Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

New contributions wanted

  • 0 Replies
  • 1090 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 823
    • View Profile
New contributions wanted
« July 06, 2018, 11:45:03 AM »
Looking at wkHTMLtoPDF help requests, there are some ideas for additional "contrib" utilities:

  • Add or replace existing page headers and footers with new ones, such as page numbering, a given date, and fixed text. It's probably too much trouble to try to extract heading/footing text from the page content, although it might be possible based on relative text sizes (assume it's a section heading). Existing page numbers might also be extracted before being overwritten (e.g., to move or reformat them). It might be better to have a human examine the page and designate where existing headers and footers are, so they can be cleanly removed before new ones are added. Chapter and section text in the replacement header/footer would probably have to be manually added per page range.
  • Extend background, etc. to the bottom of the last page. This comes from a request to carry a body background color all the way to the end of the last page, even if the text content ends part way down. Ditto for background watermarks, images, etc., which may be incomplete on the existing last page, and have to be grabbed from a previous page.
  • Find presumed section headers (based on relative font size), and clean up orphans by moving some content to the next page. This would of course have a cascading effect on content further down the page. It might be better to leave selection of new page breaks to a human user.
  • Clean up known problems with other packages, such as wkHTMLtoPDF, such as improper splitting of tables (in the middle of a line of text, a thead immediately before a page break, etc.). If clear patterns can be discovered, such as a line at the bottom of one page repeated at the top of the next, this might be feasible, although it would be easy to go down a rabbit hole with something like this! Again, probably best marked up manually, to move the desired page break location. Don't forget table borders (outlines) would need to be reduced/expanded.
  • Reflowing a document to new page sizes and margins. This requires being able to recognize paragraphs. Recognition might be from inter-paragraph vspace, indentation, or short last lines, along with manual cleanup for missed cases.
Things like extracting pages and combining them into new documents might best be left to existing tools such as PDFtk, although something might be done with (manually) trimming unwanted leading and trailing content during extraction, and possibly reflowing what remains onto new pages.
« Last Edit: September 12, 2021, 08:27:53 AM by Phil »