Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

New contributions wanted

  • 0 Replies
  • 216 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 430
    • View Profile
New contributions wanted
« July 06, 2018, 11:45:03 AM »
Looking at wkHTMLtoPDF help requests, there are some ideas for additional "contrib" utilities:

  • Add or replace existing page headers and footers with new ones, such as page numbering, a given date, and fixed text. It's probably too much trouble to try to extract heading/footing text from the page content, although it might be possible based on relative text sizes (assume it's a section heading). Existing page numbers might also be extracted before being overwritten (e.g., to move or reformat them).
  • Extend background, etc. to the bottom of the last page. This comes from a request to carry a body background color all the way to the end of the last page, even if the text content ends part way down.
  • Find presumed section headers (based on relative font size), and clean up orphans by moving some content to the next page. This would of course have a cascading effect on content further down the page.
  • Clean up known problems with other packages, such as wkHTMLtoPDF, such as improper splitting of tables (in the middle of a line of text, a thead immediately before a page break, etc.). If clear patterns can be discovered, such as a line at the bottom of one page repeated at the top of the next, this might be feasible, although it would be easy to go down a rabbit hole with something like this!
  • Reflowing a document to new page sizes and margins. This requires being able to recognize paragraphs.
Things like extracting pages and combining them into new documents might best be left to existing tools such as PDFtk, although something might be done with (manually) trimming unwanted leading and trailing content during extraction, and possibly reflowing what remains onto new pages.