Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

[RT 132446] Corrupt fonts

  • 4 Replies
  • 145 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 687
    • View Profile
[RT 132446] Corrupt fonts
« April 28, 2020, 06:57:30 PM »
Tue Apr 28 17:03:19 2020 mike.edwards@ceprinter.com - Ticket created [Reply] [Forward]
Subject:    Corrupt fonts
Date:    Tue, 28 Apr 2020 21:03:09 +0000
To:    "bug-PDF-API2@rt.cpan.org" <bug-PDF-API2@rt.cpan.org>
From:    Mike Edwards <mike.edwards@ceprinter.com>

Not sure if this is a bug, per se, but I am getting different results on two different Ubuntu Linux boxes...

My project involves taking many individual PDFs, combining them into one, and adding a 2D barcode to every other page as I go. This works just fine on one machine (Ubuntu 16.04), and does not throw errors on my other machine (Ubuntu 18.04) during run-time, but the end result gives errors in Acrobat, such as "The font 'OFCCJF+MyriadPro-Bold' contains bad /Flags", "...contains bad /Widths", etc., with many glyphs being scrambled. (The source PDFs that I am combining do not exhibit this behavior.) I am guessing this is an issue with embedded fonts and the environment in which PDF::API2 finds the fonts to embed, but I'm not sure. The font paths on both machines contain the fonts that are crashing, but I must be missing something. Is there a way to force a font to be explicitly and fully embedded in the final PDF that gets written out?

I am somewhat stymied as I have been running this same code (with minor tweaks) for a few years on this annual project but haven't figured this out yet. I could run everything on the box I know that works, but am trying to spread out the load.

Thanks,
Mike



--

Mike Edwards

[cid:9531e9c1-d5f0-4776-b636-c7a5acccb664]
2700 Bell Avenue | Des Moines, IA
w: 515-280-9765

www.cprinters.com<http://www.cprinters.com/>


COVID-19 Update

At Christian Edwards we have a comprehensive plan in place. We are focused on ensuring stability of services for our clients, as well as providing for the wellbeing of our employees. We will take action to respond to situations as they arise.  We thank you for your continued trust and wish you health and safety during this unprecedented situation.

Tue Apr 28 18:54:05 2020 PMPERRY@cpan.org - Correspondence added

Is MyriadPro being used for the Barcode, or is it for other text being added? Do all source PDFs have fully embedded fonts, or are some expecting to see local font files? I assume that you've carefully checked where this font is used, and whether it's embedded, and whether its use is added during the combine process.

It sounds like it would be difficult to provide a small sample case to show what's going on. If that's the case, can you try using PDF::Builder instead, and see if it behaves any differently from PDF::API2? It's worth a try if nothing else works. You can install it right alongside PDF::API2, and change your program to use PDF::Builder instead of PDF::API2.

If you can't or won't do that, but don't want to publicly provide source PDFs and code, you can contact me offline to provide the materials on a confidential basis. I can try it on both PDF::API2 and PDF::Builder. I can only test on Windows, so there's no guarantee that I can reproduce your Linux problem.
« Last Edit: April 28, 2020, 07:03:54 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 687
    • View Profile
Re: [RT 132446] Corrupt fonts
« Reply #1: April 29, 2020, 09:25:29 AM »
Wed Apr 29 08:32:39 2020 mike.edwards@ceprinter.com - Correspondence added

MyriadPro is in the source PDFs but is not being injected as part of my script (see attached). In looking at the properties of the source PDFs, the fonts are embedded subsets. The only font being used in the script is Helvetica-Condensed It doesn't look like PDF::Builder supports the creation of datamatrix (2D) barcodes, but I could be wrong.

Wed Apr 29 09:19:28 2020 PMPERRY@cpan.org - Correspondence added
Quote
It doesn't look like PDF::Builder supports the creation of datamatrix (2D) barcodes, but I could be wrong.

Neither PDF::API2 nor PDF::Builder natively support the datamatrix barcodes, but if you can use Barcode::DataMatrix with PDF::API2, it should work OK with PDF::Builder.

I took a very quick skim through Barcode::DataMatrix, and it doesn't appear to be pulling in any fonts. As you said, your code only adds Helvetica Condensed. If your code works fine on one machine, and fails on another, there must be some critical difference between the two configurations. Have you checked that your PDF::API2 is up to date (not relying on some other system to keep it updated)? Have you confirmed that the Myriad fonts are identical on the two systems? Have you tried PDF::Builder, just to see if it works more consistently (on both systems)?

One Ubuntu system is 2 years older than the other, which itself is almost 2 years old. Did PDF::API2 come with these install images, or did you install it? Have you double-checked your PDF source files to make sure they are clean as a whistle and not subtly corrupted themselves? And that the MyriadPro fonts are actually embedded and not expected to be on the Reader machine?

Your first post asked about embedding fonts. Only TrueType/OpenType fonts get embedded, as far as I can tell. And that's only when you use ttfont(), not corefont(). The default IS to embed the subset of glyphs used. I presume your Helvetica Condensed Type1 (PS) is going to require that font be present on the Reader machine.

The worst case scenario would be that some versions of PDF::API2 are subtly corrupting embedded fonts while incorporating them into a combined PDF. PDF::Builder uses very similar code, and may suffer from the same problem. I will try your code to combine some of my PDFs, but no promises that I can replicate it.

By the way, PDF::API2 supports up to PDF 1.4. Any 1.5 or up features MIGHT get corrupted -- that's been seen before. Check your input PDFs to see what version they are.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 687
    • View Profile
Re: [RT 132446] Corrupt fonts
« Reply #2: April 29, 2020, 08:33:45 PM »
Wed Apr 29 09:38:07 2020 PMPERRY@cpan.org - Correspondence added

I see that your code uses both "openpage" and "import_page" calls. Note that problems have been reported with both of those sometimes corrupting read-in PDF (bugs 130722 and 130805). One of them involved already-corrupted PDFs being read in -- when you try to use a reader (such as Adobe) to read the source PDFs, does it ask to SAVE the PDF when you exit the reader? If so, it cleaned up some original damage to the file! Anyway, you might want to read those two bug reports and see if anything rings a bell.

Wed Apr 29 13:58:40 2020 mike.edwards@ceprinter.com - Correspondence added

It seems that it is less about which machine its run on. I tried the same script on the same set of PDFs two different times on the same machine and one set was corrupt and the other was fine. I did get a good working set from 27,982 source PDFs on the newer box, though. So now I'm at somewhat of a loss, but let me see if I can answer your questions in order:

  1.  The version on the older box is 2.033-0 and was installed from cpan. The other box runs 2.033-1, installed from apt repo.
  2.  The fonts are identical, copied from one to the other.
  3.  I have not yet tried PDF::Builder as that would be time-consuming. If I get a chance, I will try it out though.
  4.  The module was installed as described above.
  5.  The source files are apparently clean, but I have not opened all of them. (I am not prompted to save them when opening in Acrobat Pro on my Windows workstation.)
  6.  The fonts are in the source PDFs as embedded subsets.
  7.  Source PDFs are all version 1.6 but do not have any form elements or anything funky.

I looked at those bug reports and they do not seem to apply here.

I appreciate your help.

Mike

Wed Apr 29 20:22:12 2020 PMPERRY@cpan.org - Correspondence added

If I understand you, you are not getting consistent pass or failure for given inputs on the same box? That's rather unsettling. One thing you expect computers to do is produce the same results (correct or not) consistently for the same inputs. I don't think I've seen this behavior before for this software. I think that PDF::API2 still adds a timestamp to font object names (PDF::Builder had that removed), but offhand I can't think of how that would cause time-varying results you're seeing.

Version 2.033 is a bit long of tooth (3 years old this July). The current version is 2.037. You might want to think about updating to current, and if that doesn't cure it, giving PDF::Builder a try (version 3.018 was just released). All you should have to do is change all occurrences of "PDF::API2" in your program to "PDF::Builder", and it should work.

I hope to give your code a try soon and see if I can replicate any problem.

Wed Apr 29 20:31:15 2020 PMPERRY@cpan.org - Correspondence added

Quote
Source PDFs are all version 1.6 but do not have any form elements or anything funky.

Hmm. There are lots of things that could give PDF::API2 indigestion that are not forms or anything especially funky. PDF::API2 is quite well behaved up to 1.4, but beyond that, who knows.

Perhaps you could make copies of some offending PDFs and use an editor (such as ViM or Notepad++) to change the version number from 1.6 to 1.4. Then try to read it into something like Acrobat, and see if there are any complaints. If not, it might really be 1.4 or lower, but if there are errors, you have a 1.5 or higher item in the PDF that PDF::API2 may be SILENTLY choking on. PDF::Builder may be a little better behaved concerning such things, and can tell you if it's unhappy about some things.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 687
    • View Profile
Re: [RT 132446] Corrupt fonts
« Reply #3: May 01, 2020, 07:50:18 PM »
Thu Apr 30 14:26:11 2020 mike.edwards [...] ceprinter.com - Correspondence added

Yeah, stupid computers.

Heh. I changed a sample PDF to 1.4 as you suggested and Acrobat did not complain at all when it opened it. I've been doing this annual project for several years with PDF::API2 and have had a similar issue with font rendering but always worked it out--I just never documented the solution.

I ran a larger set (138,615) overnight last night on the newer machine, and they seemed to build OK. Unfortunately, I've lost track of the different things I've tried to resolve this, but maybe I'm OK now and this is really a non-issue.

Thank you for lending an ear. I will try to give PDF::Builder a shot. What sets it apart?

Thu Apr 30 17:57:31 2020 PMPERRY [...] cpan.org - Correspondence added

Quote
Heh. I changed a sample PDF to 1.4 as you suggested and Acrobat did not complain at all when it opened it.
So -- at least that PDF -- is probably actually PDF 1.4 and not using 1.5+ features. Most PDF-production programs just set their PDF version to the highest needed for features they support, rather than actually keeping track of features used. Nevertheless, PDF::API2 shouldn't have trouble with a PDF with "too high" a version number.

Quote
I ran a larger set (138,615) overnight last night on the newer machine, and they seemed to build OK.
Just to avoid confusion, is this with unmodified PDFs (i.e., not reset to version 1.4)? So you just can't consistently reproduce the problem? This is one of those phase-of-the-Moon correlations with problems? Did you ever upgrade to PDF::API2 2.037?

Quote
Thank you for lending an ear. I will try to give PDF::Builder a shot. What sets it apart?
Friends, Romans, countrymen,...

Builder is a fork of API2 (about 4 years ago) and has mostly the same function (in fact, a superset). I've fixed a number of open bugs, made enhancements to TIFF and PNG image handling, and just added support for HarfBuzz complex script hacking (ligatures, kerning, cursive/connected scripts, RTL support). I've made changes to handle read-in PDFs greater than version 1.4 a little better (I feel). Other than that, it's still quite compatible with API2.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 687
    • View Profile
Re: [RT 132446] Corrupt fonts
« Reply #4: May 05, 2020, 07:46:33 PM »
Mon May 04 21:19:13 2020 PMPERRY [...] cpan.org - Correspondence added

On Thu Apr 30 14:26:11 2020, mike.edwards@ceprinter.com wrote:
Quote
I will try to give PDF::Builder a shot.

FYI, I just added INFO/CONVERSION to PDF::Builder's GitHub repository (for eventual release in 3.019), to assist if you run into any troubles trying to change over to Builder. It mostly pertains to removed deprecated functions and some minor incompatibilities in results. Still, for the most part, you should find Builder to be a superset of API2's functionality. Best wishes!