Wed Apr 29 09:38:07 2020 PMPERRY@cpan.org - Correspondence added
I see that your code uses both "openpage" and "import_page" calls. Note that problems have been reported with both of those sometimes corrupting read-in PDF (bugs 130722 and 130805). One of them involved already-corrupted PDFs being read in -- when you try to use a reader (such as Adobe) to read the source PDFs, does it ask to SAVE the PDF when you exit the reader? If so, it cleaned up some original damage to the file! Anyway, you might want to read those two bug reports and see if anything rings a bell.
Wed Apr 29 13:58:40 2020 mike.edwards@ceprinter.com - Correspondence added
It seems that it is less about which machine its run on. I tried the same script on the same set of PDFs two different times on the same machine and one set was corrupt and the other was fine. I did get a good working set from 27,982 source PDFs on the newer box, though. So now I'm at somewhat of a loss, but let me see if I can answer your questions in order:
1. The version on the older box is 2.033-0 and was installed from cpan. The other box runs 2.033-1, installed from apt repo.
2. The fonts are identical, copied from one to the other.
3. I have not yet tried PDF::Builder as that would be time-consuming. If I get a chance, I will try it out though.
4. The module was installed as described above.
5. The source files are apparently clean, but I have not opened all of them. (I am not prompted to save them when opening in Acrobat Pro on my Windows workstation.)
6. The fonts are in the source PDFs as embedded subsets.
7. Source PDFs are all version 1.6 but do not have any form elements or anything funky.
I looked at those bug reports and they do not seem to apply here.
I appreciate your help.
Mike
Wed Apr 29 20:22:12 2020 PMPERRY@cpan.org - Correspondence added
If I understand you, you are not getting consistent pass or failure for given inputs on the same box? That's rather unsettling. One thing you expect computers to do is produce the same results (correct or not) consistently for the same inputs. I don't think I've seen this behavior before for this software. I think that PDF::API2 still adds a timestamp to font object names (PDF::Builder had that removed), but offhand I can't think of how that would cause time-varying results you're seeing.
Version 2.033 is a bit long of tooth (3 years old this July). The current version is 2.037. You might want to think about updating to current, and if that doesn't cure it, giving PDF::Builder a try (version 3.018 was just released). All you should have to do is change all occurrences of "PDF::API2" in your program to "PDF::Builder", and it should work.
I hope to give your code a try soon and see if I can replicate any problem.
Wed Apr 29 20:31:15 2020 PMPERRY@cpan.org - Correspondence added
Source PDFs are all version 1.6 but do not have any form elements or anything funky.
Hmm. There are lots of things that could give PDF::API2 indigestion that are not forms or anything especially funky. PDF::API2 is quite well behaved up to 1.4, but beyond that, who knows.
Perhaps you could make copies of some offending PDFs and use an editor (such as ViM or Notepad++) to change the version number from 1.6 to 1.4. Then try to read it into something like Acrobat, and see if there are any complaints. If not, it might really be 1.4 or lower, but if there are errors, you have a 1.5 or higher item in the PDF that PDF::API2 may be SILENTLY choking on. PDF::Builder may be a little better behaved concerning such things, and can tell you if it's unhappy about some things.