Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

RT 121832 - Invalid PDF file in test suite

  • 6 Replies
  • 1483 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
RT 121832 - Invalid PDF file in test suite
« May 23, 2017, 12:35:38 PM »
Tue May 23 09:40:43 2017 futuramedium [...] yandex.ru - Ticket created
Subject:    Invalid PDF file in test suite

The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. Line 12 should contain "/N 2" (not "1") -- number of objects in object stream. (Ehm, what does test, which uses this PDF, test? :) If file is broken).

After that file can be viewed OK with Ghostscript, Firefox, etc. Still not good enough for Adobe Reader, but it looks like Reader issue, it refuses uncompressed object streams and/or xref streams (? - doesn't matter).

+ MediaBox is required, and though most software forgives its absence, it's probably better if test suite contains 100% valid PDFs.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #1: May 24, 2017, 06:01:31 PM »
Tue May 23 22:41:32 2017 steve [...] deefs.net - Correspondence added

Quote
The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. Line 12 should contain "/N 2" (not "1") -- number of objects in object stream. (Ehm, what does test, which uses this PDF, test? :) If file is broken).
Thanks -- I've updated the PDF.  To answer your question, this file is used to test parts of the object stream handling code, which is still pretty new.  It's not a perfect or complete test, but I figure some tests are better than no tests, which is where we started.  If it opens in PDF readers, that's a bonus, but isn't the most important thing.

Quote
it's probably better if test suite contains 100% valid PDFs.
Agreed.  Do you know of any software or site that provides minimal, ASCII-charset sample PDFs that are suitable for testing individual parts of the PDF specification?  I haven't been able to find anything, so I'm writing/modifying them by hand to get the features I need for testing purposes.  I'd be happy not to have to do that.  :-)
#
Tue May 23 22:41:32 2017 The RT System itself - Status changed from 'new' to 'open'
#
Tue May 23 22:41:37 2017 steve [...] deefs.net - Status changed from 'open' to 'patched'

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #2: May 24, 2017, 06:41:14 PM »
Tried new PDF from GitHub (PDF::API2). Still fails, but t/02-xrefstm.t still successfully completes all 4 tests.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #3: May 25, 2017, 09:48:13 AM »
Wed May 24 18:01:59 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Correspondence added

On Tue May 23 22:41:32 2017, SSIMMS wrote:
Quote
Quote
On Tue May 23 09:40:43 2017, vadimr wrote:
The "sample-xrefstm.pdf" can't be read with any PDF viewer I tried. Line 12 should contain "/N 2" (not "1") -- number of objects in object stream. (Ehm, what does test, which uses this PDF, test? :) If file is broken).

Thanks -- I've updated the PDF. To answer your question, this file is used to test parts of the object stream handling code, which is still pretty new. It's not a perfect or complete test, but I figure some tests are better than no tests, which is where we started. If it opens in PDF readers, that's a bonus, but isn't the most important thing.
Quote
it's probably better if test suite contains 100% valid PDFs.
Agreed. Do you know of any software or site that provides minimal, ASCII-charset sample PDFs that are suitable for testing individual parts of the PDF specification? I haven't been able to find anything, so I'm writing/modifying them by hand to get the features I need for testing purposes. I'd be happy not to have to do that. :-)
I am doing exactly the same thing with PDF::Tiny. Feel free to steal my test PDFs.
#
Thu May 25 05:49:46 2017 futuramedium [...] yandex.ru - Correspondence added
From:    futuramedium [...] yandex.ru

Quote
Do you know of any software or site that provides minimal, ASCII-charset sample PDFs that are suitable for testing individual
parts of the PDF specification?
No, sorry, haven't seen such a collection.

The reason why I opened "sample-xrefstm.pdf" at all, was that I was looking for a minimal available PDF 1.5 file to report another issue. I'll describe it next here instead of creating new ticket. It's, strictly speaking, Adobe issue, not PDF::API2's. But, unfortunately, in the bubble that I exist in, files which Acrobat rejects are automatically considered invalid. Whether it should concern PDF::API2, you decide :).

The (fixed) "sample-xrefstm.pdf" is not suited well to investigate, because it is rejected by Acrobat "as is", from the beginning (which, like I said, is probably Adobe's issue as well).

But consider any other 1.5 file with xref stream. Opening it with PDF::API2, making changes, and saving to file leads to incrementally updated PDF, with original xref table stream intact, and PDF::API2's appended xref table section being "classical". Nowhere in specification this is prohibited. Any other viewers are happy to open such files. But Acrobat either rejects them or tries to "fix", breaking them completely.

I think it should be mentioned in "known issues", at least.

Working around this issue could be to have "saving as" to totally rebuild PDF instead of appending i.e. not to incrementally update. I mean, as CAM::PDF::cleanoutput vs CAM::PDF::output. Then there'll be single "classical" xref table. Yet further, incremental update could append either "classical" or streamed xref section. I understand such changes can be difficult to implement.
#
Thu May 25 07:19:58 2017 futuramedium [...] yandex.ru - Correspondence added
From:    futuramedium [...] yandex.ru

Sorry, as I see it's a known issue, already discussed (117184).

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #4: May 29, 2017, 11:56:34 AM »
Mon May 29 10:01:38 2017 futuramedium [...] yandex.ru - Correspondence added
From:    futuramedium [...] yandex.ru

Here's a method to add to API2.pm to re-build PDF::API2 instance and save it to file with a single "classical" xref table.
Code: [Select]
sub not_very_clean_output {
    my ( $self, $fn ) = @_;

    delete $self-> { reopened };
    delete $self-> { pdf }{ ' update' };
    delete $self-> { pdf }{ ' loc' };

    my %done;
    $done{ $self-> { pdf }{ ' objects' }{ $_-> uid }[ 0 ]} ++
        for @{ $self-> { pdf }{' outlist'}};

    my %h;      # obj_num => gen_num
   
    my $tdict = $self-> { pdf };
    while ( defined $tdict ) {
        my $sect = $tdict-> { ' xref' };
       
        for ( keys %$sect ) {

            next unless /./;
            next if $done{ $_ };
           
            my $ary = $sect-> { $_ };
            next if $#$ary == 2 and $ary-> [ 2 ] eq 'f';
            $h{ $_ } = $#$ary == 2
                ? $ary-> [ 1 ]
                : 0
        }
        $tdict = $tdict->{ ' prev' }
    }

    for my $objnum ( sort { $a <=> $b } keys %h ) {
       
        my $obj = $self-> { pdf }-> read_objnum( $objnum, $h{ $objnum });
        $obj-> realise;
        $self-> { pdf }-> out_obj( $obj )
            unless $obj-> { Type } and
                   $obj-> { Type }-> val =~ /^(Xref|ObjStm)$/   # skip
    }
   
    $self-> saveas( $fn );
}

"Not very clean" because it does bare minimum. Objects are not re-numbered (i.e. range consolidated, holes eliminated), un-used objects are not discarded. It seems to work, but not tested extensively, may serve as a base or for anyone in desperate need. Also, instance stability after calling this is not tested, maybe it should be destroyed or file re-opened.

"next unless /./;" is for line 815 of PDF::API2::Basic::PDF::File. <see reference in RT 121911 -- Mod.>

==========================================================
This code does not appear to be in either the API2 2.033 release or the GitHub library. Where
is the "patch" mentioned below? 117184 is still unresolved at this time.
==========================================================
If this is involving cross-reference streams, keep in mind that they are PDF 1.5 features, not
valid in PDF 1.4! As of 117184, the last word was that Builder could read, but not write
cross-reference streams. (still a 1.5 feature)
==========================================================


« Last Edit: December 25, 2017, 07:55:22 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #5: July 03, 2017, 10:35:20 AM »
RT 117184 points to this ticket as a possible solution.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Re: RT 121832 - Invalid PDF file in test suite
« Reply #6: July 03, 2017, 10:38:39 AM »
Sun Jul 02 23:46:16 2017 steve [...] deefs.net - Status changed from 'patched' to 'resolved'
#
Sun Jul 02 23:46:21 2017 steve [...] deefs.net - Fixed in 2.032 added