Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 117184 - Unable to write an opened PDF containing cross-reference streams

  • 10 Replies
  • 2653 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Fri Mar 11 11:03:10 2016 profires [...] gmail.com - Ticket #112932: Ticket created
Subject:    Can't call method "outobjdeep" in 2.026
 
Hi,

I'm using the library on a simple script to update the info structure of pdf files.
References:
Perl version: strawberry version 5.18.2
OS: Windows 7 Enterprise

PDF input files are produced by a java program with PDF version 1.4 (and I never had problem with these).

The issue happens sometimes, when users add annotation with Acrobat Reader and this implies that the PDF version becomes 1.7 (according to the Acrobat Reader that they use)

After this modification we get the known old bug #48683, as we are using PDF-API2-2.021

I've just tried the new released PDF-API2-2.026 to check actual evolution and I obtain a new error message:
<<
Can't call method "outobjdeep" on an undefined value at D:/tm_programs/perl_portable_pdf/perl/site/lib/PDF/API2/Basic/PDF/Objind.pm line 170.
 
Below an extract from my sample script:
<<
Code: [Select]
my $pdf = PDF::API2->open($source) or die "Can't open PDF file $source: $!";
my $nowDate     = strftime( "%Y%m%d%H%M%S", localtime());

my  %h = $pdf->info(
        'CreationDate' => $nowDate,
    );
$pdf->saveas($source);

As this is my first time reporting a bug, please apologize for any mistake.
#
Tue Mar 15 15:07:43 2016 steve [...] deefs.net - Ticket #112932: Correspondence added
 
Are you able to attach a PDF that demonstrates this problem?  If you'd rather it not be publicly visible, you can instead send one to me privately.
#
Tue Mar 15 15:07:43 2016 The RT System itself - Ticket #112932: Status changed from 'new' to 'open'
#
Thu Mar 17 12:33:25 2016 profires [...] gmail.com - Ticket #112932: Correspondence added
Subject:    Re: [rt.cpan.org #112932] Can't call method "outobjdeep" in 2.026
Date:    Thu, 17 Mar 2016 16:33:03 +0000
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Francesco Fiorentino <profires [...] gmail.com>
 
In attachment a sample.pdf where the attached perl script (test.pl) works correctly and a modified one (sampleMod.pdf) where I have the listed error message. The sampleMod is obtained adding an highlight with Adobe Reader XI and saving it.

« Last Edit: March 07, 2019, 07:41:58 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
#
Tue Apr 26 04:07:46 2016 profires [...] gmail.com - Ticket #112932: Correspondence added
Subject:    Re: [rt.cpan.org #112932] Can't call method "outobjdeep" in 2.026
Date:    Tue, 26 Apr 2016 08:07:25 +0000
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Francesco Fiorentino <profires [...] gmail.com>
 
Hi,

with the 2.027 released, I see that the error message is no more present, but, using the same input attached previously, it produces an unreadable file.

Thanks,
Francesco

#
Wed Jun 01 10:12:24 2016 profires [...] gmail.com - Ticket #112932: Correspondence added
Subject:    Re: [rt.cpan.org #112932] Can't call method "outobjdeep" in 2.026
Date:    Wed, 01 Jun 2016 14:12:03 +0000
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Francesco Fiorentino <profires [...] gmail.com>
 
Any feedback about that?

#
Thu Jun 02 09:55:22 2016 steve [...] deefs.net - Ticket #112932: Correspondence added

I suspect that it's the same issue as ticket #113293.
#
Tue Jun 07 15:00:48 2016 MELMOTHX [...] cpan.org - Ticket #112932: Correspondence added
 
Actually, the issue seems unrelated. End of the modified PDF:

Code: [Select]
startxref
116
%%EOF
8 0 obj << /CreationDate (20160607205416) /Creator (Apache FOP Version 1.1) /ModDate (D:20160317171139+01'00') /PDFVersion (1.4) /Producer (Apache FOP Version 1.1) >> endobj
xref
0 1
0000000000 65535 f
8 1
0000009549 00000 n
trailer
<< /Type /XRef /DecodeParms << /Columns 4 /Predictor 12 >> /Filter /FlateDecode /ID [ <951086a159fa774291c81f007ad52c0e> <d0fd218e4aa35740b313e56bfd43b2db> ] /Index [ 9 18 ] /Info 8 0 R /Length 60 /Prev 116 /Root 10 0 R /Size 1 /W [ 1 2 1 ] >>
startxref
9723
%%EOF

It looks like the code just appends this code and keeps the original PDF verbatim, at first glance, hence the breakage.
#
Wed Aug 24 05:06:45 2016 dietrich.streifert [...] googlemail.com - Ticket created
Subject:    Simply opening and saving a multipage PDF file corrupts the file
Date:    Wed, 24 Aug 2016 11:06:30 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Dietrich Streifert <dietrich.streifert [...] googlemail.com>
 
This is for perl 5.16 on centos 7.2 using a simple test file (filename "test.pdf" ) with four pages:

The following code
Code: [Select]
my $pdf   = PDF::API2->open("test.pdf");
$pdf->saveas("test-mod.pdf");
$pdf->end;
generates a corrupt file "test-mod.pdf" which is not readable any more by e.g. Acrobat Reader, which reports that the document can not be opened (code 14).

This behaviour makes PDF::API2 unusable for even the simplest modifications.

I've attached both the perl code and the test file (don't know if this gets through the email bug submission at rt.cpan.org)

#
Subject:    [rt.cpan.org #117184]
Date:    Wed, 24 Aug 2016 10:12:45 -0400
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Phil M Perry
 
I see that the original test.pdf is PDF version 1.5. Maybe there's something in there that got corrupted when reading into PDF::API2. Is it possible to create your test.pdf in version 1.4 or even 1.3? Admittedly that's not a great solution -- PDF::API2 needs to be brought into the 21st century and handle up to version 1.7 correctly -- but it may do for the time being.
#
Wed Aug 24 10:13:02 2016 The RT System itself - Status changed from 'new' to 'open'
#
Wed Aug 24 10:27:59 2016 dietrich.streifert [...] googlemail.com - Correspondence added
Subject:    Re: [rt.cpan.org #117184]
Date:    Wed, 24 Aug 2016 16:27:45 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Dietrich Streifert <dietrich.streifert [...] googlemail.com>
 
You're right! It works if I convert test.pdf to PDF-Version 1.4.
#
Thu Oct 06 23:34:09 2016 steve [...] deefs.net - Subject changed from 'Simply opening and saving a multipage PDF file corrupts the file' to 'Unable to write an opened PDF containing cross-reference streams'
#
Thu Oct 06 23:34:09 2016 steve [...] deefs.net - Severity Wishlist added
#
Thu Oct 06 23:42:42 2016 steve [...] deefs.net - Correspondence added

PDF::API2 got support for reading files with cross-reference streams in version 2.026, but it doesn't yet support writing those files.

The easiest way to implement this would be to convert the object stream to regular objects and save the file normally.  That would eliminate the need to teach PDF::API2 how to write a cross-reference stream, though that's the other option.  Doing so will typically produce a file that's a little smaller, but it isn't necessary.

As a workaround until someone adds that support, you can use importPageIntoForm to copy each page into a new PDF file, or use other copy methods to get the data from the original file to a new one.
#
Fri Oct 07 00:21:57 2016 steve [...] deefs.net - Ticket #112932: Correspondence added

On Thu Jun 02 09:55:22 2016, SSIMMS wrote:

Ok, not #113293, but it does appear to be the same as #117184.  sampleMod.pdf contains a cross-reference stream.  PDF::API2 can read them as of version 2.026, but it doesn't know how to write a cross-reference stream yet, nor how to convert from a cross-reference stream to a cross-reference table (which would likely be the easier of the two to implement).

A potential solution and a workaround are given in ticket #117184.
#
Fri Oct 07 00:22:50 2016 steve [...] deefs.net - Ticket #112932: Merged into ticket #117184
#
Fri Oct 07 00:22:50 2016 steve [...] deefs.net - Merged into ticket #117184

<formatting cleanup - Mod.>
« Last Edit: March 07, 2019, 07:42:27 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Steve has rejected RT 120450 as a duplicate of this bug.
120450 reopened, closed as PATCHED
« Last Edit: March 07, 2019, 07:42:48 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Sun Jul 02 23:45:57 2017 steve [...] deefs.net - Correspondence added

Possible solution in ticket 121832.
« Last Edit: March 07, 2019, 07:43:05 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
 PhilterPaper commented 1 Dec

A fundamental problem here is that cross-reference streams are PDF 1.5, while PDF::Builder (and API2) are supposed to be 1.4. It should refuse to read in a PDF 1.5 file until all 1.5 features are fully implemented! At any rate, it may be able to read these streams, but are not yet able to write them, so the former capability isn't terribly useful.
« Last Edit: March 07, 2019, 07:43:26 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Originally opened on RT as 112932 Can't call method "outobjdeep" in 2.026. Another ticket, RT 117184 Simply opening and saving a multipage PDF file corrupts the file was renamed to Unable to write an opened PDF containing cross-reference streams and the two tickets were merged under the latter's name. To make things more consistent, the title of this ticket will be changed, too.

RT 121832 is suggested as a fix, but the code does not appear to be in PDF::API2 or PDF::Builder. Need to check more on this, as 121832 was closed as fixed in both products! RT 117184 is still open on PDF::API2.

Add:
Outputting cross-reference streams would be marked as "PDF 1.5" output, which would probably be automatic anyway, if the only way to generate a cross-reference stream would be through reading in a PDF 1.5 (or higher) file. If we come up with a way to natively generate a cross-reference stream, it would have to force 1.5 output..
« Last Edit: March 09, 2019, 12:44:18 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Mon Apr 01 11:44:37 2019 PMPERRY@cpan.org - Correspondence added

Ticket 121832 is marked as fixed (resolved), but I don't think Vadim's code was put in, and I don't think the current PDF::API2 (nor PDF::Builder) can deal with writing back out a PDF 1.5 cross-reference stream. I don't know for sure what was "fixed" in that ticket. Perhaps it would be a good time to take another look at either writing out a cross-reference stream or converting it to a classic xref table.

In PDF::Builder, the cross-reference stream output would automatically bump the PDF version to 1.5 (simply reading in such a PDF in the first place will also do so). I have no problems with doing that -- on the other hand, is there a strong argument for converting to an xref table, to stay at PDF 1.4? Cross-reference streams, once read in, seem to be causing more and more trouble, so it would be good to deal with them once and for all.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Tue Apr 02 22:13:42 2019 futuramedium [...] yandex.ru - Correspondence added

Phil,

I have better alternative than patch (hack) from #121832. To please Acrobat/Reader, incremental update can append either classical Xref Table or compressed Xref Stream. The new patch seems to be working. The test PDF file is from this thread.

1) Producing "hybrid files" to ensure "compatibility with older applications" is not implemented (was not even contemplated -- I don't think it's important anymore).

2) No support (with this patch, but would not be difficult in general) for files > ~4 Gb.

3) Somewhat lousy compression (because of no prediction) if someone updates unusually large number of objects -- i.e. generally unlikely).

4) Of course, updated objects are not stuffed into streams, and furthermore this patch does nothing to "use modern compression" when file is clean-output (IIRC, PDF::API2 can't do it anyway).

5) Important -- this patch also applies changes (2 topmost changes) as per #121911.

In fact, fixes are very minimal, existing code is mostly re-used to collect updates made to XRef Table (instead of writing them as they come) and then apply them appropriately in either of 2 modes.

+ One (minor) digression: documentation could be more clear that after calling "saveas" an instance becomes unusable -- to prevent someone writing scripts e.g. such as with commented fragment below.
Code: [Select]
use warnings;
use strict;
use feature 'say';

use PDF::API2;

my $pdf = PDF::API2-> open( "test.pdf" );
$pdf-> page;
$pdf-> page;
$pdf-> page;

$pdf-> saveas( "test-mod.pdf" );

# $pdf-> page;
# $pdf-> page;
# $pdf-> saveas( "test-mod++.pdf" );

__END__
Code: [Select]
--- PDF\API2\Basic\PDF\File.old Fri Jul 7 04:53:59 2017
+++ PDF\API2\Basic\PDF\File.pm Wed Apr 3 04:01:26 2019
@@ -522,6 +522,7 @@
         if (defined $result->{'Type'} and defined $types{$result->{'Type'}->val}) {
             bless $result, $types{$result->{'Type'}->val};
+            $result-> {' outto'} = [ $self ];
         }
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
@@ -540,7 +541,7 @@
         }
         $result->{' parent'} = $self;
         weaken $result->{' parent'};
-        $result->{' realised'} = 0;
+#??        $result->{' realised'} = 0;
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
     }
@@ -1282,7 +1283,7 @@
     $tdict->{'Size'} = PDFNum($self->{' maxobj'});

     my $tloc = $fh->tell();
-    $fh->print("xref\n");
+    my @out;
     my @xreflist = sort { $self->{' objects'}{$a->uid}[0] <=> $self->{' objects'}{$b->uid}[0] } (@{$self->{' printed'} || []}, @{$self->{' free'} || []});

@@ -1314,25 +1315,25 @@
 #            $fh->printf("0 1\n%010d 65535 f \n", $ff);
 #        }
         if ($i > $#xreflist || $self->{' objects'}{$xreflist[$i]->uid}[0] != $j + 1) {
-            $fh->print(($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n");
+            push @out, ($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n";
             if ($first == -1) {
-                $fh->printf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
+                push @out, sprintf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
                 $first = 0;
             }
             for ($j = $first; $j < $i; $j++) {
                 my $xref = $xreflist[$j];
                 if (defined $freelist[$k] && defined $xref && "$freelist[$k]" eq "$xref") {
                     $k++;
-                    $fh->print(pack("A10AA5A4",
+                    push @out, pack("A10AA5A4",
                                     sprintf("%010d", (defined $freelist[$k] ?
                                                       $self->{' objects'}{$freelist[$k]->uid}[0] : 0)), " ",
                                     sprintf("%05d", $self->{' objects'}{$xref->uid}[1] + 1),
-                                    " f \n"));
+                                    " f \n");
                 }
                 else {
-                    $fh->print(pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
+                    push @out, pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
                                     sprintf("%05d", $self->{' objects'}{$xref->uid}[1]),
-                                    " n \n"));
+                                    " n \n");
                 }
             }
             $first = $i;
@@ -1342,9 +1343,48 @@
             $j++;
         }
     }
     $fh->print("trailer\n"); 
-    $tdict->outobjdeep($fh, $self);
-    $fh->print("\nstartxref\n$tloc\n%%EOF\n");
+    if ( exists $tdict-> { Type } and $tdict-> { Type }-> val eq 'XRef' ) {
+
+        my ( @index, @stream );
+        my $len = 2;                                # 2 or 4 will do
+        for ( @out ) {
+            $_ = [ split ];
+            die if $_-> [ 0 ] >= 0xFFFFFFFF;       # extremely unlikely, but better (any?) message would help
+            $len = 4 if $_-> [ 0 ] >= 0xFFFF;
+            @$_ == 2 ? push @index, @$_ : push @stream, $_
+        }
+        my $c = $len == 2 ? 'n' : 'N';
+        my $stream = join '', map {
+            pack "C${c}C", $_-> [ 2 ] eq 'n' ? 1 : 0, @{ $_ }[ 0 .. 1 ]
+        } @stream;
+
+        $i = $self->{ ' maxobj' } ++;
+        $self-> add_obj( $tdict, $i, 0 );
+        $self-> out_obj( $tdict );
+
+        push @index, $i, 1;
+        $stream .= pack "C${c}C", 1, $tloc, 0;
+
+        $tdict-> { Size } = PDFNum( ++ $i );
+        $tdict-> { Index } = PDFArray( map PDFNum( $_ ), @index );
+        $tdict-> { W } = PDFArray( map PDFNum( $_ ), 1, $len, 1 );
+        $tdict-> { Filter } = PDFName( 'FlateDecode' );
+
+        delete $tdict-> { DecodeParms };    # For such streams, prediction improves compression hugely,
+                                            # but "outfilt" just can't do it, alas.
+
+        $stream = PDF::API2::Basic::PDF::Filter::FlateDecode-> new-> outfilt( $stream, 1 );
+        $tdict-> { ' stream' } = $stream;
+        $tdict-> { ' nofilt' } = 1;
+        delete $tdict-> { Length };
+        $self-> ship_out;
+    }
+    else {
+        $fh->print("xref\n", @out, "trailer\n");
+        $tdict->outobjdeep($fh, $self);
+        $fh->print("\n");
+    }
+    $fh->print("startxref\n$tloc\n%%EOF\n");
 }

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Wed Apr 03 12:53:41 2019 futuramedium [...] yandex.ru - Correspondence added

Should have chosen offset length (2 or 4 bytes) based on $tloc only. Fixed. Also, added filtering to XRef stream. Raw (uncompressed) stream length will grow up to 25% (as with file being tested) because of prepended byte per "row", but for any substantial changes to PDF file, compression ratio will improve significantly. E.g., if, in example script, 6 instead of 3 pages are appended, compressed stream length already becomes 42 vs. 44 bytes for filtered/unfiltered data.

One concern may be that gennum is limited to 1 byte, but, in reality, they haven't been used (and objnums re-used) for a long time. In test file, and all "modern" (with XRef stream) files I've seen, 1st XRef Table entry is "0 0 f". IIRC PDF 2.0 says gennum is always 0.
Code: [Select]
--- PDF\API2\Basic\PDF\File.old Fri Jul 7 04:53:59 2017
+++ PDF\API2\Basic\PDF\File.pm Wed Apr 3 19:27:37 2019
@@ -522,6 +522,7 @@
         if (defined $result->{'Type'} and defined $types{$result->{'Type'}->val}) {
             bless $result, $types{$result->{'Type'}->val};
+            $result-> {' outto'} = [ $self ];
         }
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
@@ -540,7 +541,7 @@
         }
         $result->{' parent'} = $self;
         weaken $result->{' parent'};
-        $result->{' realised'} = 0;
+#??        $result->{' realised'} = 0;
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
     }
@@ -1282,7 +1283,7 @@
     $tdict->{'Size'} = PDFNum($self->{' maxobj'});

     my $tloc = $fh->tell();
-    $fh->print("xref\n");
+    my @out;

     my @xreflist = sort { $self->{' objects'}{$a->uid}[0] <=> $self->{' objects'}{$b->uid}[0] } (@{$self->{' printed'} || []}, @{$self->{' free'} || []});

@@ -1314,25 +1315,25 @@
 #            $fh->printf("0 1\n%010d 65535 f \n", $ff);
 #        }
         if ($i > $#xreflist || $self->{' objects'}{$xreflist[$i]->uid}[0] != $j + 1) {
-            $fh->print(($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n");
+            push @out, ($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n";
             if ($first == -1) {
-                $fh->printf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
+                push @out, sprintf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
                 $first = 0;
             }
             for ($j = $first; $j < $i; $j++) {
                 my $xref = $xreflist[$j];
                 if (defined $freelist[$k] && defined $xref && "$freelist[$k]" eq "$xref") {
                     $k++;
-                    $fh->print(pack("A10AA5A4",
+                    push @out, pack("A10AA5A4",
                                     sprintf("%010d", (defined $freelist[$k] ?
                                                       $self->{' objects'}{$freelist[$k]->uid}[0] : 0)), " ",
                                     sprintf("%05d", $self->{' objects'}{$xref->uid}[1] + 1),
-                                    " f \n"));
+                                    " f \n");
                 }
                 else {
-                    $fh->print(pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
+                    push @out, pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
                                     sprintf("%05d", $self->{' objects'}{$xref->uid}[1]),
-                                    " n \n"));
+                                    " n \n");
                 }
             }
             $first = $i;
@@ -1342,9 +1343,48 @@
             $j++;
         }
     }
     $fh->print("trailer\n");
-    $tdict->outobjdeep($fh, $self);
-    $fh->print("\nstartxref\n$tloc\n%%EOF\n");
+    if ( exists $tdict-> { Type } and $tdict-> { Type }-> val eq 'XRef' ) {
+
+        my ( @index, @stream );
+        for ( @out ) {
+            my @a = split;
+            @a == 2 ? push @index, @a : push @stream, \@a
+        }
+        $i = $self->{ ' maxobj' } ++;
+        $self-> add_obj( $tdict, $i, 0 ); 
+        $self-> out_obj( $tdict );
+
+        push @index, $i, 1;
+        push @stream, [ $i, 0, 'n' ];
+
+        $i = $self->{ ' maxobj' } ++;
+        $self-> add_obj( $tdict, $i, 0 );
+        $self-> out_obj( $tdict );
+
+        my $len = $tloc > 0xFFFF ? 4 : 2;           # don't expect files > 4 Gb
+        my $tpl = $tloc > 0xFFFF ? 'CNC' : 'CnC';   # don't expect gennum > 255, it's absurd.
+                                                    # Adobe doesn't use them anymore anyway
+        my $stream = '';
+        my @prev = ( 0 ) x ( $len + 2 );
+        for ( @stream ) {
+            my @line = unpack 'C*', pack $tpl, $_-> [ 2 ] eq 'n' ? 1 : 0, @{ $_ }[ 0 .. 1 ];
+
+            $stream .= pack 'C*', 2,                # prepend filtering method, "PNG Up"
+                map {( $line[ $_ ] - $prev[ $_ ] + 256 ) % 256 } 0 .. $#line;
+            @prev    = @line;
+        }
+        $tdict-> { Size } = PDFNum( $i + 1 );
+        $tdict-> { Index } = PDFArray( map PDFNum( $_ ), @index );
+        $tdict-> { W } = PDFArray( map PDFNum( $_ ), 1, $len, 1 ); 
+        $tdict-> { Filter } = PDFName( 'FlateDecode' );
+
+        $tdict-> { DecodeParms } = PDFDict;
+        $tdict-> { DecodeParms }-> val-> { Predictor } = PDFNum( 12 );
+        $tdict-> { DecodeParms }-> val-> { Columns } = PDFNum( $len + 2 );

+        $stream = PDF::API2::Basic::PDF::Filter::FlateDecode-> new-> outfilt( $stream, 1 );
+        $tdict-> { ' stream' } = $stream;
+        $tdict-> { ' nofilt' } = 1;
+        delete $tdict-> { Length };
+        $self-> ship_out;
+    }
+    else {
+        $fh->print("xref\n", @out, "trailer\n");
+        $tdict->outobjdeep($fh, $self);
+        $fh->print("\n");
+    }
+    $fh->print("startxref\n$tloc\n%%EOF\n");
 }

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Wow! That's quite a bit of work you've put in -- thank you. It's complicated enough that I want to go over it very carefully (and of course, test it thoroughly) before putting it in PDF::Builder. I can't even yet ask any questions about it! I hope to get it in for release 3.014, unless there are complications, in which case it may slide to 3.015 this summer.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 529
    • View Profile
Mon Apr 08 17:59:57 2019 futuramedium [...] yandex.ru - Correspondence added

Found minor issues: though harmless, they'd better be fixed. I hope that's final version, sorry for the mess.
Code: [Select]
--- PDF\API2\Basic\PDF\File.old Fri Jul 7 04:53:59 2017
+++ PDF\API2\Basic\PDF\File.pm Tue Apr 9 00:46:42 2019
@@ -522,6 +522,8 @@

         if (defined $result->{'Type'} and defined $types{$result->{'Type'}->val}) {
             bless $result, $types{$result->{'Type'}->val};
+            $result-> {' outto'} = [ $self ];
+            weaken $_ for @{$result->{' outto'}};
         }
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
@@ -540,7 +542,7 @@
         }
         $result->{' parent'} = $self;
         weaken $result->{' parent'};
-        $result->{' realised'} = 0;
+#??     $result->{' realised'} = 0;
         # gdj: FIXME: if any of the ws chars were crs, then the whole
         # string might not have been read.
     }
@@ -1282,7 +1284,7 @@
     $tdict->{'Size'} = PDFNum($self->{' maxobj'});

     my $tloc = $fh->tell();
-    $fh->print("xref\n");
+    my @out; my @xreflist = sort { $self->{' objects'}{$a->uid}[0] <=> $self->{' objects'}{$b->uid}[0] } (@{$self->{' printed'} || []}, @{$self->{' free'} || []});

@@ -1314,25 +1316,25 @@
 #            $fh->printf("0 1\n%010d 65535 f \n", $ff);
 #        }
         if ($i > $#xreflist || $self->{' objects'}{$xreflist[$i]->uid}[0] != $j + 1) {
-            $fh->print(($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n");
+            push @out, ($first == -1 ? "0 " : "$self->{' objects'}{$xreflist[$first]->uid}[0] ") . ($i - $first) . "\n";
             if ($first == -1) {
-                $fh->printf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
+                push @out, sprintf("%010d 65535 f \n", defined $freelist[$k] ? $self->{' objects'}{$freelist[$k]->uid}[0] : 0);
                 $first = 0;
             }
             for ($j = $first; $j < $i; $j++) {
                 my $xref = $xreflist[$j];
                 if (defined $freelist[$k] && defined $xref && "$freelist[$k]" eq "$xref") {
                     $k++;
-                    $fh->print(pack("A10AA5A4",
+                    push @out, pack("A10AA5A4",
                                     sprintf("%010d", (defined $freelist[$k] ?
                                                       $self->{' objects'}{$freelist[$k]->uid}[0] : 0)), " ",
                                     sprintf("%05d", $self->{' objects'}{$xref->uid}[1] + 1),
-                                    " f \n"));
+                                    " f \n");
                 }
                 else {
-                    $fh->print(pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
+                    push @out, pack("A10AA5A4", sprintf("%010d", $self->{' locs'}{$xref->uid}), " ",
                             sprintf("%05d", $self->{' objects'}{$xref->uid}[1]),
-                            " n \n"));
+                            " n \n");
                 }
             }
             $first = $i;
@@ -1342,9 +1344,53 @@
             $j++;
         }
     }
-    $fh->print("trailer\n");
-    $tdict->outobjdeep($fh, $self);
-    $fh->print("\nstartxref\n$tloc\n%%EOF\n");
+    if ( exists $tdict-> { Type } and $tdict-> { Type }-> val eq 'XRef' ) {
+
+        my ( @index, @stream );
+        for ( @out ) {
+            my @a = split;
+            @a == 2 ? push @index, @a : push @stream, \@a
+        }
+        my $i = $self->{ ' maxobj' } ++;
+        $self-> add_obj( $tdict, $i, 0 );
+        $self-> out_obj( $tdict );
+
+        push @index, $i, 1;
+        push @stream, [ $tloc, 0, 'n' ];
+
+        my $len = $tloc > 0xFFFF ? 4 : 2;           # don't expect files > 4 Gb
+        my $tpl = $tloc > 0xFFFF ? 'CNC' : 'CnC';   # don't expect gennum > 255, it's absurd.
+                                                    # Adobe doesn't use them anymore anyway
+        my $stream = '';
+        my @prev = ( 0 ) x ( $len + 2 );
+        for ( @stream ) {
+            my @line = unpack 'C*', pack $tpl, $_-> [ 2 ] eq 'n' ? 1 : 0, @{ $_ }[ 0 .. 1 ];
+
+            $stream .= pack 'C*', 2, # prepend filtering method, "PNG Up"
+                map {( $line[ $_ ] - $prev[ $_ ] + 256 ) % 256 } 0 .. $#line;
+            @prev    = @line;
+        }
+        $tdict-> { Size } = PDFNum( $i + 1 );
+        $tdict-> { Index } = PDFArray( map PDFNum( $_ ), @index );
+        $tdict-> { W } = PDFArray( map PDFNum( $_ ), 1, $len, 1 );
+        $tdict-> { Filter } = PDFName( 'FlateDecode' );
+
+        $tdict-> { DecodeParms } = PDFDict;
+        $tdict-> { DecodeParms }-> val-> { Predictor } = PDFNum( 12 );
+        $tdict-> { DecodeParms }-> val-> { Columns } = PDFNum( $len + 2 );
+
+        $stream = PDF::API2::Basic::PDF::Filter::FlateDecode-> new-> outfilt( $stream, 1 );
+        $tdict-> { ' stream' } = $stream;
+        $tdict-> { ' nofilt' } = 1;
+        delete $tdict-> { Length };
+        $self-> ship_out;
+    }
+    else {
+        $fh->print("xref\n", @out, "trailer\n");
+        $tdict->outobjdeep($fh, $self);
+        $fh->print("\n");
+    }
+    $fh->print("startxref\n$tloc\n%%EOF\n");
 }