Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

RT 121911 - Adding pages to existing documents doesn't work if page tree is any

  • 2 Replies
  • 996 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 359
    • View Profile
Sat May 27 09:04:13 2017 futuramedium [...] yandex.ru - Ticket created
Subject:    Adding pages to existing documents doesn't work if page tree is anything but very simple (+ maybe fixed)

Code: [Select]
use strict;
use warnings;
use feature 'say';
use PDF::API2;

my $pdf = PDF::API2-> open( 'sample.pdf' );
$pdf-> page;
$pdf-> saveas( 'sample+.pdf' );

Can't call method "new_obj" on an undefined value at C:/PDL24/perl/site/lib/PDF/API2/Basic/PDF/Pages.pm line 95.

=========================================================
still get this error, now on line 96
=========================================================


Sample.pdf is attached. Its 'Pages' root has a single intermediate node, which further has 11 leaves i.e. pages.

'Pages' are blessed in line 521 of PDF::API2::Basic::PDF::File, i.e. not in constructor of PDF::API2::Basic::PDF::Pages. In line 61 of latter module there's assignment to ' outto' property. If file is opened, this property is never assigned. But in line 207 it is used to create intermediate node. Hence, the above error message.

I suggest adding after line 521 in PDF::API2::Basic::PDF::File:

Code: [Select]
$result-> {' outto'} = [ $self ];

Then script completes, but blank page is prepended (??). It looks like this problem is in line 213 of PDF::API2::Basic::PDF::Pages. Condition is always false, it should be changed to

Code: [Select]
$pindex = -1 if ($pindex == $ppnum - 1);

=========================================================
Now is line 215. I'm not sure about the condition "is always false". If the loop at 212 gets a match,
it breaks to 215 with $pindex = the page number, which will be < $ppnum. If no match, $pindex
should == $ppnum and for loop exits. This sets $pindex to -1 so to push a new page, rather than
splice in a new page in middle of array.
=========================================================
proposed change (also second one below) moves blank page to end, rather than beginning. This is
correct, but it doesn't make sense (no match should produce pindex==ppnum)
=========================================================


Then document is modified as expected. I didn't test these fixes extensively, because I don't yet understand fully internal mechanics of this module.


Mon May 29 07:34:01 2017 futuramedium [...] yandex.ru - Correspondence added
From: futuramedium [...] yandex.ru

No, still not good enough. Adding another page in above script, i.e. duplicating "$pdf-> page;" results in broken PDF file. Looks like increment statement in "for" loop in line 192 of PDF::Builder::Basic::PDF::Pages reads from file and thus previously made changes to object in memory are lost.

I think if object already was ' realised', then it should not be read again. Is that what this flag is for? Then the whole "realise" method of PDF::Builder::Basic::PDF::Objind -- line 157 -- should be replaced with:

Code: [Select]
sub realise {
    my ( $self ) = @_;

    return $self if $self-> { ' realised' } or
                   !$self-> { ' objnum' };
   
    $self-> { ' realised' } = 1;
    $self-> { ' parent' }-> read_obj( $self )
}

===== old code ================================================

sub realise {
    $_[0]->{' realised'}?
         $_[0]:
         $_[0]->{' objnum'}?
                $_[0]->{' parent'}->read_obj(@_):
                $_[0];
}

===========================================================
They do the same thing, but the new code also sets ' realised' to 1.
===========================================================


Further, line 539 of PDF::Builder::Basic::PDF::File should be deleted (commented out), because it just makes no sense (??).

With these changes it seems to work, but it requires more tests from someone who understands internal logic of this distribution.

And, change to line 213 in report above should be duplicated, for the same reason, in line 188 of PDF::Builder::Basic::PDF::Pages.

=========================================================
Same problem as line 215 above.
=========================================================


Mon May 29 09:58:34 2017 futuramedium [...] yandex.ru - Correspondence added
From: futuramedium [...] yandex.ru

Sorry, I forgot about related issue: the PDF::Builder::Basic::PDF::Pages::rebuild_tree is a no-op, and its return contradicts to what POD says. <see CTS3 bug #51 -- Mod.>

  • Speaking about strange code fragments (unrelated issue). What does line 815 of PDF::Builder::Basic::PDF::File was supposed to do?
  • Another unrelated issue: lines 163-181 of PDF::Builder::Basic::PDF::Dict are meant to replace LZWDecode with FlateDecode, preventing second FlateDecode in Filter array. But this fragment will never work.
   
  • It assumes Filter is always an array.
  • Splicing is performed on array that is just not used further. I think @filters array was intended to be spliced.
  • If LZWDecode happens before FlateDecode, it's just replaced with FlateDecode. If there was another FlateDecode further in array, there'll be two of them.
  • If LZWDecode happens after FlateDecode, it's not pushed to @filters. And what is supposedly spliced later, is some other different filter.
   
Ehm, maybe it's better be replaced with:

Code: [Select]
@filters = map { $_-> val } ref $self-> { Filter }{ ' val' } eq 'ARRRAY'
    ? @{ $self-> { Filter }{ ' val' }}
    : ( $self-> { Filter });
my $hasflate = grep { /^FlateDecode$/ } @filters;
@filters = map {
    s/^LZWDecode$/FlateDecode/ && $hasflate
        ? ()
        : "PDF::API2::Basic::PDF::Filter::$_"-> new
} @filters;

But of course there hardly exists a PDF file with both LZWDecode and FlateDecode in Filter array.

 PhilterPaper commented on Oct 5

Quote
Further, line 539 of PDF::Builder::Basic::PDF::File should be deleted (commented out), because it just makes no sense (??).

Line numbers change over time. Is this the line you are referring to?

Code: [Select]
       $result->{' realised'} = 0;
        # gdj: FIXME: if any of the ws chars were crs, then the whole
        # string might not have been read.

========================================================
I don't think that's it, but reporter needs to elaborate on it.
========================================================


Quote
Speaking about strange code fragments (unrelated issue). What does line 815 of PDF::Builder::Basic::PDF::File was supposed to do?

Line numbers change over time. Is this the line you are referring to? It does look a little unproductive. Maybe it's a cut and paste that wasn't cleaned up right, or just an outright typo.

Code: [Select]
$tdict->{' xref'}{$i}[0] = $tdict->{' xref'}{$i}[0];
========================================================
removed from API2 and Builder in early July
========================================================


Quote
Another unrelated issue: lines 163-181 of PDF::Builder::Basic::PDF::Dict are meant to replace LZWDecode with FlateDecode, preventing second FlateDecode in Filter array. But this fragment will never work.

I'll have to take a look at it some time later. It would be a lot easier to open a new issue (ticket) for each such thing, so as one item is cleared, an issue can be closed.
« Last Edit: December 25, 2017, 06:30:37 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 359
    • View Profile
Mon May 29 07:34:01 2017 futuramedium [...] yandex.ru - Correspondence added
From:    futuramedium [...] yandex.ru

No, still not good enough. Adding another page in above script, i.e. duplicating "$pdf-> page;" results in broken PDF file. Looks like increment statement in "for" loop in line 192 of PDF::API2::Basic::PDF::Pages reads from file and thus previously made changes to object in memory are lost.

I think if object already was ' realised', then it should not be read again. Is that what this flag is for? Then the whole "realise" method of PDF::API2::Basic::PDF::Objind -- line 157 -- should be replaced with:
Code: [Select]
sub realise {
    my ( $self ) = @_;

    return $self if $self-> { ' realised' } or
                   !$self-> { ' objnum' };
   
    $self-> { ' realised' } = 1;
    $self-> { ' parent' }-> read_obj( $self )
}
Further, line 539 of PDF::API2::Basic::PDF::File should be deleted (commented out), because it just makes no sense (??).

With these changes it seems to work, but it requires more tests from someone who understands internal logic of this distribution.

And, change to line 213 in report above should be duplicated, for the same reason, in line 188 of PDF::API2::Basic::PDF::Pages.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 359
    • View Profile
Mon May 29 09:58:34 2017 futuramedium [...] yandex.ru - Correspondence added
From:    futuramedium [...] yandex.ru

Sorry, I forgot about related issue: the PDF::API2::Basic::PDF::Pages::rebuild_tree is a no-op, and its return contradicts to what POD says. <see CTS3 bug -- Mod.>

+ Speaking about strange code fragments (unrelated issue). What does line 815 of PDF::API2::Basic::PDF::File was supposed to do?

+ Another unrelated issue: lines 163-181 of PDF::API2::Basic::PDF::Dict are meant to replace LZWDecode with FlateDecode, preventing second FlateDecode in Filter array. But this fragment will never work.

(1) It assumes Filter is always an array.
(2) Splicing is performed on array that is just not used further. I think @filters array was intended to be spliced.
(3) If LZWDecode happens before FlateDecode, it's just replaced with FlateDecode. If there was another FlateDecode further in array, there'll be two of them.
(4) If LZWDecode happens after FlateDecode, it's not pushed to @filters. And what is supposedly spliced later, is some other different filter.

Ehm, maybe it's better be replaced with:
Code: [Select]
@filters = map { $_-> val } ref $self-> { Filter }{ ' val' } eq 'ARRRAY'
    ? @{ $self-> { Filter }{ ' val' }}
    : ( $self-> { Filter });
my $hasflate = grep { /^FlateDecode$/ } @filters;
@filters = map {
    s/^LZWDecode$/FlateDecode/ && $hasflate
        ? ()
        : "PDF::API2::Basic::PDF::Filter::$_"-> new
} @filters;
But of course there hardly exists a PDF file with both LZWDecode and FlateDecode in Filter array.