Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

[RT 131657] pdfapi2 regression 0ab55cd53 [Testcase]

  • 2 Replies
  • 57 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 632
    • View Profile
[RT 131657] pdfapi2 regression 0ab55cd53 [Testcase]
« February 03, 2020, 03:45:16 PM »
Mon Feb 03 06:05:43 2020 winter [...] bfw-online.de - Ticket created
Subject:    pdfapi2 regression 0ab55cd53 [Testcase]
Date:    Mon, 3 Feb 2020 11:49:28 +0100
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Leon Winter <winter [...] bfw-online.de>

Hi,

we are using your PDF::API2 module (via Debian) and just upgraded to a more recent version where we noticed a bug:

Deep recursion on subroutine "PDF::API2::Basic::PDF::Objind::release" at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 123.
 at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 122.
        PDF::API2::Basic::PDF::Objind::release(PDF::API2::Outline=HASH(0x583c1788)) called at /usr/share/perl5/PDF/API2/Basic/PDF/Objind.pm line 122

I could identify the commit 0ab55cd535b3b3ac7f616589d00ff00864626030 as culprit. Using the current version (at master) with this commit reverted we do not run into this bug. We use the module to modify a 200+-page PDF to add outlines and titles.

See my attached test case (Perl script) which I run on a empty 200+ page PDF file which I also attached.

I created the PDF file this way:
 
Code: [Select]
convert xc:none -page A4 a.pdf
 pdftk A=blank.pdf cat A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
 A A A A A A A A A A A A A A A A A A A A A A A A A A output blank217.pdf

Then run the script with a current pdfapi2 in path, I did:
 PERL5LIB=pdfapi2/lib perl -w testcase.pl

Output after regression:
Deep recursion on subroutine "PDF::API2::Basic::PDF::Objind::release" at pdfapi2/lib/PDF/API2/Basic/PDF/Objind.pm line 123.
[...]

Output before regression (0ab55cd535b3b3ac7f616589d00ff00864626030):
 (nothing)

Thanks,
Leon

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 632
    • View Profile
Re: [RT 131657] pdfapi2 regression 0ab55cd53 [Testcase]
« Reply #1: February 11, 2020, 07:51:42 PM »
Tue Feb 11 12:35:18 2020 futuramedium [...] yandex.ru - Correspondence added

Well, I had some time today, what follows is maybe not a patch "as is", rather invitation for a review. Plus, I didn't investigate why there were no warnings before mentioned commit -- just looking at current code (though I suspect it is related to efforts to prevent memory leaks, circular refs, etc., therefore there's no way back to "before that commit").

Reduced to SSCCE, the test would be:
Code: [Select]
use strict;
use warnings;
use PDF::API2;

my $pdf = PDF::API2-> open( 'blank217.pdf' );
my $outlines = $pdf-> outlines;
$outlines-> outline-> dest( $pdf-> openpage( $_ ))
    for 0 .. 100;
$pdf-> { pdf }-> release;

There are from 0 to quite a few warnings then, different from run to run, and only reason for that is "values" returns random order in line 115 https://metacpan.org/release/PDF-API2/source/lib/PDF/API2/Basic/PDF/Objind.pm#L115

Hm-m, OK. The Outlines tree is quite inter-connected, with Prev/Next links for intermediate/leaf, First/Last for root/intermediate, and Parent for all but root. I suspect, because of that, with just 101 bookmarks we easily (but not always) hit the deep recursion limit, then.

Actually, many PDF structures are similarly inter-connected -- e.g. consider a pages tree. I think, though didn't try to confirm, that a relatively (maybe a bit more) complex pages tree can trigger the same warnings when "released", because of Parent member.

I think, with Outlines tree, we don't have to follow any of Parent/First/Last/Prev/Next links to manually "release" them. For simple reason, that leafs/intermediate nodes are members of ' children' arrays. In fact, matter-of-fact autovivification of these arrays (https://metacpan.org/release/PDF-API2/source/lib/PDF/API2/Outline.pm#L131) is somewhat ...disturbing. OK, what I'm suggesting, is that said "Parent/First/Last/Prev/Next" values are weakened, and "weak" refs ARE NOT added to queue to be manually released.

For Outlines, they are released anyway because there are stored in ' children'. For Pages, any "Parent" is already weakened if I'm reading the code right. With patches applied, there are no warnings however many times I run the test. I hope there are no memory leaks then. For structures other than Outlines or Pages, if PDF::API2 ever deals with them, the solution would be then to weaken refs which are released through other links -- then we don't hit "deep recursion" for trees of any complexity. What do you think? :)

Code: [Select]
--- PDF/API2/Outline_old.pm     Thu Feb  6 01:08:40 2020
+++ PDF/API2/Outline.pm Tue Feb 11 20:16:25 2020
@@ -32,36 +32,43 @@
     $self->{'Prev'}    = $prev   if defined $prev;
     $self->{' api'}    = $api;
     weaken $self->{' api'};
+    weaken $self-> {'Parent'} if defined $parent;
+    weaken $self-> {'Prev'} if defined $prev;
     return $self;
 }

 sub parent {
     my $self = shift();
     $self->{'Parent'} = shift() if defined $_[0];
+    weaken $self-> {'Parent'};
     return $self->{'Parent'};
 }

 sub prev {
     my $self = shift();
     $self->{'Prev'} = shift() if defined $_[0];
+    weaken $self-> {'Prev'};
     return $self->{'Prev'};
 }

 sub next {
     my $self = shift();
     $self->{'Next'} = shift() if defined $_[0];
+    weaken $self-> {'Next'};
     return $self->{'Next'};
 }

 sub first {
     my $self = shift();
     $self->{'First'} = $self->{' children'}->[0] if defined $self->{' children'} and defined $self->{' children'}->[0];
+    weaken $self-> {'First'};
     return $self->{'First'};
 }

 sub last {
     my $self = shift();
     $self->{'Last'} = $self->{' children'}->[-1] if defined $self->{' children'} and defined $self->{' children'}->[-1];
+    weaken $self-> {'Last'};
     return $self->{'Last'};
 }
---------------------------------
--- PDF/API2/Basic/PDF/Objind_old.pm    Thu Feb  6 01:08:40 2020
+++ PDF/API2/Basic/PDF/Objind.pm        Tue Feb 11 20:17:34 2020
@@ -14,6 +14,7 @@

 use strict;
 use warnings;
+use Scalar::Util 'isweak';

 our $VERSION = '2.037'; # VERSION

@@ -112,7 +113,7 @@
 sub release {
     my ($self) = @_;

-    my @tofree = values %$self;
+    my @tofree = grep { !isweak $_ } values %$self;
     %$self = ();

     while (my $item = shift @tofree) {

Tue Feb 11 12:41:58 2020 futuramedium [...] yandex.ru - Correspondence added

Sorry, it looks I added "fixed in 2.037"; I tried to add "my line numbers are for 2.037".
« Last Edit: February 11, 2020, 07:56:04 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 632
    • View Profile
Re: [RT 131657] pdfapi2 regression 0ab55cd53 [Testcase]
« Reply #2: February 14, 2020, 03:46:24 PM »
Thu Feb 13 09:30:19 2020 winter [...] bfw-online.de - Correspondence added

I can confirm that your patch will silence the warnings (also in our production code).

Thanks