Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

RT 113290 - Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basi

  • 2 Replies
  • 1223 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Wed Mar 23 09:32:55 2016 melmothx [...] gmail.com - Ticket created
Subject:    Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date:    Wed, 23 Mar 2016 14:31:55 +0100
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Marco Pessotto <melmothx [...] gmail.com>

See the attached patch: read_stream.patch
Code: [Select]
diff --git a/lib/PDF/API2/Basic/PDF/File.pm b/lib/PDF/API2/Basic/PDF/File.pm
index a9bba4b..e0b6cc8 100644
--- a/lib/PDF/API2/Basic/PDF/File.pm
+++ b/lib/PDF/API2/Basic/PDF/File.pm
@@ -711,7 +711,7 @@ sub read_objnum {
         my $src = $self->read_objnum($object_location->[0], 0, %opts);
         die 'Cannot find the compressed object stream' unless $src;
 
-        $src->read_stream if $src->{' nofilt'};
+        $src->read_stream(1) if $src->{' nofilt'};
 
         my $map = substr($src->{' stream'}, 0, $src->{'First'}->val);
         my $objects = substr($src->{' stream'}, $src->{'First'}->val);

From the doc, it looks like read_stream without a true argument empties the ' stream' content in some cases, storing it on the disk. But here the code unconditionally assumes that ' stream' is always set to a string.

WARNING: I'm not sure the patch does the right thing, though, but appears to work.

Attached a sample PDF which triggers the bug. I couldn't strip down the file to have a more reasonable size, sorry.
<< unable to attach file (too large). obtain at https://rt.cpan.org/Public/Ticket/Attachment/1643682/881493/46006606.pdf >>
<< unable to attach file (too large). obtain at https://rt.cpan.org/Public/Ticket/Attachment/1610700/861766/large-compressed.pdf >>

Code: [Select]
perl -Ilib -MPDF::API2 -e 'PDF::API2->open("large-compressed.pdf");'
Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.

Cheers

--
Marco
#
Sat Mar 26 09:16:57 2016 steve [...] deefs.net - Correspondence added

I've applied the patch.  Better would be to change read_objnum to read from the file if $src->{' streamfile'} is set.  Do you want to try implementing that?

If so, I'm not entirely sure that the way Dict.pm creates the streamfile is entirely correct -- if you run into problems with it, let me know, preferably with a large file I can use to troubleshoot.
#
Sat Mar 26 09:16:57 2016 The RT System itself - Status changed from 'new' to 'open'
#
Sat Mar 26 09:16:59 2016 steve [...] deefs.net - Status changed from 'open' to 'patched'
#
Sat Mar 26 09:19:32 2016 steve [...] deefs.net - Correspondence added

On Sat Mar 26 09:16:57 2016, SSIMMS wrote:

... assuming that it's a different file than the one that's already attached to this ticket.
#
Sat Mar 26 10:01:18 2016 melmothx [...] gmail.com - Correspondence added
Subject:    Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date:    Sat, 26 Mar 2016 15:01:04 +0100
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Marco Pessotto <melmothx [...] gmail.com>

Hi Steve!

Sure, I can give it a try over the next days. I'll let you know.

--
Marco
#
Mon Mar 28 11:38:38 2016 melmothx [...] gmail.com - Correspondence added
Subject:    Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date:    Mon, 28 Mar 2016 17:38:31 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Marco Pessotto <melmothx [...] gmail.com>
 
Please see:

https://github.com/ssimms/pdfapi2/pull/4

I'm not sure it's ready to be merged, as testing it effectively is a bit complicated, so I'd appreciate it if you would take a look at it.

--
Marco
#
Sat Apr 30 09:34:36 2016 melmothx [...] gmail.com - Correspondence added
Subject:    Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date:    Sat, 30 Apr 2016 15:34:29 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Marco Pessotto <melmothx [...] gmail.com>

Hello Steve and sorry to bother you.

Beside the patch sitting in the PR (which I understand could require same mumbling), could we have a release with the fix in this ticket, i.e. at this commit https://github.com/ssimms/pdfapi2/commit/e576b85de116e6ef2e475adb6d5ad68261a84b83

It's too bad the CPAN is delivering a version which breaks and the fix is already in. Maybe not optimal, but at least working.

Please let me know if I can be of any help.

Best wishes

--
Marco
#
Thu Jun 30 12:39:32 2016 bkrzno [...] hotmail.com - Correspondence added
Subject:    Re: [rt.cpan.org #113290] Crash with Objind 1 does not exist at index 316 at lib/PDF/API2/Basic/PDF/File.pm line 725.
Date:    Thu, 30 Jun 2016 16:38:54 +0000
To:    "bug-PDF-API2 [...] rt.cpan.org" <bug-PDF-API2 [...] rt.cpan.org>
From:    Branko Krznaric <bkrzno [...] hotmail.com>
 
Thank you for fixing this bug. I have installed the latest release PDF-API2-2.028. I can now open some PDF files I was not able before, but I get similar error when opening some encrypted PDFs, e.g. "Objind 807 does not exist at index 0 at lib/PDF/API2/Basic/PDF/File.pm line 722."


This has happened with several PDFs. They all have in common that they are encrypted. I have attached a sample PDF which triggers the bug.


Tech details:

perl v5.20.1 built for MSWin32-x86-multi-thread-64int

OS Win 7, SP 1, 32bit


Thank you in advance for looking into this! I highly appreciate your work.


Branko
46006606.pdf

#
Sun Oct 09 18:26:17 2016 steve [...] deefs.net - Correspondence added

On Thu Jun 30 12:39:32 2016, bkrzno@hotmail.com wrote:
Show quoted text
I just took a look at this, and it appears to be a separate issue -- PDF::API2 doesn't know how to read encrypted PDFs.  That would be a nice wishlist item, and doesn't appear to be especially difficult to implement.  Feel free to create a new ticket for it, particularly if you'd like to try to write the code (I can provide pointers if so).

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Fri Oct 21 14:30:04 2016 steve [...] deefs.net - Correspondence added

I've just rewritten some of the code in Dict.pm and File.pm:

Dict.pm: Previously, read_stream was only creating a stream cache file when a given 4kB block uncompressed to over 16kB, whereas it was supposed to do so whenever the uncompressed stream was more than 32kB.  I've fixed that, and increased the max in-memory stream size from 32kB to 16MB when $force_memory isn't set.

File.pm: read_objnum will now read from a stream cache file if one exists, rather than requiring that the entire object stream be read into memory first.  I've also added comments and used more descriptive variable names in the hope of improving maintainability.

This should resolve the issue without increasing memory consumption when there's a large object stream.

To test it, you can set $mincache to a small number (e.g. 8192) to ensure that the cache files get created.

Can you confirm that things are working properly for you using the latest code on GitHub?

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 387
    • View Profile
Mon Jan 30 11:50:06 2017 steve [...] deefs.net - Status changed from 'patched' to 'resolved'
#
Mon Jan 30 11:50:12 2017 steve [...] deefs.net - Fixed in 2.031 added