Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 117210 - Opening damaged files (was Error: "can't call method "realise" on a)

  • 3 Replies
  • 1274 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Thu Aug 25 08:41:45 2016 regtapy [...] yandex.ru - Ticket created
Subject:    Error: "can't call method "realise" on an undefined value" while open a pdf-file

Hello,

system info:
PDF::API2 VERSION 2.028
perl v5.10.1
FreeBSD 8.3-RELEASE-p3

I've got the error "Can't call method "realise" on an undefined value at /usr/local/lib/perl5/site_perl/5.10.1/PDF/API2.pm line 199." when I tried to open attached file (dr-hilton.pdf).

The minimum code is:
Code: [Select]
#!/usr/bin/perl

use strict;
use warnings;

use PDF::API2;

my $file = 'dr-hilton.pdf';
eval { my $src_pdf = PDF::API2->open( $file ) };
if ( $@ ) {
 warn "Error: $@";
}

1;
Any ideas what's wrong with that?
Thanks

#
Fri Oct 07 00:10:33 2016 steve [...] deefs.net - Correspondence added
Download (untitled) / with headers
text/plain 1.3k
It looks like the file is corrupt.  At the end of the file (byte 187481), there's an "xref" line followed by "1 7".  The 1 indicates that the first object number in the cross-reference table will be 1, and that there are 7 entries (the next seven lines).  However, the next seven lines are numbered from 0 rather than 1.

The "1 7" is invalid -- according to the spec, if there's only one cross-reference table (and this file only has one), it has to start with 0.  PDF::API2 assumes that the 1 is intentional and doesn't return an error.  Since Adobe Reader opens it without complaining, it looks like it assumes the 1 is a mistake and also doesn't return an error.

If you change the "1 7" to "0 7", the file will open properly in both PDF::API2 and Adobe Reader.

#
Fri Oct 07 00:10:33 2016 The RT System itself - Status changed from 'new' to 'open'
#
Fri Oct 07 00:10:38 2016 steve [...] deefs.net - Status changed from 'open' to 'rejected'
« Last Edit: April 15, 2017, 03:39:58 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
This (rejected) bug report brings up an interesting situation. Can — and should — PDF::API2 allow some minor errors in read-in files, and fix them? Of course, a warning message should be issued to the user that there was something wrong with their input PDF file, and PDF::API2 was able to fix it during the read (the original file should obviously not be touched).

Such a capability should probably wait until the issue of handling different PDF versions (both input and output) are adequately sorted out.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
See also RT 106020 — a proposal for doing some sort of validation on PDF files being read in. In that case, it may involve some fixup of a somewhat out-of-spec PDF file, as many readers apparently do. Validation may want to wait for implementation of a PDF version number setting.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Pull request on PDF::API2

 gvsyn commented 8 days ago

When trying to open PDFs from Sharp scanners, open fails reporting invalid PDF version. The header is as follows:

%PDF-1.4 Sharp Scanned ImagePDF
%Sharp Non-Encryption
3 0 obj

From the PDF spec there are no restrictions stating that after the minor version there should be a newline or similar. Adjusted the code to not care what there is after the 1.x

Suggested code fix in   lib/PDF/API2/Basic/PDF/File.pm:
Code: [Select]
@@ -241,7 +241,7 @@ sub open {
241  241        binmode $fh, ':raw';
242  242        $fh->seek(0, 0);            # go to start of file
243  243        $fh->read($buffer, 255);
244         -   unless ($buffer =~ m/^\%PDF\-1\.(\d)+\s*$cr/mo) {
     244    +   unless ($buffer =~ m/^\%PDF\-1\.(\d)+.*$cr/mo) {
245  245            die "$filename not a PDF file version 1.x";
246  246        }
247  247        $self->{' version'} = $1;

*****Implementation: consider grabbing any comment after PDF-x.y and making a new comment line after it
*****  keeping in mind version min/max for read-in version number
« Last Edit: July 07, 2018, 07:30:26 PM by Phil »