Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 120397 - Can’t handle newlines in references

  • 4 Replies
  • 1760 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
RT 120397 - Can’t handle newlines in references
« February 27, 2017, 11:03:20 AM »
Sun Feb 26 17:41:13 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Ticket created
Subject:    Can’t handle newlines in references

In PDF syntax, an indirect reference consists of three distinct tokens that can be separated by any PDF whitespace, and even comments.  For example, this is a syntactically valid indirect reference:

1 %eieio
0
R

PDF::API2 does not allow comments at all (based on reading the code; that is not a problem for my PDFs).  But it does choke on newlines if the object is long enough that it has not all been read into the file yet.

This happens with:

1895 0
obj<</Count
253/Kids[1896
0
R
1
0
R
7
0
R
13
0
R
...

etc., with 253 entries.

PDF::API2::Basic::PDF::File::readval needs to read more data if it finds what could be a partial reference.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Re: RT 120397 - Can’t handle newlines in references
« Reply #1: March 28, 2017, 10:58:46 AM »
Anyone looking at this one might also want to look at rejected bug RT 106020 — I have a vague feeling that they may be somewhat related.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Re: RT 120397 - Can’t handle newlines in references
« Reply #2: April 19, 2017, 08:57:04 PM »
Also look in /lib/PDF/API2/Basic/PDF/File.pm at 3 developer notes: first, a disagreement about whether carriage returns can follow streams, and then two "FIXME"s concerning carriage-returns interrupting the reading of strings. Any of these might have some bearing on this bug.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Re: RT 120397 - Can’t handle newlines in references
« Reply #3: June 26, 2017, 06:57:58 PM »
Sat Jun 24 11:52:14 2017 steve [...] deefs.net - Correspondence added

Thanks for the bug report.

I think this should now be working as expected.  See t/rt120397.t for the cases that are now being tested -- if anything is missing, add a test and let me know.

On Sun Feb 26 17:41:13 2017, SPROUT wrote:
Quote
In PDF syntax, an indirect reference consists of three distinct tokens that can be separated by any PDF whitespace, and even comments.  For example, this is a syntactically valid indirect reference:
 
1 %eieio
0
R
 
PDF::API2 does not allow comments at all (based on reading the code; that is not a problem for my PDFs).  But it does choke on newlines if the object is long enough that it has not all been read into the file yet.
 
This happens with:
 
1895 0
obj<</Count
253/Kids[1896
0
R
1
0
R
7
0
R
13
0
R
...
 
etc., with 253 entries.
 
PDF::API2::Basic::PDF::File::readval needs to read more data if it finds what could be a partial reference.
#
Sat Jun 24 11:52:15 2017 The RT System itself - Status changed from 'new' to 'open'
#
Sat Jun 24 11:52:15 2017 steve [...] deefs.net - Status changed from 'open' to 'patched'
#
Sat Jun 24 14:41:07 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Correspondence added

On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
Quote
Thanks for the bug report.
 
I think this should now be working as expected. See t/rt120397.t for the cases that are now being tested -- if anything is missing, add a test and let me know.
Thank you.

I’m afraid it is still not working.  Attached is a sample 400-page PDF that it fails on.  This PDF may not actually be valid.  To keep the file size small, I made the pages Kids array reference the same PDF 400 times.  Adobe Reader does not like this file, but there is nothing the specification to suggest that the same page object cannot be referenced multiple times in the Kids array.

In any case, it makes a good test.  I am not sure where you would put this in the repository, but a simple
Code: [Select]
    ok eval { PDF::API2->open("t/newlines.pdf") }
will suffice.
Subject:    newlines.pdf

#
Sat Jun 24 14:42:10 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Correspondence added

On Sat Jun 24 14:41:07 2017, SPROUT wrote:
Quote
BTW, the error I get is:

Can't parse `R
3
0
R
3
0
... many times over ...
3
0
R' near 1000 length 313. at /Library/Perl/5.12/PDF/API2/Basic/PDF/File.pm line 682.
#
Sat Jun 24 15:03:07 2017 steve [...] deefs.net - Correspondence added

Can you update to HEAD and try again, please?  I made a couple more fixes an hour or so after updating this ticket, and I'm guessing you don't have that commit.

The PDF you attached opens fine for me on HEAD (but not on the original fix).

On Sat Jun 24 14:41:07 2017, SPROUT wrote:
Quote
On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
Quote
Thanks for the bug report.

I think this should now be working as expected.  See t/rt120397.t for the cases that are now being tested -- if anything is missing, add a test and let me know.

Thank you.
 
I’m afraid it is still not working.  Attached is a sample 400-page PDF that it fails on.  This PDF may not actually be valid.  To keep the
file size small, I made the pages Kids array reference the same PDF 400 times.  Adobe Reader does not like this file, but there is nothing  the specification to suggest that the same page object cannot be referenced multiple times in the Kids array.
 
In any case, it makes a good test.  I am not sure where you would put this in the repository, but a simple
Code: [Select]
ok eval { PDF::API2->open("t/newlines.pdf") }
will suffice.
#
Sat Jun 24 17:05:04 2017 $_ = 'spro^^*%*^6ut# [...] &$%*c>#!^!#&!pan.org'; y/a-z.@//cd; print - Correspondence added

On Sat Jun 24 15:03:07 2017, SSIMMS wrote:
Quote
Can you update to HEAD and try again, please?  I made a couple more fixes an hour or so after updating this ticket, and I'm guessing you don't have that commit.
 
The PDF you attached opens fine for me on HEAD (but not on the original fix).
Yes, it works now.  Thank you.

Quote

On Sat Jun 24 14:41:07 2017, SPROUT wrote:
Quote
On Sat Jun 24 11:52:14 2017, SSIMMS wrote:
Quote
Thanks for the bug report.

I think this should now be working as expected.  See t/rt120397.t for the cases that are now being tested -- if anything is missing, add
a test and let me know.

Thank you.

I’m afraid it is still not working.  Attached is a sample 400-page PDF that it fails on.  This PDF may not actually be valid.  To keep the
file size small, I made the pages Kids array reference the same PDF 400 times.  Adobe Reader does not like this file, but there is
nothing the specification to suggest that the same page object cannot be referenced multiple times in the Kids array.

In any case, it makes a good test.  I am not sure where you would put this in the repository, but a simple
Code: [Select]
ok eval { PDF::API2->open("t/newlines.pdf") }
will suffice.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 417
    • View Profile
Re: RT 120397 - Can’t handle newlines in references
« Reply #4: July 03, 2017, 10:41:23 AM »
Sun Jul 02 23:47:06 2017 steve [...] deefs.net - Status changed from 'patched' to 'resolved'
#
Sun Jul 02 23:47:12 2017 steve [...] deefs.net - Fixed in 2.032 added