Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 114976 - Huge memory consumption in page splitting

  • 7 Replies
  • 2257 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
RT 114976 - Huge memory consumption in page splitting
« October 20, 2016, 08:00:09 PM »
Wed Jun 01 19:14:36 2016 dosio [...] land.it - Ticket created
Subject:    Huge memory consumption in page splitting
Date:    Thu, 2 Jun 2016 01:14:16 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Claudio Dosio <dosio [...] land.it>
 
In some cases while splitting the PDF in single pages some of the image objects therein "explode" in size. An embedded image of about 1MB in the PDF can take the entire system to use more than 8GB of ram.

If needed I can provide some PDFs with these cases.

Best regards
Claudio

--
Claudio Dosio
Responsabile R&S Software

<formatting cleanup - Mod.>
« Last Edit: May 01, 2017, 10:44:14 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #1: October 21, 2016, 05:05:44 PM »
Fri Oct 21 16:26:54 2016 steve [...] deefs.net - Correspondence added

Yes, I'll need some sample PDFs and example code to troubleshoot this.
#
Fri Oct 21 16:26:54 2016 The RT System itself - Status changed from 'new' to 'open'
#
Fri Oct 21 16:26:55 2016 steve [...] deefs.net - Status changed from 'open' to 'stalled'

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #2: October 22, 2016, 07:52:36 PM »
Sat Oct 22 07:47:27 2016 dosio [...] land.it - Correspondence added
Subject:    Re: [rt.cpan.org #114976] Huge memory consumption in page splitting
Date:    Sat, 22 Oct 2016 13:46:25 +0200
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Claudio Dosio <dosio [...] land.it>

Hello,
attached is one of the PDFs that create the problem. I cannot send the other ones as they contain customer private data. The problem mainly occurs when at least one of the images in the PDF comes from a scanner or MFC device. Probably the PDFs in that case have something weird in their structure but they can be opened without problems with Acrobat Reader or similar tools, which makes it hard to explain to somebody that their PDF has problems.

One of the workarounds I found is to convert the incoming PDF via pdfopt, pdftk or ghostscript but that takes time and does not always guarantee the result.

What I would need if the bug cannot be easily fixed is to have some error from the PDF::API2 library that I can use to return an error
condition upstream, while now what seems to happen is that the perl continues to run even if the page has not been fully split but any exec/system command that tries to run an external command returns a out of memory condition from the external command itself.

Please let me know if I can be of any help for you to solve the problem.

Best regards
Claudio

--
Claudio Dosio
Responsabile R&S Software

<< file too large to attach. it is available at https://rt.cpan.org/Public/Ticket/Attachment/1678038/900541/prova20150224.pdf >>
<formatting cleanup - Mod.>
« Last Edit: May 01, 2017, 10:46:15 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #3: November 03, 2016, 10:36:29 PM »
Thu Nov 03 19:07:58 2016 steve [...] deefs.net - Correspondence added

Can you give me some example code that demonstrates the problem, please?  I just tried a simple script to import the pages individually to another PDF, and didn't run into any problems.


*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #4: November 09, 2016, 04:05:55 PM »
Wed Nov 09 10:42:22 2016 steve [...] deefs.net - Correspondence added

I came across a file that was showing similar symptoms to what you reported, and fixed a few bugs.  Please try out either the latest code at GitHub or developer's release 2.030_001 to see if it's fixed for you as well.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #5: November 10, 2016, 04:15:27 PM »
Thu Nov 10 14:18:13 2016 dosio [...] land.it - Correspondence added
Subject:    Re: [rt.cpan.org #114976] Huge memory consumption in page splitting
Date:    Thu, 10 Nov 2016 20:17:49 +0100
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Claudio Dosio <dosio [...] land.it>

The fix solves the problem on some PDF but not on all. I have tried a setup with a 1.2MB PDF on a VM that has 4 GB of ram and it cannot manage to split the document, going in out of memory. Unluckily, I cannot give you the document that produces that behaviour since it contains private data.

Below is the code I use to split and process the pages:

Code: [Select]
$inPdf = PDF::API2->open("$tempDirname/$inPdfFilename");
$numOfPages = $inPdf->pages;

while ( $pageNum <= $numOfPages ) {

        my $outPdf = PDF::API2->new(-file => "$tempDirname/${inPdfFilename}_${pageNum}");

        my $xo   = $outPdf->importPageIntoForm( $inPdf, $pageNum );
        my $page = $outPdf->page;
        my $p7m  = '';

        $page->mediabox( $configParams->{'pdfPageW'}, $configParams->{'pdfPageH'} );

        my $gfx = $page->gfx;
        $gfx->formimage( $xo, 0, 0, 1 );
        $outPdf->save();

        if (isToGlyph($pageNum,$configParams)) # It is actually a configuration parameter which is true
        {
               .... Does some actions on the split pages (copy, compress, etc) ....
        }
        $pageNum++;
}

This code actually goes in out of memory before the IF statement


Many thanks for your help

Claudio


Claudio Dosio
Responsabile R&S Software

<formatting cleanup - Mod.>
« Last Edit: May 01, 2017, 10:49:00 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #6: July 02, 2017, 09:32:49 PM »
Sun Jul 02 19:21:05 2017 steve [...] deefs.net - Correspondence added

PDF::API2 2.032 (just released) includes memory-related improvements that may help in this case.

Since I'm not able to reproduce this issue, I'm going to close this ticket.  Feel free to create a new one if you have a file you can share (you can also send me a file privately, which I'll use solely to track down this issue).
#
Sun Jul 02 19:21:06 2017 steve [...] deefs.net - Status changed from 'open' to 'rejected'

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 114976 - Huge memory consumption in page splitting
« Reply #7: September 14, 2018, 01:13:41 PM »
Note that this ticket is closed as rejected on the PDF::API2 queue, but is still open here. Since a suggested patch is offered, I won't discard it just yet, but if no one shows up with a reproducible test case within 12 months or so, I'll go ahead and close this one.