Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

[RT 122962] Reusing PDF::API2 objects for different PDFs

  • 0 Replies
  • 555 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 364
    • View Profile
[RT 122962] Reusing PDF::API2 objects for different PDFs
« December 25, 2017, 03:05:16 PM »
 PhilterPaper commented on Sep 8

Subject: Reusing PDF::API2 objects for different PDFs

Date: Tue, 5 Sep 2017 12:55:47 +0100
From: Andrew Beverley <andy [...] andybev.com>

Firstly, thanks for a great module. I am using it to generate a PDF with many pages. Producing the whole PDF as one object in one go uses huge amounts of memory, so I now produce each page one-by-one and then concatenate them afterwards using CAM::PDF.

This works well, in that significantly less memory is used, but it is slow, as I am creating a new PDF::API2 object each time.

From the small amount of profiling I have done, a lot of time seems to be spent adding the TTF fonts. I wondered, is there some way to reuse the PDF::API2 object (or just the fonts) and create a fresh page each time?

I have tried various hacks (I won't detail them all here), such as reusing the ttfont object in multiple PDFs, deleting the pages from the object, and so on, but I couldn't get any to work.

Do you have any suggestions please? If you do, and it involves some coding, I would be happy to investigate providing a patch.

Thanks,
Andy

 PhilterPaper commented on Sep 8

on Tue Sep 05 20:01:42 2017 steve [...] deefs.net - Correspondence added

There are probably some ways to speed up that operation, but depending on what kind of coding you're up for trying, it might be possible to solve your original problem instead.

Take a look at my comments on ticket 113516. Currently, when PDF::API2 opens a file, it reads the whole thing into memory, but that wasn't always the case, and the code that PDF::API2 is built on top of doesn't require that everything be loaded in memory either.

It's theoretically possible for you to create a number of pages, write those out to disk, free up the memory, and repeat, without closing and reopening the file. If you want to start down that trail, look at PDF::API2->finishobjects() and follow the path for details about writing out a file in chunks.

Freeing the memory without closing the file may be trickier (I haven't looked into that yet). I'm guessing it'll involve the release_obj() call in PDF::API2::Basic::PDF::File -- if I'm reading the code correctly, that will remove it from the various caches, but without actually removing it from the PDF. The release() call will almost definitely free the memory, but I think that's only supposed to be called when you're done with the file.

As an aside, several comments in the code mention circular references. As of a release or two ago, those should no longer exist (if you find any, please give me a test case), so that should simplify things.

If you get to a point where you can call finishobjects() more than once and get a working file, but are still running out of memory, let me know (preferably with sample code) and we can dive into that problem more deeply.

If that ends up being too complicated and you'd rather keep trying to speed up the ttfont calls, it should be possible to reuse the time-consuming part of that object's creation. It may be as simple as calling $new_pdf->{'pdf'}->new_obj($font_object_from_old_pdf) instead of $new_pdf->ttfont(...). That definitely wouldn't qualify as intended/supported behavior, but it might work.

-- Steve

on Tue Sep 05 20:01:42 2017 The RT System itself - Status changed from 'new' to 'open'

 PhilterPaper commented on Sep 17

on Tue Sep 05 20:01:42 2017 The RT System itself - Status changed from 'new' to 'open'

on Tue Sep 12 06:52:45 2017 andy [...] andybev.com - Correspondence added

Hi Steve, thanks for the quick and comprehensive reply. I've spent a while trying your suggestions (comments below), but am unfortunately no further forward.

At this point I should say that this is more of a nice to have than an essential requirement, so if there are no quick-wins for either of us then I will be happy for you to close the ticket. Have a look at the below if you get the time anyway, and let me know what you think.

Quote
Take a look at my comments on ticket 113516. Currently, when PDF::API2 opens a file, it reads the whole thing into memory, but that wasn't always the case, and the code that PDF::API2 is built on top of doesn't require that everything be loaded in memory either.

Thanks. I don't think this particular information helps, as I am writing out, not reading.

Quote
It's theoretically possible for you to create a number of pages, write those out to disk, free up the memory, and repeat, without closing and reopening the file. If you want to start down that trail, look at PDF::API2->finishobjects() and follow the path for details about writing out a file in chunks.

Freeing the memory without closing the file may be trickier (I haven't looked into that yet). I'm guessing it'll involve the release_obj() call in PDF::API2::Basic::PDF::File -- if I'm reading the code correctly, that will remove it from the various caches, but without actually removing it from the PDF. The release() call will almost definitely free the memory, but I think that's only supposed to be called when you're done with the file.

I've spent a while playing around with the above. I seem to be able to write out a PDF in chunks, but whenever I try to do so along with calls to free the memory, I run into problems. The finishobjects() in itself doesn't seem to make any difference to memory use, and whenever I try it with something like a save or release_obj then I get:

Code: [Select]
Can't call method "new_obj" on an undefined value at /usr/share/perl5/PDF/API2/Basic/PDF/Pages.pm line 92
Quote
If you get to a point where you can call finishobjects() more than once and get a working file, but are still running out of memory, let me know (preferably with sample code) and we can dive into that problem more deeply.

I should have said before that I am using PDF::TextBlock. I don't think this affects the principle though, as I run into similar problems if I remove it and write lots of text using raw calls.

Anyway, FWIW, here is a MWE:

Code: [Select]
my $pdf = PDF::API2->new(-file => 'mypdf.pdf');

for my $count (1..100)
{   my $page = $pdf->page;
    my $tb = PDF::TextBlock->new({
           pdf => $pdf,
          page => $page,
          x => 100,
          y => 100,
    });
    for my $count2 (1..20)
    {
         $tb->text("Text $count2");
         $tb->apply;
    }
    $pdf->finishobjects;
}
$pdf->save;

Quote
If that ends up being too complicated and you'd rather keep trying to speed up the ttfont calls, it should be possible to reuse the time-consuming part of that object's creation. It may be as simple as calling $new_pdf->{'pdf'}->new_obj($font_object_from_old_pdf) instead of $new_pdf->ttfont(...). That definitely wouldn't qualify as intended/supported behavior, but it might work.

Given the relatively modest potential gains, I've decided this is probably best avoided!

Thanks again, and please do feel free to close this ticket if it all looks like too much hassle.

Andy