Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 113084 - Unreproducible internal font IDs

  • 2 Replies
  • 1586 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
RT 113084 - Unreproducible internal font IDs
« October 21, 2016, 04:13:18 PM »
Wed Mar 16 09:27:23 2016 dam [...] cpan.org - Ticket created
From:    dam [...] cpan.org
Subject:    Unreproducible internal font IDs

We have the following bug reported to the Debian package of PDF-API2 (https://bugs.debian.org/818363):

It doesn't seem to be a bug in the packaging, so you may want to take a look. Thanks!

Quote
The TrueType, BdFont, CoreFont and Postscript subclasses of PDF::API2::Resource::Font append '~'.time() to the resource identifiers they generate. This is unnecessary because the identifier already includes the result of the pdfkey() function, which gives increasing numbers through a process lifetime.

Adding time() is in itself not enough to guarantee unique IDs, but only introduces unreproducible output and poses a slight performance penalty for the syscall needed to get the current time.

The patch below removes time() from the resource IDs, which is sufficient to get reproducible output.

Thanks for considering,
  Damyan Ivanov,
  Debian Perl Group

Code: [Select]
--- a/lib/PDF/API2/Resource/CIDFont/TrueType.pm
+++ b/lib/PDF/API2/Resource/CIDFont/TrueType.pm
@@ -38,7 +38,7 @@ sub new {
     my ($ff,$data)=PDF::API2::Resource::CIDFont::TrueType::FontFile->new($pdf,$file,@opts);
 
     $class = ref $class if ref $class;
-    my $self=$class->SUPER::new($pdf,$data->{apiname}.pdfkey().'~'.time());
+    my $self=$class->SUPER::new($pdf,$data->{apiname}.pdfkey());
     $pdf->new_obj($self) if(defined($pdf) && !$self->is_obj($pdf));
 
     $self->{' data'}=$data;
@@ -51,7 +51,7 @@ sub new {
 
     $de->{'FontDescriptor'} = $des;
     $de->{'Subtype'} = PDFName($self->iscff ? 'CIDFontType0' : 'CIDFontType2');
-    ## $de->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname).'~'.time());
+    ## $de->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname));
     $de->{'BaseFont'} = PDFName($self->fontname);
     $de->{'DW'} = PDFNum($self->missingwidth);
     if($opts{-noembed} != 1)
--- a/lib/PDF/API2/Resource/Font/BdFont.pm
+++ b/lib/PDF/API2/Resource/Font/BdFont.pm
@@ -56,7 +56,7 @@ sub new {
     my %opts=@opts;
 
     $class = ref $class if ref $class;
-    $self = $class->SUPER::new($pdf, sprintf('%s+Bdf%02i',pdfkey(),++$BmpNum).'~'.time());
+    $self = $class->SUPER::new($pdf, sprintf('%s+Bdf%02i',pdfkey(),++$BmpNum));
     $pdf->new_obj($self) unless($self->is_obj($pdf));
 
     # adobe bitmap distribution font
@@ -202,7 +202,7 @@ sub readBDF {
         $data->{bbox}{'.notdef'} = [0, 0, 0, 0];
     }
 
-    $data->{fontname}=pdfkey().pdfkey().'~'.time();
+    $data->{fontname}=pdfkey();
     $data->{apiname}=$data->{fontname};
     $data->{flags} = 34;
     $data->{fontbbox} = [ split(/\s+/,$data->{FONTBOUNDINGBOX}) ];
--- a/lib/PDF/API2/Resource/Font/CoreFont.pm
+++ b/lib/PDF/API2/Resource/Font/CoreFont.pm
@@ -164,7 +164,7 @@ sub new
     #}
     
     $class = ref $class if ref $class;
-    $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey().'~'.time());
+    $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey());
     $pdf->new_obj($self) unless($self->is_obj($pdf));
     $self->{' data'}=$data;
     $self->{-dokern}=1 if($opts{-dokern});
--- a/lib/PDF/API2/Resource/Font/Postscript.pm
+++ b/lib/PDF/API2/Resource/Font/Postscript.pm
@@ -28,7 +28,7 @@ sub new {
     }
 
     $class = ref $class if ref $class;
-    $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey().'~'.time());
+    $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey());
     $pdf->new_obj($self) unless($self->is_obj($pdf));
     $self->{' data'}=$data;
 
@@ -40,7 +40,7 @@ sub new {
     $self->{'FontDescriptor'}=$self->descrByData();
     if(-f $psfile)
     {
-        $self->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname).'~'.time());
+        $self->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname));
 
         my ($l1,$l2,$l3,$stream)=$self->readPFAPFB($psfile);
#
Subject:    [rt.cpan.org #113084]
Date:    Wed, 16 Mar 2016 11:04:14 -0400
To:    bug-PDF-API2 [...] rt.cpan.org
From:    Phil M Perry

See also #105579. It sounds like it might be a very similar issue.
#
Wed Mar 16 11:04:07 2016 The RT System itself - Status changed from 'new' to 'open'
#
Wed Mar 16 11:13:36 2016 dam [...] cpan.org - Correspondence added
Subject:    Re: [rt.cpan.org #113084]
Date:    Wed, 16 Mar 2016 15:13:19 +0000
To:    Phil M Perry via RT" <bug-PDF-API2 [...] rt.cpan.org>
From:    Damyan Ivanov <dam [...] cpan.org>

Yes, this seems to be the same issue. Interestingly, hash order seems not to be a problem -- after removing the time stamps, the PDF output gets completely deterministic in my tests.
#
Thu May 12 16:23:07 2016 steve [...] deefs.net - Correspondence added

Thanks for the suggestion.  The version control history doesn't go all the way back, so I can't tell why the time() call was added to the IDs initially, but since it exists in more-commonly-used parts of the code and is missing in less-commonly-used parts of the code, I'm going to guess that it was added at some point to work around a name collision issue, possibly related to merging multiple PDFs created by PDF::API2. Using time() isn't as fancy as, say, using UUIDs, but it's a lot faster and likely good enough for most real-world applications.

There are definitely benefits to having the same code produce the same PDF repeatedly, but there are also potential drawbacks, so I'm going to leave this part of the code as is.
#
Thu May 12 16:23:13 2016 steve [...] deefs.net - Status changed from 'open' to 'rejected'

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 113084 - Unreproducible internal font IDs
« Reply #1: January 31, 2017, 10:07:29 AM »
Quote
I'm going to guess that it was added at some point to work around a name collision issue, possibly related to merging multiple PDFs created by PDF::API2. Using time() isn't as fancy as, say, using UUIDs, but it's a lot faster and likely good enough for most real-world applications.

On the other hand, having reproducible output is also important. Rather than adding ~time() in a prophylactic manner ("just in case we run into a collision"), might it not be better to wait until we detect that there is a collision, or other problem, and then do unholy things to the IDs? Or even use UUIDs instead, if that manages collisions adequately while still being reproducible? Keep in mind that time() is being used more or less as a random number generator -- there's still a tiny chance that two same IDs produced in different runs could still have the same timestamp and thus the same ID!

How do other tools handle this? Do they wait until they have detected a collision, and then do something to change an ID? If this only happens during a PDF merge, should we leave the responsibility of detecting such problems to the author of the merge tool, automatically detect and fix up IDs during the writing of the merged PDF, or do something in-between (e.g., supply a tool to detect and manage duplicates)?

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 113084 - Unreproducible internal font IDs
« Reply #2: March 22, 2017, 10:39:04 PM »
Per #105579, I am going to remove the ~time() suffix to font resource names. Please report if you encounter problems that look like a name collision.

To be released (code with ~time() commented out, but not erased) in 3.003.
« Last Edit: March 28, 2017, 01:17:59 PM by Phil »