Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

RT 117031 - UTF8 flag in metadata date fields causes garbage

  • 1 Replies
  • 1926 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Thu Aug 18 04:40:10 2016 Jeffrey.Ratcliffe [...] gmail.com - Ticket created
Subject:    UTF8 flag in metadata date fields causes garbage
 
In the date metadata fields passed to the info() method, if the utf8 flag is set, pdfinfo reports garbage in the resulting PDF metadata.

I am working around it by unsetting the utf8 flag as follows:

Code: [Select]
$h{CreationDate} = encode('ASCII', "D:$year$month$day"."000000+00'00'");

Perhaps you could do this in the info() method.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: RT 117031 - UTF8 flag in metadata date fields causes garbage
« Reply #1: November 26, 2018, 09:20:45 PM »
Unless someone comes up with a documented requirement that metadata is not allowed to be UTF-8, my code changes (PDFString wrapper for PDFStr and PDFUtf) allow UTF-8 metadata for fields. I can understand CreationDate and ModDate having very specific field formats with no place for non-ASCII characters, but for everything else it seems to work fine. The pdfinfo utility mentioned prints out the metadata OK, except that wide characters are missing. Since this is a console text output utility, it's not surprising that UTF-8 characters would not be handled. However, there is no error message.

Until and unless I'm informed that UTF-8 is not permitted in any metadata field, this fix will be in the next release (3.013). If it is necessary to ban UTF-8 in metadata fields, update PDF::Builder::Basic::PDF::Util to remove 'm' code from the UTF-8-allowed list.

Further note that direct use of PDFStr() and PDFUtf() calls are now discouraged — you should use PDFString() with the appropriate usage code.