Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

RT 57248 - Cyrillic letters

  • 1 Replies
  • 1658 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 364
    • View Profile
RT 57248 - Cyrillic letters
« October 20, 2016, 07:46:46 PM »
Subject:    Cyrillic letters
 
1. The following Cyrillic glyphs (names according to http://www.adobe.com/devnet/font/pdfs/5013.Cyrillic_Font_Spec.pdf)

  afii10047 (uppercase 'Э')
  afii10049 (uppercase 'Я')
  afii10095 (lowercase 'э')

are not displayed when using TrueType fonts. I tried different encodings (CP1251, UTF8) with the same result.

2. When using core fonts, all the cyrillics are displayed overlapping each other with CP1251 encoding, and are not displayed at all with UTF8 encoding.

Perl version v5.10.1 built for MSWin32-x86-multi-thread
Binary build 1007 [291969] provided by ActiveState
Operating system Windows Vista Home Premium, Service Pack 1 (ver. 6.0.6001)
Subject:    test-utf8.pdf

Code: [Select]
  use locale;
  use POSIX;
  use PDF::Report;

  my $encoding = 'cp1251';

  POSIX::setlocale($encoding)
    or die 'cannot set locale';

  my $pdf = new PDF::API2(  );
  $pdf->mediabox( 'A4' );

  my $page = $pdf->page();
  my $txt = $page->text;

  my $font = $pdf->ttfont('Times.ttf', '-encode' => $encoding );
  my $fontsize = 12;
  $txt->font($font,$fontsize);
  $txt->translate(10,700);
  $txt->text("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
  $txt->translate(10,650);
  $txt->text("abcdefghijklmnopqrstuvwxyz");
  $txt->translate(10,600);
  $txt->text("àáâãäå¸æçèéêëìíîïðñòóôõö÷øùüûúýþÿ");
  $txt->translate(10,550);
  $txt->text("ÀÁÂÃÄŨÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÜÛÚÝÞß");

  my $font = $pdf->corefont('Times', '-encode' => $encoding );
  my $fontsize = 12;
  $txt->font($font,$fontsize);
  $txt->translate(10,400);
  $txt->text("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
  $txt->translate(10,350);
  $txt->text("abcdefghijklmnopqrstuvwxyz");
  $txt->translate(10,300);
  $txt->text("àáâãäå¸æçèéêëìíîïðñòóôõö÷øùüûúýþÿ");
  $txt->translate(10,250);
  $txt->text("ÀÁÂÃÄŨÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÜÛÚÝÞß");

  my $font = $pdf->corefont('Times', '-encode' => $encoding );
  my $fontsize = 12;
  $txt->font($font,$fontsize);
  $txt->translate(10,750);
  $txt->text("Using true type font:");
  $txt->translate(10,450);
  $txt->text("Using core font:");

  $pdf->saveas( 'test.pdf' );

#
Subject:    [rt.cpan.org #57248]
Date:    Mon, 15 Feb 2016 16:40:51 -0500
To:    bug-PDF-API2 [...] rt.cpan.org
 
I modified the example text file to display x40 through xFF for both TrueType and Core fonts. I ran it for CP1251 (Cyrillic), CP1252 (Latin 1), CP1253 (Greek), and CP1254 (Turkish). This is Windows XP SP3, PDF::API2 2.025, Adobe Reader 11.0.08. All four character sets have some variety of MS "Smart Quotes" in the x80 - x9F range. I have not yet tried UTF-8 encoded text.

In all cases, the TTF displays perfectly, even the unassigned characters in the Smart Quotes range. The three Cyrillic characters reported missing in the original bug report are present and in the right place. All the CoreFont displays have problems with the Smart Quotes unassigned characters still displaying the empty box, but evidently having a near-zero width (so that the following character mostly overprints it).

Core Font only problems:
CP1251: All Cyrillic and possibly some other characters print correctly, but apparently have about 33% width and are overprinted by following characters.
CP1252: The unassigned characters in the Smart Quotes range get overprinted, but the rest of the Latin-1 characters look OK.
CP1253: The Greek letters behave just like the Cyrillic letters in 1251.
CP1254: The Turkish letters behave just like the Latin-1 letters in 1252.

The bottom line is that TTF looks OK from here (at least for CP125x encoding), but Core Fonts have trouble with unassigned ("box")
characters and non-Latin characters, where the characters look OK, but the text location is not advanced far enough and we get overprinting. Perhaps the font data (especially character width) isn't being read correctly? Since it works for (e.g.) CP1252, it seems odd that it would fail for non-Latin sets (note that Turkish is Latin). That would imply that the font files themselves are defective or non-standard in some way.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 364
    • View Profile
Re: RT 57248 - Cyrillic letters
« Reply #1: December 25, 2017, 05:20:28 PM »
 PhilterPaper commented on Nov 3

The current situation is:

1: TTF does not appear to be missing any characters, including the three listed.

2a: The overlap of characters is because the width listed in PDF::Builder::Resource::Font::CoreFont::[fontname].pm's "missingwidth" value of 250, which is as little as a quarter of what is needed. Only the standard Latin-1 glyphs, and their widths, are listed. Everything else is "missing". Possibly this could be fixed by extending the [fontname].pm glyph and width tables, but that will be quite a bit of work. It's also possible that instead of using fixed .pm files, that PDF::Builder could read the local copy of the core files.

Reading the local core font files for metrics and embedding the fonts (see #80) would ensure that all glyphs are always properly rendered.

2b: Core fonts do not support UTF-8 -- only single byte encodings at this time. UTF-8 support for core and Type1 fonts would certainly be desirable, but I don't know if it's feasible to add it (see #81).

To access core font glyphs which are outside of Latin-1, consider using automap() to break up the font into multiple planes, each up to 256 characters. However (020_corefonts uses this), it still does not appear that this gives correct character widths.