Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering.

Consistent code point handling across font types

  • 0 Replies
  • 780 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 392
    • View Profile
Consistent code point handling across font types
« December 25, 2017, 03:27:48 PM »
 PhilterPaper commented on Nov 3

Ref RT 120048/#47

Core fonts and Type1 fonts are currently limited to single byte encodings, and use the automap() method to map their glyphs over multiple planes (of up to 256 glyphs each). It would be good to extend them in some way to handle UTF-8 text, so that one would not need to constantly switch between subfonts (planes) to see and use all the glyphs in a font (see 020_corefonts, 021_psfonts, 021_synfonts). Is there any way to natively use UTF-8 with these font types? We want to avoid automatically running automap and switching planes under the covers, as this would be very bulky and slow. Also, automap does not guarantee that the same code point will map to the same glyph over different versions of a given font file!

On the other hand, TrueType and OpenType fonts are UTF-8 ready, but utilities such as 021_synfonts need to be extended to show glyphs beyond the first page (plane 0). 022_truefonts shows plane 0 per the encoding, but everything else is listed by CID (Character ID), arranged by Unicode point. Perhaps automap() could be written to handle this? We want 021_synfonts to display all glyphs for a TrueType font.

The idea is to get consistent text handling, regardless of what kind of font (core, Type1, TrueType, etc.) happens to be used. If you're content to stay in a single byte encoding, you can do that (although automap should continue to be supported for legacy purposes). If you want to use UTF-8 with core or Type1 fonts, to seamlessly access all glyphs by Unicode point, you should be able to do that.

 PhilterPaper commented on Nov 15

In Type1 (and possibly core) fonts, there is Unicode point information, so in theory we can determine the glyph number (GID, G+nnn) for any desired Unicode character at document creation time. However, the current output mechanism is based on a map of single byte-to-glyph name, and something else would have to be found.