Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

[RT 128674] error "requested cmap '' not installed" with many CJK fonts

  • 49 Replies
  • 2044 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 601
    • View Profile
terefang (Alfred Reibenschuh) said:

yes  < presumably "unchanged default behavior? >

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 601
    • View Profile
So how about tweaking the PDF::Builder code to do the following:

  • Unless a (new) option to ttfont says otherwise, try to use the existing 4 .cmap files* for CJK fonts.
  • If .cmap files fail for CJK, and for all others, try the user-supplied list of internal cmaps (if given as a new option, otherwise fall back to a default list), using Bob's gist code (is it then not necessary to set mstable?).
  • If still no joy, call find_ms as before.
   
Would this work? What I'm still unclear about is when to specify a "Microsoft" cmap list and when to specify a "non-Microsoft" cmap list. Should PDF::Builder attempt to discover what platform it's running on? That could work if the font is to be embedded, but what if we specify "-noembed"? I suspect that the current code would favor the Microsoft cmap list. Should we ask the user for both lists to use, rather than just one, if they choose to specify their own cmap list, or do we leave the burden of figuring out what platform to the user (who supplies only the appropriate list)?

Should we just always force fonts to be embedded? That is, no/op "-noembed"? Are there likely to be CID (glyph number) mismatches if the font is not embedded?
Quote
    so the preference order would be 0/6, 0/4, 3/10, 0/3, 3/1 for script fonts and 3/0 for symbol fonts.
    in microsoft environments the preference would change to 0/6, 3/10, 0/4, 3/1, 0/3 for script fonts

These would be the default cmap list: always 0/6 first, followed by 0/4 and 3/10 (reverse for MS environment), followed by 0/3 and 3/1 (reverse for MS environment). Would a symbol font have only 3/0 anyway (tack it on to the end of both lists), or do we need to look at the font and see if it's symbol?

* I still need to update the .cmap files

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 601
    • View Profile
I can't move forward with implementing this until I have an idea of what would be considered proper behavior of the code.

  • Is there any point in continuing to allow -noembed if (?) it permits a mismatch of CIDs and glyphs?
  • If the font must be embedded, does it make sense for the code to determine MS/non-MS platform, and choose the appropriate default cmap list? If the user gives two lists, it would pick the right one; if the user gives one list, assume they already know which platform they're on.
  • Is the given use/fallback list (4 given .cmaps unless "no, don't", user-specified cmap list, default cmap list, find_ms() function) a good one? Does it cover all the bases?
  • For symbol fonts, is it safe to simply add "3/0" on to the end of the cmap list, or should the code do something to determine if this is a symbol font, and treat it differently? That is, will a symbol font have only a 3/0 cmap, and no non-symbol font will have a 3/0 cmap?

And where do I find the information to build updated .cmap files? Add: Is there a guarantee that a given Unicode point will always map to the same CID (and thus, correct glyph)? Per point 1, why would there be a fixed mapping for four specific CJK alphabets, but not for other alphabets? I'm wondering why we shouldn't use the built-in cmaps for those fonts, unless the problem is that they specifically don't have cmaps.
« Last Edit: March 28, 2019, 09:44:49 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 601
    • View Profile
I have updated FontFile.pm to look in .cmap files (if -usecmf=>1), then in the list of Platform/Encodings given by -cmaps (or its default list), then using find_ms(), and finally back to .cmap files (regardless of -usecmf setting). One of those should hopefully find a workable CMap! There is also a -debug flag to show diagnostic information in the hunt for a CMap. I am leaving the ticket open for now as a reminder that the four .cmap files are still in need of updating, and I have not yet found a suitable data source (Unicode/glyphID mapping) to generate new .cmap files. Many thanks to Alfred Reibenschuh (original PDF::API2 author) and Bob Hallissy (Font::TTF author) for their assistance.

Update: The issue of whether something should be done with -noembed remains open. Perhaps it should be no-op'd, but I'd like to get some feedback first.
« Last Edit: May 22, 2019, 09:54:14 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 601
    • View Profile
I found some sources from Adobe and I think I have the four .cmap files updated to current status (2019-05-29). Therefore I am closing this issue (in GitHub, 3.016 release). Please feel free to reopen (or open a new one) if you find a better update to these files.