When 020_corefonts is run for Times-Roman and
latin1 encoding, t-caron and c-caron
do show up in the listing (as does e-acute), so they
are in the font. I'm not sure if this is
times.ttf (I had to change the name in the fourth test from
Times-Roman.ttf) or another file. e-acute is proper Latin-1/Windows-1252, while t-caron and c-caron are actually Latin Extended-A. However, they happen to be in the Times-Roman font and available under "latin1". It's not clear exactly what constitutes "Latin-1" in the eyes of the font designer, but in most fonts, under PDF, it seems to be close to Windows-1252 (Latin-1 + Smart Quotes), plus a few odd characters and ligatures. Times-Roman under
utf8 encoding shows similar glyphs, though at different code points. They are disjoint sets. All three characters are found in ISO-8895-2, which presumably is why you tried that single-byte encoding.
In your example file, you have
use utf8;, so presumably all
valid multibyte sequences are being treated as UTF-8 characters. If I remove
use utf8;, all special characters (except third example ISO-8859-2) are displayed as pairs of Latin-1 bytes (and UTF-8 shows "tofu"), which is not surprising. I see that when you give
-encode => "UTF-8" in the second example, the e-acute is "tofu" (invalid in some way), as are t-caron and c-caron. That doesn't seem right. However, under default (latin1) encoding in the first example, the UTF-8 source e-acute
is recognized. I'm not surprised that t-caron and c-caron are not recognized under default latin1 encoding, as they are not properly Latin-1.
- Example 1, corefonts with default (latin1) encoding: I don't think it should be considered a problem if non-Latin-1 characters (t-caron and c-caron) don't show up when using "latin1" encoding, even though they are defined in the font.
- Example 2, corefonts with explicit "UTF-8" encoding, none of the non-ASCII special characters are recognized. Same with "utf8" encoding. Is "-encode" supposed to be used when the input stream is already UTF-8? I would expect all three special characters to be recognized under UTF-8. First encodeing to 'utf8' doesn't help — now all three characters are tofu'd.
- Example 3, corefonts with input first converted to Latin-2, and displayed as encoded Latin-2, works. The UTF-8 special characters are all recognized by encode.
- Example 4, ttffont with default encoding, works.
Should this be treated as a bug of some sort? That is, correct UTF-8 sequences are not being recognized under default and UTF-8 encoding? Is there an official definition of what you should get with UTF-8 input (and use utf8) and various encodings? I'm not sure why 020_corefonts shows t-caron and s-caron under "latin1" encoding, but perhaps it has something to do with the input stream not being true UTF-8, but built character-by-character. I don't see a problem with corefonts rejecting characters which are not true Latin-1, when "latin1" encoding is specified, but it needs to be consistent with other encodings and inputs.