#### News:

Give our new Discussions area a try!

PDF::Builder v3.024 Released, 12 September 2022
Please see the CPAN listing, GitHub entry.

PDF::Table v1.003 Released, 05 July 2022
Please see the CPAN listing, as well as the GitHub entry.

### A Thought…

The only function of economic forecasting is to make astrology look respectable.

John Kenneth Galbraith

## Truly Smart Quotes

Posted on 2017-Mar-01 at 14:26:18 by Phil

When writing extensive amounts of text, such as long-winded postings to this forum, it would be nice to get some assistance from the computer in adding typographically correct (and visually pleasing) markup to the text, in the correct form for the character encoding used and the processing to be done on the text (HTML, BBCode, straight UTF-8 characters, etc.). It can always be done manually, either as you go or after-the-fact cleanup, but it’s far more convenient to have this done for you as you concentrate on the wordsmithing.

The big ones are quotation marks and apostrophes. The ASCII straight quotes " and ' are just not satisfactory for real text that you want to be proud of. Proper “quotation marks” and ‘single quotes’ (apostrophes) are nice looking (aren’t they?), but a pain to manually enter as you type. En (–) and em (—) dashes are a lot better looking than single and double hyphens - and -- in properly formatted text. Special punctuation such as ellipses (…) and various forms of trademark signs (®, ™, ℗, and ℠) are surprisingly common.

Microsoft has “Smart Quotes” enabled (by default) in a number of its products, such as Word. In its version of common single-byte encodings (e.g., Western/CP 1252/Windows 1252 cf. Latin-1/ISO-8859-1), it takes the very rarely used control characters defined in x80 through x9F, and replaces them with a number of commonly used non-ASCII punctuation and some accented letters. In most cases, it looks something like (CP 1252):

Hex   Char Unicode HTML entity Name Reserved use
80 U+20AC &euro; Euro reserved control
81 (reserved) U+0081 (none) (none) reserved control
82 U+201A &sbquo; Low-“9” opening quotation mark Break Permitted Here
83 ƒ U+0192 &fnof; or ƒ Florin/script f/folder No Break Here
84 U+201E &bdquo; Low-“99” opening quotation mark Index
85 U+2026 &hellip; Ellipsis Next Line
86 U+2020 &dagger; Single dagger Start of Selected Area
87 U+2021 &Dagger; Double dagger End of Selected Area
88 ˆ U+02C6 &circ; Circumflex ^ accent (non-combining modifier letter) Character Tabulation Set
89 U+2030 &permil; o/oo per mille Character Tabulation with Justification
8A Š U+0160 &Scaron; or Š   S + caron accent Line Tabulation Set
8B U+2039 &lsaquo; Single left angle quote < (guillemet) Partial Line Down
8C Œ U+0152 &OElig; OE ligature Partial Line Up
8D (reserved) U+008D (none) (none) Reverse Line Feed
8E Ž U+017D &Zcaron; or Ž   Z + caron accent Single Shift Two
8F (reserved) U+008E (none) (none) Single Shift Three
90 (reserved) U+0090 (none) (none) Device Control String
91 U+2018 &lsquo; “6” opening quotation mark Private Use One
92 U+2019 &rsquo; “9” closing quotation mark/apostrophe   Private Use Two
93 U+201C &ldquo; “66” opening quotation mark Set Transmit State
94 U+201D &rdquo; “99” closing quotation mark Cancel Character
95 U+2022 &bull; Solid bullet Message Waiting
96 U+2013 &ndash; En-dash Start of Guarded Area
97 U+2014 &mdash; Em-dash End of Guarded Area
98 ˜ U+02DC &tilde; Small Tilde ~ diacritic (non-combining) Start of String
9A š U+0161 &scaron; or š s + caron accent Single Character Introducer
9B U+203A &rsaquo; Single right angle quote > (guillemet) Control Sequence Introducer
9C œ U+0153 &oelig; oe ligature String Terminator
9D (reserved) U+009D (none) (none) Operating System Command
9E ž U+017E &zcaron; or & #382; z + caron accent Privacy Message
9F Ÿ U+0178 &Yuml; Y + diaeresis/umlaute accent Application Program Command

Note that this is for CP-1252. Other Microsoft single-byte code pages may have slightly different assignments. Check before you use such a page!

Most of these characters are now well supported by all browsers, although some older browsers may have trouble with some of them. Note that double angle brackets « and » are not included here, although the single versions are. The “American-style” “reversed-9” single and double opening (left) quotes are not included, either.

So, when working with some sort of editor or word processor, how does it know which quotation mark (opening or closing) to use when I type "? How about which single quote (apostrophe) when I type '? Single and double quotes can come unpaired — some publishing styles may put an opening double quote at the beginning of a paragraph, when the whole thing is a quote, but omit the closing quote. When I type ’, is that an opening single quote, or an apostrophe used in a contraction? When I type \$i--, I don’t want it thinking I want an em-dash there in place of the post-decrement operator! Word and similar products generally do a fairly good job of guessing what I mean, but they can be very insistent on what they think I mean, and refuse to let me override their Smart Quote entries! That is very frustrating. The editor or word processor should learn to just stay out of the way when I override their ruling.

Perhaps a happy medium would be to have editor “buttons” near the text entry window, as some forum software uses for BBCode tags, to insert this special punctuation only when the user calls for it. The downside to this is that every quote and dash means I have to pause my typing and move the mouse to the button and click it (although accelerator keys could help with the more commonly used special characters). Also keep in mind that different intended uses of typed text can call for different ways of indicating that character, from simply inserting a UTF-8 character to inserting an HTML entity or BBCode markup, whether it was automatically determined or manually inserted.

Posted on 2017-Mar-02 at 04:54:07 by sciurius

I’ve been using a ‘compose key’ on my keyboards since the 80s. VT2xx keyboards at the time had a real compose key, nowadays on a standard keyboard I use ‘Right Ctrl’ for this purpose.

Many systems let you define one of the keys to function as compose key.

Adding fancy quotes and diacritical characters is as easy as [Compose] plus < plus " → “ (left double), [Compose] plus > plus " → ” (right double), [Compose] + u + " → ü and so on. And many symbols as well, like the → arrows.

It has been under your fingertips for ages↦

Posted on 2022-Mar-20 at 12:16:00 by Phil

It has been under your fingertips for ages↦

Not really. On a standard Windows PC with US keyboard, it’s a non-standard add-on that the average user isn’t going to know about, much less install. Even experienced users can find that such utilities often have to be installed on a per-application basis, and there is not One to Rule Them All (work with any application). Then, if it places a UTF-8 sequence into whatever application is in use, it may or may not work well (16-bit Unicode, or some single-byte encoding, may be desired).

It isn’t as simple in the Windows world as it is in the Unixy world!

All content © copyright 2005 – 2022 by Catskill Technology Services, LLC.