When writing extensive amounts of text, such as long-winded postings to this forum, it would be nice to get some assistance from the computer in adding typographically correct (and visually pleasing) markup to the text, in the correct form for the character encoding used and the processing to be done on the text (HTML, BBCode, straight UTF-8 characters, etc.). It can always be done
manually, either as you go or after-the-fact cleanup, but it's far more convenient to have this done
for you as you concentrate on the wordsmithing.
The big ones are quotation marks and apostrophes. The ASCII straight quotes " and ' are just not satisfactory for real text that you want to be proud of. Proper “quotation marks” and ‘single quotes’ (apostrophes) are nice looking (aren't they?), but a pain to manually enter as you type. En (–) and em (—) dashes are a lot better looking than single and double hyphens - and -- in properly formatted text. Special punctuation such as ellipses (…) and various forms of trademark signs (®, ™, ℗, and ℠) are surprisingly common.
Microsoft has "Smart Quotes" enabled (by default) in a number of its products, such as
Word. In its version of common single-byte encodings (e.g., Western/CP 1252/Windows 1252
cf. Latin-1/ISO-8859-1), it takes the very rarely used control characters defined in x80 through x9F, and replaces them with a number of commonly used non-ASCII punctuation and some accented letters. In most cases, it looks something like (CP 1252):
Hex | Char | Unicode | HTML entity | Name | Reserved use |
80 | € | U+20AC | € | Euro | reserved control |
82 | ‚ | U+201A | ‚ | Low-"9" opening quotation mark | Break Permitted Here |
83 | ƒ | U+0192 | ƒ or ƒ | Florin/script f/folder | No Break Here |
84 | „ | U+201E | „ | Low-"99" opening quotation mark | Index |
85 | … | U+2026 | … | Ellipsis | Next Line |
86 | † | U+2020 | † | Single dagger | Start of Selected Area |
87 | ‡ | U+2021 | ‡ | Double dagger | End of Selected Area |
88 | ˆ | U+02C6 | ˆ | Circumflex ^ accent (combining?) | Character Tabulation Set |
89 | ‰ | U+2030 | ‰ | o/oo per mille | Character Tabulation with Justification |
8A | Š | U+0160 | Š or Š | S + caron accent | Line Tabulation Set |
8B | ‹ | U+2039 | ‹ | Single left angle quote < (guillemet) | Partial Line Down |
8C | Œ | U+0152 | Œ | OE ligature | Partial Line Up |
8E | Ž | U+017D | Ž or Ž | Z + caron accent | Single Shift Two |
91 | ‘ | U+2018 | ‘ | "6" opening quotation mark | Private Use One |
92 | ’ | U+2019 | ’ | "9" closing quotation mark/apostrophe | Private Use Two |
93 | “ | U+201C | “ | "66" opening quotation mark | Set Transmit State |
94 | ” | U+201D | ” | "99" closing quotation mark | Cancel Character |
95 | • | U+2022 | • | Solid bullet | Message Waiting |
96 | – | U+2013 | – | En-dash | Start of Guarded Area |
97 | — | U+2014 | — | Em-dash | End of Guarded Area |
98 | ˜ | U+02DC | ˜ | Tilde ~ accent (combining?) | Start of String |
99 | ™ | U+2122 | ™ | Trademark TM | reserved control |
9A | š | U+0161 | š or š | s + caron accent | Single Character Introducer |
9B | › | U+203A | › | Single right angle quote > (guillemet) | Control Sequence Introducer |
9C | œ | U+0153 | œ | oe ligature | String Terminator |
9E | ž | U+017E | ž or & #382; | z + caron accent | Privacy Message |
9F | Ÿ | U+0178 | Ÿ | Y + diaeresis/umlaute accent | Application Program Command |
Most of these characters are now well supported by all browsers, although some older browsers may have trouble with some of them. Note that double angle brackets « and » are not included here, although the single versions
are.
So, when working with some sort of editor or word processor, how does it know which quotation mark (opening or closing) to use when I type "? How about which single quote (apostrophe) when I type '? Single and double quotes can come unpaired — some publishing styles may put an opening double quote at the beginning of a paragraph, when the whole thing is a quote, but omit the closing quote. When I type ', is that an opening single quote, or an apostrophe used in a contraction? When I type
$i--, I don't want it thinking I want an em-dash there in place of the post-decrement operator!
Word and similar products generally do a fairly good job of guessing what I mean, but they can be very insistent on what they
think I mean, and refuse to let me override their Smart Quote entries! That is very frustrating. The editor or word processor should learn to just stay out of the way when I override their ruling.
Perhaps a happy medium would be to have editor "buttons" near the text entry window, as this forum uses for BBCode tags, to insert this special punctuation only when the user calls for it. The downside to this is that every quote and dash means I have to pause my typing and move the mouse to the button and click it (although accelerator keys could help with the more commonly used special characters). Also keep in mind that different intended uses of typed text can call for different ways of indicating that character, from simply inserting a UTF-8 character to inserting an HTML entity or BBCode markup, whether it was automatically determined or manually inserted.