And as things fell apart
Nobody paid much attention
You’ve got it, you’ve got it
— (Nothing But) Flowers, The Talking Heads
Posted on 2017-Mar-01 at 14:26:18 by Phil
When writing extensive amounts of text, such as long-winded postings to this forum, it would be nice to get some assistance from the computer in adding typographically correct (and visually pleasing) markup to the text, in the correct form for the character encoding used and the processing to be done on the text (HTML, BBCode, straight UTF-8 characters, etc.). It can always be done manually, either as you go or after-the-fact cleanup, but it’s far more convenient to have this done for you as you concentrate on the wordsmithing.
The big ones are quotation marks and apostrophes. The ASCII straight quotes " and ' are just not satisfactory for real text that you want to be proud of. Proper “quotation marks” and ‘single quotes’ (apostrophes) are nice looking (aren’t they?), but a pain to manually enter as you type. En (–) and em (—) dashes are a lot better looking than single and double hyphens - and -- in properly formatted text. Special punctuation such as ellipses (…) and various forms of trademark signs (®, ™, ℗, and ℠) are surprisingly common.
Microsoft has “Smart Quotes” enabled (by default) in a number of its products, such as Word. In its version of common single-byte encodings (e.g., Western/CP 1252/Windows 1252 cf. Latin-1/ISO-8859-1), it takes the very rarely used control characters defined in x80 through x9F, and replaces them with a number of commonly used non-ASCII punctuation and some accented letters. In most cases, it looks something like (CP 1252):
Hex | Char | Unicode | HTML entity | Name | Reserved use |
---|---|---|---|---|---|
80 | € | U+20AC | € | Euro | reserved control |
81 | (reserved) | U+0081 | (none) | (none) | reserved control |
82 | ‚ | U+201A | ‚ | Low-“9” opening quotation mark | Break Permitted Here |
83 | ƒ | U+0192 | ƒ or ƒ | Florin/script f/folder | No Break Here |
84 | „ | U+201E | „ | Low-“99” opening quotation mark | Index |
85 | … | U+2026 | … | Ellipsis | Next Line |
86 | † | U+2020 | † | Single dagger | Start of Selected Area |
87 | ‡ | U+2021 | ‡ | Double dagger | End of Selected Area |
88 | ˆ | U+02C6 | ˆ | Circumflex ^ accent (non-combining modifier letter) | Character Tabulation Set |
89 | ‰ | U+2030 | ‰ | o/oo per mille | Character Tabulation with Justification |
8A | Š | U+0160 | Š or Š | S + caron accent | Line Tabulation Set |
8B | ‹ | U+2039 | ‹ | Single left angle quote < (guillemet) | Partial Line Down |
8C | Œ | U+0152 | Œ | OE ligature | Partial Line Up |
8D | (reserved) | U+008D | (none) | (none) | Reverse Line Feed |
8E | Ž | U+017D | Ž or Ž | Z + caron accent | Single Shift Two |
8F | (reserved) | U+008E | (none) | (none) | Single Shift Three |
90 | (reserved) | U+0090 | (none) | (none) | Device Control String |
91 | ‘ | U+2018 | ‘ | “6” opening quotation mark | Private Use One |
92 | ’ | U+2019 | ’ | “9” closing quotation mark/apostrophe | Private Use Two |
93 | “ | U+201C | “ | “66” opening quotation mark | Set Transmit State |
94 | ” | U+201D | ” | “99” closing quotation mark | Cancel Character |
95 | • | U+2022 | • | Solid bullet | Message Waiting |
96 | – | U+2013 | – | En-dash | Start of Guarded Area |
97 | — | U+2014 | — | Em-dash | End of Guarded Area |
98 | ˜ | U+02DC | ˜ | Small Tilde ~ diacritic (non-combining) | Start of String |
99 | ™ | U+2122 | ™ | Trademark TM | reserved control |
9A | š | U+0161 | š or š | s + caron accent | Single Character Introducer |
9B | › | U+203A | › | Single right angle quote > (guillemet) | Control Sequence Introducer |
9C | œ | U+0153 | œ | oe ligature | String Terminator |
9D | (reserved) | U+009D | (none) | (none) | Operating System Command |
9E | ž | U+017E | ž or & #382; | z + caron accent | Privacy Message |
9F | Ÿ | U+0178 | Ÿ | Y + diaeresis/umlaute accent | Application Program Command |
Note that this is for CP-1252. Other Microsoft single-byte code pages may have slightly different assignments. Check before you use such a page!
Most of these characters are now well supported by all browsers, although some older browsers may have trouble with some of them. Note that double angle brackets « and » are not included here, although the single versions are. The “American-style” “reversed-9” single and double opening (left) quotes are not included, either.
So, when working with some sort of editor or word processor, how does it know
which quotation mark (opening or closing) to use when I type "? How about
which single quote (apostrophe) when I type '? Single and double quotes
can come unpaired — some publishing styles may put an opening double quote
at the beginning of a paragraph, when the whole thing is a quote, but omit the
closing quote. When I type ’, is that an opening single quote, or an
apostrophe used in a contraction? When I type $i--
, I don’t
want it thinking I want an em-dash there in place of the post-decrement
operator! Word and similar products generally do a fairly good job of
guessing what I mean, but they can be very insistent on what they think I
mean, and refuse to let me override their Smart Quote entries! That is very
frustrating. The editor or word processor should learn to just stay out of the
way when I override their ruling.
Perhaps a happy medium would be to have editor “buttons” near the text entry window, as some forum software uses for BBCode tags, to insert this special punctuation only when the user calls for it. The downside to this is that every quote and dash means I have to pause my typing and move the mouse to the button and click it (although accelerator keys could help with the more commonly used special characters). Also keep in mind that different intended uses of typed text can call for different ways of indicating that character, from simply inserting a UTF-8 character to inserting an HTML entity or BBCode markup, whether it was automatically determined or manually inserted.
Posted on 2017-Mar-02 at 04:54:07 by sciurius
I’ve been using a ‘compose key’ on my keyboards since the 80s. VT2xx keyboards at the time had a real compose key, nowadays on a standard keyboard I use ‘Right Ctrl’ for this purpose.
Many systems let you define one of the keys to function as compose key.
Adding fancy quotes and diacritical characters is as easy as [Compose] plus < plus " → “ (left double), [Compose] plus > plus " → ” (right double), [Compose] + u + " → ü and so on. And many symbols as well, like the → arrows.
It has been under your fingertips for ages↦
Posted on 2022-Mar-20 at 12:16:00 by Phil
Not really. On a standard Windows PC with US keyboard, it’s a non-standard add-on that the average user isn’t going to know about, much less install. Even experienced users can find that such utilities often have to be installed on a per-application basis, and there is not One to Rule Them All (work with any application). Then, if it places a UTF-8 sequence into whatever application is in use, it may or may not work well (16-bit Unicode, or some single-byte encoding, may be desired).
It isn’t as simple in the Windows world as it is in the Unixy world!
All content © copyright 2005 – 2025
by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this
site remains under the copyright and license of its owners.
Catskill Technology Services, LLC does not claim copyright over such software.
This page is https://www.catskilltech.com/utils/show.php?link=truly-smart-quotes
Search Quotations database.
Last updated Sat, 28 Dec 2024 at 11:29 PM