#### News:

Give our new Discussions area a try!

PDF::Builder v3.024 Released, 12 September 2022
Please see the CPAN listing, GitHub entry.

PDF::Table v1.003 Released, 05 July 2022
Please see the CPAN listing, as well as the GitHub entry.

### A Thought…

Life is full of misery, loneliness, and suffering — and it’s all over much too soon.

— Woody Allen

## Why semantic markup?

Posted on 2017-Oct-04 at 17:11:44 (last update on 2022-Apr-24 at 21:05:00) by Phil

Semantic markup is the practice of tagging text with why that text is there, rather than simply “this is how it looks” (styling or presentation markup). This gives structure to text source, which can be useful in at least two ways:

1. Rather than repeating the “how it looks” (appearance) information with every use, it is consolidated into one place for consistency and easy changes to the appearance.
2. It can be searchable for changing or inventorying certain uses.

If you wanted a certain look to, say, your chapter headings, would you rather do the following each time you started a new chapter?

1. Give command to skip to the top of a right-hand (odd number) page.
2. Skip down several lines.
3. Give the chapter number right justified, in 30pt Cooper Black.
4. Next line, give the heading My Chapter Title right justified, in 15pt Cooper Black.
5. Skip down several lines.

or,

1. Give markup such as <chap_start>My Chapter Title</chap_start>.

In the second case, some sort of “style file” (such as CSS) knows how you want your chapters started and styled. If you don’t like the look of it, you change things in one place — perhaps a different typeface, or a different size. And the markup language keeps track of the chapter numbers for you. You wouldn’t believe how many people think it’s easier to just do it the first way (rather than learn something new)!

If all you’re doing is a one or two page memo, that will at most be printed out once or twice, and never updated or consolidated into some sort of collection, such manual operations are acceptable. However, for anything beyond that, you should consider a semantic markup setup. It can even be WYSIWYG editing, so long as the element buttons are semantic descriptions and not just styling. That is, buttons for “emphasis”, “citation”, etc. and not for “italic”, “bold”, “underline”, etc. Certainly for books, manuals, journal articles, and the like, markup with semantics is mandatory.

For example, most WYSIWYG editors allow you to designate text as italic or bold (or both). This is bad practice for anything beyond a brief letter or memo. Let’s say you use italics for emphasized text, for titles (citations), and for foreign words. You just wrote a nice technical report with a Word Processor, and your boss is so impressed that she asks you to submit it to a technical or scientific journal (as an article). The journal bounces back the manuscript with some style suggestions: “we use bold for emphasis, underlines for citations, and a different typeface for foreign words.” If you had written this in a markup language, it would be easy to change the definition (in the style file) of “emphasis” from italic to bold, of “citation” from italic to underlined, and “foreign” from italic to a different typeface. Alas, you are going to have to trudge through the manuscript word-by-word and manually change all italics, after figuring out why you used italics for something. Fun! If it’s a standard markup (such as HTML or LaTeX), you might be able to simply submit the markup and let the journal or publisher supply the style file. And lest you think this is an exaggeration, I’ve heard of publishers who want typewriter (fixed pitch) style submissions so they can print them out and count words, line lengths, and be double-spaced with room for editor’s marks!

Some WYSIWYG Word Processors (such as MS Word) can give you limited semantic markup (e.g., designate headings for various purposes, at different levels), but you generally cannot export them to flat text files (sometimes to HTML). Even when you can, they often come loaded up with all sorts of extra crap (font selections, sizes, etc., that are repeated over and over) that you’d really rather not have to deal with. This is not to say that WYSIWYG word processing can’t deliver good, clean markup; it’s just that it’s usually something tacked on after the fact, and it’s not really designed from the ground up to do that. There’s almost always some styling controls or tags mixed in, that you’ll need to fix, especially if generic “italic” and “bold” etc. stylings are available.

Once you have your (semantic) markup cleanly separated from the text and from the styling, what can you do with it? Well, such text is better for screen readers, as knowing what the text is for can clue the reader in to how to modulate its voice. For instance, emphasized text might be read slower, louder, and at a lower pitch. A citation might have a slight pause before and after it. And a foreign word might be pronounced correctly (if the language used is included somewhere, such as <foreign lang="fr">après-ski</foreign> embedded within English text).

Another use for flat file text source with markup (tags) is to have some processor scan through it looking for certain tags, and extracting that text into a separate file. For instance, find all citation tags to start building a bibliography for your document. Another could be to extract foreign words and phrases to start building a glossary. In both cases, the list could be sorted and manually or automatically looked over to spot possible misspellings and typos, helping to clean up your source. This must all be done manually if all you did was italicize this material.

For HTML web page markup, putting as much of the styling as possible into CSS leaves leaner (smaller), cleaner HTML text which search engines prefer to something cluttered and bloated with styling markup. For a journal, magazine, or book submission, it becomes much easier to meet their styling guidelines when style information is consolidated into one place.

Posted on 2022-Apr-07 at 09:47:00 by Phil

As an update to the above article, there are times when a bare <i> or <b> etc. might be necessary. For example, if you are giving a genus or species name such as Homo Sapiens, it’s supposed to be rendered in italics. If you are doing a lot of taxonomy work, it could be worthwhile to define a <species> tag that resolves to italicizing the word(s), but unless you’ll be using it a lot, it may not be worth the effort.

So, you will probably end up needing to have bare italic, bold, underline, etc. commands for those times that you need it for a one-off case. See this article for (among other things) some further discussion on the topic. The whole point, however, is that you should be in the habit of using the appropriate semantic commands wherever possible (and available), rather than overloading a simple italic or bold command.

All content © copyright 2005 – 2022 by Catskill Technology Services, LLC.