Creating Semantic HTML is the practice of reinforcing the information and meaning of a document rather then simply defining it's presentation. In the concept of separation of concerns, the use of HTML is not simply to put code to a design, but to actually use the semantics of the language to convey the actual meaning of the content or potential content within.
Semantics (from Ancient Greek: σημαντικός sēmantikós, "significant") is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for, their denotation. Linguistic semantics is the study of meaning that is used for understanding human expression through language. Other forms of semantics include the semantics of programming languages, formal logics, and semiotics.
A great example is the evolution of the
<i> tag in HTML to
<em> in HTML5. In HTML5 the
<i> tag was deprecated and it is preferred to use
<em> as this conveys actual meaning. The presentation of the emphasis is up to the coder of the presentation layer either by italic font, a bold font, underlining, slower or louder audible speech etc...
Italics were originally used to convey the citation of a source, something carried over from print. In this context the use of the
<em> tag is inappropriate. For properly citing a reference, it would be semantically correct to use the
<cite> element. It is again up to the coder of the presentation layer to decide if it will be styled with italics or other visual way to demote the context of this content.
The growing nature of the semantic web over the years, including the
<i> tag, has systematically looked to remove all presentational elements from HTML and place all that concern in the hands of the presentation layer, aka CSS. Note the complete removal of concepts like
<s>, etc ... as these elements only convey presentation and are void of any real contextual meaning.
What's interesting to me, when people have discussions that end up at a point where they essentially agree, but continue to disagree. It's at this point they say, "We are just arguing semantics." It should be noted that the discussion around the Semantic Web starts at this point.
People are constantly in the pursuit greater meaning. And this pursuit is ever-present in the use of the written word and in code. In code there are hard concepts and then there are soft concepts. It is these soft concepts that allow for interpretation. And interpretation without context is meaningless. Semantically meaningless.
Prior to the onset of HTML5, the web was a wild place. It is these very habits that continue to cripple ongoing development. The standards were not established until people starting using these tools and the implementers of browsers needed to bend to these bad habits. If they didn't, then they would break the web. This was the discussion around XHTML that was considered a hard correction, but extremely destructive at the same time.
In many ways, we made the bed we live in. A poorly standardized specification that allowed for interpretation, and is followed up by Browsers that accepted our bad habits, created a generation of coders that feel semantics is a waste of time and basically meaningless. Iconic.
"If it works in the Browser, then it's good enough for me!"
This is yesterday, we need to be thinking about tomorrow. The web isn't just for people to read blogs about their dog, but the INTERNET of things will vastly change the way that we communicate. Without semantics, we will have no real understanding about all the content around us. And if don't know that it's there when we need it, is it really there?
Semantics come in many forms. HTML core elements with more meaning are appearing in the spec. Additional tools like RDF, Microformats and ARIA are allowing us to grow our understanding of the content.
Prior to HTML5 there was a scan of the Internet that concluded that there were common patterns in how people were trying to structure their HTML documents. It was from these patterns that concepts like
footer were born. As these concepts are taking hold in the community, it is the constant review of these emerging patterns that continues to drive standardization in semantics.
In fact it was over time and with additional review of developing web sites that another common thread was discovered. Developers were either using a
<div> or a
<section> with the class of
.main or the ARIA
role="main" as the primary wrapper of the view. It was from this pattern that the W3C considered this as a specification.
The primary purpose of
<main>is to map ARIA’s landmark role main to an element in HTML. This will help screen readers and other assistive technologies understand where the main content begins.
In the end, what we are looking for is a consistent library of ideas that we can all agree has the appropriate meaning. These are heated discussions at best. Take for example the work being done with schema.org.
This site provides a collection of schemas that webmasters can use to markup HTML pages in ways recognized by major search providers, and that can also be used for structured data interoperability (e.g. in JSON). Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right Web pages.
Aside from the word 'webmasters', this is pretty good stuff. As we will discuss later with additional semantic tools, they commonly refer to this common library of meaning. The 'merriam webster' of the Internet of things.
What's interesting to me is the ongoing discussion I have with developers when it comes to the concept of 'adding semantics'. In the end, the root of the discussion is, "How can we add more meaning to this document and/or structure?" The issue lies within the concept of, if we add meaning, does this meaning mean the same to someone else?
A default position with most when trying to add semantics to a document is with the use of CSS classes. They typical reason being, this is easy. We can create any name we want and this will have meaning to us. While this will add internal meaning, if there is no schema, taxonomy or ontology, this process is essentially meaningless.
When creating documents, I challenge myself and others with the question, "What am I really saying here?" The crux of this conversation typically revolves around assistive technologies and how people with disabilities can interact with the web. This is the easy problem to solve as we have a clear understanding what it means to be disabled and try to interact with this medium.
I challenge you to look past these concepts and imagine how other applications, web crawlers, robots, etc ... look at your content. If we can write software that can read the context of a web document and convey meaning to a person who is visually impaired, why can we not write an application that can also interpret meaning?
What about an internet where content is not delivered by the means we consider appropriate today, but a web where technology knows us better when we know ourselves. A day when applications are able to scan the internet looking for appropriate content to deliver. This concept is void of presentation and 'keyword' searching at this point, but it is a process to aggregating content based on it's meaning, and not completely relying on the actual content (strings of text) itself.
The future is the future and we live in the now, but we need to prepare for the future. If we don't when it gets here, we will not be prepared. Where are my hover boards that Marty promised? Apparently we didn't prepare for that future.
Throughout this course I will continue to challenge the meaning of content and it's representation in HTML documents. Either by challenging exiting constructs or evaluating new tools given to us. At every turn we as developers need to challenge the constructs around us and make sure that we have the tools we need to be successful in this medium. But first, we need to completely understand the tools we have access to now and how to best use them.
For now, come away with the knowledge that HTML is not a structural language from which to reproduce Photoshop images. Primarily, it is a structural language in which it's purpose is to crate document structure. Secondarily, it's purpose is to allow for additional programatic attributes that apply presentation and interaction.
Maintaining this separation of concerns when building web apps will help you in avoiding the common problems that web developers encounter every day and build a better web for the future.
The Semantic Web
If you’re like me a few months ago, you go to a website and if it looks good and works well, you say “job well done!” What many people don’t think about is the monstrous size of the Internet and how complicated it must be for the the makers of websites and applications to latch onto any stability in an ever-evolving world full of emerging technologies.