On-Page SEO · Glossary · Updated Apr 2026

Semantic HTML

Definition

Semantic HTML is the practice of using HTML elements that carry meaning about their content (`<article>`, `<section>`, `<nav>`, `<main>`, `<aside>`, `<header>`, `<footer>`) instead of generic `<div>` containers. Crawlers, screen readers, and LLM scrapers rely on these tags to extract structure.

Long definition

Semantic HTML is the difference between a page that's structured and a page that just looks structured. A <div class="article"> renders the same as <article> — the second carries machine-readable meaning that the first does not.

The core landmark elements, per the WHATWG HTML spec:

<main> — the page's primary content (one per page).
<article> — a self-contained piece of content (a blog post, a product card, a forum reply).
<section> — a thematic grouping inside the page; usually has a heading.
<nav> — primary navigation links.
<aside> — content tangentially related (sidebar, callout, related-posts).
<header> and <footer> — header/footer for the page or for an <article>.
<figure> / <figcaption> — images or media with caption.
<time datetime="..."> — machine-readable dates.

These tags are read by three audiences. Screen readers use them as landmarks for keyboard navigation — users jump between <nav>, <main>, <aside>. Search crawlers use them as content-vs-chrome signals; Google's documentation explicitly mentions <main> and <article> as helpful for primary-content extraction. LLM scrapers (GPTBot, ClaudeBot, PerplexityBot) extract <article> content as the canonical chunk for retrieval and quoting — divs are noisier and quoted less reliably.

Semantic HTML compounds with structured data. JSON-LD Article schema works better when there's an actual <article> element wrapping the content; Person and Organization schemas pair naturally with <header> and contact <footer> blocks.

The cost of semantic HTML is roughly zero — it's a tag substitution. The cost of not using it is silent: lower content-extraction confidence, weaker accessibility, less reliable AI quoting. For new templates, default to semantic. For legacy templates, the migration is mostly mechanical.

Common misconceptions

"Semantic HTML is a direct ranking factor." It isn't a tagged factor. It improves the inputs to ranking — content extraction, accessibility, structured-data validity — which improve the signals algorithms read.
"Divs are fine if I add ARIA roles." ARIA is a fallback when semantic HTML can't express the role. The W3C's first ARIA rule is "do not use ARIA if a native element does the job". <button> beats <div role="button"> every time.
"<section> and <div> are interchangeable." A <section> should have a heading and represent a thematic chunk. A <div> is a generic styling box with no meaning. Mixing them up produces broken document outlines.
"LLMs ignore HTML structure and just read text." Modern LLM scrapers parse the DOM and prefer semantic landmarks. Sites with clean <article> markup get cleaner quotations in AI Overviews and Perplexity citations.

Continue exploring