On-Page SEO · Glossary · Updated Apr 2026

Semantic HTML

Definition

Semantic HTML is the practice of using HTML elements that carry meaning about their content (`<article>`, `<section>`, `<nav>`, `<main>`, `<aside>`, `<header>`, `<footer>`) instead of generic `<div>` containers. Crawlers, screen readers, and LLM scrapers rely on these tags to extract structure.

Find related

Long definition

Semantic HTML is the difference between a page that's structured and a page that just looks structured. A <div class="article"> renders the same as <article> — the second carries machine-readable meaning that the first does not.

The core landmark elements, per the WHATWG HTML spec:

  • <main> — the page's primary content (one per page).
  • <article> — a self-contained piece of content (a blog post, a product card, a forum reply).
  • <section> — a thematic grouping inside the page; usually has a heading.
  • <nav> — primary navigation links.
  • <aside> — content tangentially related (sidebar, callout, related-posts).
  • <header> and <footer> — header/footer for the page or for an <article>.
  • <figure> / <figcaption> — images or media with caption.
  • <time datetime="..."> — machine-readable dates.

These tags are read by three audiences. Screen readers use them as landmarks for keyboard navigation — users jump between <nav>, <main>, <aside>. Search crawlers use them as content-vs-chrome signals; Google's documentation explicitly mentions <main> and <article> as helpful for primary-content extraction. LLM scrapers (GPTBot, ClaudeBot, PerplexityBot) extract <article> content as the canonical chunk for retrieval and quoting — divs are noisier and quoted less reliably.

Semantic HTML compounds with structured data. JSON-LD Article schema works better when there's an actual <article> element wrapping the content; Person and Organization schemas pair naturally with <header> and contact <footer> blocks.

The cost of semantic HTML is roughly zero — it's a tag substitution. The cost of not using it is silent: lower content-extraction confidence, weaker accessibility, less reliable AI quoting. For new templates, default to semantic. For legacy templates, the migration is mostly mechanical.

Common misconceptions

  • "Semantic HTML is a direct ranking factor." It isn't a tagged factor. It improves the inputs to ranking — content extraction, accessibility, structured-data validity — which improve the signals algorithms read.
  • "Divs are fine if I add ARIA roles." ARIA is a fallback when semantic HTML can't express the role. The W3C's first ARIA rule is "do not use ARIA if a native element does the job". <button> beats <div role="button"> every time.
  • "<section> and <div> are interchangeable." A <section> should have a heading and represent a thematic chunk. A <div> is a generic styling box with no meaning. Mixing them up produces broken document outlines.
  • "LLMs ignore HTML structure and just read text." Modern LLM scrapers parse the DOM and prefer semantic landmarks. Sites with clean <article> markup get cleaner quotations in AI Overviews and Perplexity citations.