Schema Markup That LLMs Actually Use
Six types that move citation rate, six that don't, and the less-is-more rule
The schema audit a typical SEO team ships in 2026 looks impressive on paper and accomplishes very little. Twelve schema types deployed, three rich result eligibilities earned, a green Schema Validator dashboard. None of it correlates with citation rate in ChatGPT, Claude, Perplexity, or Google AI Overviews. The team checks the box and the visibility doesn't move.
The reason is straightforward. LLM retrieval layers don't reward schema coverage; they reward schema correctness on a small set of types they actually read. Six types pull weight in citation behavior, six are read but ignored as ranking signals, and the rest are noise. A site with three well-formed Article, Organization, and Person schemas outperforms a site with twelve schemas where half have property mismatches against the rendered DOM.
This article maps the six schema types LLMs actually use, the six they read but don't reward, the "less is more" rule, and concrete JSON-LD examples for the three highest-leverage types. It assumes you've already deployed at least basic Article schema and now need to know which next moves matter. If you haven't deployed any schema, start with structured-data fundamentals and come back.
Why correctness beats coverage
The classic schema audit asks "which types are eligible for rich results?" and tries to maximize the answer. That question made sense in 2018, when Google's rich-result eligibility was the primary schema use case. It made less sense in 2022 when Google deprecated multiple rich-result types. It barely makes sense in 2026, when LLM retrieval matters as much as rich-result eligibility.
The new question: "which schema types does the retrieval layer read, and is my markup correct enough to be trusted?"
The retrieval layer cross-checks markup against rendered DOM. A Product schema with a price property of $49.99 paired with a rendered page showing $59.99 is detected as inconsistent, and the entire schema block is silently discounted. Not just the price field — the whole block. A FAQPage with Question items that don't appear in the rendered page is detected as marketing fluff and discounted. An Organization schema with a sameAs link to a LinkedIn profile that 404s is detected as stale and discounted.
The audit metric that matters is not schema count. It's schema-DOM consistency. Three schema blocks with 100% consistency outperform twelve blocks with 70% consistency, every time. The retrieval layer does not give partial credit for "mostly correct."
What this means in practice: cut your schema deployment to the types that move citation rate, audit them ruthlessly for DOM consistency, and let the rest go. The audit checklist below in the closing section applies the rule.
Article: the foundation, when you bother to fill it
Article (and its subtypes NewsArticle, BlogPosting, TechArticle) is the most-read schema type by every LLM retrieval layer in 2026. The minimum bar for trustable Article markup has six properties:
headline(matches<h1>text exactly)authoras a nestedPersonwith at leastnameandurldatePublished(ISO 8601, matches a visible publish date in the rendered DOM)dateModified(ISO 8601, matches a visible "Updated" marker if present)publisheras a nestedOrganizationwithnameandlogoarticleBodyor a clear mainEntityOfPage pointer
Half the Article schema I see in 2026 audits is missing the nested Person entity for the author. The schema declares "author: 'Enric Ramos'" as a string, which the retrieval layer reads as an unverifiable claim. A nested Person with name, url, and sameAs (LinkedIn, Twitter, ORCID where applicable) turns the author into a verifiable entity that the model can attribute citations to.
A working Article JSON-LD block:
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Schema Markup That LLMs Actually Use",
"author": {
"@type": "Person",
"name": "Enric Ramos",
"url": "https://crawlsense.com/team/enric-ramos",
"sameAs": [
"https://www.linkedin.com/in/enric-ramos",
"https://twitter.com/enricramos"
]
},
"datePublished": "2026-04-25",
"dateModified": "2026-04-25",
"publisher": {
"@type": "Organization",
"name": "CrawlSense",
"logo": {
"@type": "ImageObject",
"url": "https://crawlsense.com/logo.png"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://crawlsense.com/blog/geo-ai-search/schema-for-ai-search"
}
}
That block, on a page where the rendered content matches every property, is the single highest-leverage schema move for content sites. It's not glamorous; it's foundational.
Organization: the entity anchor for your domain
Organization schema lives on the homepage (and ideally the about page) and anchors your brand entity in the knowledge graph. The minimum bar:
name(the canonical brand name, matched against Wikidata if you have an entry)url(the canonical homepage)logo(square, ideally 512x512 minimum)sameAs(an array of authoritative profile URLs)
The sameAs array is the load-bearing property. It's what the retrieval layer uses to disambiguate your organization from others with similar names. The chain that works in 2026:
- Your Wikidata entity (if you have one)
- Your Wikipedia article (if you have one)
- LinkedIn company page (verified)
- Crunchbase (for SaaS, B2B)
- One industry-specific authority (G2 for SaaS, Glassdoor for staffing, GitHub org for dev tools, ProductHunt for consumer tech)
Three to five sameAs links is the sweet spot. Two is borderline. One is insufficient. Ten is over-claiming and starts to dilute trust if any of the links look low-authority.
A working Organization JSON-LD block:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "CrawlSense",
"url": "https://crawlsense.com",
"logo": "https://crawlsense.com/logo.png",
"sameAs": [
"https://www.wikidata.org/wiki/Q123456789",
"https://www.linkedin.com/company/crawlsense",
"https://www.crunchbase.com/organization/crawlsense",
"https://github.com/crawlsense"
],
"description": "SEO audit and AI-visibility tracking platform for SaaS and content sites.",
"founder": {
"@type": "Person",
"name": "Enric Ramos",
"url": "https://crawlsense.com/team/enric-ramos"
},
"foundingDate": "2024-09-01"
}
The founder nested Person is optional but worth including for small companies — it ties the founder's personal entity to the organizational entity, which strengthens both. For deeper implementation, see Entity SEO: Building the Knowledge Graph LLMs Read.
Person: making authors verifiable entities
Person schema is the third leg of the trustworthy-content tripod. Every bylined article should reference a Person entity, and that entity should be verifiable through sameAs links to professional profiles.
The properties that matter:
name(the canonical author name as it appears across publications)url(the author's bio page on your site)jobTitle(current role)worksFor(linkedOrganization)sameAs(array of authoritative professional profiles)
The retrieval layer uses Person schema for two things: attribution (citing this article requires knowing who the author is) and authority weighting (an author with stronger entity signals lifts the article's citation likelihood). Both matter for E-E-A-T-sensitive content, especially YMYL topics.
A working Person block, embedded as the author of an Article:
{
"@type": "Person",
"name": "Enric Ramos",
"url": "https://crawlsense.com/team/enric-ramos",
"jobTitle": "Founder & SEO Lead",
"worksFor": {
"@type": "Organization",
"name": "CrawlSense"
},
"sameAs": [
"https://www.linkedin.com/in/enric-ramos",
"https://twitter.com/enricramos",
"https://scholar.google.com/citations?user=ABC123"
]
}
The Google Scholar link is the strongest authority signal where it applies (academic, technical, research-adjacent topics). LinkedIn and Twitter cover the rest.
FAQPage: read by the retrieval layer, deprecated by Google rich results
FAQPage schema is in an awkward position. Google deprecated FAQ rich results for most sites in August 2023, which meant the visible-in-SERP benefit largely disappeared. But the schema is still read by every LLM retrieval layer in 2026, and it remains a useful grounding signal for genuinely Q&A-shaped content.
The rule: use FAQPage schema when the content is actually question-and-answer shaped, with each Question being a distinct sub-query and each acceptedAnswer being 60-150 words of substantive answer. Don't use it for marketing content dressed up as FAQs ("What makes our product great?" → marketing copy is detected and discounted).
The properties that matter:
mainEntity(array ofQuestionitems)- Each
Question:name(the question text),acceptedAnswer(aAnswerentity) - Each
Answer:text(the answer body, matching the rendered DOM exactly)
The match-against-rendered-DOM check is strict here. If your FAQ schema lists a question that doesn't appear in the visible page, the entire FAQPage block is silently discounted. The retrieval layer will not partially trust it.
For a worked pattern, the FAQ at the bottom of this article is implemented as FAQPage schema with each visible question matched 1:1 against a Question entity. The questions are short, the answers are 80-120 words, the rendered text matches the markup. That's the pattern that works.
Product, BreadcrumbList, and HowTo: the second tier
Three more types pull weight, with caveats.
Product. Essential for ecommerce. Properties that matter: name, image, description, brand, offers (with price, priceCurrency, availability). The price-DOM consistency check is the thing teams break most often — when price changes ship to the rendered page but the schema isn't updated, the schema is discounted.
BreadcrumbList. Read by the retrieval layer for site-structure context. Useful for content sites with a clear hierarchy. The schema needs to match the visible breadcrumb in the DOM; mismatches are detected.
HowTo. Google deprecated HowTo rich results in September 2023, which means the SERP-visible benefit disappeared. The retrieval layer still reads HowTo schema for genuinely step-by-step content. Use it where the content is actually a procedural how-to, not where it's a list of tips dressed up as steps.
A worked Product example for a SaaS pricing page:
{
"@context": "https://schema.org",
"@type": "Product",
"name": "CrawlSense Pro",
"description": "AI-visibility tracking and SEO audit platform for SaaS teams.",
"brand": {
"@type": "Brand",
"name": "CrawlSense"
},
"offers": {
"@type": "Offer",
"url": "https://crawlsense.com/pricing",
"priceCurrency": "EUR",
"price": "99.00",
"availability": "https://schema.org/InStock",
"priceValidUntil": "2026-12-31"
}
}
The priceValidUntil is required by Google for the Offer to be valid, and it must be a future date. A schema with a past priceValidUntil is invalid even if every other property is correct.
What LLMs read but don't reward
Six schema types are read by retrieval layers but don't materially move citation rate, in my testing across 200+ pages over fifteen months.
Review aggregate ratings without third-party verification. Self-reported star ratings on your own site without a verifiable source (G2, Trustpilot, Google Reviews) are read but heavily discounted. The retrieval layer treats unverified self-ratings as low-trust.
Event schema for marketing events. Read but rarely cited. Events surface through specialized tooling (Google Events, calendar integrations); LLMs don't lean on the schema for general retrieval.
Recipe. Useful for recipe sites, but the retrieval layer reads recipes from prose just as well. The schema is icing, not foundation.
VideoObject. Read, but LLMs don't currently retrieve video content for citation purposes. The schema is more useful for Google Video search than for AI Search.
LocalBusiness. Important for local SEO and Google Business Profile, but not a meaningful AI Search lever. LLMs lean on Google's Local Pack data more than your LocalBusiness schema.
SoftwareApplication. Read, but rarely cited specifically. The retrieval layer prefers Product for SaaS pricing pages and Article for software documentation.
The pattern: schema types that map to specific Google rich-result eligibilities are read but don't generalize to AI Search citation. Schema types that anchor entity verification (Article, Organization, Person) generalize broadly.
The "three not twelve" rule
The audit framework I use is simple. For any content site:
- Always:
Articleon every article page.Organizationon the homepage.Personfor every author. - Where it fits naturally:
FAQPageon Q&A-shaped content.BreadcrumbListon hierarchical content.Producton ecommerce/pricing pages. - Skip unless you have a specific reason: everything else.
For a B2B SaaS content site, that means Article + Organization + Person + occasional FAQPage + Product on pricing. Five total deployments, all of them DOM-consistent, all of them maintained. Not twelve types where six are stale.
The reason "less is more" works: every schema block is a maintenance liability. When the page updates, the schema needs to update. A site with twelve schema types has twelve update points; one falls out of sync within six months and the retrieval layer notices. A site with three schema types maintains them.
The audit move that compounds: kill the schema you don't need. The deletion is the gain.
Validation that catches what Schema Validator misses
Google's Rich Results Test and Schema.org Validator catch syntactic errors. They don't catch the bigger problem: schema-DOM inconsistency.
The validation layer that works has three checks:
Syntactic. Run the page through Google Rich Results Test and Schema.org Validator. Both should be clean. This is table stakes.
DOM consistency. For each schema property, verify it appears in the rendered DOM (after JavaScript execution if relevant). This is where most schema breaks. Tools that help: Screaming Frog with custom extraction, Sitebulb's structured-data audit, or a small Puppeteer script that diffs schema-property values against rendered text.
Freshness. For schemas with date properties (Article, Event, Product priceValidUntil), verify the dates are current. Stale dates discount the schema even when everything else is correct.
The cadence that works: full validation on every new template, spot-check validation on a 10-page sample monthly, full re-audit quarterly. For sites with frequent template changes, weekly spot-checks.
For broader audit framework integration, see The Complete Guide to Technical SEO Audits and the schema-specific entries in the json-ld glossary.
Putting schema on your audit checklist
The six-step audit pattern:
- List every schema type currently deployed across your site. If it's more than six types, plan deletions.
- For the remaining types, verify each is in the high-leverage list (
Article,Organization,Person,FAQPage,BreadcrumbList,Product). Drop the rest. - For each remaining schema block, run DOM consistency check. Fix mismatches by updating the schema, not the DOM.
- For
Organization, verify thesameAschain is current — every link resolves, every profile is verified. - For
Personon authors, verify each author has a real bio page and a currentsameAschain. - Re-validate quarterly. The maintenance cost is the entire reason "less is more" works — and the reason ten-type schema deployments rot.
This satellite is part of the broader generative engine optimization playbook, where schema is the structured-grounding leg of the four-leverage-point framework. Citation magnetism, entity authority, structured grounding, and brand-mention density compound when all four are operating; a schema audit alone won't move citations if entity authority is weak.
Frequently asked questions
Should I implement Schema.org types beyond what Google supports for rich results?
Yes, when they map to LLM retrieval layers' reading patterns. Article, Organization, and Person are read by every major LLM regardless of Google rich-result eligibility. HowTo and FAQPage retain LLM value even after Google deprecated their rich results. The rich-result list and the LLM-reading list overlap but aren't identical.
Does invalid schema hurt my page or just get ignored?
Invalid schema is silently ignored by Google for rich-result eligibility, and silently discounted by LLM retrieval layers for grounding. It doesn't actively penalize the page, but you get zero benefit from the markup. For practical purposes, treat invalid schema as worse than no schema, because it's a maintenance liability with no upside.
What's the right format — JSON-LD, microdata, or RDFa?
JSON-LD, with no exceptions in 2026. Google explicitly prefers JSON-LD, all major LLM retrieval layers parse it cleanly, and the maintenance pattern (one block in <head> per schema entity) is the cleanest. Microdata and RDFa are legacy formats that introduce DOM-consistency risk for no benefit.
How do I add schema for content I don't author directly (UGC, comments, reviews)?
Carefully. The retrieval layer's DOM-consistency check applies, which means schema must match what users actually wrote. For genuine UGC reviews, Review schema with author as a Person (even if anonymous, with name: "Verified buyer") is workable. For comment threads, schema is rarely worth the complexity.
Does schema markup help with conversational search specifically?
It helps indirectly, by making your content more reliably attributable. Conversational queries are decomposed into sub-questions, and the retrieval layer scores chunks against each sub-question. Well-formed schema gives the layer cleaner attribution targets, which lifts citation likelihood across conversational and traditional surfaces equally.
When should I add Speakable schema for voice search?
SpeakableSpecification is read by Google Assistant for news content and ignored almost everywhere else. For news publishers, it's worth implementing on time-sensitive content. For content sites, blog posts, and SaaS docs, the implementation cost outweighs the benefit. Skip unless you're publishing news.
The summary on schema in 2026: deploy three types well, not twelve poorly. Article, Organization, Person are the foundation. FAQPage, BreadcrumbList, Product are the targeted additions where they fit naturally. Validate against the rendered DOM, not just the syntax checker. Kill the schema you don't need. The audit move that compounds is the deletion, not the addition.
Related articles
Managing LLM Crawlers: GPTBot, ClaudeBot, Google-Extended
Eight LLM crawlers now hit your site. Some train, some retrieve, some do both. Blocking the wrong one costs you AI-channel visibility for nothing. Here's the matrix and the robots.txt that maps to it.
Optimizing for Perplexity: What Sources Get Cited
Perplexity citations don't follow Google's logic. Older domains, .edu and .gov bias, deeper retrieval, and a freshness signal that punishes thin update cycles. Here's the playbook for the second-largest answer engine.
Tracking Your Brand's Visibility in AI Answers
Five vendors now sell AI-answer visibility tracking. The metrics they report don't match. Here's the toolset, the metric definitions worth using, and a manual sampling protocol when budget rules out vendors.