Tracking Your Brand's Visibility in AI Answers

The vendor stack, the metrics that matter, and the manual protocol when budget rules out tools

Enric Ramos · · 12 min read
a white computer mouse

A CMO asked me last month what her brand's "share of AI" was, expecting a percentage. The honest answer was that no two vendors define the metric the same way, the manual sample I'd run that morning gave a different number than her vendor dashboard, and the figure most useful to her wasn't the one she'd asked for. That conversation is happening in every marketing org with an AI-search budget right now.

The measurement layer for brand visibility in LLMs is twelve months less mature than the surfaces it's trying to measure. Five vendors compete on overlapping but inconsistent metrics. Manual sampling produces cleaner numbers but doesn't scale. Server-log heuristics catch retrieval-bot traffic but not citation outcomes. Most SEO dashboards have no place for any of it. The result is that teams either over-rely on a single vendor's headline number, or they don't measure at all and ship blind.

This article maps the current toolset, defines the four metrics worth tracking, gives a manual sampling protocol that works when vendor budget isn't there, and shows how to integrate AI-visibility data into the SEO dashboard you already maintain. It assumes you've decided AI-channel visibility is worth measuring; if you haven't, read Citation Rate: The KPI Your SEO Dashboard Is Missing first.

Why every vendor reports a different number

Five major vendors sell AI-visibility tracking in April 2026: Profound, Otterly.ai, Athena, Goodie AI, and Mention.AI. A handful of newer entrants — including offerings inside Semrush, Ahrefs, and Conductor — provide partial coverage. None of them measure the same thing.

The variance comes from four design choices each vendor makes independently:

Query universe. Profound builds a query universe per customer based on seed keywords plus expansion. Otterly samples a vendor-curated universe of high-intent prompts. Athena lets you upload your own. Goodie infers prompts from your content. The same brand can score 3% in one universe and 18% in another, both correct under their own definitions.

Surface coverage. Profound covers ChatGPT, Perplexity, Google AI Overviews, and Claude. Otterly covers the same plus Gemini and Bing Copilot. Athena adds You.com and a handful of niche surfaces. The denominator changes with each surface added.

Sampling cadence. Daily samples vs weekly vs on-demand. A vendor that samples daily smooths volatility; a weekly sampler catches refresh cycles less reliably. Compare same-cadence vendors only.

Mention vs citation. This is the biggest divergence. A "mention" is the brand appearing in the generated answer text. A "citation" is the brand's URL appearing in the source list. Vendors blur this distinction inconsistently, and the metrics behave very differently — citations are rarer, more volatile, and more directly attributable to your content.

The practical consequence: pick one vendor as your primary, supplement with manual sampling for ground truth, and resist the urge to compare numbers across vendors. The trend within a single vendor is what's tracked; the absolute number is mostly noise.

The five vendors compared

A short, honest tour of the field as of April 2026.

Profound. Strongest enterprise tool, biggest customer base. Strong query-universe construction, daily sampling, good API. Weak on niche query baskets. Pricing starts around $1,000/month, scales to $10,000+ for global brands. Best fit: B2B SaaS and enterprise brands with budget.

Otterly.ai. Mid-market sweet spot. Solid surface coverage, weekly sampling on the standard tier (daily on enterprise), competitive pricing around $200-$800/month. Less polished UI than Profound, comparable data quality. Best fit: SMB SaaS, agencies tracking client portfolios.

Athena. Lets you bring your own query basket, which is the cleanest design philosophy in the space. Sampling is on-demand and weekly, surface coverage is broad. Pricing is opaque (mid-three figures monthly typical). Best fit: teams that already have a defined prompt universe.

Goodie AI. Newest entrant of the major five. Inferred query universe from content, which makes onboarding fast but the universe is less reliable. Pricing is approachable. Best fit: smaller content teams that want directional data fast.

Mention.AI. Pivoted into this space from social listening. Strongest on sentiment analysis of how the brand is described in answers, weaker on citation tracking. Best fit: brands that care more about how they're characterized than whether they're cited.

A practical recommendation: most teams should start with Otterly or Athena for two months, evaluate the data against manual sampling, then either upgrade to Profound for scale or stay where they are. Vendors are still differentiating fast, and the right answer in mid-2026 may differ from April 2026.

For a deeper read on individual surface optimization, see Optimizing for Perplexity and ChatGPT Search Optimization.

The four metrics actually worth tracking

Strip the vendor dashboards down to first principles and four metrics survive. Track these; ignore the rest.

Mention rate. The percentage of sampled queries where your brand name appears in the generated answer text. Measured per-surface and aggregated. A brand mentioned in 22% of ChatGPT answers and 8% of Perplexity answers has a 15% blended mention rate (weighted by query volume per surface). Mention rate is the broadest visibility signal and the one most stakeholders intuit.

Citation rate. The percentage of sampled queries where your domain appears in the source list of the generated answer. This is rarer than mention rate (a 20% mention rate often coincides with a 4-7% citation rate) and more directly attributable to your content quality. Citation rate is the metric SEO teams should care about most, because it maps cleanly to actions you can take.

Sentiment. The polarity of how your brand is described when mentioned. Negative sentiment is the loudest signal — it usually means an AI hallucination, an out-of-date claim, or a competitor's framing that the model has internalized. Track sentiment as a tripwire, not a primary KPI; the action threshold is the appearance of consistent negative sentiment, not the daily fluctuation.

Attribution accuracy. When the model attributes a claim to your brand, how often is the claim actually correct? This is the sleeper metric. A brand cited frequently but with hallucinated capabilities is in worse shape than a brand cited less often but accurately. Attribution accuracy requires manual sampling — no vendor measures it cleanly.

What to skip: "share of voice" composite metrics that bundle the above into a single number, "AI authority score" black-box rankings, and any metric that doesn't tell you what to fix when it moves. The four above each map to specific actions: low mention rate means brand-entity weakness; low citation rate means content-extraction weakness; negative sentiment means content correction needed; low attribution accuracy means schema and structured-data work.

A manual sampling protocol that works

When the vendor budget isn't there or the data needs ground truth, manual sampling is the fallback. Done well, it produces cleaner numbers than vendor dashboards. Done badly, it produces noise.

The protocol I use has four parts.

Define the query basket. Pick 50 queries — 15 branded, 20 product/category, 15 informational. The branded queries test your visibility on direct prompts about your brand. The product/category queries test visibility on the prompts that actually drive purchase research. The informational queries test entity authority on your topical area. Lock the basket; don't change it for at least a quarter.

Define the surfaces. ChatGPT (with web search on), Claude (with web search on), Perplexity, Google AI Overviews, and Bing Copilot at minimum. Five surfaces × 50 queries = 250 samples per cycle. This is doable in 90-120 minutes a week if you're disciplined.

Define the sampling protocol. Run each query in incognito or a fresh logged-out session. Use a US IP unless your market is non-US. Screenshot every answer. Log five fields per sample: surface, query, brand mentioned (yes/no), domain cited (yes/no), sentiment if mentioned (positive/neutral/negative), accuracy if mentioned (correct/partial/incorrect).

Define the cadence. Weekly is the floor. Anything less and you can't separate signal from noise. Monthly is too coarse to catch refresh-cycle effects. Daily is overkill for most teams; it overweights the volatility of the surfaces themselves.

The 12-week trailing rate is the readout. A 12-week trailing mention rate of 14% on the branded basket and 6% on the product basket, with a positive 2-percentage-point trend over the previous quarter, is a clear story. Daily volatility is noise; quarterly trend is signal.

For solo SEOs and lean teams, this protocol is the entire measurement layer. For larger teams, it becomes the ground-truth check against vendor data.

Integrating AI visibility into your existing SEO dashboard

Most SEO dashboards are built around organic traffic, rankings, and conversion. AI visibility doesn't fit any of those native columns cleanly, and the temptation is to bolt it onto a separate tab and forget about it. Resist that.

The integration that works has three layers.

Top-line dashboard. Add three metrics to the executive view: mention rate, citation rate, and a sentiment tripwire. Show them with the same time-series treatment as organic traffic — 12-week trailing, with month-over-month and year-over-year deltas. This is the row that tells the CMO whether AI visibility is moving the right way.

Diagnostic layer. Below the top-line, expose the per-surface breakdown. ChatGPT mention rate, Perplexity citation rate, AI Overview citation rate, separately. The surface breakdown is what the SEO team uses to diagnose where to invest next quarter. A brand with 22% on ChatGPT and 4% on Perplexity has a Perplexity-specific problem; treat it differently than a brand with low rates everywhere.

Action layer. The bottom of the dashboard exposes the queries where you've lost or gained citation in the last 30 days. New citations earned go to a "what worked" review for replication. New citations lost go to a "what broke" review for root-cause. This is the layer that turns the dashboard from a reporting tool into an operating tool.

Most SEO platforms (Looker Studio, custom BI, GSC-integrated dashboards) accept CSV exports from the major vendors. The integration is a 30-minute Looker Studio job for any team that already has a working SEO dashboard. The Looker Studio template I use is straightforward: vendor CSV → BigQuery table → Looker Studio data source → three rows of cards plus the per-query exception report.

For broader dashboard architecture and the KPI tree this fits into, see Building an SEO KPI Tree from Revenue Down.

Common mistakes when reporting AI visibility

Three mistakes I see consistently across teams:

Reporting absolute numbers without trend context. "We're at 12% mention rate" tells the CMO nothing useful. "We're at 12%, up from 8% twelve weeks ago, on a basket of 50 queries we've held constant" is a defensible report. Always pair the level with the trend and the basket definition.

Comparing across vendors. A 14% mention rate from Profound and a 9% mention rate from Otterly do not contradict each other; they measure different denominators. Pick one vendor as primary, treat its trend as the operating signal, and don't cross-reference vendor rates as if they were absolute.

Ignoring sentiment until it's a crisis. Negative sentiment in AI answers compounds. A brand described as "expensive" or "buggy" in two surfaces this month is described that way in five surfaces next month, because the surfaces increasingly cite each other. Catch sentiment drift early, fix the source content, and re-measure.

Confusing mention rate with revenue impact. Mention rate is a leading indicator of brand exposure, not a revenue metric. The cleanest revenue read I've seen from AI visibility comes through branded-search velocity, which lifts 4-8 weeks after sustained AI mention rate increases. Don't promise the CMO that "12% mention rate equals X dollars." It doesn't, and the relationship is genuinely fuzzy in 2026.

What to ship in your first measurement quarter

If you're starting from zero on AI-visibility tracking, the 90-day plan that works:

Days 0-15 — Define the basket and the baseline.

  • Build the 50-query basket: 15 branded, 20 product, 15 informational. Lock it.
  • Run the first manual sample across five surfaces. This is your week 1 baseline.
  • Decide whether vendor budget is in scope this quarter. If yes, evaluate Otterly and Athena trials in parallel.

Days 15-45 — Operational cadence.

  • Weekly manual sampling, every Monday morning (or whatever fixed slot).
  • If vendor selected, onboard and run vendor data alongside manual for at least four weeks to validate.
  • Build the dashboard skeleton: top-line, diagnostic, action layers.

Days 45-90 — Acting on the data.

  • Identify the three queries with the largest week-over-week citation movement (positive or negative).
  • For positive movement, document what changed in your content or schema and replicate where applicable.
  • For negative movement, root-cause: schema regression, content drift, competitor entity strengthening.
  • Ship one improvement per week. Re-measure monthly.

By day 90 you have a working measurement function, a vendor relationship validated against ground truth, and a backlog of high-leverage actions to ship in the next quarter. The full operating model is in Generative Engine Optimization: The 2026 Playbook.

Frequently asked questions

How accurate are vendor tools compared to manual sampling?

Within 2-4 percentage points typically, when query baskets are aligned. Vendors tend to over-report mention rate (because they sample broader prompt variations than humans do) and under-report citation rate (because their citation parsing misses some surfaces). For decision-making, the trend matches; the absolute level differs.

Can I use Google Search Console for AI visibility tracking?

No. GSC counts AI Overview impressions in the regular impression total but does not separate them. There is no AI-citation report in GSC as of April 2026. The closest signal in GSC is impression-without-click growth on queries where AI Overviews are common, which is suggestive but not measurable.

How do I attribute revenue to AI visibility?

Carefully. Direct attribution is rarely possible — AI mentions don't pass UTM parameters, citations don't always lead to clicks. The cleanest attribution chain is mention rate → branded search velocity → branded organic conversions, with a 4-8 week lag. Treat AI visibility as upper-funnel and report it that way.

What about zero-click searches?

Zero-click is the broader category that AI visibility lives inside. AI Overviews, featured snippets, and knowledge panels all reduce CTR while preserving (or expanding) impression count. The visibility metrics in this article apply across the zero-click surface; the strategic frame is in Zero-Click Search: Revenue When Users Don't Click.

Does sentiment in AI answers really drive customer behavior?

Increasingly, yes. The 2025 Edelman Trust Barometer showed AI-search results scoring trust ratings comparable to top-tier publications for product research queries. Negative sentiment in AI answers translates into purchase consideration impact at roughly the same rate as negative reviews on G2 or Trustpilot, in the surveys I've seen. Track it.

What's the relationship between AI visibility and traditional SERP rankings?

Strong but not deterministic. High-ranking pages cite more often than low-ranking pages, but the correlation is loose enough that pages ranking #15 regularly outperform pages ranking #2 in citation rate. The chunk-level patterns in Optimizing for AI Overviews explain most of the variance.

The honest summary on AI-visibility tracking in 2026: it's a real measurement problem with imperfect tools and a clear set of metrics that matter. Pick a vendor or run manual sampling, lock your query basket, integrate the data into the dashboard you already have, and report the trend honestly. The teams winning this transition are not the ones with the best tools. They're the ones who decided to measure consistently and act on what they saw.

Related articles