Algorithms & Quality · Glossary · Updated Apr 2026

Information gain

Definition

Information gain is a score described in Google's [US Patent 11620273](https://patents.google.com/patent/US11620273B2/) measuring how much new information a given document adds beyond what the user has already seen in other documents on the same topic. Rewards original substantive contribution; penalizes the third article on a SERP that just restates the first two.

Find related

Long definition

The information gain patent — "Contextual estimation of link information gain" — was granted to Google in April 2023, with priority dating to 2018. The patent describes a model that scores a document's incremental value to a user who has already consumed other documents on the same query. The core question: if the user has read documents A and B, how much new information does document C contribute?

This is a different question from relevance, quality, or freshness. A relevant document can have an information gain score near zero if it only repeats what already-ranked documents say. An obscure document can have a high information gain score if it adds a fact, perspective, dataset, or original observation that the rest of the SERP lacks.

The patent is not confirmation that Google ranks by information gain in production. Patents represent inventions Google claims, not necessarily systems live in search. But the concept aligns publicly with what Google's documentation says about the Helpful Content System — content should provide "substantial value compared to other pages in search results", "include original information, reporting, research, or analysis", and "go beyond the obvious".

Practical signals that align with high information gain:

  • Original data — your own survey, your own crawl, your own measurements.
  • Genuine first-hand experience — you used the product, ran the test, visited the place.
  • Synthesis the SERP lacks — connecting two ideas no other ranked page has connected.
  • Counterpoint — a substantiated dissent against the consensus the rest of the SERP repeats.
  • Detail at a depth competitors don't reach — methodology, edge cases, failure modes.

What scores low: rewriting the top three results in different words; publishing the same listicle as everyone else with reordered items; AI-generated synthesis that compresses what's already indexed without adding observations.

Common misconceptions

  • "Information gain is a confirmed ranking factor." It's a patent, not a confirmed live system. The pattern it describes aligns with publicly-documented quality signals, but Google has not confirmed information gain runs in production search.
  • "Longer articles have more information gain." Length doesn't equal gain. A 500-word piece with one original dataset can score higher than a 5,000-word rewrite of existing material. The unit is new information, not word count.
  • "Information gain only matters for journalism or research." It applies to any topic with multiple competing pages. Product reviews, recipes, how-to guides, glossary entries — wherever a SERP exists with several documents covering the same ground, the document that adds the most novel substance has the strongest case to rank.
  • "AI-generated content can't have information gain." It can, when paired with original inputs the AI didn't have — your data, your experience, your testing. AI-generated content built only from re-summarizing already-indexed sources is exactly the pattern information gain would score near zero.