A trove of leaked Google documents has provided an unprecedented look inside Google Search, revealing some of the most crucial elements Google uses to rank content.
What Happened
Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released on March 13 on GitHub by an automated bot named yoshi-code-bot. These documents were shared with Rand Fishkin, co-founder of SparkToro, earlier this month.
Discover what we’ve learned from Fishkin, as well as Michael King, CEO of iPullRank, who also reviewed and analysed the documents. King plans to provide further analysis for Search Engine Land soon.
Why We Care
We have been given a rare glimpse into how Google’s ranking algorithm works, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search ranking factors via a leak, which was one of the biggest stories of that year.
This Google document leak? It will likely be one of the biggest stories in the history of SEO and Google Search.
What’s Inside
Here’s what we know about the internal documents, thanks to Fishkin and King:
- Current: The documentation indicates this information is accurate as of March.
- Ranking Features: 2,596 modules are represented in the API documentation with 14,014 attributes.
- Weighting: The documents did not specify how any of the ranking features are weighted – just that they exist.
- Twiddlers: These are re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document,” according to King.
- Demotions: Content can be demoted for various reasons, such as:
- A link doesn’t match the target site.
- SERP signals indicate user dissatisfaction.
- Product reviews.
- Location.
- Exact match domains.
- Porn.
- Change History: Google apparently keeps a copy of every version of every page it has ever indexed, meaning Google can “remember” every change ever made to a page. However, Google only uses the last 20 changes of a URL when analysing links.
- Links Matter: Link diversity and relevance remain key, the documents show. PageRank is still very much alive within Google’s ranking features. PageRank for a website’s homepage is considered for every document.
This doesn’t prove Google spokespeople have lied about links not being a “top 3 ranking factor” or links mattering less for ranking. Two things can be true at once. Again, we don’t know how any of these features are weighted.
- Successful Clicks Matter: If you want to rank well, you need to keep creating great content and user experiences. Google uses a variety of measurements, including badClicks, goodClicks, lastLongestClicks, and unsquashedClicks.
Also, longer documents may get truncated, while shorter content gets a score (from 0-512) based on originality. Scores are also given to Your Money Your Life content, like health and news.
What Does It All Mean?
According to King:
“You need to drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank. Conceptually, it makes sense because a very strong piece of content will do that. A focus on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.”
Documents and testimony from the U.S. vs. Google antitrust trial confirmed that Google uses clicks in ranking – especially with its Navboost system, “one of the important signals” Google uses for ranking.
Brand Matters
Fishkin’s big takeaway? Brand matters more than anything else:
“If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: ‘Build a notable, popular, well-recognised brand in your space, outside of Google search.’”
Entities Matter
Authorship lives. Google stores author information associated with content and tries to determine whether an entity is the author of the document.
- SiteAuthority: Google uses something called “siteAuthority.”
Google told us something like this existed in 2011, after the Panda update launched, stating publicly that “low quality content on part of a site can impact a site’s ranking as a whole.” However, Google has denied having a website authority score in the years since then.
Chrome Data
A module called ChromeInTotal indicates that Google uses data from its Chrome browser for search ranking.
Whitelists
A couple of modules indicate Google whitelists certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Though we’ve long known Google (and Bing) have “exception lists” when “specific algorithms inadvertently impact websites.”
Small Sites
Another feature is smallPersonalSite – for a small personal site or blog. King speculated that Google could boost or demote such sites via a Twiddler. However, that remains an open question. Again, we don’t know for certain how much these features are weighted.
Other Interesting Findings
According to Google’s internal documents:
- Freshness Matters: Google looks at dates in the byline (bylineDate), URL (syntacticDate), and on-page content (semanticDate).
- Core Topic Relevance: To determine whether a document is or isn’t a core topic of the website, Google vectorises pages and sites, then compares the page embeddings (siteRadius) to the site embeddings (siteFocusScore).
- Domain Registration Information: Google stores domain registration information (RegistrationInfo).
- Page Titles Still Matter: Google has a feature called titlematchScore that is believed to measure how well a page title matches a query.
- Font Size and Anchor Text: Google measures the average weighted font size of terms in documents (avgTermWeight) and anchor text.
The Articles
- Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked by King on iPullRank
- An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them by Fishkin on SparkToro
Quick Clarification
There is some dispute as to whether these documents were “leaked” or “discovered.” It’s likely the internal documents were accidentally included in a code review and pushed live from Google’s internal code base, where they were then discovered.
3P Digital’s Takeaway
The revelations from these leaked documents are monumental for anyone involved in SEO. At 3P Digital, we’ve always emphasised the importance of high-quality content, diverse backlinks, and exceptional user experience. These findings confirm that our strategies align with the factors that truly matter in Google’s ranking algorithm.
Understanding the nuances of Google’s ranking factors allows us to refine our approach and deliver even better results for our clients. If you’re looking to enhance your SEO strategy and drive more qualified traffic to your website, our performance-based model ensures that you only pay for the tangible results we deliver.
Let’s discuss how we can leverage these insights to elevate your online presence. Contact us today!
Alex Frew is a prominent figure in the digital marketing landscape in Queensland, Australia. He is the founder of 3P Digital, an agency that operates on a unique pay-per-performance model. This model aligns the agency’s success directly with the success of its clients, ensuring that if the clients don’t see results, the agency doesn’t get paid. This approach has set 3P Digital apart from traditional digital marketing agencies, which often charge retainers regardless of performance.
Before establishing 3P Digital, Alex co-founded and successfully grew several other digital marketing agencies, including Yes Digital, Digital Six, and 2X Digital. His experience spans over eight years, during which he identified a strong demand among Australian businesses for a marketing agency that shares both the rewards and risks of their marketing efforts.
Alex’s leadership at 3P Digital emphasizes growth, transparency, and authentic client connections. The agency offers a wide range of services, including PPC management, SEO, YouTube SEO, strategy and research, technical SEO audits, and real-time reporting. This comprehensive service offering is designed to meet the diverse needs of their clients and ensure measurable outcomes.
Overall, Alex Frew’s innovative approach and commitment to client success have made him a notable figure in the Queensland digital marketing community.
0 Comments