Google Search Engine Ranking Algorithm Leak

This breakdown unveils potential Google Search ranking factors, including details on PageRank variations, site authority metrics and more.

google search engine algorithm leak 1

A leak of the Google API Content Warehouse document was first reported by the legendary SEO Rank Fishkin over at SparkToro containing links to private repositories and internal pages of Google’s corporate website. This 2500-page document provides a detailed explanation of how Google’s search engine operates. SEO specialists are actively analyzing the leaked data.

Based on the information presented in the documentation, Google uses over 14,000 factors to rank websites. Among these, user clicks and post-click behavior, known as behavioral factors, play a crucial role. Our HVAC SEO specialists have studied the new data, clicks and site navigation are now much more significant, while content and links are secondary.

google ranking leak

Google Ranking Algorithms

Recent leaks from Google, combined with disclosures from the U.S. Department of Justice’s antitrust case, have shed light on various aspects of Google’s ranking algorithms that contradict some of the company’s public statements. Here are several key points that differ from Google’s claims about its ranking methods and are of great interest to SEO specialists:

User Behavior on Site:
– An important ranking component, NavBoost, utilizes click-based data to elevate or demote a site’s ranking. NavBoost analyzes user clicks on search results, considering parameters such as “goodClicks,” “badClicks,” and “lastLongestClicks.” This allows Google to understand which search results most satisfy users and which pages to promote in rankings. NavBoost also accounts for user behaviors like pogo-sticking (quickly returning to search results after clicking on a result that didn’t meet the user’s needs). The duration of clicks (time spent on a page) is analyzed to determine the usefulness and relevance of a page.

Use of Chrome Data:
– The leak revealed that Google collects extensive user behavior data, which is used to assess pages and domains. For example, Google can use the number of clicks on pages in the Chrome browser to determine the most popular URLs on a site, affecting the creation of Sitelinks.

Whitelist of Sites:
– Google maintains whitelists for sites related to travel, COVID, and elections. This allows Google to control search results for controversial or potentially problematic queries, ensuring that only verified and reliable sources are shown.

Domain Authority:
– Although Google has repeatedly claimed not to use the Domain Authority metric in its algorithms, the leaked data showed the existence of a siteAuthority metric used in the Q* system to evaluate site authority. This indicates an internal equivalent to the Domain Authority metric.

Sandboxing:
– Google has claimed that there is no “sandbox” and that new sites are not subject to special restrictions. However, the leak mentions a hostAge attribute used for “fresh spam sandboxing,” confirming that Google does employ some form of “sandboxing” for new or suspicious sites.

Data from EWOK:
– EWOK is an internal Google platform used for search quality evaluation, where live human raters review search results pages and rate them based on criteria such as relevance, usefulness, and trust in the source. Data from these quality raters can be used to directly influence page rankings.

Consideration of Brand Size:
– Popular and well-known brands are prioritized in rankings. Google uses various methods to identify and rank brands, including brand size, which is determined not only by the site itself but also by mentions of the site on the internet (even without links).

 

Additional Important Points

Date Significance:
– Google actively associates dates with content using various methods: bylineDate (the date specified on the page), syntacticDate (date extracted from the URL or title), and semanticDate (date derived from the page’s content).

Original Content and Keywords:
– Brief content is evaluated for originality, impacting its ranking. Page titles should match user queries, which remains a crucial factor.

Font Size:
– Google tracks the average weighted font size of terms in documents and links, which also influences ranking.

Home Page PageRank:
– Every document is assigned the PageRank of the home page. It is likely that PageRank and siteAuthority are used as proxies for new pages until their own PageRank is calculated.

Small Sites Demotion:
– Google has a specific flag indicating that a site is a “small personal site.” While definitions for such sites are not provided, Google can easily boost or demote their ranking.

Indexing Level Influences Link Value:
– A metric called sourceType shows the connection between the page’s indexing location and its value. The Google index is divided into levels: the most important, regularly updated, and accessible content is stored in flash memory; less important content is stored on solid-state drives, and infrequently updated content is stored on regular hard drives. The higher the level, the more valuable the link. Pages considered “fresh” are also seen as higher quality, partially explaining why high-ranking and news pages yield better ranking results.

Demotion in Google’s Ranking Algorithms

Demotion refers to the lowering of web page positions in search results due to certain factors that negatively impact their quality or relevance. The data leak revealed that Google employs various algorithmic mechanisms for demotion. Here are some of them:

– Anchor Mismatch: When a link does not match the target site it refers to, the site is demoted in the rankings.

– SERP Demotion: A signal indicating potential user dissatisfaction with a page, likely measured by clicks.

– Nav Demotion: Applied to pages that demonstrate poor navigation or a bad user experience.

– Exact Match Domains Demotion: A feature for demoting exact match domains (e.g., buy-cheap-shoes.com) if they do not provide quality content.

– Product Review Demotion: Though specifics are unclear, it is likely related to the recent product reviews update in 2023.

– Location Demotion: Indicates that “global” pages might be demoted in the rankings, suggesting Google aims to associate pages with locations and rank them accordingly.

– Porn Demotion: Demotion for displaying pornographic content.

– Other Link Demotions: Demotions due to various link-related issues.

 

Ranking System Architecture

Google’s internal system architecture, based on their internal names, shows the functionality and interconnections of various systems.

Crawling:
– Trawler:
 The web crawling system, which has a scan queue, reflects the crawl rate, and understands how often pages are visited.

Indexing:
– Alexandria:
 The main indexing system.
 SegIndexer: A system that places documents in tiers within the index.
– TeraGoogle: A secondary indexing system for long-term document storage.

Rendering:
– HtmlrenderWebkitHeadless:
 The rendering system for JavaScript pages.

Processing:
– LinkExtractor:
 Extracts links from pages.
– WebMirror: Manages canonicalization and duplication.

Ranking:
 Mustang: The main system for evaluating, ranking, and servicing sites.
– Ascorer: The core ranking algorithm.
– NavBoost: A re-ranking system based on click logs and user behavior.
– FreshnessTwiddler: Ranks documents based on their freshness.
– WebChooserScorer: Determines object titles used in snippet scoring.

Serving:
– Google Web Server (GWS): 
The server that the Google frontend interacts with, which retrieves data to display to users.
– SuperRoot: The brain of Google Search, sending messages to Google servers and managing the post-processing system for re-ranking and presenting results.
– SnippetBrain: Generates snippets for search results.
– Glue: Combines universal results considering user behavior.
– Cookbook: Generates signals for the system.

 

What are Twiddlers?

Twiddlers are re-ranking functions that activate after the primary ranking algorithm, Ascorer, has executed. Twiddlers can adjust the information retrieval score of a document or change its ranking, and they can also impose certain category-specific restrictions.

It is presumed that any function with the suffix Boost operates using the Twiddler framework. Here are some Boosts described in the documentation:

– NavBoost: Adjusts rankings based on user navigation and behavior data.
– QualityBoost: Enhances the ranking of higher-quality content.
– RealTimeBoost: Modifies rankings based on real-time data and events.
– WebImageBoost: Alters rankings by considering the relevance and quality of images on web pages.

Source: https://www.blackhatworld.com/seo/google-documentation-leak-14-000-ranking-factors-and-new-seo-revelations.1606240/

Got HVAC Leads?

Find out how many homeowner requests for HVAC service are available in your area.

Get a Free SEO Audit and $500 credit toward Google Ads.

Call Now 888-402-8997

By clicking the button above, you consent to receiving calls and emails from HVAC Leads. Calls may be connected using automated technology.