SEO is full of concepts and buzzwords - a cacophony of case studies and best practices belted out like a children's holiday choir.
What does it actually look like in the wild? What patterns could we see when we study it at scale? What odd blend of curiosity and masochism would volunteer months of their time to study it?
Hi. It's me. I'm the researcher.
Every year, dozens of industry experts come together to analyze the state of the web.
It's a massive undertaking that requires analyzing 83TB of data to craft 21 chapters.
This year, I led an incredible team to study SEO across 16.9 million pages, including home and inner website pages, using the raw stats and trends of the HTTP Archive, Lighthouse, Crux data, and a dash of miracle.
What is the Web Almanac?
The Web Almanac is an annual report that provides a deep dive into the current state of the web. It's like a yearly checkup for the internet.
The project aims to help web developers, designers, and other industry professionals better understand the current trends, technologies, and best practices in web development.
On a more human note, our modern lives are digital first. Internet access is necessary for large and critical aspects of our lives.
The United Nations has even made the case for connectivity as a human right. This effort is the only project to document our evolving digital landscape.
What did we learn?
In short? A lot.
This year's edition of the SEO chapter is the most in-depth and explored dimensions and metrics that weren't available in previous editions.
Seventy pages can be a bit overwhelming to dive into, so here are five of my favorite bite-sized takeaways.
1. More robots(.txt)
We've been using robots.txt for decades. However, the protocol wasn't formally standardized until September 2022 with RFC 9390.
This formalization has led to stricter enforcement of technical standards and improved adherence to the protocol.
This year we saw an increase in successful robots.txt responses and a decrease in errors, indicating improved implementation of the robots.txt protocol.
We saw successful requests for robots.txt files on 83.9% when requests were made as mobile and 83.5% as desktop, up from 82.4% and 81.5% in 2022.
We're also getting more specific in our robots.txt declarations.
SEO tools, like AhrefsBot and MJ12Bot are increasingly being named in robots.txt files.
Honestly, this makes sense to me. If you aren't actively using the tool on your site, block it. No need to make competitor research that easy at the cost of your server resources.
Emerging AI crawlers, like GPTBot, are also starting to appear in robots.txt files, reflecting the growing use of these technologies. It will be interesting to see how this file changes as proposed controls for AI crawlers are adopted.
2. Robots directives: both specific and oddly nonsensical
This was the first year that we included meta robots directives in the Web Almanac. Robots directives are a page-specific approach to controlling how individual HTML pages should be indexed and served, distinct from robots.txt files.
You can use 24 valid robots directive rules to control the indexing and serving of content, which is why it's so surprising that the two most used rules in 2024 were follow and index. These aren't valid directives and are ignored by Googlebot.
Analyzing directives was tricky due to the sheer number of combinations. Each rule is paired with a name. The name value of the robots meta tag specifies to which crawler(s) the rule applies. We had 14 rules and 9 bot names in our data set.
We found that SEOs are being very intentional in curating rules by bot name.
The noarchive rule is meaningless to most bots and applied to only 1% of pages when the generic robots name was used. Bingbot was named for the noarchive rule a shocking 36% of the time. This is likely due to the tag's ability to keep content out of Bing chat answers. Googlebot-news was most likely to be named for a noindex rule.
3. Less canonical confusion
Duplicate content happens. We've gotten new insights into the 40+ signals that Google evaluates between duplicate pages. All the more reason to sue canonical!
This year saw canonical usage increase to 65% overall, but more noteworthy, it was the first year that the tag was applied equally on desktop and mobile. The majority of sites implement canonical tags in the raw HTML and rendered HTML, with less than 1% having a mismatch between the two. There's always room for improvement. One in 50 mobile pages had its canonical tag changed during rendering. These tags are extremely useful for SEOs - unless they confuse Google and get ignored.
4. Dynamic rendering is on the decline
Back at Google I/O 2018, members of the Search team shared that Googlebot had trouble rendering JavaScript. The next year, the team announced evergreen Googlebot - a much more capable crawler running on the latest Chromium rendering engine. They also shared that dynamic rendering wasn't recommended anymore. (Twice the tech stack, none of the fun.)
Despite the quick change, sites clung to their dynamic rendering. In 2022, 13% of pages had the vary HTTP response header enables different content to be served based on the requesting user agent.
This year, we finally saw the dastardly header plummet 92% to 1% of desktop and 2% of mobile pages. Good riddance.
5. More sites are less bad
Remember 2020? The year of Coronavirus lockdowns, "you're on mute" Zoom calls, and realizing your website's performance was awful.
When Core Web Vitals (CWV) was released in 2020, just 20% of sites passed the user-centric performance assessments. However, the percentage of sites passing CWV has increased each year.
In 2024, 48% of mobile sites and 54% of desktop sites passed the assessment! This is an exceptional accomplishment when we consider changes to Interaction to Next Paint replacing First Input Delay as the interactivity metric. Largest Contentful Paint had the lowest pass rate.
One of my favorite aspects of the Web Almanac is the ability to hop to a different chapter to explore topics in greater detail. This year's Performance chapter immaculately breaks down metrics by sub-parts and optimizations for a satisfying look at the state of web performance.
Contributing to the Web Almanac
The Web Almanac is a volunteer project. When the year's project is announced, that's your time to look through open issues in the Github repo and raise your hand for where you'd like to contribute. Each chapter has multiple roles that need to be filled and I've asked my team to share their experiences.
1. Authors
Subject matter experts and lead the content direction for each chapter. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report.
Being an author is a big commitment, especially on larger or more prominent chapters. (If you're new and interested in contributing, look for the "good first issue" tag.)
"Being the author of such a relevant chapter as SEO in WebAlmanac has been a life-changing experience. This is because I learned a lot (technically), but even more because of the collaborative way in which everything is done. At the end of the day, you have an immense feeling of gratitude for your participation and the impact that the team's work will have on the web community," said Mikael Araújo.
"It's rare you ever get the chance to work with such a huge data set, and it's an honour to be able to take a look at that data, slice and dice it up, and help tell a story about how the web is, how it's changing and to some extent it's health," said Dave Smart.
2. Reviewers
Subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. They are the glue that holds authors and analysts together when eyes straight crossing from screen fatigue.
"Being a reviewer is a great first step into the Web Almanac is you've not contributed before and are curious how much work is involved in creating each chapter (hint: it's a lot!). You can commit as much (or as little) time as you want. And who knows, you might get the big and want to write a chapter next year!" said Barry Pollard.
3. Analysts
Unsung chapter heroes responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly.
"It was one of the most challenging yet fun projects I have worked on while getting to know others in the industry. Realize that it's a labor of love and have fun. Don't be afraid to ask questions if you get stuck," said Chris Nichols.
4. Editors
Technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit.
"In addition to collaborating with the best minds working on the web, as an editor you get a 30,000-foot view from above of the most interesting data and studies, along with their implications. You see the positive and, at times, negative trends, as well as get a sense of the overall health of the web. It's a lot of work as an editor, and there's a ton of fact-checking and details that are involved, but it's 100 percent worth it since you're working alongside industry leaders who are truly committed to making the web better for everyone," said Michael Lewittes.
5. Translators
Technical writers who help internationalize the Almanac and make it globally accessible to the web community. It's undoubtedly cool seeing your work translated into other languages but helping make this research available to non-English audiences is an incredible skill set.
If you read this year's chapter and thought of something you didn't see explored or a new way to approach the data - consider contributing to the 2025 edition! You'll learn new skills and work with incredible experts.
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.
Add Search Engine Land to your Google News feed.
Related stories
New on Search Engine Land
Bing Search gets faster, more accurate and efficient through SLM models and TensorRT-LLM
LinkedIn tests personal post boost option
Google AI Overviews rising in B2B technology, healthcare sectors
5 PPC measurement initiatives to set yourself up for 2025 success
Snap overhauls creator monetization, raises performance bar
About the author
Contributor
Jamie Indigo
Jamie Indigo isn't a robot, but speaks bot. As a technical SEO consultant, they study how search engines crawl, render, and index the web. When not working, Jamie likes horror movies, graphic novels, and Dungeons & Dragons.