technical seo audit guide header image

Click to Hide Contents
Table of Contents
What Is Technical SEO?
Why Technical SEO Is Important?
Tools and Resources
Technical SEO Checklist


You still can get great results without technical SEO (assuming your on-page and off-page optimization are perfect). However, with proper technical optimization, you’ll be able to utilize 100% of the available resources, which potentially may be that little edge that you need to outrank your competitors!

Oleg DonetsSEO Expert


What Is Technical SEO?

My Definition: Technical search engine optimization is a set of activities that focus on optimizing back-end infrastructure of a website for major search engines to help maximize ongoing on-page and off-page SEO efforts.

In simple words, technical SEO is an indispensable part of your on-page and off-page optimization strategy without which neither of these two will be fully utilized.


Why Technical SEO Is Important?

Having technical SEO right is a significant edge over your competitors who most likely concentrate on more popular aspects of SEO (like link building) and totally ignore its technical aspect.

It is a big mistake for them but a huge opportunity for you to dominate the search engines in your desired markets.

Internationally known motivational public speaker, sales coach and self-development author, Brian Tracy, calls it the “Winning Edge” concept. He explains it the following way:

If you become just a little bit better in certain critical areas of selling, it can translate into enormous increases in sales. (source)

Usually I give an analogy of a racing car here but since I already mentioned Brian Tracy, I will bring his analogy – horse racing 🙂

It is similar to horse racing where out of ten horses the first one wins the race by a nose.

  • Does that mean that winning horse ten times faster than a horse that comes a fraction of a second later?
  • Does that mean that the winning horse twenty five or even fifty percent as fast as the second one?

Of course not. The winning horse is just a nose faster. However, that translates into 10x the reward money.

The same exact rule applies in SEO world. I’d say 95% of all the SEOs out there don’t even pay attention to the technical aspect.

This allows you to come in with your “Winning Edge” and outrank your competitors in a blink of an eye (as long as you also implement the other two essential SEO components of a successful SEO campaign – on-page and off-page optimization).


Tools and Resources

Below are all the tools and resources my team and I use on daily basis to conduct thorough technical SEO audits.


Technical SEO Checklist

Before we get to the checklist itself, I want to briefly talk about why you need to use a checklist while conducting your technical SEO audit.

Why do you need a technical SEO checklist?

Having a well-organized checklist will allow you to:

  • Effectively and efficiently audit an existing website to identify critical technical issues.
  • Build a brand new site and prevent those same issues from happening at the very beginning.
  • Significantly speed up the entire technical audit process.
  • Create guidance for your web designer on what to do and what not to do while designing a new website.


1. Accessibility

What is Accessibility?

My Definition: Accessibility is an ability of a given website or web application to be accessed by both real people (users) and bots (crawlers). In this section we’re going to discuss only the latest – accessibility of your web property by crawler bots.

Why Accessibility is important?

Even though there are two accessibility aspects that you need to take care of: 1st accessibility of search engine crawlers and 2nd accessibility of human visitors, in this guide I’m going to cover the first option – the crawlers.

As far as technical aspect of search engine optimization is concerned, everything starts from accessibility of your website or app. If search engine crawlers are unable to even access your web property from one reason or another, there is no point even talking about ranking it.

It is waste of time because it is impossible to rank something that is not even accessible to search engines. Instead, you first need to make sure your website is easily accessible to search engines in which you want your site to rank.

To do that, you need to make sure you go through this checklist containing critical accessibility aspects of your web property.


1.1 Robots.txt File

What is robots.txt?

It is a plain notepad file that is uploaded to your website host’s root directory. This file contains certain directives called “Robots Exclusion Protocol” that serve as rules to deny access of certain crawler bot/s to certain area/s of your site. For  more info about robots.txt file and how to use it, read this guide.

Actions to Take:

  • Check if a site has robots.txt file by typing in the browser. You’ll see similar results like in the image below if you have robots.txt file. If not, create one by simply creating a notepad file and uploading it to root directory of your site. You will be able to figure out the default directives that you need to place into your robots.txt file by reading the guide in provided link above.

  • Check if the entire site or its certain parts are accidentally blocked in robots.txt file. If something is blocked, ask yourself if there is a reason for that. In my example (see below) there is only a single page blocked – default login page (i believe that directive was generated by a security plugin; I may remove it as according to this post by Joost de Valk, it is no longer a useful practice). So it is totally fine because the rest of the site is accessible. The problems start when you have some portions or an entire site accidentally blocked from crawlers. And yes, this happens quite frequently lol.

  • Make sure CSS and JS files are not blocked in robots.txt file. To understand why, read this post by Joost de Valk.


1.2 Noindex Tag

What is noindex tag?

It is an indication (in a form of a little piece of code that’s inserted in HTML of your page/s) for search engines used to prevent them from including a certain page/s of your site in their index.

Very important to understand that in order for it to work effectively, that page must not be blocked in robots.txt file. Because search engines, in particular Google, won’t be able to crawl that page if it is blocked in robots.txt. Therefore, they won’t see your noindex meta tag in the page’s HTML code, which may cause that page still to appear in search results due to inbound links from your own or other sites that link to that page.

Actions to Take:

  • Check pages for meta robots ‘noindex’ tag. If some page/s have ‘noindex’ meta tag added, ask yourself if there is a reason for that.


1.3 Redirects

What are redirects?

This one is quite obvious. Redirect is a response code that’s executed on a server level that tells both browsers (human visitor) and search engines (bot crawler visitor) that a requested URL was moved to another URL.

There are multiple redirects in a series of 300 redirect types. However, the most frequently used as well as the most important ones are 301 (permanent redirect) and 302 (temporary redirect). Here is a great guide by Rand Fishkin on how to use redirects.

Actions to Take:

  • Make sure that all the permanent redirects use proper 301 permanent redirect feature and not 302, 303, 304, etc.
  • If there are temporary pages that have redirects applied to them, make sure proper 302 temporary redirect feature is used and not 307s, meta refresh, and JavaScript redirects.
  • Make sure 301 redirects point directly to the final URL instead of using redirect chains. Redirect chains may significantly diminish the amount of link equity associated with the final URL. Here is a nice case study on how redirect chains affected organic traffic of that was accidentally discovered by their SEO team.


1.4 Use of JavaScript

What is JavaScript?

I don’t want to get into this here. You can read about what JavaScript is and how it works here.

Actions to Take:

  • Make sure content on your website is in a plain HTML format that is crawlable by search engines and not being served in JavaScript.
  • Make sure links on your website are in a plain ‘href’ HTML format that are crawlable by search engines and not being served in JavaScript.


1.5 Use of iFrames

What is iFrame?

I don’t want to get into this here. You can read about what iFrame is and how it works here.

Actions to Take:

  • Make sure content on your website is in a plain HTML format that is crawlable by search engines and not being served via iFrames.


1.6 Use of Flash

What is Flash?

I don’t want to get into this here. You can read about what Flash is and how it works here.

Actions to Take:

  • Make sure that major portions of your website (like navigations, button, links, etc.) are implemented in a plain HTML format that is crawlable by search engines and not in Flash technology.


2. Crawlability

What is Crawlability?

My Definition: Crawlability is an ability of a website/application’s content to be crawled by search engines’ automated software commonly known as “web crawlers”. Crawling is a process of data gathering through discovery of publicly available pages on the web that is performed by web crawlers. Crawling is responsible for collection of the data on the web and transferring it back to a database for further organization of that data known as “index”.

Why Crawlability is important?

Crawling is the entry point for sites into Google’s search results. Efficient crawling of a website helps with its indexing in Google Search. (source)

The main reason why Crawlability is so important is because after crawling process, search engines use collected data to form their Index.

If you want to learn more about what index is and how it works, check out this article by Google. Having your content included in the index means being eligible to get that content ranked in organic SERPs.

If your website/app has crawlability issues, then there is a chance web crawlers won’t be able to crawl your content in full (or at all). If this happens, some or all of your content won’t be included in search engines’ index.

That, in turn, means that you won’t be able to rank that content in search engines. In other words, your first and most important goal should be – to make it to the index.

What affects crawlability?

Search engines invest a LOT of money in all the resources involved in crawling the web. With the millions of new pages that are added to the web every single day, they become more and more picky with what to crawl and what not to.

They assign a so-called crawl budget to every website on the web according to how easily and efficiently it can be crawled.

Therefore, you need to make sure you don’t waste web crawlers’ resources by taking care of all the factors that affect crawlability. Otherwise it may negatively affect your site’s crawl budget.

So what factors actually affect crawlability?

There are 10 main factors that influence crawlability. Below I provided a checklist of all those factors. Make sure you take care of them to ensure high crawling rate.


2.1 Site Structure

What is site structure?

Site structure is a framework or architecture of your website’s content. In simple words, site structure is how the content is organized on your site.

Well-organized site structure will not only create positive experience for your visitors but also help you site rank better.

On the other hand, poorly-organized structure will hinder your ranking significantly and won’t facilitate user experience.

One of the main and most important priorities of yours, especially when designing a new/redesigning an existing website, should always be an effort to create a well-organized site structure from the very beginning.

If this part is done properly along with basic on-page optimization, I won’t afraid stating that this alone may result in first page ranking for low competition keywords.

Give it some love with building a few quality links and you’ll dominate the SERPs.

Don’t get me wrong, well-organized site structure alone won’t help you hit first page in many competitive markets. It is obviously not enough.

My point is that proper site structure is imperative for your site’s future ranking and user satisfaction. So make sure you get it right.

Actions to Take:

  • Make sure you build well-organized, intuitive and easily crawlable content structure of your site.

Here is an example of poorly-organized content structure:

poorly-organized website straucture

With such a confusing content structure, not only your visitors will be frustrated but also web crawlers will have a hard time to understand the hierarchy of your content as well as crawl it efficiently.

Here is an example of well-organized website content structure:

well-organized website straucture

With this intuitive and well-organized content structure, you’ll please not only your visitors but also web crawlers.

Crawlers will easily understand how your site architecture is designed and what topical content hierarchy is implemented (also known as “siloing”).

This way, they will be able to more efficiently crawl your site (which means they will come more often) and segment different parts of your site into topically relevant segments (which will significantly help you ranking your pages for relevant keywords of each topic/category).


2.2 HTML Sitemap

What is HTML sitemap?

HTML sitemap is a plain HTML page like other regular pages on your site. Unlike other pages, this page is created with a single goal in mind – to list links to all the existing pages of your site.

In other words, this page serves as an index of all the pages that exist on your site. Having HTML sitemap on your site will not only help human visitors to navigate your site but also facilitate crawling for web spiders.

Here is a snapshot from my site’s sitemap page:

HTML sitemap

Actions to Take:

  • Make sure you link to your HTML sitemap pages from the footer so that web crawlers will be easily able to crawl your site regardless what page of your site they arrived from. Here is how it looks like on my site:


2.3 XML Sitemap

What is XML sitemap?

Unlike HTML sitemap, XML sitemap is solely created for web crawlers.  It is a plain XML file that lists all the pages (URLs) of a website that a webmaster wants to be crawled.

XML sitemap provides search crawlers with a valuable information about each URL.

For instance, it informs how frequently URLs are likely to change, when they were last updated, and the hierarchical relation between each other.

This allows search crawlers to spider a website more effectively and efficiently. To learn more about XML sitemaps, check out this resource.


Actions to Take:

  • Make sure you are using proper sitemap protocol format for your XML sitemap. You can read about proper sitemap format here. There are multiple tools that can help you create yor sitemap. Here are my favorite two: for WordPress based sites – Yoast SEO plugin (here is the guide on how to create the sitemap) and for any site running on any platform – Screaming Frog tool (here is the guide on how to create the sitemap).
  • Make sure you add your XML sitemap/s to your robots.txt file to inform the search engines of your sitemap’s existence. It is as simple as just listing your sitemap’s URL (or URLs if you have several sitemaps) in your robots.txt file. Check out my robots.txt file below:

  • Make sure to submit your XML sitemap to Google Search Console. I highly suggest you to do that because It will notify Google about all the URLs that you included in the sitemap as well as give you some useful crawl stats. I recorded a short video showing you how to submit your XML sitemap to Google Search Console.

If you don’t have GSC account or for some reason you don’t want to create one, you can submit your sitemap via HTTP request. To do that you simply need to replace this part “PLACE YOUR SITEMAP URL HERE” of the listed below HTTP request with your sitemap URL and visit that URL through the browser. YOUR SITEMAP URL HERE

So, if my sitemap URL is, then the HTTP request will look like this:

You then copy that entire URL, paste it into the browser like this:

After you hit “Enter”, this is the message that you’ll receive:


2.4 Internal Linking

What is internal linking?

Internal linking or interlinking is a process of connecting relevant content on your site by linking to it from other related content.

Interlinking between pages helps with opening up additional pathways for web crawlers to follow thereby facilitating crawling. There are several ways you can leverage internal linking to improve your crawling rate.

Actions to Take:

  • Make sure you have an HTML sitemap page that lists links to every page of your site and is accessible from the footer.
  • Make sure you link from your content (articles, guides, posts, etc.) to other relevant content.
  • Make sure your navigation menus link to important pages of your site.
  • Make sure you utilize breadcrumb navigation throughout your site.


2.5 Duplicate Content

What is duplicate content?

This one is quite obvious for most of you yet some people may not be aware of it.

So duplicate content is considered to be any piece of content that resides on a unique URL and either a very similar or an exact copy of another piece of content that resides on different unique URL.

Duplicate content may happen intentionally and not intentionally. It also may occur on your site and outside your site. For crawling purposes, you need to take care of the duplicate content issue that occurs on your site.

However, it doesn’t mean that you don’t need to take care of this issue outside your site. You definitely need to take care of it as well. I covered it in great details in this guide.

If this duplicate issue is not resolved, after a while web crawlers will reduce their frequency visiting your site or just lower your site’s crawl budget.

It is because they will realize that your site contains duplicate content which, in turn, wastes resources. Therefore, they would prefer to visit other sites that serve unique content over yours.

Actions to Take:

  • Make sure you identify and fix duplicate content issue on your website. You can see detailed instructions on how to identify and fix duplicate content in the guide that I mentioned above.


2.6 Site Speed

What is site speed?

This one may sound like another obvious one but a lot of SEOs interpret page load time in different ways. Indeed there are few metrics based on which you can measure page load time.

  • Document Complete – this page load time metric measures a time frame required to load a page for a visitor to be able to view it. All the content on a page at this point would not be necessarily loaded and visitor won’t be able to fully interact with it.
  • Fully Rendered – this page load time metric measures a time frame required to fully load all the elements of a page including buttons, images, videos, scripts, etc.
  • Time to First Byte (TTFB) – this page load time metric measures a time frame required to visitor’s browser to receive the first byte of a response from your server after requesting a particular URL.

Based on this study, there is a direct correlation between lower TTFB metric and higher search engine rankings, which is not directly relate to our discussion about crawlability.

However, if you look at it from another perspective, higher ranking by itself is determined by multitude of factors. If one of them is Time to First Bite metric, then site speed or page load time should be taken care of.

In fact, here is one of the responses that John Mueller (Google’s Webmaster Trends Analyst) gave in a Google forum discussion about unreachable URLs of a certain website:

We’re seeing an extremely high response-time for requests made to your site (at times, over 2 seconds to fetch a single URL). This has resulted in us severely limiting the number of URLs we’ll crawl from your site, and you’re seeing that in Fetch as Google as well. My recommendation would be to make sure that your server is fast & responsive across the board. As our systems see a reduced response-time, they’ll automatically ramp crawling back up (which gives you more room to use Fetch as Google too). (John Mueller, Google, source)

Actions to Take:

  • Make sure you optimize your images for the web before uploading them.
  • Make sure you minify CSS, JavaScript, and HTML (for wordpress sites you can use caching plugin like this one  to do that).
  • Make sure you leverage browser caching (for wordpress sites you can use caching plugin like this one  to do that).
  • Make sure you use content distribution network (CDN)
  • Make sure you enable GZIP compression of CSS, JavaScript, HTML files.
  • Make sure you host your site on a quick server.

Here are additional resources on site speed improvement and its importance that I highly recommend to read:


2.7 Blocked Content

What is blocked content?

Blocked content is a content access to which has been blocked (intentionally or unintentionally) from web crawlers through the following three ways:

  • Robots.txt file
  • Password protection
  • Robots meta tag

If you block content intentionally or unintentionally through any of these methods, be aware that web crawlers won’t be able to access that content.

Password protection: content that’s blocked by a password so that only registered visitors with a password can access it, will prevent search bots from reaching that content.

It is totally fine if this is what you intended to do. Just make sure you mark all the pointing links to this content with ‘nofollow’ attribute in order not to waste crawl budget.

However, if you initially didn’t intend to block that content from crawlers but only from unregistered users, search engine crawlers won’t be able to crawl it due to password protection.

Robots meta tag: there are two robots meta tag directives: “noindex” and “nofollow”.

“noindex” tag looks like this: <meta name=”robots” content=”noindex”>

If this tag is applied to a certain page, crawlers will skip that page from crawling because it indicates that a content on that page should not be indexed.

“nofollow”: tag looks like this: <meta name=”robots” content=”nofollow”>

If this tag is applied to a certain page, crawlers will crawl it but they will not be able to crawl any link on that page which will block the access to other pages that this page is linking to.

Robots.txt file: the same issue arises when you block certain content in your robots.txt file. I covered it extensively in this guide in one of the previous sections. So you can check it out there.

Actions to Take:

  • Make sure that content that you want to be crawled is not blocked by password protected access.
  • Make sure that content that you want to be crawled is not blocked by robots.txt file.
  • Make sure that content that you want to be crawled is not blocked by robots meta tag.


2.8 Crawl Errors

What are crawl errors?

Crawl errors are server-based errors. It is a general term that encompasses wider array of errors such as DNS, Server, Robots and URL errors.

Generally speaking, we can divide crawl errors onto two sections:

  • Site errors
  • URL errors

Here is how Google Search Console displays crawl errors:

crawl errors

Here is a great guide as well as this one that thoroughly explain each type of error as well as provide instructions on how to fix them.

Any of the listed above errors may prevent web crawlers from crawling your content and subsequently prevent it from appearing in SERPs if not taken care of on timely manner.

Therefore, it is imperative to consistently monitor your website for any sort of crawl errors. I highly recommend to use Google Search Console for that purpose.

Actions to Take:

  • Make sure you add and verify your website via Google Search Console for consistent monitoring.
  • Make sure you take care of all the errors right away and fix them according to mentioned guides.


2.9 Low Quality Content

What is low quality content?

Low quality content refers to a thin, poorly written or spam content.

If web crawlers identify such content on your site, they may significantly reduce frequency as well as overall crawl budget for your site.

Crawling low quality content is waste of resources for search engine bots and so they will respond appropriately to such waste.

Actions to Take:

  • Make sure all your content is written in proper, grammatically correct manner.
  • Make sure you don’t host thin content on your site.
  • Make sure you don’t engage with content spam activities like spun text on your site.


2.10 Broken Links

What are broken links?

Obvious, broken links are links that lead user/crawler to nowhere. In other words, if a final destination of a link doesn’t return 200 HTTP status code, that link can be considered as a broken or dead link.

Having broken links on your website is definitely frustrating for human visitors. But it also a wasteful activity for web crawlers. If you have few broken links here and there, it is fine.

However, if your site contains bunch of them, search engine spiders will waste resources on crawling them. Remember, any crawling activity that wastes resources of web crawlers, may negatively affect your site’s crawl budget!

So take care of all your broken links for both better user experience and higher crawling rate.

Actions to Take:

  • Make sure you fix all or majority of your broken links by 301 redirecting them to relevant live pages. You can easily see your broken links in Google Search Console dashboard:

broken links

On the more detailed report you’ll see “Soft 404” errors as well as “404” (not found) errors.


3. Indexability

What is Indexability?

My Definition: Indexability is an ability of a given website or web application’s content to be included in search engines’ index. Often it refers to a number of pages/URLs of that site/application that are indexed in search engines.

Even though indexation issues usually occur on larger sites, it is equally important for smaller sites’ owners to be aware of this problem and know how to identify and fix it because it may happen on any site.

Why Indexability is important?

Indexability is the third and last phase of the technical SEO process that gives a green light for a website to begin ranking process.

It is critically important to make sure all the three phases (Accessibility, Crawlability and Indexability) are taken care of properly at this point.

Think about this, you have an awesome, useful content about a certain subject matter on your site and you want others to discover it through search engines.

However, (without even having a slightest idea) your website’s back-end infrastructure is happened to be so much convoluted, complex or it simply lacks some essential elements, so it makes your content literally undiscoverable.

In this case, regardless how good your on-page and off-page optimization is, it won’t help you with discoverability of your content simply because it doesn’t exist in search engines’ index.

There are multiple factors that are responsible for Indexation. The most important ones are Crawlability, Accessibility and Quality (website quality as a whole including content and backlinks profile) issues.

Now let’s get to the actionable items for assessment of your web property’s indexation status.


3.1 Indexation Rate

What is indexation rate?

Indexation rate is a percentage of your site’s existing content that is actually included (indexed) in search engines’ index.

In other words, say you have 10 pages on your site. If 7 of them are indexed, then your indexation ratio is o.7 (7/10=0.7). Then your rate would be 70% (0.7*100=70). Very basic math.

Unfortunately, the amount of pages that you think a website contains is rarely equal to actual number of the pages that exist in reality (unless you’re using some ancient static website platform that solely allows you to build static html pages).

With today’s popular Content Management Systems (CMS), it’s almost impossible to control pages that are being created automatically by those CMSs.

In addition, there are plenty of custom website platforms that are created by people who understand little (if at all) in SEO and its fundamentals.

With all that in place, it shouldn’t be surprising why we estimate X number of pages but in reality it ends up being completely different number.

So, you need to make sure you get accurate number.


3.2 Finding Out Your Site’s Indexation Rate

  • Step 1

Find out approximate amount of pages on your site which you want to be indexed.

Whether it’s your own or your client’s website, you have to have some sort of idea of how many pages there are on the site in order to be able to determine indexation rate.

Obviously you don’t want to count those pages that you intentionally blocked in robots.txt file or by robots meta tag. Otherwise they will skew your rate.

Instead, you want to find out amount of those pages that you DO want to be indexed in search engines and check indexation rate against that number.

Once you know that number, you can then check how many of them are indexed. It all sounds simple but what if you have a huge site with thousands of pages?

How would you efficiently find out that number?

I recommend using a site crawler software like Screaming Frog. With its help you will be able to check number of pages on your website in a few minutes.

This tool allows you to crawl up to 500 URLs for free. For sites with above 500 URLs you’ll need to purchase a license.

I recorded a short video showing you how to conduct a crawl with Screaming Frog tool to check the amount of existing pages on your site.

  • Step 2

Find out approximate amount of pages on your site which are actually indexed in search engines.

You can check it out quickly in Google by using advanced search operator command I recorded a short video showing you how to use site: Google’s search operator to check indexed pages.

Unfortunately, this method is not always accurate yet it gives a decent idea of how many pages are indexed.

To understand better those inaccuracies, I suggest you to read this in-depth case study about Google’s indexation issues written by Patrick Hathaway from URL Profiler (a robust tool that my team and I are now using for all the technical SEO audits).

You can also check indexed pages through Google Search Console. On the left side navigation find “Google Index” option.

Under it select “Index Status” option.

Then go to “Advanced” tab. You’ll be able to see there a nice data about your site’s indexability.

  • Step 3

After all that, check the number of pages that you found out through the Screaming Frog crawl (don’t forget to deduct all the pages that you intentionally disallowed and those that don’t return ‘200 ok’ HTTP response) and results found by using Google’s command

Do basic math (how I showed above) to find out your indexation rate. If you see that your rate is lower than 90%, then you have a problem with Indexation.

You most likely have some problem with one or more items that I outlined in this in-depth technical SEO audit guide. So go over each them and make sure you take care of every item I specified.


Final Thoughts

You see, there is a LOT of things to take care of on your web site/app from technical side. Following this technical SEO audit checklist will set you apart from your competitors.

Implementing listed above practices will automatically get you to the next level above majority of the SEOs out there simply because very little percentage actually takes care of technical SEO.

Once you combine the power of all the three aspect of successful  SEO strategy (Technical SEO, On-Page SEO and Off- Page SEO), you’ll dominate search engines. Trust me.

Here is my personal request to you. If the information in this guide was helpful to you, please share it with an SEO community on any Facebook group that you are member of.

Also, if you have any questions regarding this guide or you want to suggest an item that you think I’ve missed, feel free to comment below.