How to check response code of each URL in the Sitemap with Google Sheets

Sitemaps without any doubt, are an important part of any SEO campaign. They help Google understand what your website has to offer and bring better visibility for your website.

As your website grows, it becomes hard to keep track on which pages are included in your sitemap file. Everyone generates a sitemap using an automated extension or a 3rd party app. When everything is done automatically, chances of getting shit generated are high (by shit I mean 404s as those are the pages you no longer need). There are many CMS and plugins which are not as sophisticated as you will want them to be and thus they produce shit sometimes.

4xx and 5xx are not the things you want to submit to Google. It’s like asking out your girl with an empty box of ring. It looks funny but the results are fierce.  Although 404s does not affect the ranking of your site but if you throw shit at Google, Google will throw shit at you too.

I recently encountered a situation like that where our CMS was publishing old 404 pages into sitemap and they were regularly updated to the search console. There were numerous pages which had  403, 501 etc. and search console showed a warning that xx pages are not reachable but only showed a few samples. I saw a lot of pages in the crawl errors section which were old and not a part of the sitemap so it was difficult to figure out what exact pages submitted in sitemap are not working from the search console.

I used the method below to figure that out.

Let’s get started.

Start with an empty Google sheet and name it whatever you want.

Enter the below formula in Cell A2 and replace https://www.example.com/sitemap.xml with your actual sitemap URL

=IMPORTXML("https://www.example.com/sitemap.xml", "//*[local-name() ='url']/*[local-name() ='loc']")

Once you have done this, all the URLs in your sitemap.xml file should show up in column A.

Now we need to get the response code for those URLs. By default, you cannot directly get with the formulas in Google Sheets. For this purpose, we will make a formula using the scripts editor.

Go to Tools > Script Editor and follow the steps mentioned below.

  • Name the script HTTPResponse
  • Copy the script mentioned below and replace it with the default code.
function HTTPResponse( uri )
{
 var response_code ;
try {
 response_code = UrlFetchApp .fetch( uri ) .getResponseCode() .toString() ;
 }
catch( error ) {
 response_code = error .toString() .match( / returned code (ddd)./ )[1] ;
 }
finally {
 return response_code ;
 }
  • Hit Save (Don’t change anything else)

Now your HTTPResponse formula is ready to work. Use that formula in cell B2 and you will see the results instantly.

Comment below if you face any issues using this method or if you have an easier method to accomplish the above-mentioned objective.

Reply