Why Screaming Frog Is My Best Friend (and Should Be Yours Too)

Screaming Frog has been my go-to resource prior to planning any sort of SEO strategy for about 2 years now. Unfortunately, its ugly and somewhat unintuitive design has been a turn off for many people I know, which prevents them from realizing its true value. I’m going to go step-by-step through the process of using the tool, while detailing situations in which you would need it. This post will be easier to follow if you download the latest version of the tool, located here:

Getting Started

At the top of the Screaming Frog (henceforth SF) window, you’ll see a form that says Enter URL to spider. When you enter the URL, make sure you put the correct version (www vs non-www). If you just copy the homepage URL of the site you’ll be fine.

The unlicensed version of SF will crawl up to 500 URI, which means everything from HTML pages to images to CSS and JS scripts and more (a URI is basically an item on a website, including URLs). If you only want to see HTML pages, click the box that says Filter and select HTML. Note that this won’t allow you to crawl more URI—you’ll just see fewer items.

The Main Crawl Window

The bulk of the SF window is made up of crawl information, which includes everything you could want to know about the site in question and more, including:

  • Address – the location of the URI
  • Content – what the URI is (HTML file, CSS file, image, etc.)
  • Status Code – the code the URI returns. Useful for finding information about redirects, broken links, etc.
  • Status – what the Status Code means (404= Not Found, 301=Moved Permanently, 200=OK, etc.)
  • Title 1 – the text of the <title> tag on your page
  • Title 1 Length – length, in characters, of the <title> tag
  • Title 1 Pixel Width – length, in pixels, of the <title> tag. We’ve typically estimated <title> tags in the SERPs to cut off after around 70 characters, but estimates put it closer to 584 px. This measure could be useful in determining the ideal length of your <title> tags.

It goes on to list meta description, headings, page inlinks and outlinks, etc. They’re easy enough to figure out, so I won’t go into any more detail (though if someone wants to add them in the comments that would probably be helpful).

Above the main window you’ll see tabs that will allow you to filter by information, so if you only want to see Response Code information, for example, click this tab and get even more detailed information about this. This can be especially useful for the next topic, exporting data.

Exporting Screaming Frog Crawl Data

Exporting data from SF can make a number of tasks much, much easier. This is obviously not even close to an exhaustive list—be creative and find new ways to use the tool:

  • Updating title / meta
  • Checking for duplicate headings
  • Finding word count for all your pages

The URL Info Window

Below the main crawl window, you’ll find the URL window info. There are 5 tabs:

  • URL Info –displays relevant URL information, including status code, title and meta information, headings, and more. Mostly information you can get from the primary crawl window.
  • In Links –lists every internal page that links to the page you’ve selected, along with the anchor text, alt text, and whether it’s followed or not. This is extremely useful for finding out the location of broken links. You can export a list of all 4XX-code in links, which we’ll get to in a minute.

Right clicking anywhere in this tab will give you four options: Copy FROM URL, Copy TO URL, Open FROM URL in Browser, Open TO URL in Browser. Should be pretty obvious what each of these does.

  • Out Links –shows every page the selected page links out to, including external pages.
  • Image Info –shows all pages linking to an image, as well as the image alt text.
  • SERP Snippet – mocks up a Google SERP snippet so you can see how altering your title / meta, adding rich snippet information, etc. will affect your site’s appearance in the SERP.

Bulk Export

This is a new feature of SF 2.5 and up, and allows you to export the following data:

  • All In Links
  • No Response In Links
  • Success (2XX) In Links
  • Redirection (3XX) In Links
  • Client Error (4XX) In Links
  • Server Error (5XX) In Links
  • All Out Links
  • All Anchor Text
  • All Image Alt Text
  • Images Missing Alt Text


Newer SF versions allow for the easy creation of XML sitemaps. Just click Create XML Sitemap under the Sitemaps tab.


Basic SEO Tips Every Web Development Firm Should Follow

As important as search engines like Google are to businesses these days, it kills me to see so many web design / development firms who either don’t know what they’re doing or don’t care enough to engage in SEO best practices. Even if you bill yourself as a strictly web development firm, there are certain basic things you should know about that can kill your client’s performance in the search engines.

Two things to note before we get into it:

  • This list is by no means exhaustive, but it should cover most of the basics.
  • I’m not saying every web developer needs to become an SEO expert, but if you’re going to build websites, at least make them with your clients’ best interests at heart.

When transferring domains or changing URLs, make sure you have proper redirects in place.

I just saw this one recently. One of our clients, a vacation rental home company, switched from an old, completely awful looking website to a newer, much more modern one. The new site looks awesome, but we quickly noticed a big problem:

The old site had a ton of dynamically generated URLs, most of which came from the site’s search feature and changed based on certain amenities potential renters could choose from. As their rental index changed, certain pages got dropped and their URLs returned 404 errors. As a result, the old site had a little over 1,000 pages 404ing at any given time. In addition, the site had shifted from a .html extension to a .htm and finally to a .asp, creating new errors at every step of the way.

When the new site launched, the URL structure changed again, but this time it left behind nearly 9,000 404 errors. As of this writing we haven’t had a chance to do a deep dive into the root of the problem, but main thing we’re concerned about is, obviously, that no redirects were put in place when changing the URL structure of the site. And unfortunately, the site is built on a Microsoft server, which means no one on our team really knows anything about it (Microsoft servers handle much differently than Linux/Apache servers, using web.config files to handle redirects instead of .htaccess files).

Almost worse than having no redirects in place (which I suspect in the above example, but again I can’t be sure until I see it) is having the wrong types of redirects. We have another client who also just got a new site from a third party vendor, and this vendor used all 302 redirects for new URLs as opposed to 301s. What’s the problem here?

A 301 redirect is a permanent redirect. It tells the search engine crawlers that the URL they are currently crawling has been moved to a new URL, and if they go to that URL they will find the information they are supposed to index.

302 redirects, on the other hand, are temporary redirects, and tell search engines that the move is not permanent. As a result, when a crawler hits the page it has to decide whether to keep the old page or discard it and acknowledge the new one. If the crawler decides to keep the old page, and that page no longer exists or isn’t what you want visitors to see, you could end up losing traffic big time.

Please, please, please avoid duplicate content.

I’m a copywriter first, programmer second, and I see this one ALL. THE. TIME. There are usually two similar cases here:

  • A company partners with a manufacturer and the manufacturer provides them a website at a low cost, already populated with content. The company is sometimes able to edit this content site to match their brand and location, but otherwise most of it stays as is.
  • A web design company offers template-based websites with content already written. Again, this content can often be edited by the site owners, but unless they know a bit about SEO or work with a company who does (and if this were the case, they likely wouldn’t be in this situation in the first place), they don’t touch it.

Now my thought on duplicate content is that for many sites, especially those belonging to local businesses, is that it will neither kill you nor make you stronger. I say this based on my direct observation—I’ve seen companies all over the country rank well in their unique service areas despite having identical copy.

Now just because I don’t hate it doesn’t mean I like it. Because I don’t. At all. In fact, if I were running a web development company, I would do it like this:

The standard web package gets you four pages of copy:

  • Home page
  • About us page
  • Services page
  • Contact page

Each of these would be written with a combination of stock content and specialized information I received from you directly. Any additional pages you wanted written as part of your site launch, from resources pages to specific service pages, would be billed out at a set rate per page. Ongoing work would be billed separately from there.

Obviously the point of this is not to squeeze money out of helpless business owners who don’t know their way around the web. Quite the opposite, in fact—it’s to future-proof their site against Google and other search engines coming down harder on duplicate content than they already are.

Use widgets in as low-risk a manner as possible.

I haven’t seen any effects of this next example, good or bad, but it made me think this morning and that was enough for me to want to include it on the list. The problem is this:

A client uses a separate company’s widget to display reviews as part of a rotating banner, present on every page. These reviews add about 750 words of content to each page, and despite being displayed randomly thanks to JavaScript, every review still loads identically in every page’s HTML.

Now while I said above that I don’t mind duplicate content across domains for sites in vastly different service areas, I do mind excessive duplicate content at the top of every page within a single site. My problem is as follows: search engine crawlers don’t always read whole pages—they skim, starting from the top and working their way down until they get a good idea of what the page is about. If all of your pages start the same way and have such a significant amount of content that the search engines never see the actual page content, you’re going to have to rely on your title tags to differentiate pages within the site (the crawlers might not even hit your H1s!)—not ideal, especially if you’re making a large investment in content marketing.

I add “stock” content—usually in the form of a “Why Call Client X” section—to the bottom of most pages I write. The reason I write it at the bottom is because I want to front-load my unique content. The last thing you want is to have enough duplicate content across your site’s pages that the crawler stops reading before it gets to your unique!

Bottom line: even if you’re not an SEO company, please don’t be lazy with web development.

Like I said—this list is by no means exhaustive and is only a few things web development companies should be doing to ensure their clients’ sites don’t take a dive in Google when their new sites launch. Let me know in the comments if you’ve faced similar or different situations and be on the lookout for a follow up post as I come across more.