Robots.txt Checker
Enter a website URL to view its robots.txt
file and check whether the URL is blocked from crawling.
Enter URL
What is robots.txt?
The robots.txt
file is a simple text file placed at the root of a website that tells web crawlers—like those used by Google and Bing—which parts of the site they can or cannot visit. It follows a standard protocol known as the Robots Exclusion Protocol, giving site owners control over how bots interact with their content.
How It Works
When a bot lands on your website, the first thing it looks for is the /robots.txt
file. Based on the rules defined inside it, the bot will decide which URLs to crawl and which ones to avoid. These rules are written using two primary directives:
- User-agent: Specifies the target crawler (e.g.,
*
for all bots,Googlebot
for Google). - Disallow/Allow: Directs bots to avoid or access certain paths.
User-agent: * Disallow: /private/ Allow: /public/
Why It Matters for SEO
The robots.txt
file plays a crucial role in how search engines interact with your site. A well-optimized file can help search engines focus on your most important pages while preventing them from indexing duplicate, irrelevant, or private content. It also helps manage your crawl budget by avoiding unnecessary indexing of unimportant sections.
How to Create and Use a robots.txt File
You can create a robots.txt
file using any plain text editor. Ensure it’s encoded in UTF-8 and placed in the root of your domain—for example: https://www.yourdomain.com/robots.txt
.
Inside the file, specify which bots you're targeting and which parts of your site they can access. Here's a basic example:
User-agent: * Disallow: /admin/ Allow: /blog/ Sitemap: https://www.yourdomain.com/sitemap.xml
After creating the file, upload it to your web server. You can test and validate it using tools like Google’s Robots.txt Tester or other SEO analyzers.
Best Practices
- Be precise with paths—use slashes correctly to distinguish between files and folders.
- Don’t block critical assets like CSS or JavaScript files. Search engines need them to render your site correctly.
- Review your file regularly to keep it aligned with your site structure and SEO goals.
- Remember:
robots.txt
is publicly accessible. Avoid using it to hide sensitive data.
Using robots.txt with WordPress
For WordPress sites, robots.txt
can be especially useful to manage crawler behavior and performance. A common setup is to block the /wp-admin/
area while still allowing AJAX requests for functionality.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
Upload this file to your root directory and verify its behavior using a robots.txt testing tool. Keep in mind that any major theme, plugin, or content change might require updates to this file.