Skip to main content

robots.txt Generator & Tester

Generate robots.txt files with custom crawler rules and sitemaps, or test an existing robots.txt to see if specific URLs are allowed or blocked.

Ad (leaderboard)
Rate this tool
0.0 / 5 · 0 ratings

Embed This Calculator

Add this calculator to your website for free. Copy the single line of code below and paste it into your HTML. The calculator auto-resizes to fit your page.

<script src="https://calchammer.com/embed.js" data-calculator="robots-txt-generator" data-category="everyday"></script>
data-theme "light", "dark", or "auto"
data-values Pre-fill inputs, e.g. "amount=1000"
data-max-width Max width, e.g. "600px"
data-border "true" or "false"
Or use an iframe instead
<iframe src="https://calchammer.com/embed/everyday/robots-txt-generator" width="100%" height="500" style="border:none;border-radius:12px;" title="Robots Txt Calculator"></iframe>

Preview

yoursite.com/blog
Robots Txt Calculator auto-resizes here
Ad (in_results)

Understanding robots.txt

The robots.txt file is the first file that well-behaved web crawlers check when visiting a site. It lives at the root of your domain and uses the Robots Exclusion Protocol to communicate which parts of your site crawlers should and should not access. While the standard has been in use since 1994, Google formalized its interpretation in a detailed specification and released an open-source parser in 2019. Every website that wants to control how search engines crawl its pages needs a properly configured robots.txt file.

The file uses a straightforward syntax. Each section begins with a User-agent directive specifying which crawler the rules apply to. An asterisk matches all crawlers. Disallow directives list paths that should not be crawled, while Allow directives create exceptions within broader disallow rules. The Sitemap directive points crawlers to your XML sitemap, helping them discover all the pages on your site. Lines beginning with a hash character are comments and are ignored by crawlers.

Ad (in_content)

How Web Crawlers Use robots.txt

When a search engine crawler first visits your domain, it requests the /robots.txt file before crawling any other page. If the file exists and contains rules for that crawler's user agent, the crawler follows those rules. If the file does not exist or returns a 404, the crawler assumes all pages are allowed. If the file returns a 5xx server error, most crawlers will temporarily stop crawling the site and try again later, treating the inability to read robots.txt as a precaution rather than permission. Google caches robots.txt files and refreshes them at least once a day.

Common robots.txt Rules

The most common use case is blocking crawlers from admin areas, login pages, search results pages, and duplicate content. For example, Disallow: /admin/ prevents crawlers from indexing your administration panel. Disallow: /search prevents search engine result pages from appearing in search results, which would be thin, duplicate content. Blocking PDF files, print-friendly pages, or staging environments are other frequent applications. It is important to remember that robots.txt controls crawling, not indexing. A page blocked by robots.txt can still appear in search results if other pages link to it.

Crawling vs. Indexing

A common misconception is that robots.txt can prevent a page from appearing in search results. Blocking a page in robots.txt prevents crawlers from accessing its content, but the URL may still appear in search results if external sites link to it. Google will show the URL with a note that the description is not available because the page is blocked from crawling. To truly prevent indexing, use the noindex meta tag or the X-Robots-Tag HTTP header instead. Critically, the page must be crawlable for Google to see the noindex directive, so do not block a page in robots.txt if you want to use noindex on it.

Testing Your robots.txt

Before deploying a robots.txt file, always test it to make sure it does not accidentally block important pages. Google Search Console provides a robots.txt tester that shows you exactly how Googlebot interprets your rules. You can also use the tester tab in this tool to paste your robots.txt content and check whether specific URLs are allowed or blocked. Test your most important pages, your sitemap URL, and any pages you specifically want to block to verify the rules are working as intended.

Frequently Asked Questions

What is a robots.txt file?

A plain text file at the root of a website that tells crawlers which pages they may or may not access. It follows the Robots Exclusion Protocol using directives like User-agent, Disallow, Allow, and Sitemap.

Where does the robots.txt file go?

It must be at the root of the domain at the exact path /robots.txt. Each subdomain needs its own file. It must be served as text/plain in UTF-8 encoding.

Can robots.txt block all crawlers?

Using "User-agent: * / Disallow: /" blocks all well-behaved crawlers. However, malicious bots may ignore it. For true access restriction, use server-side authentication.

Does Google respect robots.txt?

Yes. Googlebot checks robots.txt before crawling any URL. However, blocked pages may still appear in search results if linked from other sites. Use noindex to prevent indexing.

Can I block specific pages with robots.txt?

Yes. Use "Disallow: /path/" for directories or "Disallow: /page.html" for specific pages. Wildcards like *.pdf$ can match patterns. Allow directives create exceptions.

Related Calculators

Disclaimer: This calculator is for informational and educational purposes only. Results are estimates and should not be considered professional expert advice. Consult a qualified professional before making decisions based on these calculations. See our full Disclaimer.