What is robots.txt?
The robots.txt file is a simple text file that tells bots and crawlers which parts of your website they can visit. Think of it like a sign at the entrance of your website that says "visitors welcome" or "private area." Every bot that follows the rules (called the Robots Exclusion Protocol) checks this file first before crawling your site.
For AI search engines, robots.txt is especially important. It controls whether AI bots like GPTBot (ChatGPT), ClaudeBot (Claude), and PerplexityBot can access your content for training and search results. Setting this up correctly helps you manage your AI bot access effectively.
Your robots.txt file must be located at yoursite.com/robots.txt. Bots won't look for it anywhere else. If you don't have this file, bots assume they can crawl everything.
Why robots.txt Matters for AI
AI bots are different from traditional search engine crawlers. They visit your site for two main reasons:
Training Data Collection
Some AI companies use web content to train their language models. They crawl millions of pages to build knowledge bases.
You can control whether your content is used for training by blocking specific bots in robots.txt.
Search Result Generation
AI search engines crawl your content to include it in their search results and answer generation.
Allowing these bots helps your content appear in AI-generated answers, improving your GEO-Score.
The key is finding the right balance. You want AI search engines to access your content for visibility, but you might want to block certain areas or specific training bots. Your robots.txt file gives you this control.
Major AI Bot User-Agents
Each AI bot identifies itself with a unique user-agent string. Here are the most important ones:
GPTBot
OpenAIUser-agent: GPTBot
Used by: ChatGPT, OpenAI search features
GPTBot crawls content for both ChatGPT responses and training. Blocking it prevents your content from appearing in ChatGPT's web search results.
ClaudeBot
AnthropicUser-agent: ClaudeBot
Used by: Claude AI, Anthropic's AI assistant
ClaudeBot accesses web content to provide current information in Claude's responses. It respects robots.txt rules strictly.
PerplexityBot
PerplexityUser-agent: PerplexityBot
Used by: Perplexity AI search engine
PerplexityBot powers one of the most popular AI search engines. Allowing it improves visibility in Perplexity search results.
Google-Extended
GoogleUser-agent: Google-Extended
Used by: Google Gemini AI training
This is separate from Googlebot. Google-Extended collects data for training Gemini. Blocking it doesn't affect normal Google Search indexing.
FacebookBot
MetaUser-agent: FacebookBot
Used by: Meta AI, Facebook link previews
FacebookBot crawls for link previews and Meta's AI features. It's important for social media visibility.
For a complete list of AI bot user-agents with technical details, see our AI Bot User-Agents Reference.
Basic robots.txt Syntax
The robots.txt file uses a simple syntax with just a few commands:
User-agent
Specifies which bot the following rules apply to. Use * for all bots.
User-agent: GPTBot User-agent: *
Disallow
Tells bots NOT to access specific paths. Use / to block everything.
Disallow: /admin/ Disallow: /private/ Disallow: /
Allow
Tells bots they CAN access specific paths. Use this to override a broader Disallow rule.
Disallow: /admin/ Allow: /admin/public/
Crawl-delay
Sets a delay in seconds between bot requests. Not supported by all bots.
Crawl-delay: 10
Sitemap
Points bots to your XML sitemap for better crawling efficiency.
Sitemap: https://yoursite.com/sitemap.xml
Common robots.txt Configurations
Here are ready-to-use configurations for common scenarios:
Allow All AI Bots (Recommended for Most Sites)
This configuration welcomes all AI search engines while protecting admin areas:
# Allow all AI bots to crawl User-agent: * Allow: / # Block private areas for all bots Disallow: /admin/ Disallow: /api/ Disallow: /login/ Disallow: /dashboard/ # Sitemap location Sitemap: https://yoursite.com/sitemap.xml
Block AI Training, Allow AI Search
Block bots used for training AI models while allowing search bots:
# Block training bots User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / # Allow search bots User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / # Default rules for other bots User-agent: * Allow: / Disallow: /admin/ Sitemap: https://yoursite.com/sitemap.xml
Selective Content Access
Allow AI bots to access blog content but not product pages:
# AI bots can access blog User-agent: GPTBot Allow: /blog/ Disallow: / User-agent: ClaudeBot Allow: /blog/ Disallow: / # Default rules User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml
Block All AI Bots
If you want to opt out of AI search entirely (not recommended for visibility):
# Block all known AI bots User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: FacebookBot Disallow: / User-agent: CCBot Disallow: / # Allow traditional search engines User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / Sitemap: https://yoursite.com/sitemap.xml
Best Practices
Do These
✓Place robots.txt in your root directory
✓Use one rule per line
✓Include your sitemap location
✓Test your robots.txt after changes
✓Allow AI bots for better GEO visibility
✓Keep the file under 500KB
Avoid These
✗Using robots.txt for security
✗Blocking all bots without reason
✗Using regular expressions (not supported)
✗Forgetting to update after site changes
✗Blocking CSS/JS needed for page rendering
✗Creating multiple robots.txt files
Testing Your robots.txt
Always test your robots.txt file before deploying it. Use these methods:
Manual Testing
Visit yoursite.com/robots.txt in your browser to verify:
- The file is accessible and loads correctly
- There are no syntax errors or typos
- All user-agent names are spelled correctly
- Paths match your actual site structure
Google Search Console
Use Google's robots.txt Tester tool:
- Go to Google Search Console
- Navigate to Crawl → robots.txt Tester
- Test specific URLs against your rules
- Check for errors and warnings
Online Validators
Use third-party robots.txt validators:
- Robots.txt Checker: Check syntax and coverage
- Bloffee GEO Analyzer: Validates robots.txt as part of full site analysis
- SEO Tools: Many SEO platforms include robots.txt testing
Server Log Monitoring
Check your server logs to verify bot behavior:
- Look for AI bot user-agent strings in access logs
- Verify bots are respecting your rules
- Identify any unauthorized crawling
- Monitor crawl frequency and patterns
Advanced Configurations
Rate Limiting with Crawl-delay
Control how fast bots crawl your site to reduce server load:
User-agent: GPTBot Crawl-delay: 10 Allow: / User-agent: ClaudeBot Crawl-delay: 5 Allow: /
Note: Not all bots support crawl-delay. It's more reliable to use server-side rate limiting.
Wildcard Patterns
Use wildcards to match multiple paths (supported by most modern bots):
User-agent: * # Block all PDF files Disallow: /*.pdf$ # Block all URLs with query parameters Disallow: /*? # Block all admin pages Disallow: /*/admin/
Multiple Sitemaps
List multiple sitemaps for different content types:
Sitemap: https://yoursite.com/sitemap-pages.xml Sitemap: https://yoursite.com/sitemap-blog.xml Sitemap: https://yoursite.com/sitemap-products.xml Sitemap: https://yoursite.com/sitemap-images.xml
robots.txt Quick Tips
- •Start with allowing all AI search bots for maximum visibility
- •Only block specific bots if you have a strong reason
- •Always include your sitemap location
- •Test changes before deploying to production
- •Monitor bot access in your server logs
- •Update robots.txt when you change site structure
- •Remember: robots.txt is not a security measure
Impact on Your GEO-Score
Your robots.txt configuration directly affects your AI Bot Access score, which is a key component of your overall GEO-Score.
Bloffee checks your robots.txt for:
- Whether AI bots can access your content
- Proper syntax and formatting
- Accidental blocking of important pages
- Sitemap declaration
- Overly restrictive rules that hurt visibility
A well-configured robots.txt that welcomes AI bots can improve your GEO-Score by 10-15 points. Blocking important bots can reduce your score by 20-30 points or more.
Related Topics
- AI Bot User-Agents Reference
Complete list of AI bot user-agents with technical details
- AI Bot Access
Learn how bot access affects your GEO-Score
- Meta Tags Complete Guide
Configure meta robots tags for additional bot control