robots.txt + ai.txt

Per RFC 9309

Crawler controls including AI-crawler blocks (ClaudeBot, GPTBot, Google-Extended).

What is this, and when do I need it?

What is this?

The robots.txt is a text file at the root of your website (/robots.txt) that tells search crawlers which areas they may index and which they may not. Standardised in RFC 9309.

Complementing it: ai.txt and llms.txt are aimed specifically at AI crawlers (ClaudeBot, GPTBot, Google-Extended, PerplexityBot). With them you signal whether your content may be used as training material for language models - not legally binding yet, but respected by serious providers so far.

When do I need it?

robots.txt is a must for any production website. Without it, search engines crawl everything - including internal paths, admin pages, staging environments. A few disallow lines save crawl budget and protect against accidental indexing.

ai.txt / llms.txt are recommended as soon as you publish content with IP value (texts, code, data) that you do not want in AI training. Practically effective with the major providers; against bad actors, only legal remedies help.

/robots.txt per RFC 9309
# robots.txt per RFC 9309 (Robots Exclusion Protocol)
# Created with Dernium Webtools

User-agent: *
Disallow: /admin/
Disallow: /api/

# AI crawler block. Tokens per vendor documentation as of early 2026.
# List requires ongoing maintenance because vendors change tokens.
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Bytespider
User-agent: Amazonbot
User-agent: Applebot-Extended
User-agent: cohere-ai
User-agent: cohere-training-data-crawler
User-agent: YouBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: ImagesiftBot
User-agent: Diffbot
User-agent: Webzio-Extended
User-agent: omgili
User-agent: omgilibot
User-agent: Timpibot
User-agent: PetalBot
User-agent: AI2Bot
User-agent: Andibot
User-agent: Kangaroo Bot
User-agent: Velen Crawler
User-agent: MistralAI-User
User-agent: DuckAssistBot
User-agent: iaskspider
User-agent: Sidetrade indexer bot
User-agent: ICC-Crawler
User-agent: ISSCyberRiskCrawler
Disallow: /

Sitemap: https://example.com/sitemap.xml

Kostenlos, ohne Gewähr (Best-Effort). Erzeugte wie geprüfte Angaben sind unverbindlich; für fehlerhafte oder unvollständige Ergebnisse und Konfigurationen übernehmen wir keine Haftung. Anwendung und Prüfung erfolgen in eigener Verantwortung, vor dem Produktiveinsatz bitte testen.

Free, no warranty (best effort). Generated and inspected values are non-binding; we accept no liability for erroneous or incomplete results or configurations. Use and verification are your own responsibility; please test before production use.

Extras: ai.txt and llms.txt

ai.txt per Spawning is an opt-out or opt-in marker for AI training pipelines on the media-type level (text, image, audio, video, code). llms.txt per llmstxt.org is a short briefing in Markdown form that language models can read for structure.

/ai.txt per Spawning ai.txt
# ai.txt per Spawning (https://spawning.ai/)
# Opt-out / opt-in signal for AI training pipelines, separate from robots.txt.
# Created with Dernium Webtools

User-Agent: *
Disallow: image, text, audio, video, code

# Domain: example.com
# Host under https://<domain>/ai.txt

Kostenlos, ohne Gewähr (Best-Effort). Erzeugte wie geprüfte Angaben sind unverbindlich; für fehlerhafte oder unvollständige Ergebnisse und Konfigurationen übernehmen wir keine Haftung. Anwendung und Prüfung erfolgen in eigener Verantwortung, vor dem Produktiveinsatz bitte testen.

Free, no warranty (best effort). Generated and inspected values are non-binding; we accept no liability for erroneous or incomplete results or configurations. Use and verification are your own responsibility; please test before production use.

/llms.txt per llmstxt.org
# Example Ltd
> Short description of the site for language models.

## Important content

- [Home](https://example.com/)
- [Imprint](https://example.com/imprint)
- [Contact](https://example.com/contact)

<!-- Created with Dernium Webtools -->

Kostenlos, ohne Gewähr (Best-Effort). Erzeugte wie geprüfte Angaben sind unverbindlich; für fehlerhafte oder unvollständige Ergebnisse und Konfigurationen übernehmen wir keine Haftung. Anwendung und Prüfung erfolgen in eigener Verantwortung, vor dem Produktiveinsatz bitte testen.

Free, no warranty (best effort). Generated and inspected values are non-binding; we accept no liability for erroneous or incomplete results or configurations. Use and verification are your own responsibility; please test before production use.

Inspect an existing robots.txt

Fetches /robots.txt of the given domain and shows the content.

Try with:

Server path: this inspection does NOT run browser-local. We fetch the DNS record or HTTPS response via our server. We do not log the queried domain or the result. 12 requests per minute per IPv4 address or IPv6 /64 subnet.