# Syntopi.com — robots.txt # Public surfaces are open to all crawlers; auth-gated and editorial # dashboards are blocked. Sitemap auto-regenerated by scripts/build_sitemap.py. # # LLM-training scraper opt-outs are below the general rules. Each # provider documents its own user-agent string; the conventional # Disallow: / under each blocks the corpus from being ingested into # training pipelines. Search-engine indexers (Googlebot, Bingbot, # DuckDuckBot, etc.) are unaffected — they're separate user-agents # and continue to fall under the `User-agent: *` block above. # # Provider references: # GPTBot — https://platform.openai.com/docs/gptbot # ChatGPT-User — https://platform.openai.com/docs/plugins/bot # anthropic-ai — https://docs.anthropic.com/en/docs/agents-and-tools/web-fetch-tool # Claude-Web — Anthropic legacy crawler (still cited by some sites) # Bytespider — ByteDance / TikTok # CCBot — Common Crawl (used as input to many model trainers) # Google-Extended — Google AI training opt-out (separate from Googlebot) # Omgilibot — Webz.io (sells training data to LLM shops) User-agent: * Allow: / Disallow: /admin/ Disallow: /api/ Disallow: /tools/editor/flag-queue.html Disallow: /tools/editor/my-account.html Disallow: /tools/editor/my-library.html Disallow: /tools/editor/review.html # ── LLM-training opt-outs ──────────────────────────────────────────── User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: Omgilibot Disallow: / User-agent: Google-Extended Disallow: / # ── Commercial SEO-backlink crawlers ───────────────────────────────── # These crawl the public web for their paid B2B products (backlink graphs, # SEO audits). They don't drive any traffic TO syntopi.com; they crawl us # so their customers can see who links to us. High request rates; many # ignore robots.txt anyway — so they're ALSO hard-blocked in middleware.js. # 2026-05-20: SERankingBacklinksBot burned 262 MB/min in a single minute # hitting middleware on every URL. User-agent: SERankingBacklinksBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: DotBot Disallow: / User-agent: rogerbot Disallow: / User-agent: BLEXBot Disallow: / User-agent: DataForSeoBot Disallow: / User-agent: AwarioRssBot Disallow: / User-agent: AwarioSmartBot Disallow: / Sitemap: https://syntopi.com/sitemap.xml