Files
ComfyUI_frontend/apps
Christian Byrne e831daae59 feat(website): point robots.txt at /sitemap-index.xml + AI crawler rules (#11823)
## Summary

Once
[comfy-router#22](https://github.com/Comfy-Org/comfy-router/pull/22)
ships, `comfy.org/sitemap-index.xml` will return a unified index
aggregating both the website (38 URLs) and workflow-templates sitemaps.
This PR:

1. Reverts `Sitemap:` back to `/sitemap-index.xml` (was `/sitemap-0.xml`
in #11802 as a workaround for the 404).
2. Adds explicit allow records for 21 search and AI/LLM crawlers
(GPTBot, ChatGPT-User, OAI-SearchBot, Google-Extended, ClaudeBot,
Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User,
Applebot-Extended, Bytespider, Amazonbot, CCBot, Meta-ExternalAgent,
Meta-ExternalFetcher, Diffbot, etc.).
3. Adds `Disallow:` for `/_astro/`, `/_website/`, `/_vercel/` — Vercel
build artifacts that aren't useful to crawl.

## Why granular UAs

Stacked `User-agent:` records (per [RFC 9309
§2.2](https://datatracker.ietf.org/doc/html/rfc9309#section-2.2)) share
one rule block. Listing each bot explicitly:

- Signals intent to AI bots that look for their UA in robots.txt before
crawling more aggressively.
- Surfaces our crawl policy clearly to anyone inspecting the file.
- Lets us add per-bot Disallows in future without restructuring.

## Merge order

⚠️ **Do NOT merge until comfy-router#22 is deployed to production.**
Until then, `/sitemap-index.xml` returns 404 and this PR would re-break
the issue PR #11802 patched. Verification:

```bash
curl -sI https://comfy.org/sitemap-index.xml
# expect: HTTP/2 200, x-served-by: worker-sitemap-index
```

Once that returns 200, this is safe to merge.

## Verification (after merge + deploy)

```bash
# robots.txt is served and points at the unified index
curl -s https://comfy.org/robots.txt | grep '^Sitemap:'
# → Sitemap: https://comfy.org/sitemap-index.xml

# Each AI crawler can fetch it
for ua in 'GPTBot/1.0' 'ClaudeBot/1.0' 'PerplexityBot/1.0' 'Google-Extended' 'Applebot-Extended'; do
  curl -s -o /dev/null -w "$ua → %{http_code}\n" -A "$ua" https://comfy.org/robots.txt
done

# Sitemap is reachable from robots.txt
SITEMAP=$(curl -s https://comfy.org/robots.txt | awk -F': ' '/^Sitemap:/ {print $2}')
curl -s "$SITEMAP" | xmllint --noout - && echo "valid XML"
```

## Linear / closes

- Closes FE-437 (AI crawler rules)
- Updates FE-432 — the robots.txt change in #11802 was a workaround
that's no longer needed once #22 ships

┆Issue is synchronized with this [Notion
page](https://app.notion.com/p/PR-11823-feat-website-point-robots-txt-at-sitemap-index-xml-AI-crawler-rules-3546d73d3650811dbceedd06c00db444)
by [Unito](https://www.unito.io)
2026-05-01 21:04:45 -07:00
..