mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-05-14 01:36:14 +00:00
## Summary Once [comfy-router#22](https://github.com/Comfy-Org/comfy-router/pull/22) ships, `comfy.org/sitemap-index.xml` will return a unified index aggregating both the website (38 URLs) and workflow-templates sitemaps. This PR: 1. Reverts `Sitemap:` back to `/sitemap-index.xml` (was `/sitemap-0.xml` in #11802 as a workaround for the 404). 2. Adds explicit allow records for 21 search and AI/LLM crawlers (GPTBot, ChatGPT-User, OAI-SearchBot, Google-Extended, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, Applebot-Extended, Bytespider, Amazonbot, CCBot, Meta-ExternalAgent, Meta-ExternalFetcher, Diffbot, etc.). 3. Adds `Disallow:` for `/_astro/`, `/_website/`, `/_vercel/` — Vercel build artifacts that aren't useful to crawl. ## Why granular UAs Stacked `User-agent:` records (per [RFC 9309 §2.2](https://datatracker.ietf.org/doc/html/rfc9309#section-2.2)) share one rule block. Listing each bot explicitly: - Signals intent to AI bots that look for their UA in robots.txt before crawling more aggressively. - Surfaces our crawl policy clearly to anyone inspecting the file. - Lets us add per-bot Disallows in future without restructuring. ## Merge order ⚠️ **Do NOT merge until comfy-router#22 is deployed to production.** Until then, `/sitemap-index.xml` returns 404 and this PR would re-break the issue PR #11802 patched. Verification: ```bash curl -sI https://comfy.org/sitemap-index.xml # expect: HTTP/2 200, x-served-by: worker-sitemap-index ``` Once that returns 200, this is safe to merge. ## Verification (after merge + deploy) ```bash # robots.txt is served and points at the unified index curl -s https://comfy.org/robots.txt | grep '^Sitemap:' # → Sitemap: https://comfy.org/sitemap-index.xml # Each AI crawler can fetch it for ua in 'GPTBot/1.0' 'ClaudeBot/1.0' 'PerplexityBot/1.0' 'Google-Extended' 'Applebot-Extended'; do curl -s -o /dev/null -w "$ua → %{http_code}\n" -A "$ua" https://comfy.org/robots.txt done # Sitemap is reachable from robots.txt SITEMAP=$(curl -s https://comfy.org/robots.txt | awk -F': ' '/^Sitemap:/ {print $2}') curl -s "$SITEMAP" | xmllint --noout - && echo "valid XML" ``` ## Linear / closes - Closes FE-437 (AI crawler rules) - Updates FE-432 — the robots.txt change in #11802 was a workaround that's no longer needed once #22 ships ┆Issue is synchronized with this [Notion page](https://app.notion.com/p/PR-11823-feat-website-point-robots-txt-at-sitemap-index-xml-AI-crawler-rules-3546d73d3650811dbceedd06c00db444) by [Unito](https://www.unito.io)