Files
ComfyUI_frontend/apps/website
Christian Byrne e831daae59 feat(website): point robots.txt at /sitemap-index.xml + AI crawler rules (#11823)
## Summary

Once
[comfy-router#22](https://github.com/Comfy-Org/comfy-router/pull/22)
ships, `comfy.org/sitemap-index.xml` will return a unified index
aggregating both the website (38 URLs) and workflow-templates sitemaps.
This PR:

1. Reverts `Sitemap:` back to `/sitemap-index.xml` (was `/sitemap-0.xml`
in #11802 as a workaround for the 404).
2. Adds explicit allow records for 21 search and AI/LLM crawlers
(GPTBot, ChatGPT-User, OAI-SearchBot, Google-Extended, ClaudeBot,
Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User,
Applebot-Extended, Bytespider, Amazonbot, CCBot, Meta-ExternalAgent,
Meta-ExternalFetcher, Diffbot, etc.).
3. Adds `Disallow:` for `/_astro/`, `/_website/`, `/_vercel/` — Vercel
build artifacts that aren't useful to crawl.

## Why granular UAs

Stacked `User-agent:` records (per [RFC 9309
§2.2](https://datatracker.ietf.org/doc/html/rfc9309#section-2.2)) share
one rule block. Listing each bot explicitly:

- Signals intent to AI bots that look for their UA in robots.txt before
crawling more aggressively.
- Surfaces our crawl policy clearly to anyone inspecting the file.
- Lets us add per-bot Disallows in future without restructuring.

## Merge order

⚠️ **Do NOT merge until comfy-router#22 is deployed to production.**
Until then, `/sitemap-index.xml` returns 404 and this PR would re-break
the issue PR #11802 patched. Verification:

```bash
curl -sI https://comfy.org/sitemap-index.xml
# expect: HTTP/2 200, x-served-by: worker-sitemap-index
```

Once that returns 200, this is safe to merge.

## Verification (after merge + deploy)

```bash
# robots.txt is served and points at the unified index
curl -s https://comfy.org/robots.txt | grep '^Sitemap:'
# → Sitemap: https://comfy.org/sitemap-index.xml

# Each AI crawler can fetch it
for ua in 'GPTBot/1.0' 'ClaudeBot/1.0' 'PerplexityBot/1.0' 'Google-Extended' 'Applebot-Extended'; do
  curl -s -o /dev/null -w "$ua → %{http_code}\n" -A "$ua" https://comfy.org/robots.txt
done

# Sitemap is reachable from robots.txt
SITEMAP=$(curl -s https://comfy.org/robots.txt | awk -F': ' '/^Sitemap:/ {print $2}')
curl -s "$SITEMAP" | xmllint --noout - && echo "valid XML"
```

## Linear / closes

- Closes FE-437 (AI crawler rules)
- Updates FE-432 — the robots.txt change in #11802 was a workaround
that's no longer needed once #22 ships

┆Issue is synchronized with this [Notion
page](https://app.notion.com/p/PR-11823-feat-website-point-robots-txt-at-sitemap-index-xml-AI-crawler-rules-3546d73d3650811dbceedd06c00db444)
by [Unito](https://www.unito.io)
2026-05-01 21:04:45 -07:00
..

@comfyorg/website

Marketing/brand website built with Astro + Vue.

Ashby careers integration

/careers and /zh-CN/careers are rendered from Ashby's public job board API at build time. Data flow:

  1. src/pages/careers.astro awaits fetchRolesForBuild() during the Astro build.
  2. src/utils/ashby.ts calls GET https://api.ashbyhq.com/posting-api/job-board/{board}?includeCompensation=false, validates the envelope and each posting with Zod (src/utils/ashby.schema.ts), and maps to the domain type in src/data/roles.ts.
  3. On any failure (network, HTTP 4xx/5xx, envelope schema drift), the fetcher falls back to the committed JSON snapshot at src/data/ashby-roles.snapshot.json.
  4. src/utils/ashby.ci.ts emits GitHub Actions annotations and a $GITHUB_STEP_SUMMARY block so stale fetches are visible on green builds.

Required environment variables

Both are build-time only. Never prefix with PUBLIC_ (Astro would inline that into the client bundle).

Name Purpose Default (when unset)
WEBSITE_ASHBY_API_KEY Ashby API key (Basic auth) Build uses the committed snapshot
WEBSITE_ASHBY_JOB_BOARD_NAME Ashby public job board slug Build uses the committed snapshot

CI wiring (manual step — required)

This repo's .github/workflows/*.yaml changes cannot be pushed by a GitHub App. A maintainer must apply the following edits once:

.github/workflows/ci-website-build.yaml — pass the env into the build step and run the unit tests before it:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Setup frontend
        uses: ./.github/actions/setup-frontend

      - name: Run website unit tests
        run: pnpm --filter @comfyorg/website test:unit

      - name: Build website
        env:
          WEBSITE_ASHBY_API_KEY: ${{ secrets.WEBSITE_ASHBY_API_KEY }}
          WEBSITE_ASHBY_JOB_BOARD_NAME: ${{ vars.WEBSITE_ASHBY_JOB_BOARD_NAME || 'comfy-org' }}
        run: pnpm --filter @comfyorg/website build

      - name: Verify API key is not leaked into build output
        env:
          WEBSITE_ASHBY_API_KEY: ${{ secrets.WEBSITE_ASHBY_API_KEY }}
        run: |
          set +x
          if [ -z "${WEBSITE_ASHBY_API_KEY:-}" ]; then
            echo "Secret not available in this run; skipping leak check."
            exit 0
          fi
          # grep -rlF prints only file paths (never match content).
          MATCHES=$(grep -rlF --exclude-dir=node_modules --null \
            -e "$WEBSITE_ASHBY_API_KEY" apps/website/dist/ 2>/dev/null \
            | tr '\0' '\n' || true)
          if [ -n "$MATCHES" ]; then
            echo "::error title=Ashby API key leaked into build output::$MATCHES"
            exit 1
          fi

.github/workflows/ci-vercel-website-preview.yaml — add the two env vars to the top-level env: block so vercel build (both deploy-preview and deploy-production jobs) sees them:

env:
  VERCEL_ORG_ID: ${{ secrets.VERCEL_WEBSITE_ORG_ID }}
  VERCEL_PROJECT_ID: ${{ secrets.VERCEL_WEBSITE_PROJECT_ID }}
  VERCEL_TOKEN: ${{ secrets.VERCEL_WEBSITE_TOKEN }}
  VERCEL_SCOPE: comfyui
  WEBSITE_ASHBY_API_KEY: ${{ secrets.WEBSITE_ASHBY_API_KEY }}
  WEBSITE_ASHBY_JOB_BOARD_NAME: ${{ vars.WEBSITE_ASHBY_JOB_BOARD_NAME || 'comfy-org' }}

The secret must also be added to the Vercel project environment (vercel env add WEBSITE_ASHBY_API_KEY … or via the Vercel UI) so that vercel build in the preview job has access to it.

Fork PRs do not exercise this path: ci-vercel-website-preview.yaml receives an empty VERCEL_TOKEN for forks and fails at vercel pull before the build runs. Fork-safe PR interactions (the preview-URL comment) are handled by pr-vercel-website-preview.yaml.

Refreshing the snapshot

When a maintainer wants to update the committed snapshot (e.g. after onboarding/offboarding roles):

WEBSITE_ASHBY_API_KEY=WEBSITE_ASHBY_JOB_BOARD_NAME=comfy-org \
  pnpm --filter @comfyorg/website ashby:refresh-snapshot
git commit apps/website/src/data/ashby-roles.snapshot.json

The script exits non-zero on any non-fresh outcome so stale/empty snapshots can't be accidentally committed.

HubSpot contact form

The contact page uses HubSpot's hosted form embed for the interest form:

<script
  src="https://js-na2.hsforms.net/forms/embed/developer/244637579.js"
  defer
></script>
<div
  class="hs-form-html"
  data-region="na2"
  data-form-id="94e05eab-1373-47f7-ab5e-d84f9e6aa262"
  data-portal-id="244637579"
></div>

The localized /zh-CN/contact page uses the same portal and script with form ID 6885750c-02ef-4aa2-ba0d-213be9cccf93.

This keeps submission handling, validation, anti-spam updates, and field configuration in HubSpot. The local implementation in src/components/contact/HubspotFormEmbed.vue only loads the hosted script and renders the documented embed container.

Scripts

  • pnpm dev — Astro dev server
  • pnpm build — production build to dist/
  • pnpm typecheckastro check
  • pnpm test:unit — Vitest unit tests
  • pnpm test:e2e — Playwright E2E tests (requires pnpm build first)
  • pnpm ashby:refresh-snapshot — refresh the committed careers snapshot