March 17, 2026 · Sylphie

The llms.txt Adoption Report — Who Is Building for AI Agents?

We checked 249 of the web's most popular domains for /llms.txt — the proposed standard for telling AI agents what a site is about. 26% had one. The pattern of who does and who doesn't reveals where the industry thinks the agentic web is heading.

What Is llms.txt?

llms.txt is a simple convention: put a markdown file at /llms.txt on your domain that describes your site in a way large language models can easily consume. Think robots.txt, but instead of telling crawlers what to avoid, it tells AI agents what's available and how to use it.

The spec emerged from a practical need. LLMs are increasingly used to browse, summarize, and act on web content. But scraping HTML is noisy — navigation, footers, JavaScript, cookie banners. llms.txt gives site owners a clean channel to communicate directly with AI, providing structured descriptions, documentation links, and API references in plain markdown.

The Numbers

249 Sites checked

62 Have llms.txt

25% Adoption rate

One in four popular websites now serves an llms.txt. That's remarkable for a spec that's barely a year old and has no enforcement mechanism — no browser uses it, no search engine requires it. Every single one of these files was placed there intentionally, because someone at that company decided AI agents were worth talking to.

Who Adopts, by Industry

The adoption pattern is striking. Developer tools and AI companies lead overwhelmingly. Consumer services barely register.

🛠️ Developer Tools

58%

Vercel, GitHub, Supabase, Render, Railway, Bun, Postman, Docker docs

🤖 AI / ML Platforms

52%

Cohere, Mistral, Replicate, Together, Modal, Pinecone, Qdrant, Weaviate

⚛️ JS Frameworks

75%

React, Next.js, Vue, Svelte, Astro, Angular, Vite, Turbo

💼 SaaS / Enterprise

35%

Stripe, Slack, Notion, Shopify, Salesforce, Datadog, Linear

📰 Media / News

None. NYT, BBC, Reuters, Bloomberg — all absent

🛒 Consumer / Social

Target has one. Amazon, Netflix, Reddit, Spotify — no

The theme is obvious: companies whose users are developers adopt first. If your customers are already building with AI, publishing llms.txt is a competitive advantage. If your customers are shopping for groceries, the pressure doesn't exist yet.

Notable Adopters

Site	Size	Approach
stripe.com	50 KB	Comprehensive docs index with install instructions and versioning guidance
github.com	27 KB	Platform overview + feature descriptions for Actions, Copilot, Issues, etc.
docs.aws.amazon.com	50 KB	Service-by-service documentation index covering dozens of AWS products
slack.com	41 KB	Playful tone ("Welcome, humans and bots alike!") with deep product docs
shopify.com	861 B	Minimal — company description and a few key links
supabase.com	546 B	Hub file that links to separate per-SDK llms.txt files
nvidia.com	2.6 KB	Meta-index linking to llms.txt files for each international subdomain
react.dev	14 KB	Full documentation tree with learning paths and API reference links
intercom.com	5 KB	Structured with explicit sections: metadata, permissions, site content

Notable Absences

Some of the most interesting data points are who doesn't have an llms.txt:

Anthropic and OpenAI — the two leading LLM companies don't serve one. OpenAI returns a 403; Anthropic a 404. The irony writes itself.
Google, Apple, Meta — Big Tech is absent across the board (though AWS docs, a separate domain, does have one).
Cloudflare — builds infrastructure for the web but doesn't have one. Neither does Fastly or Netlify.
HuggingFace — the largest open-source AI platform. No llms.txt.
Every single news organization — NYT, BBC, Reuters, Washington Post, Bloomberg, TechCrunch, The Verge. Zero adoption. This likely reflects the industry's broader adversarial stance toward AI scraping.

The media industry's total absence is telling. While developer tools are racing to make their content AI-accessible, publishers are still fighting to keep AI out. These are fundamentally different bets on the same future.

What's Actually in the Files

Not all llms.txt files are created equal. After reading through all 62 files, a few distinct patterns emerge:

Pattern 1: The Documentation Index (most common)

The majority of files are structured catalogs of documentation pages. Stripe, Vercel, Datadog, Pinecone, and others essentially dump their entire doc tree into markdown links with one-line descriptions.

# Stripe Documentation

When installing Stripe packages, always check
the npm registry for the latest version rather
than relying on memorized version numbers.

- [Authentication](https://docs.stripe.com/auth.md)
- [Payment Intents](https://docs.stripe.com/payments.md)
- [Webhooks](https://docs.stripe.com/webhooks.md)
...

These files tend to be large — often hitting the 50 KB range. Stripe's has versioning warnings. Docker's links every concept guide. AWS covers dozens of services. The implicit audience is an AI agent that needs to find the right doc page for a specific task.

Pattern 2: The Company Brief (concise)

Some files are compact descriptions — who we are, what we do, a few key links. Shopify (861 bytes), Railway (868 bytes), and MongoDB (1.5 KB) take this approach.

# Shopify

> Shopify is a commerce platform that helps you
> sell online and in person. Entrepreneurs,
> retailers, and global brands use Shopify to
> process sales, run stores, and grow their
> businesses.

## Documentation
- [Shopify API docs](https://shopify.dev/docs/api)
- [Shopify CLI](https://shopify.dev/docs/api/shopify-cli)

These feel more like elevator pitches for AI. They answer the question "what is this site?" rather than trying to replace the docs. Clean, effective, low maintenance.

Pattern 3: The Hub File

Supabase and NVIDIA take a meta-approach: their top-level llms.txt links to other, more specific llms.txt files. Supabase has separate files for each SDK (JavaScript, Dart, Python, etc.). NVIDIA links to each country subdomain. This is the most architecturally sophisticated pattern — treating llms.txt as a routing layer.

# Supabase Docs

- [Guides](https://supabase.com/llms/guides.txt)
- [Reference (JavaScript)](https://supabase.com/llms/js.txt)
- [Reference (Dart)](https://supabase.com/llms/dart.txt)
- [Reference (Python)](https://supabase.com/llms/python.txt)

Pattern 4: The Kitchen Sink

Some files try to include everything. Mailchimp's announces "988 web pages across 19 categories." Auth0, Okta, and Salesforce dump what looks like auto-generated sitemaps into the file. These max out at 50 KB and feel more like SEO artifacts than thoughtfully curated AI resources.

The most effective files share a few properties: they lead with a clear description of what the site is, they link to .md versions of docs (not HTML), and they're organized by task rather than by URL hierarchy.

Size Distribution

File sizes range from 546 bytes (Supabase) to 50 KB (Stripe, Vercel, AWS, and a dozen others that clearly hit a generation limit):

Under 2 KB — 8 files. Concise company briefs. (Shopify, Railway, Supabase, Svelte, ClickUp, MongoDB)
2–10 KB — 20 files. Curated doc indexes. The sweet spot. (React, Cursor, Cohere, Mistral, Linear, Astro, Notion)
10–50 KB — 22 files. Comprehensive docs dumps. (GitHub, Slack, Datadog, Pinecone, Together, Modal, Render)
~50 KB (truncated) — 12 files. Hit the ceiling. Auto-generated or exhaustive. (Stripe, Vercel, AWS, Twilio, Plaid, Okta, Mailchimp)

The 2–10 KB range appears to be the most intentional. These are files where someone made editorial decisions about what to include. The 50 KB files are often auto-generated — useful, but not curated.

What This Tells Us

Three observations from the data:

1. Developer-facing companies are building for the agentic web right now. When 75% of major JavaScript frameworks have an llms.txt, that's not experimentation — it's standard practice. These teams see AI agents as a first-class audience for their documentation. They're right. Cursor, GitHub Copilot, and similar tools are already the primary way many developers interact with framework docs.

2. The consumer web hasn't started. Amazon, Netflix, Uber, Airbnb, Reddit, YouTube — none have an llms.txt. The consumer web still treats AI as something to defend against (see: our robots.txt analysis), not something to build for. This will change when agent-mediated commerce becomes real, but we're not there yet.

3. The file format works best for documentation-heavy sites. The most useful llms.txt files we found were from companies with extensive technical docs — Stripe, AWS, React, Datadog. For sites that are primarily content (news) or transactions (e-commerce), the value proposition of a static markdown file is less clear. The spec may need to evolve for these use cases.

Recommendations

If you're thinking about adding llms.txt to your site:

Start with the company brief pattern. A clear description, your key products, and links to docs. You can expand later. 2 KB is fine.
Link to markdown, not HTML. The best files link to .md versions of docs that agents can consume without parsing DOM. Stripe and GitHub do this well.
Curate, don't dump. A 50 KB auto-generated sitemap in markdown is technically compliant but practically useless. An agent with a 50 KB context window doesn't need 988 page titles — it needs the 20 most important ones.
Include permissions. Render's file includes explicit licensing: train: allow, summarize: allow, attribution: required. This is good practice and likely where the spec is heading.
Consider the hub pattern. If you have multiple products or SDKs, a top-level file that routes to per-product files (like Supabase) scales better than one massive file.

The llms.txt spec is still young. There's no validator, no formal RFC, no enforcement. That's exactly why adoption data matters right now — the companies adopting today are shaping what the standard becomes.

Methodology

We checked 249 domains on March 17, 2026, by requesting https://{domain}/llms.txt with a standard HTTP client. A site was counted as having an llms.txt if it returned a 200 status with text/markdown or text/plain content (not an HTML error page). We excluded domains that returned HTML pages or domain-for-sale placeholders. Timeout was 8 seconds. Full dataset — CSV, JSON, and the Python script to reproduce it — is available in agent-bench's research directory.

← Which Sites Block AI Bots?