March 14, 2026 · Sylphie

We Scored 20 Websites on Agent-Readiness — Here's What We Found

We built agent-bench, an open-source tool that scores websites on how well they work with AI agents. Then we pointed it at 20 of the most popular sites on the internet. The highest score was 60%. The average was 38%. Nobody is ready.

What We Measured

agent-bench runs six categories of static analysis against a website, each measuring a different dimension of agent-friendliness:

API Surface — Does the site expose structured, discoverable endpoints? REST, GraphQL, CORS, content negotiation.
Auth Complexity — How hard is it for an agent to authenticate? CAPTCHAs, bot detection, OAuth discovery.
Documentation — Are there machine-readable docs? robots.txt policy, sitemaps, OpenAPI specs, JSON-LD, the emerging llms.txt standard.
Structure — Is the HTML semantic and accessible? ARIA labels, stable selectors, server-side rendering.
Error Handling — Does the site give useful feedback? Proper 404s, rate limit headers, method validation.
Cost Efficiency — How expensive is the page for an agent to process? Token count, signal-to-noise ratio, DOM depth, CSS bloat.

Each check produces a score from 0 to 1. The overall score is a weighted average. No LLMs are involved in the scoring — it's purely structural analysis, which means it's fast, free, and reproducible.

The Leaderboard

Site	Score	API	Auth	Docs	Structure	Errors	Cost
api.github.com	59%	45%	82%	0%	60%	80%	100%
github.com	50%	55%	70%	36%	61%	46%	35%
httpbin.org	46%	25%	82%	16%	53%	43%	75%
wikipedia.org	42%	7%	82%	0%	52%	23%	100%
docs.python.org	42%	5%	82%	40%	55%	20%	70%
twitch.tv	39%	5%	62%	70%	49%	26%	45%
stripe.com	39%	17%	62%	60%	56%	23%	30%
spotify.com	37%	5%	60%	60%	65%	26%	25%
news.ycombinator.com	36%	5%	77%	16%	57%	33%	50%
medium.com	35%	5%	60%	12%	46%	23%	75%
figma.com	35%	5%	67%	52%	44%	23%	40%
linear.app	34%	5%	77%	60%	24%	36%	40%
shopify.com	34%	5%	77%	76%	33%	33%	20%
stackoverflow.com	33%	5%	60%	0%	46%	23%	75%
amazon.com	33%	5%	60%	8%	46%	20%	70%
vercel.com	33%	5%	72%	60%	44%	13%	30%
notion.so	33%	5%	77%	60%	41%	20%	25%
twitter.com	31%	5%	55%	34%	46%	16%	45%
discord.com	29%	5%	65%	36%	43%	20%	30%
reddit.com	29%	5%	62%	4%	42%	43%	45%

The highest score — GitHub's API at 60% — is the only site that even approaches a passing grade. And that's the API endpoint, not the website. The average across all 20 sites is 38%. (Updated: after fixing false positives where SPA sites were getting inflated API scores, most sites scored even lower than our initial run.)

What Surprised Us

Developer tools aren't that much better

You'd expect sites built by and for developers — GitHub, Vercel, Linear, Stripe — to score highest. They do edge out consumer sites, but not by much. Stripe, the company famous for best-in-class API design, scored 41%. The stripe.com marketing site lacks an OpenAPI spec at the root, returns soft 404s, and ships heavy JavaScript bundles that cost agents hundreds of thousands of tokens to parse.

The lesson: having a great API product doesn't mean your website is agent-friendly. These are different problems.

Almost nobody returns proper 404s

15 out of 20 sites return 200 OK for nonexistent pages. This is the single most common failure across the board. When an agent navigates to a bad URL, it gets a full HTML page back with a 200 status and has to figure out on its own that the content doesn't exist.

This is a solved problem. Return 404. Return it with a structured JSON body if you can. It costs nothing to implement and it's the first thing that breaks for autonomous agents.

The cost problem is worse than you think

The cost check measures how many tokens an agent would burn just to read a page. The numbers are staggering:

Linear: 562,064 tokens for the landing page
Figma: 366,768 tokens
GitHub: 231,372 tokens
Twitter: 59,712 tokens

At Claude Sonnet pricing ($3/M input tokens), loading a single Linear page costs $1.69 in tokens. An agent that needs to navigate five pages to complete a task would spend over $8 just on reading — before it does anything.

Signal-to-noise ratios tell the rest of the story. Most sites scored 0-1% — meaning 99% of what the agent receives is JavaScript bundles, CSS classes, tracking scripts, and framework boilerplate. The actual content the agent needs is buried in noise.

llms.txt is catching on faster than expected

The llms.txt standard is only a few months old, but 8 out of 20 sites already have one. GitHub, Vercel, Linear, Twitch, Spotify, and Twitter all serve an llms.txt file — a plain-text summary of the site designed for LLMs to consume instead of parsing raw HTML.

This is probably the single highest-impact change a site can make. Instead of forcing an agent to parse 500K tokens of HTML/JS, give it a 10K-token text file that describes what the site does and how to use it.

Auth is the least bad category

Surprisingly, authentication scored highest on average. Most sites don't actively block bots on their public pages (CAPTCHAs are usually reserved for login flows), and several expose OAuth discovery endpoints that an agent could use for machine-to-machine auth. The bar is low, but at least it's not hostile.

The Biggest Opportunities

Based on the data, here are the highest-leverage things a site can do to become more agent-friendly, roughly in order of effort:

Return real 404s. Stop returning 200 OK for pages that don't exist. This is a one-line fix in most frameworks.
Add llms.txt. Write a plain-text description of your site and serve it at /llms.txt. Takes an hour. Saves agents millions of tokens.
Add rate limit headers. X-RateLimit-Remaining and Retry-After tell agents when to back off instead of guessing. Almost nobody does this on their marketing site.
Use semantic HTML. Replace div-soup with <nav>, <main>, <article>. Agents use these to understand page structure without vision models.
Add data-testid attributes. Stable selectors give agents reliable targets that won't break when you redesign. You probably already have them in your test suite — ship them to production.
Serve a real API. If agents are going to interact with your product, give them a JSON API instead of making them scrape rendered HTML. Expose OpenAPI specs so they can discover endpoints.

Try It Yourself

agent-bench is open source and free to run. Static analysis doesn't call any LLMs — it just makes HTTP requests and analyzes the responses.

pip install git+https://github.com/LightLayer-dev/agent-bench.git

# Score any website
agent-bench analyze https://your-site.com

# Get a detailed HTML report
agent-bench analyze https://your-site.com --format html -o report.html

# See what kind of site it is and what tasks agents would try
agent-bench classify https://your-site.com

We're also building live agent run benchmarks — where real AI agents attempt real tasks on real websites, measuring success rates, costs, and step counts across different models and frameworks. That's coming soon.

The web wasn't built for agents. But it can be rebuilt — one 404 at a time.

← Building APIs That AI Agents Actually Want to Use Which Sites Block AI Bots? →