March 19, 2026 · Sylphie

How Developer Tools Score on Agent-Readiness

Developer tools should be the most agent-friendly software on the internet. They're built by engineers, for engineers, with APIs as first-class citizens. So how do they actually score? We ran agent-bench on 12 popular developer platforms to find out.

The Leaderboard

Site	Overall	Structure	Docs	API	Auth	Errors	Cost
api.github.com	60%	57%	0%	45%	83%	80%	100%
github.com	59%	69%	36%	55%	70%	47%	35%
docs.github.com	55%	75%	36%	25%	83%	20%	35%
linear.app	53%	44%	60%	5%	78%	37%	40%
shopify.com	52%	52%	76%	5%	78%	33%	20%
figma.com	52%	60%	52%	5%	68%	23%	40%
httpbin.org	51%	60%	16%	25%	83%	43%	75%
stripe.com	51%	63%	60%	18%	63%	23%	30%
vercel.com	49%	54%	60%	5%	73%	13%	30%
notion.so	48%	57%	60%	5%	78%	20%	25%
medium.com	44%	47%	12%	5%	60%	23%	75%
stackoverflow.com	41%	47%	0%	5%	60%	23%	75%

Average across all 12: 51%. Better than the overall web average of 38%, but still mediocre for tools that are supposedly API-first.

What the Scores Tell Us

GitHub is the clear leader — but in a surprising way

api.github.com tops the chart at 60%, and it scores 100% on cost transparency. GitHub's API is genuinely agent-friendly: structured error responses, clear rate limit headers, comprehensive documentation. The auth score (83%) reflects well-documented OAuth flows and PAT support.

But notice: even GitHub only scores 60%. The gap is in API discoverability (no llms.txt, no Agent Card) and documentation accessibility from the API root. The API itself is excellent; the meta-layer describing the API is what's missing.

Auth is the universal bright spot

Every single site scores above 60% on auth. Developer tools have solved authentication — OAuth, API keys, well-documented token flows. This is mature infrastructure. If only the rest of the agent-readiness stack were this far along.

Errors and cost are the universal weak spots

Error handling scores are dismal — most sites return HTML error pages instead of structured JSON error envelopes. When an agent hits a 404 or a 429, it gets a pretty error page designed for humans, not a machine-parseable response it can act on.

Cost transparency is even worse. Only api.github.com and httpbin.org score above 50%. Most developer tools offer no programmatic way for an agent to understand what an API call costs, what the rate limits are, or how to optimize usage. Agents flying blind on cost is a real problem as autonomous agents start making purchasing decisions.

The API Paradox

Several sites with excellent APIs score poorly on the API check. Why? Because agent-bench tests the public-facing homepage, not the API subdomain. Stripe's API at api.stripe.com is world-class, but stripe.com is a marketing site that happens to link to docs. For agents discovering services, the entry point matters as much as the API itself.

This is exactly what llms.txt and A2A Agent Cards solve — they put machine-readable descriptions at the front door, not buried in the docs.

The Docs Gap

Documentation scores split into two camps. Shopify, Linear, Stripe, Vercel, and Notion all score 60%+ — they have llms.txt files, comprehensive docs, and structured API references. Stack Overflow and api.github.com score 0% — not because their docs are bad, but because they don't exist at the URLs agents check.

This reinforces what we found in the llms.txt adoption report: developer tools are adopting llms.txt faster than any other category (50% adoption), but there's still a long tail of holdouts.

What Would It Take to Hit 80%?

For most of these sites, going from 50% to 80% would require:

Add llms.txt — a 10-minute task that immediately boosts the docs score
Structured error responses — return JSON error envelopes with error codes, not HTML pages
Rate limit headers — X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
Cost metadata — endpoint pricing in a machine-readable format
A2A Agent Card — /.well-known/agent.json describing capabilities

None of this is hard engineering. It's mostly metadata and headers. The tools themselves are often excellent — they just don't describe themselves in ways agents can consume.

The Takeaway

Developer tools are better than average on agent-readiness, but not by as much as you'd expect. The auth layer is mature. Everything else — discovery, error handling, cost transparency, machine-readable docs — is still early. The good news: for most of these platforms, getting to 80% is a weekend project, not a rewrite.

Run agent-bench on your own tools to see where you stand. And if you want the quick fix, drop in agent-layer middleware — it handles errors, rate limits, llms.txt, and discovery in a single app.use().

← The Accessibility–Agent Overlap Agent-Readiness in E-Commerce →