Every major AI platform can now browse websites autonomously. Chrome's auto browse scrolls and clicks. ChatGPT Atlas fills forms and completes purchases. Perplexity Comet researches across tabs. But none of these agents see your website the way a human does.
This is Part 4 in a five-part series on optimizing websites for the agentic web. Part 1 covered the evolution from SEO to AAIO. Part 2 explained how to get your content cited in AI responses. Part 3 mapped the protocols forming the infrastructure layer. This article gets technical: how AI agents actually perceive your website, and what to build for them.
The core insight is one that keeps coming up in my research: the most impactful thing you can do for AI agent compatibility is the same work web accessibility advocates have been pushing for decades. The accessibility tree, originally built for screen readers, is becoming the primary interface between AI agents and your website.
In This Series
- From SEO and CRO to 'AAIO': Why Your Website Needs to Speak to Machines
- Answer Engine Optimization: How to Get Your Content Into AI Responses
- MCP, A2A, NLWeb, and AGENTS.md: The Standards Powering the Agentic Web
- How AI Agents See Your Website (And How to Build for Them) (You are here)
- Selling to AI: How Stripe, Shopify, and OpenAI Are Reinventing Checkout (coming soon)
According to the 2025 Imperva Bad Bot Report (Imperva is a cybersecurity company), automated traffic surpassed human traffic for the first time in 2024, constituting 51% of all web interactions. Not all of that is agentic browsing, but the direction is clear: the non-human audience for your website is already larger than the human one, and it's growing. Throughout this article, we draw exclusively from official documentation, peer-reviewed research, and announcements from the companies building this infrastructure.
Contents
- Three Ways Agents See Your Website
- The Accessibility Tree Is Your Agent Interface
- Semantic HTML: The Agent Foundation
- ARIA: Useful, Not Magic
- The Rendering Question
- Testing Your Agent Interface
- A Checklist for Your Development Team
- Key Takeaways
Three Ways Agents See Your Website
When a human visits your website, they see colors, layout, images, and typography. When an AI agent visits, it sees something entirely different. Understanding what agents actually perceive is the foundation for building websites that work for them.
The major AI platforms use three distinct approaches, and the differences have direct implications for how you should structure your website.
Vision: Reading Screenshots
Anthropic's Computer Use takes the most literal approach. Claude captures screenshots of the browser, analyzes the visual content, and decides what to click or type based on what it "sees." It's a continuous feedback loop: screenshot, reason, act, screenshot. The agent operates at the pixel level, identifying buttons by their visual appearance and reading text from the rendered image.
Google's Project Mariner follows a similar pattern with what Google describes as an "observe-plan-act" loop: observe captures visual elements and underlying code structures, plan formulates action sequences, and act simulates user interactions. Mariner achieved an 83.5% success rate on the WebVoyager benchmark.
The vision approach works, but it's computationally expensive, sensitive to layout changes, and limited by what's visually rendered on screen.
Accessibility Tree: Reading Structure
OpenAI took a different path with ChatGPT Atlas. Their Publishers and Developers FAQ is explicit:
ChatGPT Atlas uses ARIA tags, the same labels and roles that support screen readers, to interpret page structure and interactive elements.
Atlas is built on Chromium, but rather than analyzing rendered pixels, it queries the accessibility tree for elements with specific roles ("button", "link") and accessible names. This is the same data structure that screen readers like VoiceOver and NVDA use to help people with visual disabilities navigate the web.
Microsoft's Playwright MCP, the official MCP server for browser automation, takes the same approach. It provides accessibility snapshots rather than screenshots, giving AI models a structured representation of the page. Microsoft deliberately chose accessibility data over visual rendering for their browser automation standard.
Hybrid: Both at Once
In practice, the most capable agents combine approaches. OpenAI's Computer-Using Agent (CUA), which powers both Operator and Atlas, layers screenshot analysis with DOM processing and accessibility tree parsing. It prioritizes ARIA labels and roles, falling back to text content and structural selectors when accessibility data isn't available.
Perplexity's research confirms the same pattern. Their BrowseSafe paper, which details the safety infrastructure behind Comet's browser agent, describes using "hybrid context management combining accessibility tree snapshots with selective vision."
| Platform | Primary Approach | Details |
|---|---|---|
| Anthropic Computer Use | Vision (screenshots) | Screenshot, reason, act feedback loop |
| Google Project Mariner | Vision + code structure | Observe-plan-act with visual and structural data |
| OpenAI Atlas | Accessibility tree | Explicitly uses ARIA tags and roles |
| OpenAI CUA | Hybrid | Screenshots + DOM + accessibility tree |
| Microsoft Playwright MCP | Accessibility tree | Accessibility snapshots, no screenshots |
| Perplexity Comet | Hybrid | Accessibility tree + selective vision |
The pattern is clear. Even platforms that started with vision-first approaches are incorporating accessibility data. And the platforms optimizing for reliability and efficiency (Atlas, Playwright MCP) lead with the accessibility tree.
Your website's accessibility tree isn't a compliance artifact. It's increasingly the primary interface agents use to understand and interact with your website.
Last year, before the European Accessibility Act took effect, I half-joked that it would be ironic if the thing that finally got people to care about accessibility was AI agents, not the people accessibility was designed for. That's no longer a joke.
The Accessibility Tree Is Your Agent Interface
The accessibility tree is a simplified representation of your page's DOM that browsers generate for assistive technologies. Where the full DOM contains every div, span, style, and script, the accessibility tree strips away the noise and exposes only what matters: interactive elements, their roles, their names, and their states.
This is why it works so well for agents. A typical page's DOM might contain thousands of nodes. The accessibility tree reduces that to the elements a user (or agent) can actually interact with: buttons, links, form fields, headings, landmarks. For AI models that process web pages within a limited context window, that reduction is significant.
OpenAI's Publishers and Developers FAQ is very clear about this:
Follow WAI-ARIA best practices by adding descriptive roles, labels, and states to interactive elements like buttons, menus, and forms. This helps ChatGPT recognize what each element does and interact with your site more accurately.
And:
Making your website more accessible helps ChatGPT Agent in Atlas understand it better.
Research data backs this up. The most rigorous data on this comes from a UC Berkeley and University of Michigan study published for CHI 2026, the premier academic conference on human-computer interaction. The researchers tested Claude Sonnet 4.5 on 60 real-world web tasks under different accessibility conditions, collecting 40.4 hours of interaction data across 158,325 events. The results were striking:
| Condition | Task Success Rate | Avg. Completion Time |
|---|---|---|
| Standard (default) | 78.33% | 324.87 seconds |
| Keyboard-only | 41.67% | 650.91 seconds |
| Magnified viewport | 28.33% | 1,072.20 seconds |
Under standard conditions, the agent succeeded nearly 80% of the time. Restrict it to keyboard-only interaction (simulating how screen reader users navigate) and success drops to 42%, taking twice as long. Restrict the viewport (simulating magnification tools) and success drops to 28%, taking over three times as long.
The paper identifies three categories of gaps:
- Perception gaps: agents can't reliably access screen reader announcements or ARIA state changes that would tell them what happened after an action.
- Cognitive gaps: agents struggle to track task state across multiple steps.
- Action gaps: agents underutilize keyboard shortcuts and fail at interactions like drag-and-drop.
The implication is direct. Websites that present a rich, well-labeled accessibility tree give agents the information they need to succeed. Websites that rely on visual cues, hover states, or complex JavaScript interactions without accessible alternatives create the conditions for agent failure.
Perplexity's search API architecture paper from September 2025 reinforces this from the content side. Their indexing system prioritizes content that is "high quality in both substance and form, with information captured in a manner that preserves the original content structure and layout." Websites "heavy on well-structured data in list or table form" benefit from "more formulaic parsing and extraction rules." Structure isn't just helpful. It's what makes reliable parsing possible.
Semantic HTML: The Agent Foundation
The accessibility tree is built from your HTML. Use semantic elements, and the browser generates a useful accessibility tree automatically. Skip them, and the tree is sparse or misleading.
This isn't new advice. Web standards advocates have been screaming "use semantic HTML" for two decades. Not everyone listened. What's new is that the audience has expanded. It used to be about screen readers and a relatively small percentage of users. Now it's about every AI agent that visits your website.
Use native elements. A <button> element automatically appears in the accessibility tree with the role "button" and its text content as the accessible name. A <div onclick="doSomething()"> does not. The agent doesn't know it's clickable.
<!-- Agent can identify and interact with this -->
<button type="submit">Search flights</button>
<!-- Agent may not recognize this as interactive -->
<div class="btn btn-primary" onclick="searchFlights()">Search flights</div>
Label your forms. Every input needs an associated label. Agents read labels to understand what data a field expects.
<!-- Agent knows this is an email field -->
<label for="email">Email address</label>
<input type="email" id="email" name="email" autocomplete="email">
<!-- Agent sees an unlabeled text input -->
<input type="text" placeholder="Enter email...">
The autocomplete attribute deserves attention. It tells agents (and browsers) exactly what type of data a field expects, using standardized values like name, email, tel, street-address, and organization. When an agent fills a form on someone's behalf, autocomplete attributes make the difference between confident field mapping and guessing.
Establish heading hierarchy. Use h1 through h6 in logical order. Agents use headings to understand page structure and locate specific content sections. Skip levels (jumping from h1 to h4) create confusion about content relationships.
Use landmark regions. HTML5 landmark elements (<nav>, <main>, <aside>, <footer>, <header>) tell agents where they are on the page. A <nav> element is unambiguously navigation. A <div class="nav-wrapper"> requires interpretation. Clarity for the win, always.
<nav aria-label="Main navigation">
<ul>
<li><a href="/products">Products</a></li>
<li><a href="/pricing">Pricing</a></li>
</ul>
</nav>
<main>
<article>
<h1>Flight Search</h1>
<!-- Primary content -->
</article>
</main>
Microsoft's Playwright test agents, introduced in October 2025, generate test code that uses accessible selectors by default. When the AI generates a Playwright test, it writes:
const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });
Not CSS selectors. Not XPath. Accessible roles and names. Microsoft built their AI testing tools to find elements the same way screen readers do, because it's more reliable.
The final slide of my Conversion Hotel keynote about optimizing websites for AI agents.
ARIA: Useful, Not Magic
OpenAI recommends ARIA (Accessible Rich Internet Applications), the W3C standard for making dynamic web content accessible. But ARIA is a supplement, not a substitute. Like protein shakes: useful on top of a real diet, counterproductive as a replacement for actual food.
The first rule of ARIA, as defined by the W3C:
If you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.
The fact that the W3C had to make "don't use ARIA" the first rule of ARIA tells you everything about how often it gets misused.
Adrian Roselli, a recognized web accessibility expert, raised an important concern in his October 2025 analysis of OpenAI's guidance. He argues that recommending ARIA without sufficient context risks encouraging misuse. Websites that use ARIA are generally less accessible according to WebAIM's annual survey of the top million websites, because ARIA is often applied incorrectly as a band-aid over poor HTML structure. Roselli warns that OpenAI's guidance could incentivize practices like keyword-stuffing in aria-label attributes, the same kind of gaming that plagued meta keywords in early SEO.
The right approach is layered:
-
Start with semantic HTML. Use
<button>,<nav>,<label>,<select>, and other native elements. These work correctly by default. -
Add ARIA when native HTML isn't enough. Custom components that don't have HTML equivalents (tab panels, tree views, disclosure widgets) need ARIA roles and states to be understandable.
-
Use ARIA states for dynamic content. When JavaScript changes the page, ARIA attributes communicate what happened:
<!-- Tells agents whether the menu is open or closed -->
<button aria-expanded="false" aria-controls="menu-panel">Menu</button>
<div id="menu-panel" aria-hidden="true">
<!-- Menu content -->
</div>
- Keep
aria-labeldescriptive and honest. Use it to provide context that isn't visible on screen, like distinguishing between multiple "Delete" buttons on the same page. Don't stuff it with keywords.
The principle is the same one that applies to good SEO: build for the user first, optimize for the system second. Semantic HTML is building for the user. ARIA is fine-tuning for edge cases where HTML falls short.
The Rendering Question
Browser-based agents like Chrome auto browse, ChatGPT Atlas, and Perplexity Comet run on Chromium. They execute JavaScript. They can render your single-page application.
But not everything that visits your website is a full browser agent.
AI crawlers (PerplexityBot, OAI-SearchBot, ClaudeBot) index your content for retrieval and citation. Many of these crawlers do not execute client-side JavaScript. If your page is a blank <div id="root"></div> until React hydrates, these crawlers see an empty page. Your content is invisible to the AI search ecosystem.
Part 2 of this series covered the citation side: AI systems select fragments from indexed content. If your content isn't in the initial HTML, it's not in the index. If it's not in the index, it doesn't get cited. Server-side rendering isn't just a performance optimization.
It's a visibility requirement.
Even for full browser agents, JavaScript-heavy websites create friction. Dynamic content that loads after interactions, infinite scroll that never signals completion, and forms that reconstruct themselves after each input all create opportunities for agents to lose track of state. The A11y-CUA research attributed part of agent failure to "cognitive gaps": agents losing track of what's happening during complex multi-step interactions. Simpler, more predictable rendering reduces these failures.
Microsoft's guidance from Part 2 applies here directly: "Don't hide important answers in tabs or expandable menus: AI systems may not render hidden content, so key details can be skipped." If information matters, put it in the visible HTML. Don't require interaction to reveal it.
Practical rendering priorities:
- Server-side render or pre-render content pages. If an AI crawler can't see it, it doesn't exist in the AI ecosystem.
- Avoid blank-shell SPAs for content pages. Frameworks like Next.js (which powers this website), Nuxt, and Astro make SSR straightforward.
- Don't hide critical information behind interactions. Prices, specifications, availability, and key details should be in the initial HTML, not behind accordions or tabs.
- Use standard
<a href>links for navigation. Client-side routing that doesn't update the URL or usesonClickhandlers instead of real links breaks agent navigation.
Testing Your Agent Interface
You wouldn't ship a website without testing it in a browser. Testing how agents perceive your website is becoming equally important.
Screen reader testing is the best proxy. If VoiceOver (macOS), NVDA (Windows), or TalkBack (Android) can navigate your website successfully, identifying buttons, reading form labels, and following the content structure, agents can likely do the same. Both audiences rely on the same accessibility tree. This isn't a perfect proxy (agents have capabilities screen readers don't, and vice versa), but it catches the majority of issues.
Microsoft's Playwright MCP provides direct accessibility snapshots. If you want to see exactly what an AI agent sees, Playwright MCP generates structured accessibility snapshots of any page. These snapshots strip away visual presentation and show you the roles, names, and states that agents work with. Published as @playwright/mcp on npm, it's the most direct way to view your website through an agent's eyes.
The output looks something like this (simplified):
[heading level=1] Flight Search
[navigation "Main navigation"]
[link] Products
[link] Pricing
[main]
[textbox "Departure airport"] value=""
[textbox "Arrival airport"] value=""
[button] Search flights
If your critical interactive elements don't appear in the snapshot, or appear without useful names, agents will struggle with your website.
Browserbase's Stagehand (v3, released October 2025, and humbly self-described as "the best browser automation framework") provides another angle. It parses both DOM and accessibility trees, and its self-healing execution adapts to DOM changes in real time. It's useful for testing whether agents can complete specific workflows on your website, like filling a form or completing a checkout.
The Lynx browser is a low-tech option worth trying. It's a text-only browser that strips away all visual rendering, showing you roughly what a non-visual agent parses. A trick I picked up from Jes Scholz on the podcast.
A practical testing workflow:
- Run VoiceOver or NVDA through your website's key user flows. Can you complete the core tasks without vision?
- Generate Playwright MCP accessibility snapshots of critical pages. Are interactive elements labeled and identifiable?
- View your page source. Is the primary content in the HTML, or does it require JavaScript to render?
- Load your page in Lynx or disable CSS and check if the content order and hierarchy still make sense. Agents don't see your layout.
A Checklist for Your Development Team
If you're sharing this article with your developers (and you should), here's the prioritized implementation list. Ordered by impact and effort, starting with the changes that affect the most agent interactions for the least work.
High impact, low effort:
- Use native HTML elements.
<button>for actions,<a href>for links,<select>for dropdowns. Replace<div onclick>patterns wherever they exist. - Label every form input. Associate
<label>elements with inputs using theforattribute. Addautocompleteattributes with standard values. - Server-side render content pages. Ensure primary content is in the initial HTML response.
High impact, moderate effort:
- Implement landmark regions. Wrap content in
<nav>,<main>,<aside>, and<footer>elements. Addaria-labelwhen multiple landmarks of the same type exist on the same page. - Fix heading hierarchy. Ensure a single
h1, withh2throughh6in logical order without skipping levels. - Move critical content out of hidden containers. Prices, specifications, and key details should not require clicks or interactions to reveal.
Moderate impact, low effort:
- Add ARIA states to dynamic components. Use
aria-expanded,aria-controls, andaria-hiddenfor menus, accordions, and toggles. - Use descriptive link text. "Read the full report" instead of "Click here." Agents use link text to understand where links lead.
- Test with a screen reader. Make it part of your QA process, not a one-time audit.
Key Takeaways
-
AI agents perceive websites through three approaches: vision, DOM parsing, and the accessibility tree. The industry is converging on the accessibility tree as the most reliable method. OpenAI Atlas, Microsoft Playwright MCP, and Perplexity's Comet all rely on accessibility data.
-
Web accessibility is no longer just about compliance. The accessibility tree is the literal interface AI agents use to understand your website. The UC Berkeley/University of Michigan study shows agent success rates drop significantly when accessibility features are constrained.
-
Semantic HTML is the foundation. Native elements like
<button>,<label>,<nav>, and<main>automatically create a useful accessibility tree. No framework required. No ARIA needed for the basics. -
ARIA is a supplement, not a substitute. Use it for dynamic states and custom components. But start with semantic HTML and add ARIA only where native elements fall short. Misused ARIA makes websites less accessible, not more.
-
Server-side rendering is an agent visibility requirement. AI crawlers that don't execute JavaScript can't see content in blank-shell SPAs. If your content isn't in the initial HTML, it doesn't exist in the AI ecosystem.
-
Screen reader testing is the best proxy for agent compatibility. If VoiceOver or NVDA can navigate your website, agents probably can too. For direct inspection, Playwright MCP accessibility snapshots show exactly what agents see.
The first three parts of this series covered why the shift matters, how to get cited, and what protocols are being built. This article covered the implementation layer. The encouraging news is that these aren't separate workstreams. Accessible, well-structured websites perform better for humans, rank better in search, get cited more often by AI, and work better for agents. It's the same work serving four audiences.
And the work builds on itself. The semantic HTML and structured data covered here are exactly what WebMCP builds on for its declarative form approach. The accessibility tree your website exposes today becomes the foundation for the structured tool interfaces of tomorrow.
Up next in Part 5: the commerce layer. How Stripe, Shopify, and OpenAI are building the infrastructure for AI agents to complete purchases, and what it means for your checkout flow.

