AI search engines like ChatGPT, Perplexity, and Google AI Mode rank content differently from traditional Google. Here are the 7 factors that matter most in 2026.

For two decades, SEO was largely a game of backlinks, keyword density, and technical crawlability. Google's PageRank algorithm rewarded pages that other pages trusted. Build enough authority, optimise your title tags, and you ranked.
That model is fracturing.
In 2026, a growing share of search queries never reach a traditional results page at all. Google AI Overviews now appear on an estimated 25–30% of all searches, up from roughly 7% at launch in mid-2024. AI-referred traffic — sessions originating from ChatGPT, Perplexity, Google AI Mode, and similar platforms — grew 527% year-over-year between Q1 2025 and Q1 2026, according to data aggregated across large-scale analytics providers. Perplexity alone processes over 100 million queries per day.
These engines don't rank pages. They synthesise answers — and they choose which sources to cite. The criteria they apply are meaningfully different from classic ranking signals:
llms.txt, updated robots.txt directives, and noai meta tags.This is not a minor update to SEO. It is a different discipline, now commonly called Generative Engine Optimisation (GEO). If you want to understand the full scope of GEO and how it relates to traditional SEO, see our deep-dive: What Is GEO — Generative Engine Optimisation?
The good news: the factors that drive AI citation are auditable, measurable, and largely fixable. We have distilled them into 7 ranked signals — the same 7 categories that seo.yatna.ai scores every site on. Here is what each one means in practice.
Experience, Expertise, Authoritativeness, and Trustworthiness is not a new concept — Google's Quality Rater Guidelines have referenced it for years. But in 2026, E-E-A-T has become the dominant signal for AI citation selection, accounting for a full quarter of what our audit model measures.
Why so heavy? Because AI engines are fundamentally in the business of reputation arbitrage. When Perplexity or ChatGPT cites a source, it is implicitly vouching for it. The last thing OpenAI wants is for ChatGPT to recommend a page that later turns out to be written by an anonymous contributor with no track record. So these engines have developed increasingly sophisticated methods of assessing whether a source is genuinely authoritative.
Author identity signals. Is there a byline? Does that author have a verifiable web presence — LinkedIn profile, academic publications, industry conference speaker pages, or a detailed author bio with credentials? Pages with no identifiable human author are systematically downweighted by AI citation engines.
First-hand experience markers. Google's addition of the first "E" (Experience) to the original EAT framework was deliberate. AI engines now look for content that demonstrates direct, personal experience: case study data from the author's own work, specific examples, screenshots, original research, and language patterns that signal lived expertise rather than aggregated secondary sources.
Institutional credibility. Does your domain have a history of being cited by authoritative sources? Do you have a clear About page, a physical address or registered business identity, a public contact email (not just a form), and a privacy policy? These signals collectively tell AI engines that a real, accountable organisation operates this site.
Trust signals. HTTPS (non-negotiable), clear editorial policies, dated and maintained content, no pattern of thin or spammy pages in the crawl history.
About section is comprehensive: team bios, company history, contact details, registered address.Technical SEO has always mattered for discoverability. In the AI search era it matters for a different reason: if an AI crawler cannot read your page, your content does not exist for that engine. Full stop.
This factor carries the same 25% weight as E-E-A-T because there is simply no path to AI citation if the technical foundations are broken.
Traditional Google's Googlebot renders JavaScript using a full headless Chrome instance, then indexes the rendered DOM. This is slow, resource-intensive, and happens on a delay — but it works for JavaScript-heavy sites.
Most AI crawlers do not render JavaScript at all. OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot, and Meta's FacebookBot are all HTML-only crawlers. If your Next.js app renders a blank <div id="root"> on the server and populates it client-side, these bots see an empty page. This is a catastrophic and extremely common problem on modern web stacks.
The fix is server-side rendering (SSR) or static generation (SSG) — ensuring that full, complete HTML is returned in the initial HTTP response, before any JavaScript runs.
Crawl budget efficiency. Keep your robots.txt clean and intentional. Avoid blocking important content paths. Submit a well-structured sitemap that prioritises your most valuable pages.
Clean HTML structure. Semantic HTML5 elements — <article>, <main>, <section>, <header>, <nav> — help AI parsers understand content hierarchy. Div-soup makes extraction harder and lowers citation probability.
Internal link architecture. A clear, shallow link structure means crawlers reach all your content efficiently. Orphaned pages — those with no internal links pointing to them — are frequently missed.
HTTPS and Core Web Vitals. Beyond speed (which we cover in Factor 5), basic connectivity and security matter. Redirect chains, broken links, and mixed-content warnings create crawl failures that silently remove pages from the AI index.
curl -A "GPTBot" <url> to test what AI bots actually see.robots.txt to ensure no important sections are inadvertently blocked.This is where traditional SEO wisdom still applies — but the bar has been raised significantly by AI engines that are, themselves, expert content synthesisers.
AI engines cite content that answers questions clearly and directly. They are particularly drawn to content that is structured in a way that matches the question-answer format of conversational queries — because extracting a citation-ready quote is easier when the answer is contained in a discrete, scannable paragraph.
Answer-first structure. The classic journalist's "inverted pyramid" — most important information first — is ideal for AI citation. If someone asks "what is the best schema markup for a recipe page?", the AI engine wants a page that opens with a direct, complete answer, not one that buries the answer under three paragraphs of preamble.
Semantic coverage without stuffing. AI language models understand topical depth. A page about "technical SEO audits" that never mentions crawlability, canonical tags, or Core Web Vitals will be scored as topically shallow. But keyword-stuffing is counterproductive — these engines understand natural language and can tell the difference between genuine expertise and keyword padding.
Long-tail, question-based headings. Structuring H2 and H3 headings as questions ("How do AI crawlers index JavaScript sites?") significantly increases the probability that your content is pulled into a featured citation for that query. This is one of the highest-leverage on-page changes you can make.
Content freshness. AI engines are actively looking for current information. A page last updated in 2022 citing 2021 statistics will rarely be cited for a query with recency intent. Review and update your most important pages on a documented schedule.
Unique insight, not aggregation. Content that synthesises information available on 20 other sites adds nothing to an AI engine's training data or citation pool. What gets cited is first-hand analysis, original data, counterintuitive positions backed by evidence, and specific how-to guidance that is visibly more detailed than competitor content.
Structured data is the communication protocol between your content and AI engines. While it accounts for 10% of the overall score, its impact on citation probability is disproportionately large for certain content types — particularly FAQ pages, how-to guides, product pages, and review content.
AI engines parse structured data to extract facts without having to interpret free-form prose. A page with FAQPage schema tells the engine exactly which text is a question and which is the answer. An Article schema with author, datePublished, dateModified, and publisher properties makes E-E-A-T signals machine-readable rather than requiring inference.
The schema types with the highest impact on AI citation in 2026:
Article / NewsArticle / BlogPosting: Every editorial page should have this, with full author (as a Person object, not just a string), publisher (as an Organization with logo), datePublished, dateModified, and headline matching the page H1 exactly.
FAQPage: If your page contains a Q&A section — even an embedded FAQ — implementing this schema dramatically increases the chance that specific Q&A pairs are cited verbatim by AI engines. The format is a direct match to how conversational AI queries are processed.
HowTo: Step-by-step instructional content with HowTo schema gives AI engines a structured representation of your process. This is particularly powerful for tools, technical guides, and product tutorials.
Organization / WebSite: Site-level schema that establishes your entity identity — name, URL, logo, social profiles, contact information. This underpins E-E-A-T signals at the domain level.
BreadcrumbList: Helps AI engines understand site hierarchy and where a specific page sits within it.
Product + Review + AggregateRating: For e-commerce and product review sites, these schema types are table stakes for AI-cited product comparisons.
WebPage schema when more specific types are available"author": "Jane Smith" instead of "author": {"@type": "Person", "name": "Jane Smith", "url": "..."})FAQPage schema to any page that has a Q&A section — this is the single highest-ROI schema implementation for AI citation.Article schema with full Person objects for author attribution on all editorial content.Organization schema to your homepage with sameAs properties linking to all social profiles and business directory listings.Page speed and Core Web Vitals have been a Google ranking factor since 2021. For AI search, the dynamic is different — but equally important.
AI crawlers, unlike human users, don't wait for slow pages. Most have hard timeout limits of 5–10 seconds. If your server takes 3 seconds to return the first byte (TTFB), an AI crawler may time out before receiving the full content. A slow page is a partially crawled page — or an uncrawled page.
Google's own data shows that pages in the "Good" band for all three Core Web Vitals metrics are crawled 40% more frequently than pages in the "Needs Improvement" band. More crawl frequency means fresher inclusion in AI knowledge bases and citation pools.
The three metrics to target:
Largest Contentful Paint (LCP) < 2.5s. The main content of the page must render quickly. For AI crawlers (which can't render at all), TTFB and server response time are the relevant proxy metrics — aim for TTFB < 800ms.
Interaction to Next Paint (INP) < 200ms. While AI bots don't interact with pages, poor INP typically signals poor overall frontend performance and JavaScript bloat — which correlates with poor server-side rendering quality.
Cumulative Layout Shift (CLS) < 0.1. Layout stability affects how AI parsers extract content from the DOM. Pages with high CLS often have poorly structured HTML where content positions are determined dynamically, making extraction unreliable.
Server location and CDN coverage. AI crawlers are distributed globally. A site hosted on a single origin server in one region will have variable response times for crawlers operating from other regions. A CDN is not optional at scale.
Image optimisation. Uncompressed images are the most common cause of slow LCP. We cover image-specific factors in detail in Factor 7 — but from a performance standpoint, every uncompressed image is leaving speed points on the table.
Caching headers. Correct Cache-Control and ETag headers mean that repeat crawls by AI bots are served from cache, reducing both your server load and crawl latency.
This is the newest factor category and the most rapidly evolving. AI readiness is a catch-all for signals that specifically communicate to AI systems how to access, interpret, and cite your content.
While it accounts for only 5% of the overall audit score today, it is growing in importance. Sites that configure these signals correctly now are building a durable advantage as AI search matures.
llms.txt standardFirst proposed in late 2024 and now supported by an increasing number of AI platforms, llms.txt is a plain-text file placed at yourdomain.com/llms.txt. It gives AI language model training and retrieval systems a curated map of your most important content — analogous to sitemap.xml for crawlers, but specifically structured for LLM consumption.
A basic llms.txt file includes:
This is entirely voluntary, but it signals to AI systems that you are actively managing your content for AI visibility — a positive trust signal.
robots.txt directives for AI botsThe robots.txt file now needs to address a much longer list of crawlers than the traditional Googlebot and Bingbot. Major AI crawlers that you should explicitly manage include:
GPTBot (OpenAI / ChatGPT)ClaudeBot (Anthropic)PerplexityBotAmazonbotMeta-ExternalAgentApplebot-ExtendedGoogle-Extended (Google's AI training crawler, separate from regular Googlebot)You can either whitelist all of them (allowing full access, maximising AI citation potential), block specific ones, or use Disallow to exclude specific paths from AI training while still allowing citation. The key is to be deliberate — many sites are inadvertently blocking AI crawlers with overly broad robots.txt rules written before these crawlers existed.
For a comprehensive guide to AI crawler management, see: robots.txt for AI Crawlers in 2026 — The Complete Guide
AI engines use sitemap <lastmod> dates to prioritise recrawling. If your sitemap has stale or inaccurate <lastmod> values — or worse, no <lastmod> at all — you're leaving crawl frequency on the table. Accurate sitemap metadata is a low-effort, high-value signal.
llms.txt file for your domain following the llms.txt specification and submit it to the major AI platforms that support it.robots.txt to ensure you are not accidentally blocking AI crawlers you want to allow.<lastmod> dates for all URLs, updated programmatically on each content change.Image SEO is often treated as an afterthought. In 2026's AI search landscape, it carries specific significance beyond the traditional "alt text for accessibility" guidance.
AI engines increasingly power visual search, image-based queries, and multimodal retrieval. Google Lens queries have grown over 200% in three years. ChatGPT's image analysis capabilities are now integrated into search. Images that are properly labelled and described become discoverable through entirely new query surfaces.
Alt text. The most fundamental signal. Alt text should describe the image content in natural language — what it shows, who appears in it if relevant, and the context. It should not be keyword-stuffed. An alt text like "Screenshot of seo.yatna.ai audit dashboard showing technical SEO score breakdown" is far more useful than "SEO audit tool free online best 2026".
File names. The image filename is a crawl-time signal that most sites completely ignore. img-3847.webp tells AI engines nothing. technical-seo-audit-score-dashboard.webp is descriptive, machine-readable, and keyword-relevant without being spammy.
Captions. Figure captions are among the most-read elements on a page (eye-tracking studies consistently show this). They also provide AI engines with a second, contextual description of the image — one that situates the image within the surrounding content rather than describing it in isolation.
Structured data for images. ImageObject schema within your Article or Product schema, with caption, description, contentUrl, and creator fields, gives AI engines a machine-readable image annotation that goes beyond what alt text alone provides.
File format and size. WebP is the standard in 2026 — smaller file sizes than JPEG at equivalent quality, with modern browser support across the board. Images over 200KB on a content page are a performance problem (see Factor 5) and a signal of generally low technical standards.
srcset for different viewport sizes.ImageObject schema to your most important editorial pages, particularly those with data visualisations, product screenshots, or infographics.The seven factors above are not independent levers — they compound. A site with perfect E-E-A-T but broken technical accessibility will not be cited because the AI crawler cannot read the content. A site with excellent technical foundations but no schema markup will rank lower than a comparable site that gives AI engines machine-readable structured data.
The compounding effect works in your favour too. A site with strong signals across all seven categories achieves a multiplier effect: AI engines cite you not just because you are technically accessible or because you have good author credentials, but because every available signal consistently confirms your authority and reliability.
This is why a comprehensive, multi-factor audit is more valuable than optimising a single dimension. Running an audit that covers all seven factors simultaneously — and identifies the specific issues on your specific site — is the starting point for a meaningful AI search strategy.
For guidance on how to optimise for the two most important AI search platforms specifically, see:
Every site is different. The specific issues dragging down your AI citation potential — whether it's JavaScript rendering blocking AI crawlers, missing author bios undermining E-E-A-T, or an robots.txt file accidentally blocking GPTBot — depend on how your site was built and maintained.
The fastest way to get a clear picture of where you stand across all seven factors is a structured, automated audit.
seo.yatna.ai runs a full 7-factor AI search audit on your site in minutes. It crawls your pages, analyses every factor described in this post, scores each one, and produces a prioritised action list — not a generic checklist, but specific findings tied to specific pages on your specific domain.
The free tier audits up to 5 pages with no credit card required. For most sites, 5 pages is enough to surface the critical issues.
About the Author

Rejith Krishnan
Founder & CEO, lowtouch.ai
Rejith Krishnan is the Founder and CEO of lowtouch.ai and the creator of seo.yatna.ai. He built the AI agent platform that powers seo.yatna.ai's 7-agent audit engine - the same infrastructure lowtouch.ai deploys for enterprise clients across finance, legal, and operations.
Rejith's focus is AI enablement: helping businesses of all sizes - from solo founders and SMBs to enterprise teams - adopt AI agents that genuinely transform how they work. He specialises in deploying Large Language Models and building multi-agent systems that automate complex workflows, enhance discoverability, and deliver measurable outcomes without requiring engineering teams to manage the infrastructure.
He built seo.yatna.ai because AI-first SEO is a prerequisite for AI-era discoverability. Businesses that are not visible to ChatGPT, Perplexity, and Claude are already losing traffic. seo.yatna.ai gives every business - not just enterprise clients with dedicated SEO teams - the same AI-powered audit capability lowtouch.ai builds for its largest customers.