50-Point Technical GEO Audit Checklist: Every Technical Factor That Determines AI Visibility

A technical GEO audit evaluates your website's readiness for AI search engines — not just whether Google can crawl your pages, but whether ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews can discover, extract, understand, and cite your content. Unlike traditional SEO audits that focus on ranking signals, a GEO audit focuses on six categories: crawlability, schema and structured data, performance, content structure, accessibility, and site architecture.

The distinction matters because a site can rank well in traditional search while being nearly invisible to AI engines. A JavaScript-rendered single-page application might rank #1 for a keyword through strong backlinks and domain authority, while AI crawlers cannot even extract the text content because they do not execute JavaScript. A product page might win Google Shopping placements while being uncitable by Perplexity because it lacks the structured data that AI retrieval systems depend on.

This checklist covers 50 specific technical items organized by category. Each item includes what to check, why it matters for AI visibility, and how to fix it. Work through it systematically — the items are ordered by impact within each category.

Section 1: Crawlability (Items 1-10)

Crawlability determines whether AI engines can access your content at all. A page that cannot be crawled cannot be cited, regardless of how well-optimized its content is.

1. Robots.txt AI Crawler Configuration

Check: Review your robots.txt for directives targeting AI crawlers — GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, ChatGPT-User, Google-Extended, Meta-ExternalAgent, CCBot, and Amazonbot.

Why it matters: Over 560,000 sites now reference GPTBot in their robots.txt. Shopify stores may be blocking AI crawlers by default without the store owner's knowledge. Blocking retrieval crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot) eliminates your visibility in those platforms entirely.

Fix: Block training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Meta-ExternalAgent). Explicitly allow retrieval crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot). Review and update quarterly.

2. Server-Side Rendering Verification

Check: Disable JavaScript in your browser and load your key pages. Alternatively, use curl to fetch the raw HTML and verify that all citable text content is present in the response.

Why it matters: AI retrieval crawlers (GPTBot, PerplexityBot, ChatGPT-User) parse only the initial HTML response. They do not execute JavaScript. Content injected via client-side rendering is invisible to AI search.

Fix: Implement server-side rendering (SSR) or static site generation (SSG) for all pages you want AI engines to cite. For Next.js, use getServerSideProps or getStaticProps. For headless Shopify, ensure your frontend framework renders content server-side.

3. HTTP Status Code Audit

Check: Crawl your entire site and catalog every URL's HTTP status code. Identify all 404, 500, 301, 302, and 410 responses.

Why it matters: AI crawlers that encounter 404 or 500 errors on multiple pages reduce their crawl frequency and confidence in your site. Pages returning error codes waste crawl budget that could be spent on citable pages.

Fix: Fix or redirect all 404 errors on important pages. Remove 404 URLs from your sitemap. Ensure 301 redirects point to relevant destination pages, not generic homepages.

4. HTTPS Implementation

Check: Verify that all pages are served over HTTPS and that HTTP requests redirect to HTTPS with 301 status codes.

Why it matters: AI systems prefer HTTPS URLs during canonical selection. Mixed HTTP/HTTPS pages create duplicate content that fragments authority signals.

Fix: Enable HTTPS sitewide. Implement 301 redirects from HTTP to HTTPS. Update internal links to use HTTPS URLs. Verify canonical tags reference HTTPS versions.

5. XML Sitemap Validation

Check: Validate your sitemap against the Sitemaps protocol. Verify that all URLs return 200 status, that lastmod values are accurate, and that the sitemap is referenced in robots.txt.

Why it matters: Your sitemap is the primary discovery mechanism for AI crawlers. Sitemaps with 404 URLs, incorrect lastmod values, or validation errors reduce crawler confidence and waste crawl budget.

Fix: Remove non-200 URLs. Update lastmod to reflect actual content modification dates. Add Sitemap: directive to robots.txt. For sites with over 50,000 URLs, implement a sitemap index.

6. Crawl Depth Analysis

Check: Map the click depth of every indexable page from the homepage. Identify pages that are 4 or more clicks deep.

Why it matters: AI crawlers allocate crawl budget based on page importance signals, and click depth is one of the strongest. Pages buried 4+ clicks deep receive less crawl frequency and less AI visibility.

Fix: Restructure internal linking to bring important pages within 3 clicks of the homepage. Add contextual links from high-authority pages to deep pages. Consider adding featured content sections to the homepage.

7. Redirect Chain Resolution

Check: Identify all redirect chains (URL A redirects to URL B, which redirects to URL C). Crawl tools report chain length.

Why it matters: Each redirect in a chain adds latency and risks losing the AI crawler before it reaches the final page. Chains of 3+ redirects frequently cause AI crawlers to abandon the request.

Fix: Update all redirects to point directly to the final destination URL. Update internal links and sitemap entries to reference the final URL.

8. Noindex Tag Audit

Check: Identify all pages with <meta name="robots" content="noindex"> or X-Robots-Tag: noindex headers. Verify that no important pages are accidentally noindexed.

Why it matters: Noindexed pages are excluded from AI search indexes entirely. Accidental noindex on product pages, blog posts, or FAQ pages eliminates their AI visibility.

Fix: Remove noindex from any page you want AI engines to cite. Common accidental sources: staging site settings carried to production, CMS defaults for new pages, and plugin conflicts.

9. Canonical Tag Completeness

Check: Verify that every indexable page has a self-referencing canonical tag. Identify pages with missing canonicals, canonical chains, or canonicals pointing to non-200 URLs.

Why it matters: Missing canonicals force AI systems to guess the preferred URL. Incorrect canonicals can point AI engines to the wrong version of your content.

Fix: Add self-referencing canonicals to all pages. Resolve canonical chains. Fix canonicals pointing to redirected or error pages.

10. IndexNow Implementation

Check: Verify whether your site implements the IndexNow protocol for real-time content update notifications.

Why it matters: IndexNow pushes content change notifications to Bing, Yandex, and other participating search engines immediately. This reduces the lag between content updates and AI search index refresh from days to hours.

Fix: Implement IndexNow API integration. Trigger notifications automatically when product pages, blog posts, or key pages are updated.

Section 2: Schema and Structured Data (Items 11-20)

Schema markup is the most direct technical lever for AI citation. Pages with complete, attribute-rich schema achieve a 61.7% citation rate compared to 59.8% for pages with no schema and only 41.6% for pages with minimal or generic schema.

11. Product Schema Completeness

Check: Validate Product schema on all product pages. Verify that name, description, image, price, priceCurrency, availability, brand, sku, and aggregateRating (when reviews exist) are all populated.

Why it matters: AI engines use Product schema to understand and compare products. Incomplete product schema underperforms having no schema at all.

Fix: Populate all available Product schema attributes. Use real, current data — not placeholder values.

12. FAQPage Schema Implementation

Check: Verify FAQPage schema on product pages, category pages, and informational pages that contain FAQ sections.

Why it matters: Pages with FAQPage schema achieve a 41% citation rate versus 15% without — approximately 2.7 times higher. FAQPage schema is the single highest-impact structured data type for AI citation.

Fix: Add FAQPage schema to every page with FAQ content. Ensure the schema matches the visible FAQ content exactly.

13. Organization Schema

Check: Verify Organization schema on your homepage and about page. Check that name, url, logo, contactPoint, sameAs (social profiles), and address are populated.

Why it matters: Organization schema helps AI engines identify and attribute your brand as a known entity. This entity recognition increases citation trust.

Fix: Implement Organization schema with all available attributes. Include sameAs links to verified social profiles.

14. BreadcrumbList Schema

Check: Verify BreadcrumbList schema on all pages with visible breadcrumb navigation.

Why it matters: BreadcrumbList schema communicates your site hierarchy to AI engines. It helps them understand where each page sits within your site structure and how topics relate.

Fix: Add BreadcrumbList schema that matches your visible breadcrumb navigation. Ensure the schema reflects the actual page hierarchy.

15. Article Schema for Content Pages

Check: Verify Article schema on all blog posts and editorial content. Check that headline, author, datePublished, dateModified, and publisher are populated.

Why it matters: Article schema provides AI engines with authorship, publication date, and update signals that factor into content freshness and authority evaluation.

Fix: Add Article schema to all content pages. Keep dateModified current with actual content updates.

16. Review and AggregateRating Schema

Check: Verify AggregateRating schema on product pages with customer reviews. Verify individual Review schema when displaying full review content.

Why it matters: Review schema provides social proof signals that AI engines factor into citation confidence. Products with strong review schema are more likely to be recommended in AI answers.

Fix: Implement AggregateRating with accurate ratingValue, reviewCount, and bestRating. If displaying individual reviews, add Review schema for each.

17. Schema Validation

Check: Run all pages through Google's Rich Results Test and Schema.org Validator. Identify errors and warnings.

Why it matters: Invalid schema is worse than no schema. AI engines that encounter schema errors may penalize the page's trust score.

Fix: Resolve all errors. Address warnings where possible. Revalidate after fixes.

18. Schema-Content Consistency

Check: Verify that structured data matches visible page content. Product names, prices, descriptions, and ratings in schema should exactly match what users see.

Why it matters: AI engines cross-reference schema data against visible content. Mismatches between schema and visible content reduce trust and may trigger spam filters.

Fix: Implement dynamic schema generation that pulls from the same data source as visible content. Never hardcode schema values that may diverge from displayed content.

19. LocalBusiness Schema (If Applicable)

Check: For stores with physical locations, verify LocalBusiness schema with address, hours, phone, and geo coordinates.

Why it matters: LocalBusiness schema helps AI engines surface your store in location-relevant queries. "Where can I buy X near me" queries are increasingly handled by AI search.

Fix: Add LocalBusiness schema to location pages and your homepage. Include all available attributes.

20. HowTo Schema for Instructional Content

Check: For guides, tutorials, and how-to content, verify HowTo schema with step, name, text, and image attributes.

Why it matters: HowTo schema provides AI engines with explicitly structured instructional content that can be extracted as step-by-step answers.

Fix: Add HowTo schema to all step-based instructional content. Each step should have a name and text at minimum.

Section 3: Performance (Items 21-30)

Performance determines whether AI crawlers can access your content within their timeout windows. Sites with TTFB below 200ms see a 22% increase in citation density.

21. Time to First Byte

Check: Measure TTFB from multiple locations, particularly US East Coast (where many AI crawlers originate). Test product pages, collection pages, and content pages.

Why it matters: TTFB below 200ms is the target for AI visibility. Above 600ms requires immediate investigation. AI crawlers will timeout on consistently slow pages.

Fix: Implement server-side caching, optimize database queries, and use a CDN with edge nodes near AI crawler origins.

22. Largest Contentful Paint

Check: Measure LCP using Lighthouse, PageSpeed Insights, or CrUX data. Target 2.5 seconds or less.

Why it matters: Sites with LCP at or below 2.5 seconds are almost 50% more likely to appear in AI results.

Fix: Optimize hero images, eliminate render-blocking resources, preload critical assets, and reduce DOM size.

23. Cumulative Layout Shift

Check: Measure CLS. Target 0.1 or less.

Why it matters: Sites with CLS at or below 0.1 are included 29.8% more often in AI summaries.

Fix: Set explicit width and height on images and embeds. Avoid inserting content above existing content. Use CSS containment for dynamic elements.

24. Total Page Weight

Check: Measure the total transfer size of each page type. Target under 1 MB.

Why it matters: 18% of pages larger than 1 MB are abandoned by AI crawlers.

Fix: Compress images to WebP/AVIF. Minify CSS and JavaScript. Remove unused code. Defer non-critical resources.

25. Image Format Optimization

Check: Audit image formats across your site. Identify JPEG and PNG images that should be WebP or AVIF.

Why it matters: Modern formats reduce image file size by 25-50% compared to legacy formats, directly reducing page weight and improving AI crawler success rates.

Fix: Convert product images and content images to WebP with AVIF as progressive enhancement. Use the <picture> element for format fallback.

26. JavaScript Bundle Size

Check: Analyze JavaScript bundle sizes. Identify bundles over 100 KB that could be split or deferred.

Why it matters: Large JavaScript bundles increase page weight and parsing time. For AI crawlers that do not execute JavaScript, the bundle is pure waste.

Fix: Code-split JavaScript. Defer non-critical scripts. Remove unused libraries. Implement tree-shaking in your build process.

27. CDN Configuration

Check: Verify CDN coverage, cache hit rates, and edge node locations relative to known AI crawler origins.

Why it matters: CDN edge serving reduces TTFB to sub-100ms for cached content regardless of AI crawler location.

Fix: Implement a CDN with edge nodes in US East Coast, US West Coast, and European locations. Configure appropriate cache TTLs for each page type.

28. Server Response Under Load

Check: Load test your origin server to determine at what traffic level response times degrade.

Why it matters: AI crawlers can generate traffic patterns that resemble DDoS attacks — multiple concurrent requests hitting the same pages. If your server buckles under AI crawler load, both crawlers and real users suffer.

Fix: Configure rate limiting specifically for AI crawler user agents. Ensure CDN caching absorbs the majority of crawler requests before they reach your origin.

29. Font Loading Optimization

Check: Audit web font loading. Identify fonts that block rendering or cause layout shifts.

Why it matters: Render-blocking fonts delay content visibility and increase CLS. While AI crawlers do not render fonts, font loading affects the overall page weight and TTFB metrics.

Fix: Use font-display: swap or font-display: optional. Preload critical fonts. Consider system font stacks for body text.

30. Third-Party Script Impact

Check: Audit all third-party scripts (analytics, chat widgets, social embeds, ad scripts). Measure their impact on page load time.

Why it matters: Third-party scripts frequently add 500ms+ to page load times. They increase page weight, add render-blocking resources, and may inject content that interferes with AI content extraction.

Fix: Defer all non-critical third-party scripts. Load analytics asynchronously. Consider removing chat widgets from product pages if they significantly impact performance.

Section 4: Content Structure (Items 31-38)

Content structure determines whether AI engines can extract clean, citable passages from your pages.

31. Single H1 Per Page

Check: Verify every page has exactly one H1 tag that describes the page's primary topic.

Why it matters: 87% of pages cited by AI engines use a single H1. Multiple H1 tags create ambiguity about the page's primary topic.

32. Heading Hierarchy Validation

Check: Verify that headings follow a logical hierarchy without skipped levels (H1 followed by H3, etc.).

Why it matters: Pages with clear H1-H2-H3 hierarchy are 2.8 times more likely to be cited by AI engines.

33. Heading Frequency

Check: Verify heading frequency of approximately one heading every 150-300 words.

Why it matters: AI engines segment content into passages at heading boundaries. Sections longer than 300 words without subheadings produce oversized passages that are difficult to extract cleanly.

34. First-Sentence Answers

Check: Verify that the first sentence after each heading provides a direct, complete answer to the question implied by the heading.

Why it matters: 44.2% of all LLM citations come from the first 30% of text on a page. First-sentence answers dramatically increase extraction probability.

35. Semantic HTML Elements

Check: Verify use of <main>, <article>, <section>, <nav>, <aside>, <header>, and <footer> elements.

Why it matters: Semantic HTML reduces the computational cost for AI to understand content structure. It eliminates the need for heuristic content extraction.

36. Alt Text Completeness

Check: Audit all images for descriptive alt text. Identify images with empty, missing, or generic alt attributes.

Why it matters: AI engines cannot see images. Alt text is the only way they understand visual content. Missing alt text means missing context for AI citation.

37. Content Readability

Check: Measure Flesch-Kincaid readability grade level. Target Grade 6-8 for maximum AI citation probability.

Why it matters: Content at Grade 6-8 readability earns 4.6 AI citations on average versus 4.0 for Grade 11+. Simpler language is easier for AI to extract and cite confidently.

38. Table and List Markup

Check: Verify that comparison data, specifications, and enumerated items use proper HTML tables and lists rather than styled divs.

Why it matters: AI engines can directly parse <table>, <ul>, and <ol> elements to extract structured information. Div-based visual tables require heuristic interpretation.

Section 5: Accessibility (Items 39-44)

Accessibility and AI optimizations overlap significantly. Elements that help assistive technologies navigate your site also help AI engines extract your content.

39. ARIA Landmark Roles

Check: Verify that <main>, <nav>, <header>, and <footer> elements are present and serve as ARIA landmarks.

Why it matters: ARIA landmarks help AI crawlers identify content regions the same way they help screen readers. They define where primary content starts and where navigation and boilerplate begin.

Check: Verify that a "skip to content" link exists and points to the <main> element.

Why it matters: While AI crawlers do not click links, the presence of a skip navigation link and its target indicate where primary content begins — a signal that complements the <main> element.

Check: Verify that all interactive elements are accessible via keyboard and that focus order follows a logical sequence.

Why it matters: Keyboard-navigable sites typically have cleaner DOM structure and more semantic HTML, which directly benefits AI content extraction.

42. Color Contrast Ratios

Check: Verify WCAG 2.1 AA contrast ratios for all text content.

Why it matters: While AI crawlers do not evaluate visual contrast, sites meeting accessibility standards tend to have cleaner markup and better-organized content that is easier for AI to parse.

43. Form Label Associations

Check: Verify that all form inputs have associated labels using the for attribute.

Why it matters: Properly labeled forms indicate attention to semantic markup quality, which correlates with overall site quality signals that AI engines evaluate.

44. Language Attributes

Check: Verify that the <html> element has a lang attribute matching the page's language. Verify lang attributes on any content blocks in different languages.

Why it matters: The lang attribute helps AI engines understand the language of your content, which affects language-specific retrieval and citation. Missing lang attributes force AI systems to detect language from content, which may produce incorrect results for pages with mixed-language content.

Section 6: Architecture (Items 45-50)

Site architecture determines how AI engines understand the relationships between your pages and the topical authority of your content.

45. Hub-and-Spoke Content Clusters

Check: Identify whether your content is organized into topic clusters with hub pages linking to spoke pages and vice versa.

Why it matters: Hub-and-spoke clusters demonstrate topical authority that AI engines use to determine citation worthiness. Disconnected content lacks the authority context that drives citations.

46. Internal Link Equity Distribution

Check: Map internal link flow to identify where authority accumulates and where it leaks. Verify that your most important pages receive the most internal links.

Why it matters: AI engines evaluate page authority partly through internal link signals. Pages with many internal links from topically relevant pages are stronger citation candidates.

47. Orphan Page Detection

Check: Identify pages with no internal links pointing to them.

Why it matters: Orphan pages are discoverable only through sitemaps, not through link-following. AI crawlers that discover pages through links have higher confidence in those pages than pages found only through sitemaps.

48. URL Structure Consistency

Check: Verify that URL patterns are consistent across page types. Products should follow one pattern, collections another, blog posts another.

Why it matters: Consistent URL patterns help AI engines classify page types and apply appropriate extraction logic. Inconsistent URL patterns force heuristic classification.

49. Hreflang Implementation (International Sites)

Check: For international sites, verify hreflang annotations with bidirectional return links, correct ISO codes, self-referencing tags, and x-default declarations.

Why it matters: 67% of websites have hreflang errors. Correct implementation is a competitive advantage for international AI visibility. AI engines increasingly surface localized content but require correct signals.

50. Mobile and Desktop Parity

Check: Verify that mobile and desktop versions of your pages serve identical content. Google's mobile-first indexing means the mobile version is what gets indexed.

Why it matters: If your mobile pages serve less content than desktop (hidden sections, collapsed accordions that require JavaScript to expand), AI crawlers may index an incomplete version of your content. Ensure content parity between mobile and desktop rendered output.

Running the Audit

Work through this checklist systematically, starting with crawlability. A page that cannot be crawled fails the entire audit regardless of its schema, performance, or content quality. After crawlability, prioritize schema and performance — these two categories have the largest measurable impact on AI citation rates. Content structure and accessibility improvements compound over time. Architecture changes provide the deepest long-term benefit but require the most effort.

Schedule this audit quarterly. The AI search landscape changes rapidly — new crawlers appear, existing crawlers change their behavior, and platform-specific requirements evolve. A site that passes today's audit may fail in three months if it is not actively maintained.