- The one qualifying question for any generative engine optimization agency sales call: “Walk me through your retrieval-test workflow, what prompts, what cadence, what LLM surfaces, and how do you reconcile that against scrape logs?” Real shops answer in operator terms within 60 seconds.
- Ask for a redacted scrape-to-cite diff from a current client. The diff compares URLs that GPTBot, PerplexityBot, ClaudeBot, and Google-Extended actually fetched against the URLs that showed up as citations in ChatGPT, Perplexity, and Google AI Overviews answers. No diff, no engagement.
- Schema markup is a prerequisite for retrieval eligibility, not a citation lever. Adding Schema.org Organization or FAQ markup does not, by itself, change whether you get cited. The lever is passage efficiency and source authority.
- First decision is in-house vs. agency vs. hybrid, not which agency. If you have a content engineer and editorial ops, you can run GEO in-house with a quarterly audit. If you have neither, hire. If one, build a hybrid pod.
- Day 30 deliverables from a real GEO retainer: a documented prompt corpus tied to buyer stages, a baseline citation log across at least three LLM surfaces, and a scrape-log pull. Day 60: first scrape-to-cite diff with remediation priorities. Day 90: re-test showing the citation-rate delta.
Questions this article answers:
- How do I tell if a GEO agency actually does retrieval testing or is just running an SEO playbook with AI in the H1?
- What should a generative engine optimization agency retainer cost in 2026 and what should be in the SOW?
- Why does my page get scraped by GPTBot but Reddit gets cited in the ChatGPT answer instead?
- Should I hire a GEO agency, use my existing SEO agency, or build this in-house?
- How do GEO agencies measure progress when ChatGPT and Perplexity don’t expose rank-tracker APIs?
- What are the red flags in a GEO agency pitch deck?
Every “Best GEO Agency” List in 2026 Is a Vendor Directory. None of Them Teach You How to Evaluate One.
Search generative engine optimization agency right now. The top results are ranked listicles. Most of them happen to feature the publishing agency in the top three slots. You are being asked to trust the list.
This article gives you the opposite: a buyer’s-side workflow you can run on any sales call in 30 minutes. Including ours.
Generative engine optimization (GEO) is the practice of getting your brand and pages cited inside AI answer surfaces: ChatGPT, Perplexity, Google AI Overviews, Gemini, and Claude. Some shops call it AEO (answer engine optimization). Same category, contested vocabulary. We use GEO.
Here is the problem. Most agencies selling GEO services in 2026 are running a 2019 SEO playbook with “AI” bolted onto the H1. They talk about entities and schema and “AI-friendly content.” They cannot show you a single artifact that proves they test against actual LLM surfaces.
The artifact that separates real work from a rebranded deck is the scrape-to-cite diff: a side-by-side of which of your URLs the LLM crawlers fetched versus which ones actually appeared as citations in the answers. If your shortlist agency can’t describe that diff in operator terms, you’re paying for SEO with new vocabulary.
Before You Shop: Decide Whether GEO Belongs In-House, At an Agency, or in a Hybrid Pod
The first decision is not which agency. It is whether to hire one at all.
None of the directory listicles acknowledge this decision exists, because they need you shopping. Here are the actual criteria.
Do you have someone who can read crawler logs, hand-write JSON-LD schema, and stand up a weekly prompt re-test? Do you have editorial ops who can maintain and refresh a buyer-stage prompt corpus?
- Yes to both: GEO can run in-house with a quarterly external audit.
- No to both: hire an agency.
- Yes to one: build a hybrid pod.
When in-house GEO actually works
In-house works when you already have a strong technical SEO function and a content team that ships against briefs on a real cadence. The work is unglamorous: weekly prompt runs, citation logging, scrape-log pulls, entity reconciliation in Wikidata. You need someone who will actually do it on a Tuesday afternoon when nothing is on fire.
What a hybrid pod looks like in practice
In a hybrid pod, the agency owns retrieval testing, scrape-log analysis, and the monthly diff. Your in-house team owns content production against the diff’s remediation list.
This is the most common shape for B2B SaaS teams with a content marketer but no content engineer. It splits the work cleanly along the line where the specialized infrastructure sits.
Rough cost framing for budget planning. In-house adds a headcount line: a content engineer, or a senior technical SEO with GEO scope. Agency retainers vary widely by scope. Project audits run shorter: a scoped 30 to 60 day baseline engagement.
What a Real GEO Engagement Delivers Month-to-Month (And What “GEO Services” on a 2019 SEO Deck Doesn’t)
A real GEO retainer ships four artifacts every month: a maintained prompt corpus, a citation log across multiple LLM surfaces, a scrape-log pull, and a scrape-to-cite diff with remediation priorities.
A rebranded SEO deck ships keyword rankings, “AI-optimized” blog posts, and a schema audit with no retrieval test attached to it.
The difference is not branding. It is whether the shop has logging infrastructure for LLM crawlers and a method for testing citation outcomes on a fixed prompt set.
The 30/60/90 deliverables a real shop hands over
Day 30. A documented prompt corpus segmented by buyer stage: problem aware, solution aware, vendor comparison, branded. A baseline citation log across at least three surfaces: ChatGPT, Perplexity, and Google AI Overviews. A scrape-log pull showing which of your URLs GPTBot, PerplexityBot, ClaudeBot, and Google-Extended have actually fetched.
Day 60. The first scrape-to-cite diff. This is the artifact. It sorts your URLs into three buckets: fetched and cited, fetched and not cited, never fetched. The remediation plan flows from which bucket the priority URLs sit in.
Day 90. A re-test against the same prompt corpus. The reporting metric is the delta in share of citation, not a rank-tracker number, because there is no rank tracker for ChatGPT.
Schema is a prerequisite, not a lever
Schema work is necessary. Schema work alone does not move citation.
The pattern shows up constantly. An agency ships a clean Organization, Article, and FAQ schema implementation, sends a victory report, and citation rates do not move. Schema makes your content eligible for retrieval. The citation decision happens at a different layer: is your page the most efficient passage to answer the prompt, and is your brand a resolved entity the model treats as authoritative?
Recent third-party analysis (Ahrefs reviewed the schema-to-citation relationship in 2026) suggests the correlation between schema and AI citation is weaker than most SEO decks imply. Treat schema as table stakes. The work that moves citation comes after.
For a deeper walk through the diagnostic itself, see our four-pass GEO content audit and our Perplexity-specific retrieval fix.

The Qualifying Question: “Walk Me Through Your Retrieval-Test Workflow” And What the Next 60 Seconds Tells You
The single qualifying question for any generative engine optimization agency sales call is this: “Walk me through your retrieval-test workflow. What prompts, what cadence, what LLM surfaces, and how do you reconcile that against scrape logs?”
What happens in the next 60 seconds tells you everything.
Operator answers vs. abstraction answers
A real GEO operator gets specific about three things, fast.
Prompt corpus. How many prompts, segmented how, refreshed how often. The right answer sounds like: “We maintain a buyer-stage corpus per client, refreshed monthly against actual support tickets and sales call transcripts.” A tight corpus built from real buyer language beats a 500-prompt keyword-tool dump, because LLMs answer intent, not strings. Prompts generated from a keyword tool produce queries no real buyer asks.
Cadence. A 7 to 14 day re-run window on the same prompt corpus. Answer drift is real. ChatGPT will give you a different answer to the same prompt on Tuesday than on the previous Thursday. Without a fixed cadence on a fixed corpus, you cannot tell signal from noise.
The diff. They pull crawler logs from GPTBot, PerplexityBot, ClaudeBot, and Google-Extended, then compare URLs fetched against URLs cited in the answer logs.
A rebranded SEO shop pivots to abstraction. You will hear “we make sure your content is AI-friendly,” “we structure for entities,” “we optimize for the AI Overview,” “we follow best practices for E-E-A-T.” None of those sentences describe a workflow. They describe a vibe.
The redacted-diff request that ends the call
Follow-up question: “Can you show me a redacted scrape-to-cite diff from a current client?”
If the answer is yes, ask them to walk you through what they did with it. The conversation that follows will tell you whether they did the diagnostic work or built a slide.
If the answer is no, or some version of “that’s proprietary” without offering even a redacted sample, the engagement will be SEO work with new vocabulary. Move on.
Why Reddit Got Cited and Your Page Got Scraped: Reading the Diff Like a GEO Operator
The scrape-to-cite diff reveals two distinct failure modes. They have completely different fixes.
Fetched but not cited: the authority problem
Your URL appears in the crawler logs. It does not appear in the citation log. The model saw your page and chose somebody else’s. Usually Reddit, G2, or a listicle that aggregates multiple options.
Three common causes.
- Passage inefficiency. Your answer is buried under an H3 four scrolls down, while the Reddit thread answers the prompt in the first comment.
- Source authority. Your brand entity is not resolved in Wikidata or the Google Knowledge Graph, so the model has no signal that you are an authoritative source on the topic.
- Prompt shape. The prompt is comparative by nature (“best X for Y”) and the model prefers an aggregator that lists multiple brands over a single brand’s own page.
Fixes here are content and authority work: passage rewrites that move the answer to the top of the page, entity reconciliation work, and earning third-party mentions on the sources the model is already citing.
Never fetched: the discovery problem
Your URL does not appear in the crawler logs at all. The LLM crawlers never picked it up. This is a discovery problem, not an authority problem.
Common causes: robots.txt blocks one or more of the LLM crawlers (a surprisingly frequent finding when companies copy-paste a 2022 robots.txt), missing or broken sitemap entries, internal links so thin the crawler does not find the page, or the page being too new with no inbound link surface.
Fixes are technical SEO basics with an LLM-crawler twist: explicitly allow GPTBot, PerplexityBot, ClaudeBot, and Google-Extended in robots.txt if you want to be in the answers, then make sure your sitemap and internal link structure surface the page.
The four ratios that belong in every monthly report
The four numbers a real shop reports every month:
- Share of citation = prompts where your domain is cited / total prompts in corpus
- Scrape-to-cite ratio = citations of your domain / fetches of your domain by LLM crawlers, over the same window
- Prompt coverage rate = prompts where any owned URL appears / total tracked prompts
- Citation-eligible page rate = URLs fetched by GPTBot/PerplexityBot/ClaudeBot / total indexable URLs
Share of citation tells you the headline result. Scrape-to-cite ratio tells you whether your eligible pages are converting. Prompt coverage rate tells you whether you show up at all. Citation-eligible page rate tells you whether your discovery is healthy.
Pricing, SOW, and Red Flags: What a GEO Retainer Should Cost and What Should Be in Writing
Three retainer shapes are common in 2026.
| Engagement shape | What you get | When it fits |
|---|---|---|
| Project audit | Baseline corpus, first scrape-log pull, first scrape-to-cite diff | You want to verify methodology before signing, or you have in-house capacity to execute remediation |
| Monthly retainer | Ongoing corpus maintenance, monthly diff and priorities, entity work, content briefs informed by the diff | Most B2B and mid-market teams |
| Performance-tied | Fee structured against a share-of-citation threshold on an agreed prompt set | Rare. Only when the prompt set is designed with as much rigor as the work itself |
For monthly retainer pricing, ranges vary widely by domain size, prompt corpus complexity, and how much content production sits inside scope versus on your side. Get itemized quotes from at least three shops and compare what’s actually being delivered, not the headline number.
Clauses that belong in every GEO SOW
- Named LLM surfaces tracked (ChatGPT, Perplexity, Google AI Overviews at minimum; Gemini and Claude depending on your audience).
- Prompt corpus size, segmentation method, and refresh cadence.
- Crawler log access: which crawlers (GPTBot, PerplexityBot, ClaudeBot, Google-Extended) and how the logs get to the agency.
- Reporting cadence and the exact definition of “visibility” being reported. Share of citation on the agreed prompt corpus is the right answer. “AI rankings” is not a thing.
- Exit clause that returns the prompt corpus, the citation logs, and the remediation backlog to the client. You should own these artifacts.
Red flags in the pitch deck
- “We optimize for AI search” with no methodology behind it.
- No named LLM surfaces. “AI” as the only specificity.
- No prompt corpus as a deliverable.
- No crawler log access in scope.
- Vague visibility metrics that can’t be tied back to specific prompts.
- Refusal to show a redacted diff from any current client.
- The agency’s own website does not appear as a citation for its own brand queries in ChatGPT or Perplexity. Run this check. It takes 90 seconds.
Frequently Asked Questions
How do I tell if a GEO agency actually does retrieval testing or is just running an SEO playbook with AI in the H1?
Ask one question on the sales call: “Walk me through your retrieval-test workflow, what prompts, what cadence, what LLM surfaces, and how do you reconcile that against scrape logs?” A real GEO shop will get specific about prompt corpus size, a 7 to 14 day re-test cadence, named LLM surfaces, and how they diff crawler logs against citation logs within 60 seconds. A rebranded SEO shop pivots to abstraction about “AI-friendly content” and “entity optimization,” because they don’t have the logging infrastructure to do the actual work.
What should a generative engine optimization agency retainer cost in 2026 and what should be in the SOW?
Pricing varies by scope, but the SOW should name the LLM surfaces tracked, the prompt corpus size and refresh cadence, which crawler logs the agency will access (GPTBot, PerplexityBot, ClaudeBot, Google-Extended), the definition of visibility being reported (share of citation, not “AI rankings”), and an exit clause that returns the prompt corpus and citation logs to you. If any of those are missing, push back before signing. Get itemized quotes from at least three shops and compare deliverables, not headline numbers.
Why does my page get scraped by GPTBot but Reddit gets cited in the ChatGPT answer instead?
Your page is eligible for retrieval but losing the citation decision to a more efficient or more authoritative source. Three common causes: your answer is buried mid-page while the Reddit thread answers in the first comment, your brand entity isn’t resolved in Wikidata or the Knowledge Graph so the model lacks an authority signal, or the prompt is comparative by nature and the model prefers an aggregator that lists multiple options. The fix is passage rewrites, entity reconciliation, and earning mentions on the sources the model already cites, not more schema markup.
Should I hire a GEO agency, use my existing SEO agency, or build this in-house?
Run GEO in-house only if you have both a content engineer who can read crawler logs and write JSON-LD by hand, plus editorial ops who can maintain a buyer-stage prompt corpus on a weekly cadence. Hire a specialist agency if you have neither. Build a hybrid pod where the agency owns retrieval testing and the scrape-to-cite diff while you own content production against the remediation backlog if you have one of the two. Your existing SEO agency is the right call only if they can answer the qualifying question above with operator-level specificity.
How do GEO agencies measure progress when ChatGPT and Perplexity don’t expose rank-tracker APIs?
They run a fixed prompt corpus against the LLM surfaces on a 7 to 14 day cadence and report share of citation, scrape-to-cite ratio, prompt coverage rate, and citation-eligible page rate. Share of citation is prompts where your domain is cited divided by total prompts in the corpus, and it’s the closest thing GEO has to a headline number. The reason cadence and corpus stability matter so much is answer drift: the same prompt produces different answers across days, so progress is only legible against a stable test set.
What are the red flags in a GEO agency pitch deck?
The biggest red flag is no named methodology: “we optimize for AI search” with no LLM surfaces, no prompt corpus, no crawler log access, and no willingness to share a redacted scrape-to-cite diff from a current client. Other tells include vague visibility metrics that can’t tie back to specific prompts, a refusal to define share of citation as the reporting metric, and the most diagnostic check, that the agency’s own website doesn’t appear as a citation in ChatGPT or Perplexity for its own brand queries. Run that last check before the first call.
Run This Workflow on Elevarus Too, Then Decide
The point of this article is to make you a harder buyer. Run the qualifying question on every shop on your shortlist, including ours. Ask for the retrieval-test workflow, the prompt corpus approach, the named LLM surfaces, and a redacted scrape-to-cite diff. If we can’t answer in operator terms, take us off the list.
If you want a working session rather than a pitch, book a free consultation. Bring a domain, and we’ll do a live read on whether your citation problem is discovery (never fetched) or authority (fetched but not cited). That single distinction usually decides the next 90 days of work. Beats leaving a call with a deck.