Field Note

Measuring what AI says about your brand: a quarterly practice

If you cannot measure your AI-search visibility, you cannot manage it. Most marketing departments are flying blind here. The audit doesn't take long once you know what to ask — and the absence of one is the single biggest reason AEO programs underperform.

Luke LaFave Founder · LaFave Consulting
6 min read

If you cannot measure your AI-search visibility, you cannot manage it. This sounds obvious. It’s also the single most-skipped step in marketing departments that have committed budget to “AI search optimization” without ever defining what they’re trying to move.

The audit isn’t complicated. It does require discipline. I want to lay out what a quarterly AI-search audit actually involves, what it produces, and what to do with the output.

The audit’s premise

The buyer who is going to hire you in 2026 will ask an AI assistant a question before they reach out. We don’t know exactly which question. We can guess. The guess is the foundation of the audit.

The audit consists of running those guessed questions — call them prompts — against the AI surfaces your buyers actually use, and recording what comes back. Specifically: did the response name your brand. Did it name your competitors. What did it say about each. Did it include a citation back to your website.

This is the entire instrument. The discipline is in running it on a defined prompt set, against the same five platforms, every quarter, and recording the results in a way you can compare across time.

Building the prompt set

A prompt set for a typical service business runs between eighty and a hundred and twenty prompts. They fall into three categories.

Category-defining prompts — “who are the leading [category] firms in [your city]”, “best [service] near me”, “how do I find a reliable [profession] in [region]”. These are the prompts most buyers will use as their starting point. Twenty to thirty prompts cover most of the variants.

Comparison prompts — “[your brand] vs [competitor 1]”, “is [competitor 2] better than [competitor 3]”, “what’s the difference between [your service tier A] and [tier B]”. These are the prompts a buyer asks after they’ve narrowed to a short list. Twenty to forty prompts covers the comparison space.

Specific-question prompts — “how much does [service] cost in [your region]”, “what should I expect from a [service] consultation”, “how do I know if a [service provider] is reputable”. These are the buyer-journey prompts that map to long-tail intent. Forty to sixty prompts covers this layer.

The prompt set should be built once, frozen, and re-run each quarter without modification. The whole point of the audit is to measure change over time. If the prompt set drifts, the comparison is meaningless.

The five platforms

ChatGPT — by far the largest. The same prompt gives different responses depending on whether the user has browsing enabled, has memory enabled, is signed in. Standardize on one configuration and use it for every run.

Claude — second most important for many B2B categories. Default browsing-enabled mode.

Perplexity — almost entirely live retrieval. Cites sources inline. The cleanest platform to audit because the citations are visible.

Google Gemini — uses Google’s own ranking signals more heavily than the others. Worth tracking because alignment with Gemini predicts AI Overview performance.

Google AI Overviews — not a chat surface but the response generated above the SERP. The audit method is to type each prompt into Google directly and screenshot the Overview block.

Two notes on methodology. One: clear browsing state between prompts. A model that remembers your last prompt will bias the next response. Two: run each prompt twice and average. The same prompt, on the same platform, in the same configuration, will sometimes produce different responses. Single-shot audits are noisy. Two-shot audits are tolerable. Five-shot audits are gold standard if the budget allows.

What you record

For each prompt-platform pair, three pieces of data.

Did the response name our brand? Yes or no, plus position (first, middle, last in the list).

Did the response name each of our three named competitors? Same yes/no plus position.

Did the response include a citation back to a specific URL on our domain? If yes, which URL.

That’s the audit. Three data points per prompt-platform pair. For a hundred prompts on five platforms with two-shot averaging, that’s three thousand data points per audit run. A spreadsheet handles it cleanly.

What the output tells you

Three numbers come out of every audit.

Citation rate — the percentage of prompt-platform pairs in which our brand was named. The single most important metric. It goes from zero to one. Most pre-program audits show a citation rate between three and twelve percent. Programs that are working push citation rate into the twenty-to-forty-percent range over twelve to eighteen months.

Share of voice against named competitors — the percentage of prompts where our brand was named and a specific named competitor was not named. This is the head-to-head metric. It tells you whether you’re displacing the competitor or just adding to the recommendation list.

Source distribution — when our brand is cited, which pages on our domain are the citation sources. This tells you which content is doing the work. Often a surprising mix. The pillar pages you thought were carrying the program turn out to be cited rarely; obscure cluster pages turn out to be the workhorses.

These three numbers, tracked over time, are the program’s report card. Movement on citation rate is the headline number. Share of voice tells you whether the movement is at your competitors’ expense. Source distribution tells you where to double down.

The mistake to avoid

The mistake I see most often is teams running the audit once, getting a baseline, and then never running it again — or running it sporadically when someone in the C-suite asks about AI search. The whole value of the audit is in the comparison across quarters. A one-time baseline tells you nothing actionable.

The fix is to put the audit on a fixed calendar. Last week of the quarter, every quarter, no exceptions. Same prompt set, same platforms, same recording method. The first audit is a baseline. The second is the first signal. The third is when you can start drawing meaningful conclusions about whether the program is working.

After four audits — a full year — you have a defensible business case for either doubling down, recalibrating, or shutting down the program. Without the audits, every quarter of “AI search work” is an opinion. With them, you have an instrument.


Luke LaFave is the founder of LaFave Consulting. The studio’s monthly program includes the quarterly audit described here, on a hundred-and-twenty-prompt set, across the five platforms above.

Tagged
  • Measurement
  • Audit
Engage

If this piece resonated, the work is the next step.

The studio works with four brands per month. The discovery call is twenty minutes, includes a live audit of your current AI-search footprint, and you leave with a written plan whether you sign or not.

Become the answer
Call Luke (920) 505-0775 Text Luke Reply in minutes