Abstract
Large language models with integrated web search are becoming a primary channel for consumer product research, and major providers expose the same underlying model through multiple access surfaces—paid authenticated chat, free unauthenticated chat, and programmatic APIs—that may differ in undocumented ways. Whether the same model produces systematically different outputs across these surfaces given an identical prompt has not been investigated empirically. The answer has direct consequences for research reproducibility and for any entity whose commercial visibility depends on AI-mediated discovery. Across 900 trials, we tested this question for OpenAI’s GPT-5.2 by submitting the same multidimensional product research prompt 300 times each to three surfaces (Logged-In ChatGPT, Logged-Out ChatGPT, and the Responses API), all conducted on a single day, and analyzing the resulting search queries, cited sources, and recommended brands. The three surfaces diverged on every dimension measured. The two chat surfaces behaved more like each other than either behaved like the API (brand-visibility correlation of r = 0.891 between chat surfaces vs. 0.711–0.815 against the API), though they still differed meaningfully in which prompt-derived nuance words they retained and which specific products they drilled into. The API issued more search queries per trial (4.08 vs. 2.75–2.95) but reused the exact same query wording across trials 70% of the time (vs. 5.6–9.5% for the chat surfaces), drew from rigid templated constructions rather than varied natural language, and retained nearly every source it consulted (5% filtered vs. 73–84% for the chat surfaces). Its brand recommendations over- or under-represented specific brands by as much as 32 percentage points relative to the chat surfaces, and one brand present in 15–18% of chat-surface trials was absent entirely from the API. These findings suggest that a single access surface cannot serve as a general proxy for model behavior, and that commercial visibility in AI-generated recommendations depends materially on which surface a user queries.
Download
Download the full report here.

