TL;DR Summary of Investigating AI Consistency in Brand and Product Recommendation Tracking
Optimixed’s Overview: Unveiling the Complexities of AI-Driven Brand Visibility Metrics
Background and Research Motivation
Companies are investing over $100 million yearly in AI-based brand and product tracking tools, yet no solid research existed to verify if these AI-driven recommendations are consistent enough to generate valid visibility metrics. This study, inspired by skepticism about AI tracking’s reliability, partnered with AI tracking startup Gumshoe.ai to rigorously test popular AI models’ consistency in generating brand/product lists.
Methodology and Experiment Setup
- Participants: 600 volunteers ran 12 distinct prompts across ChatGPT, Claude, and Google AI, totaling 2,961 runs.
- Data Collection: Responses were normalized to ordered lists of brand/product recommendations.
- Focus: Examined diversity, randomness, and repeatability of brand recommendations across sectors including consumer products, B2B services, and healthcare.
Key Findings on AI List Consistency
- There is less than a 1% probability that ChatGPT or Google AI will produce the same list of recommendations in two runs out of 100.
- Ordering consistency is even lower, with matching orders occurring less than once in 1,000 runs.
- AI-generated lists vary in length, composition, and order due to their probabilistic nature and token-based prediction models.
- Human-generated prompts show vast semantic diversity, further complicating consistent tracking.
Implications for AI Visibility Metrics
Despite the randomness in individual outputs, aggregated visibility percentages — how often a brand appears across many repeated prompt runs — provide meaningful insight into a brand’s prominence within AI-generated recommendations.
- For example, a hospital appearing in 97% of cancer care recommendations demonstrates high visibility despite fluctuating rank positions.
- Visibility correlates more with the AI’s underlying training corpus and the size of the consideration set than with precise ranking.
- Tracking ranking positions is deemed unreliable and misleading.
Challenges and Open Questions
- The extreme variability of natural human prompts raises questions about how many and which prompts are necessary for statistically meaningful visibility data.
- Further research is needed to confirm if API call variability matches real user interactions.
- Large-scale, transparent, and peer-reviewed studies are required to refine AI tracking methodologies.
Final Recommendations for Marketers and AI Tracking Providers
- Marketers should be cautious and require transparent, statistically validated reports before investing in AI tracking tools.
- Visibility percentage across multiple prompts and runs can serve as a useful proxy metric, but ranking positions should be disregarded.
- Providers must publicly disclose their methodologies and data to build trust and accountability.
Conclusion
This research partially disproves the initial skepticism that AI brand recommendation tracking is useless, showing that while individual AI outputs are random, aggregated visibility metrics hold promise. However, the complexity of AI randomness and prompt diversity demands careful and transparent approaches to AI-driven brand visibility measurement moving forward.