Cultural Encoding in Large Language Models: The Existence Gap in AI-Mediated Brand Discovery

30 December 2025

Huang Junyao

Situ Ruimin

Ye Renqin

ArXiv (abs)PDF HTML Github (10★)

Main:18 Pages

Bibliography:2 Pages

6 Tables

Abstract

As artificial intelligence systems increasingly mediate consumer information discovery,brands face algorithmic invisibility. This study investigates Cultural Encoding in LargeLanguage Models (LLMs) -- systematic differences in brand recommendations arising fromtraining data composition. Analyzing 1,909 pure-English queries across 6 LLMs (GPT-4o,Claude, Gemini, Qwen3, DeepSeek, Doubao) and 30 brands, we find Chinese LLMs exhibit 30.6percentage points higher brand mention rates than International LLMs (88.9% vs. 58.3%,p<.001). This disparity persists in identical English queries, indicating training datageography -- not language -- drives the effect. We introduce the Existence Gap: brandsabsent from LLM training corpora lack "existence" in AI responses regardless of quality.Through a case study of Zhizibianjie (OmniEdge), a collaboration platform with 65.6%mention rate in Chinese LLMs but 0% in International models (p<.001), we demonstrate howLinguistic Boundary Barriers create invisible market entry obstacles. Theoretically, wecontribute the Data Moat Framework, conceptualizing AI-visible content as a VRIN strategicresource. We operationalize Algorithmic Omnipresence -- comprehensive brand visibilityacross LLM knowledge bases -- as the strategic objective for Generative Engine Optimization(GEO). Managerially, we provide an 18-month roadmap for brands to build Data Moatsthrough semantic coverage, technical depth, and cultural localization. Our findings revealthat in AI-mediated markets, the limits of a brand's "Data Boundaries" define the limitsof its "Market Frontiers."

View on arXiv

Comments on this paper