ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.20196
33
0

ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models

27 February 2025
Haibin Chen
Kangtao Lv
Chengwei Hu
Y. Li
Yujin Yuan
Yancheng He
Xingyao Zhang
L. Liu
Shilei Liu
Wenbo Su
Bo Zheng
    ELM
ArXivPDFHTML
Abstract

With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges: (1) handle the heterogeneous and diverse nature of tasks, (2) distinguish between generality and specificity within the e-commerce field. To address these problems, we propose \textbf{ChineseEcomQA}, a scalable question-answering benchmark focused on fundamental e-commerce concepts. ChineseEcomQA is built on three core characteristics: \textbf{Focus on Fundamental Concept}, \textbf{E-commerce Generality} and \textbf{E-commerce Expertise}. Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks, thus addressing the challenge of heterogeneity and diversity. Additionally, by carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts, allowing for precise validation of domain capabilities. We achieve this through a scalable benchmark construction process that combines LLM validation, Retrieval-Augmented Generation (RAG) validation, and rigorous manual annotation. Based on ChineseEcomQA, we conduct extensive evaluations on mainstream LLMs and provide some valuable insights. We hope that ChineseEcomQA could guide future domain-specific evaluations, and facilitate broader LLM adoption in e-commerce applications.

View on arXiv
@article{chen2025_2502.20196,
  title={ ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models },
  author={ Haibin Chen and Kangtao Lv and Chengwei Hu and Yanshi Li and Yujin Yuan and Yancheng He and Xingyao Zhang and Langming Liu and Shilei Liu and Wenbo Su and Bo Zheng },
  journal={arXiv preprint arXiv:2502.20196},
  year={ 2025 }
}
Comments on this paper