ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2510.05181
264
2

Auditing Pay-Per-Token in Large Language Models

5 October 2025
Ander Artola Velasco
Stratis Tsirtsis
Manuel Gomez Rodriguez
    MLAU
ArXiv (abs)PDFHTMLGithub (2★)
Main:9 Pages
6 Figures
Bibliography:4 Pages
Appendix:12 Pages
Abstract

Millions of users rely on a market of cloud-based services to obtain access to state-of-the-art large language models. However, it has been very recently shown that the de facto pay-per-token pricing mechanism used by providers creates a financial incentive for them to strategize and misreport the (number of) tokens a model used to generate an output. In this paper, we develop an auditing framework based on martingale theory that enables a trusted third-party auditor who sequentially queries a provider to detect token misreporting. Crucially, we show that our framework is guaranteed to always detect token misreporting, regardless of the provider's (mis-)reporting policy, and not falsely flag a faithful provider as unfaithful with high probability. To validate our auditing framework, we conduct experiments across a wide range of (mis-)reporting policies using several large language models from the Llama\texttt{Llama}Llama, Gemma\texttt{Gemma}Gemma and Ministral\texttt{Ministral}Ministral families, and input prompts from a popular crowdsourced benchmarking platform. The results show that our framework detects an unfaithful provider after observing fewer than ∼70\sim 70∼70 reported outputs, while maintaining the probability of falsely flagging a faithful provider below α=0.05\alpha = 0.05α=0.05.

View on arXiv
Comments on this paper