ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12244
17
0

Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce

18 May 2025
Haojin Wang
Zining Zhu
Freda Shi
ArXivPDFHTML
Abstract

Autoregressive neural language models (LMs) generate a probability distribution over tokens at each time step given a prompt. In this work, we attempt to systematically understand the probability distributions that LMs can produce, showing that some distributions are significantly harder to elicit than others. Specifically, for any target next-token distribution over the vocabulary, we attempt to find a prompt that induces the LM to output a distribution as close as possible to the target, using either soft or hard gradient-based prompt tuning. We find that (1) in general, distributions with very low or very high entropy are easier to approximate than those with moderate entropy; (2) among distributions with the same entropy, those containing 'óutlier tokens'' are easier to approximate; (3) target distributions generated by LMs -- even LMs with different tokenizers -- are easier to approximate than randomly chosen targets. These results offer insights into the expressiveness of LMs and the challenges of using them as probability distribution proposers.

View on arXiv
@article{wang2025_2505.12244,
  title={ Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce },
  author={ Haojin Wang and Zining Zhu and Freda Shi },
  journal={arXiv preprint arXiv:2505.12244},
  year={ 2025 }
}
Comments on this paper