ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.10671
34
0

Identifying Non-Replicable Social Science Studies with Language Models

10 March 2025
Denitsa Saynova
Kajsa Hansson
Bastiaan Bruinsma
Annika Fredén
Moa Johansson
ArXivPDFHTML
Abstract

In this study, we investigate whether LLMs can be used to indicate if a study in the behavioural social sciences is replicable. Using a dataset of 14 previously replicated studies (9 successful, 5 unsuccessful), we evaluate the ability of both open-source (Llama 3 8B, Qwen 2 7B, Mistral 7B) and proprietary (GPT-4o) instruction-tuned LLMs to discriminate between replicable and non-replicable findings. We use LLMs to generate synthetic samples of responses from behavioural studies and estimate whether the measured effects support the original findings. When compared with human replication results for these studies, we achieve F1 values of up to 77%77\%77% with Mistral 7B, 67%67\%67% with GPT-4o and Llama 3 8B, and 55%55\%55% with Qwen 2 7B, suggesting their potential for this task. We also analyse how effect size calculations are affected by sampling temperature and find that low variance (due to temperature) leads to biased effect estimates.

View on arXiv
@article{saynova2025_2503.10671,
  title={ Identifying Non-Replicable Social Science Studies with Language Models },
  author={ Denitsa Saynova and Kajsa Hansson and Bastiaan Bruinsma and Annika Fredén and Moa Johansson },
  journal={arXiv preprint arXiv:2503.10671},
  year={ 2025 }
}
Comments on this paper