ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15930
  4. Cited By
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large
  Language Models

WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

27 November 2023
Youssef Benchekroun
Megi Dervishi
Mark Ibrahim
Jean-Baptiste Gaya
Xavier Martinet
Grégoire Mialon
Thomas Scialom
Emmanuel Dupoux
Dieuwke Hupkes
Pascal Vincent
    LRM
ArXivPDFHTML

Papers citing "WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models"

3 / 3 papers shown
Title
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
Zhicheng YANG
Jinghui Qin
Jiaqi Chen
Xiaodan Liang
59
12
0
17 May 2022
1