ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01830
  4. Cited By
PiCO: Peer Review in LLMs based on the Consistency Optimization

PiCO: Peer Review in LLMs based on the Consistency Optimization

24 February 2025
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
    ALM
ArXivPDFHTML

Papers citing "PiCO: Peer Review in LLMs based on the Consistency Optimization"

10 / 10 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
49
1
0
01 Apr 2025
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
Fabrizio Davide
Pietro Torre
Andrea Gaggioli
Andrea Gaggioli
ELM
92
0
0
12 Dec 2024
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World
  Multilingual Settings
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings
Varun Gumma
Anandhita Raghunath
Mohit Jain
Sunayana Sitaram
LM&MA
32
1
0
17 Oct 2024
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large
  Language Models
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
Yang Janet Liu
Meng Xu
Shuo Wang
Liner Yang
Haoyu Wang
...
Cunliang Kong
Yun-Nung Chen
Yang Liu
Maosong Sun
Erhong Yang
ELM
LRM
36
1
0
21 Feb 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Can we trust the evaluation on ChatGPT?
Can we trust the evaluation on ChatGPT?
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
106
76
0
22 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1