ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.11020
  4. Cited By
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness
  Evaluation in Large Language Models

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

16 June 2024
Yuqing Wang
Yun Zhao
    LRM
    AAML
    ELM
ArXivPDFHTML

Papers citing "RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models"

4 / 4 papers shown
Title
Towards Making the Most of ChatGPT for Machine Translation
Towards Making the Most of ChatGPT for Machine Translation
Keqin Peng
Liang Ding
Qihuang Zhong
Li Shen
Xuebo Liu
Min Zhang
Y. Ouyang
Dacheng Tao
LRM
83
203
0
24 Mar 2023
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic
  Creativity and Commonsense Knowledge
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
Bill Yuchen Lin
Ziyi Wu
Yichi Yang
Dong-Ho Lee
Xiang Ren
ReLM
LRM
227
62
0
02 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1