ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.09107
  4. Cited By
GLoRE: Evaluating Logical Reasoning of Large Language Models

GLoRE: Evaluating Logical Reasoning of Large Language Models

13 October 2023
Hanmeng Liu
Zhiyang Teng
Ruoxi Ning
Jian Liu
Qiji Zhou
Yuexin Zhang
Yue Zhang
    ReLM
    ELM
    LRM
ArXivPDFHTML

Papers citing "GLoRE: Evaluating Logical Reasoning of Large Language Models"

13 / 13 papers shown
Title
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
Yuan Xia
Akanksha Atrey
Fadoua Khmaissia
Kedar S. Namjoshi
LRM
ELM
32
0
0
28 Apr 2025
AI Benchmarks and Datasets for LLM Evaluation
AI Benchmarks and Datasets for LLM Evaluation
Todor Ivanov
Valeri Penchev
93
0
0
02 Dec 2024
Leaving the barn door open for Clever Hans: Simple features predict LLM
  benchmark answers
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández Orallo
28
3
0
15 Oct 2024
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical
  Reasoning Capabilities of Language Models
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
Man Luo
Shrinidhi Kumbhar
Ming shen
Mihir Parmar
Neeraj Varshney
Pratyay Banerjee
Somak Aditya
Chitta Baral
ReLM
ELM
LRM
29
23
0
02 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,953
0
22 Mar 2023
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
  Negation
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
Thinh Hung Truong
Yulia Otmakhova
Tim Baldwin
Trevor Cohn
Jey Han Lau
Karin Verspoor
47
21
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
ContractNLI: A Dataset for Document-level Natural Language Inference for
  Contracts
ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
Yuta Koreeda
Christopher D. Manning
AILaw
84
96
0
05 Oct 2021
What Makes Good In-Context Examples for GPT-$3$?
What Makes Good In-Context Examples for GPT-333?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,296
0
17 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1