ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.09395
  4. Cited By
Evaluating LLMs' Mathematical and Coding Competency through
  Ontology-guided Interventions

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

17 January 2024
Pengfei Hong
Navonil Majumder
Deepanway Ghosal
Somak Aditya
Rada Mihalcea
Soujanya Poria
    LRM
ArXivPDFHTML

Papers citing "Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions"

5 / 5 papers shown
Title
Evaluating Mathematical Reasoning of Large Language Models: A Focus on
  Error Identification and Correction
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
25
15
0
02 Jun 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language
  Models -- A Survey
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
24
34
0
02 Apr 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
22
192
0
22 Jan 2024
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive
  Reasoning
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning
Soumya Sanyal
Zeyi Liao
Xiang Ren
ELM
ReLM
LRM
39
19
0
25 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
1