ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17968
  4. Cited By
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

23 May 2025
Jiayi Geng
Howard Chen
Dilip Arumugam
Thomas L. Griffiths
ArXiv (abs)PDFHTML

Papers citing "Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems"

18 / 18 papers shown
Title
Toward Efficient Exploration by Large Language Model Agents
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
212
4
0
29 Apr 2025
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Charles OÑeill
Tirthankar Ghosal
Roberta Răileanu
Mike Walmsley
Thang Bui
Kevin Schawinski
I. Ciucă
LRM
109
4
0
17 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALMELM
965
23
0
02 Apr 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRLReLMLRM
204
101
0
20 Mar 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo
Yulin Song
Xingyao Zhang
Jiaheng Liu
Weixun Wang
Gengru Chen
Wenbo Su
Bo Zheng
LRM
114
11
0
20 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
234
39
0
17 Mar 2025
On Benchmarking Human-Like Intelligence in Machines
On Benchmarking Human-Like Intelligence in Machines
Lance Ying
Katherine M. Collins
L. Wong
Ilia Sucholutsky
Ryan Liu
Adrian Weller
Tianmin Shu
Thomas Griffiths
Joshua B. Tenenbaum
ALMELM
434
10
0
27 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELMLRM
166
17
0
26 Feb 2025
Towards an AI co-scientist
Towards an AI co-scientist
Juraj Gottweis
W. Weng
Alexander Daryin
T. Tu
Anil Palepu
...
Gary Peltz
Yunhan Xu
Annalisa Pawlosky
Alan Karthikesalingam
Vivek Natarajan
LLMAG
136
51
0
26 Feb 2025
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron
Dacheng Li
Wenjie Ma
Xingyao Wang
Yichuan Wang
...
Aditya Desai
Ion Stoica
Ana Klimovic
Graham Neubig
Joseph E. Gonzalez
LRMAI4CE
306
54
0
12 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
390
2,024
0
22 Jan 2025
Agent Laboratory: Using LLM Agents as Research Assistants
Agent Laboratory: Using LLM Agents as Research Assistants
Samuel Schmidgall
Yusheng Su
Zihan Wang
Xingwu Sun
Jialian Wu
Xiaodong Yu
Jiang Liu
Michael Moor
Zicheng Liu
Emad Barsoum
LLMAG
94
68
2
08 Jan 2025
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery
Kanishk Gandhi
Michael Y. Li
Lyle Goodyear
Louise Li
Aditi Bhaskar
Mohammed Zaman
Noah D. Goodman
65
2
0
02 Jan 2025
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLMLRM
243
132
0
18 Sep 2024
People use fast, goal-directed simulation to reason about novel games
People use fast, goal-directed simulation to reason about novel games
Cedegao E. Zhang
Katherine M. Collins
L. Wong
Adrian Weller
Adrian Weller
Joshua B. Tenenbaum
LRM
61
1
0
19 Jul 2024
Large Language Models Assume People are More Rational than We Really are
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas Griffiths
162
20
0
24 Jun 2024
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee
Kazem Meidani
Shashank Gupta
A. Farimani
Chandan K. Reddy
217
23
0
29 Apr 2024
Incoherent Probability Judgments in Large Language Models
Incoherent Probability Judgments in Large Language Models
Jian-Qiao Zhu
Thomas Griffiths
164
8
0
30 Jan 2024
1