ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.17491
  4. Cited By
Language Models can Solve Computer Tasks

Language Models can Solve Computer Tasks

30 March 2023
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
    LLMAG
    LM&Ro
ArXivPDFHTML

Papers citing "Language Models can Solve Computer Tasks"

50 / 256 papers shown
Title
Enhancing Q-Learning with Large Language Model Heuristics
Enhancing Q-Learning with Large Language Model Heuristics
Xiefeng Wu
LRM
32
0
0
06 May 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large
  Language Models and Reinforcement Learning
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andrei Thil
Mirela Popa
Gerasimos Spanakis
LLMAG
25
2
0
01 May 2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LRM
KELM
ReLM
23
31
0
26 Apr 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Changyeon Kim
Kimin Lee
23
8
0
25 Apr 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
46
21
0
22 Apr 2024
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in
  Large Language Models
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Yanhong Li
Chenghao Yang
Allyson Ettinger
ReLM
LRM
LLMAG
26
6
0
14 Apr 2024
Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?
Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
VLM
44
7
0
09 Apr 2024
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Michael Lutz
Arth Bohra
Manvel Saroyan
Artem Harutyunyan
Giovanni Campagna
LLMAG
22
13
0
08 Apr 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
39
82
0
08 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
42
19
0
04 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
66
49
0
02 Apr 2024
Learning to Plan for Language Modeling from Unlabeled Data
Learning to Plan for Language Modeling from Unlabeled Data
Nathan Cornille
Marie-Francine Moens
Florian Mai
30
7
0
31 Mar 2024
Your Co-Workers Matter: Evaluating Collaborative Capabilities of
  Language Models in Blocks World
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Guande Wu
Chen Zhao
Claudio Silva
He He
LLMAG
14
4
0
30 Mar 2024
Enhancing the General Agent Capabilities of Low-Parameter LLMs through
  Tuning and Multi-Branch Reasoning
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
Qinhao Zhou
Zihan Zhang
Xiang Xiang
Ke Wang
Yuchuan Wu
Yongbin Li
LLMAG
LRM
36
5
0
29 Mar 2024
The New Agronomists: Language Models are Experts in Crop Management
The New Agronomists: Language Models are Experts in Crop Management
Jing Wu
Zhixin Lai
Suiyao Chen
Ran Tao
Pan Zhao
N. Hovakimyan
24
19
0
28 Mar 2024
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao
Han Wu
Zhijiang Guo
Biyan Zhou
Jiahui Gao
Sichun Luo
Hanxu Hou
Xiaojin Fu
Linqi Song
LLMAG
LRM
40
9
0
28 Mar 2024
AIOS: LLM Agent Operating System
AIOS: LLM Agent Operating System
Kai Mei
Zelong Li
Wujiang Xu
Wenyue Hua
Mingyu Jin
Yongfeng Zhang
Shuyuan Xu
Ruosong Ye
Yingqiang Ge
Yongfeng Zhang
LLMAG
26
17
0
25 Mar 2024
Can Language Models Pretend Solvers? Logic Code Simulation with LLMs
Can Language Models Pretend Solvers? Logic Code Simulation with LLMs
Minyu Chen
Guoqiang Li
Ling-I Wu
Ruibang Liu
Yuxin Su
Xi Chang
Jianxin Xue
LLMAG
ELM
LRM
21
0
0
24 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open
  Domain Multi-Hop Question Answering
Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering
Yuan Gao
Yiheng Zhu
Yuanbin Cao
Yinzhi Zhou
Zhen Wu
Yujie Chen
Shenglan Wu
Haoyuan Hu
Xinyu Dai
LRM
49
2
0
19 Mar 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Kevin Xu
Yeganeh Kordi
Kate Sanders
Yizhong Wang
Adam Byerly
Kate Sanders
Adam Byerly
Jingyu Zhang
Benjamin Van Durme
Daniel Khashabi
LLMAG
64
6
0
18 Mar 2024
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Yiran Wu
Tianwei Yue
Shaokun Zhang
Chi Wang
Qingyun Wu
40
21
0
17 Mar 2024
Computer User Interface Understanding. A New Dataset and a Learning
  Framework
Computer User Interface Understanding. A New Dataset and a Learning Framework
Andrés Munoz
Daniel Borrajo
21
0
0
15 Mar 2024
AutoGuide: Automated Generation and Selection of State-Aware Guidelines
  for Large Language Model Agents
AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
Yao Fu
Dong-Ki Kim
Jaekyeom Kim
Sungryull Sohn
Lajanugen Logeswaran
Kyunghoon Bae
Honglak Lee
LLMAG
57
7
0
13 Mar 2024
Scaling Instructable Agents Across Many Simulated Worlds
Scaling Instructable Agents Across Many Simulated Worlds
Sima Team
Maria Abi Raad
Arun Ahuja
Catarina Barros
F. Besse
...
Daan Wierstra
Duncan Williams
Nathaniel Wong
Sarah York
Nick Young
LM&Ro
107
35
0
13 Mar 2024
Large Language Models are Contrastive Reasoners
Large Language Models are Contrastive Reasoners
Liang Yao
ReLM
ELM
LRM
32
2
0
13 Mar 2024
BAGEL: Bootstrapping Agents by Guiding Exploration with Language
BAGEL: Bootstrapping Agents by Guiding Exploration with Language
Shikhar Murty
Christopher D. Manning
Peter Shaw
Mandar Joshi
Kenton Lee
LM&Ro
LLMAG
21
14
0
12 Mar 2024
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work
  Tasks?
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin
Maxime Gasse
Massimo Caccia
I. Laradji
Manuel Del Verme
...
Megh Thakkar
Quentin Cappart
David Vazquez
Nicolas Chapados
Alexandre Lacoste
LLMAG
51
51
0
12 Mar 2024
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned
  Decision
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision
Ruiwen Zhou
Yingxuan Yang
Kangrui Chen
Ying Wen
Wenhao Wang
Chunling Xi
Guoqiang Xu
Jiliang Tang
Lingjuan Lyu
LLMAG
21
8
0
10 Mar 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Boshi Wang
Hao Fang
Jason Eisner
Benjamin Van Durme
Yu-Chuan Su
CLL
27
7
0
07 Mar 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
  Models for PowerPoint Task Completion
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Zekai Zhang
Yiduo Guo
Yaobo Liang
Dongyan Zhao
Nan Duan
33
1
0
06 Mar 2024
Learning to Use Tools via Cooperative and Interactive Agents
Learning to Use Tools via Cooperative and Interactive Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Zhumin Chen
Lingyong Yan
Haibo Shi
Dawei Yin
Pengjie Ren
Suzan Verberne
Zhaochun Ren
LLMAG
24
24
0
05 Mar 2024
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied
  Instruction Following
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
Haochen Shi
Zhiyuan Sun
Xingdi Yuan
Marc-Alexandre Côté
Bang Liu
LLMAG
19
10
0
05 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
51
44
0
27 Feb 2024
DS-Agent: Automated Data Science by Empowering Large Language Models
  with Case-Based Reasoning
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
Siyuan Guo
Cheng Deng
Ying Wen
Hechang Chen
Yi-Ju Chang
Jun Wang
ELM
LM&Ro
LLMAG
AI4CE
37
26
0
27 Feb 2024
Look Before You Leap: Problem Elaboration Prompting Improves
  Mathematical Reasoning in Large Language Models
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models
Haoran Liao
Jidong Tian
Shaohua Hu
Hao He
Yaohui Jin
ReLM
LRM
33
1
0
24 Feb 2024
On the Multi-turn Instruction Following for Conversational Web Agents
On the Multi-turn Instruction Following for Conversational Web Agents
Yang Deng
Xuan Zhang
Wenxuan Zhang
Yifei Yuan
See-Kiong Ng
Tat-Seng Chua
LLMAG
LM&Ro
23
21
0
23 Feb 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
34
43
0
22 Feb 2024
Making Reasoning Matter: Measuring and Improving Faithfulness of
  Chain-of-Thought Reasoning
Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
Debjit Paul
Robert West
Antoine Bosselut
Boi Faltings
ReLM
LRM
27
20
0
21 Feb 2024
Learning to Check: Unleashing Potentials for Self-Correction in Large
  Language Models
Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models
Che Zhang
Zhenyang Xiao
Chengcheng Han
Yixin Lian
Yuejian Fang
LRM
25
0
0
20 Feb 2024
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of
  Large Language Models
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
Loka Li
Zhenhao Chen
Guan-Hong Chen
Yixuan Zhang
Yusheng Su
Eric P. Xing
Kun Zhang
LRM
36
15
0
19 Feb 2024
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Kuang-Huei Lee
Xinyun Chen
Hiroki Furuta
John F. Canny
Ian S. Fischer
RALM
53
29
0
15 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
22
59
0
08 Feb 2024
Dual-View Visual Contextualization for Web Navigation
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil
Chan Hee Song
Boyuan Zheng
Xiang Deng
Yu-Chuan Su
Wei-Lun Chao
EgoV
14
12
0
06 Feb 2024
Skill Set Optimization: Reinforcing Language Model Behavior via
  Transferable Skills
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
Kolby Nottingham
Bodhisattwa Prasad Majumder
Bhavana Dalvi
Sameer Singh
Peter Clark
Roy Fox
32
7
0
05 Feb 2024
Understanding the planning of LLM agents: A survey
Understanding the planning of LLM agents: A survey
Xu Huang
Weiwen Liu
Xiaolong Chen
Xingmei Wang
Hao Wang
Defu Lian
Yasheng Wang
Ruiming Tang
Enhong Chen
LLMAG
LM&Ro
19
126
0
05 Feb 2024
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on
  Model-induced Process Supervision
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision
Zihan Wang
Yunxuan Li
Yuexin Wu
Liangchen Luo
Le Hou
Hongkun Yu
Jingbo Shang
LRM
29
18
0
05 Feb 2024
Executable Code Actions Elicit Better LLM Agents
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang
Yangyi Chen
Lifan Yuan
Yizhe Zhang
Yunzhu Li
Hao Peng
Heng Ji
ELM
LLMAG
LM&Ro
24
127
0
01 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM
  Collaboration
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Vidhisha Balachandran
Yulia Tsvetkov
18
77
0
01 Feb 2024
WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts
WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts
Pardis Sadat Zahraei
Ali Emami
16
6
0
31 Jan 2024
Previous
123456
Next