ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.17491
  4. Cited By
Language Models can Solve Computer Tasks

Language Models can Solve Computer Tasks

30 March 2023
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
    LLMAG
    LM&Ro
ArXivPDFHTML

Papers citing "Language Models can Solve Computer Tasks"

50 / 256 papers shown
Title
A Framework for Collaborating a Large Language Model Tool in
  Brainstorming for Triggering Creative Thoughts
A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts
Hung-Fu Chang
Tong Li
KELM
LLMAG
34
2
0
10 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+
  Interaction Trajectories
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song
Weimin Xiong
Xiutian Zhao
Dawei Zhu
Wenhao Wu
Ke Wang
Cheng Li
Wei Peng
Sujian Li
LLMAG
24
9
0
10 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential
  Decision Making
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&Ro
ELM
OffRL
LRM
29
2
0
08 Oct 2024
Mirror-Consistency: Harnessing Inconsistency in Majority Voting
Mirror-Consistency: Harnessing Inconsistency in Majority Voting
Siyuan Huang
Zhiyuan Ma
Jintao Du
Changhua Meng
Weiqiang Wang
Zhouhan Lin
LRM
24
3
0
07 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
76
48
0
07 Oct 2024
From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with
  LLM-Guided Knowledge
From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
Xiefeng Wu
OffRL
29
1
0
02 Oct 2024
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li
Zhicheng Sun
Fei Li
80
1
0
02 Oct 2024
'Simulacrum of Stories': Examining Large Language Models as Qualitative
  Research Participants
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants
Shivani Kapania
William Agnew
Motahhare Eslami
Hoda Heidari
Sarah E Fox
34
4
0
28 Sep 2024
Synatra: Turning Indirect Knowledge into Direct Demonstrations for
  Digital Agents at Scale
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
Tianyue Ou
Frank F. Xu
Aman Madaan
J. Liu
Robert Lo
Abishek Sridhar
Sudipta Sengupta
Dan Roth
Graham Neubig
Shuyan Zhou
OffRL
25
9
0
24 Sep 2024
SiSCo: Signal Synthesis for Effective Human-Robot Communication Via
  Large Language Models
SiSCo: Signal Synthesis for Effective Human-Robot Communication Via Large Language Models
Shubham D. Sonawani
F. Weigend
H. B. Amor
13
0
0
20 Sep 2024
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round
  LLM Generation
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation
Chen Liang
Zhifan Feng
Zihe Liu
Wenbin Jiang
Jinan Xu
Yufeng Chen
Yong Wang
LLMAG
LRM
18
0
0
19 Sep 2024
100 instances is all you need: predicting the success of a new LLM on
  unseen data by testing on a few instances
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
Lorenzo Pacchiardi
Lucy G. Cheke
José Hernández Orallo
ALM
LRM
ELM
36
3
0
05 Sep 2024
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
Segev Shlomov
Ben wiesel
Aviad Sela
Ido Levy
Liane Galanti
Roy Abitbol
LLMAG
30
3
0
03 Sep 2024
Critic-CoT: Boosting the reasoning abilities of large language model via
  Chain-of-thoughts Critic
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Xin Zheng
Jie Lou
Boxi Cao
Xueru Wen
Yuqiu Ji
Hongyu Lin
Y. Lu
Xianpei Han
Debing Zhang
Le Sun
LLMAG
OffRL
LRM
ReLM
KELM
28
13
1
29 Aug 2024
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task
  Execution with Strategic Exploration
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang
Zijian Ma
Yunpu Ma
Zhen Han
Yu Wu
Volker Tresp
LLMAG
38
22
0
28 Aug 2024
Critique-out-Loud Reward Models
Critique-out-Loud Reward Models
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALM
LRM
25
25
0
21 Aug 2024
What's Wrong? Refining Meeting Summaries with LLM Feedback
What's Wrong? Refining Meeting Summaries with LLM Feedback
Frederic Kirstein
Terry Ruas
Bela Gipp
47
6
0
16 Jul 2024
Source Code Summarization in the Era of Large Language Models
Source Code Summarization in the Era of Large Language Models
Weisong Sun
Yun Miao
Yuekang Li
Hongyu Zhang
Chunrong Fang
Yi Liu
Gelei Deng
Yang Liu
Zhenyu Chen
ELM
39
14
0
09 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
62
9
0
09 Jul 2024
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through
  Self-Correction in Language Models
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Haritz Puerto
Tilek Chubakov
Xiaodan Zhu
Harish Tayyar Madabushi
Iryna Gurevych
ReLM
LRM
39
9
1
03 Jul 2024
Tree Search for Language Model Agents
Tree Search for Language Model Agents
Jing Yu Koh
Stephen Marcus McAleer
Daniel Fried
Ruslan Salakhutdinov
LM&Ro
LLMAG
LRM
46
56
0
01 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
52
5
0
01 Jul 2024
LLM Critics Help Catch LLM Bugs
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALM
LRM
27
58
0
28 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
33
45
0
24 Jun 2024
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Orevaoghene Ahia
Shuyue Stella Li
Vidhisha Balachandran
Sunayana Sitaram
Yulia Tsvetkov
62
4
0
22 Jun 2024
Large Language Models have Intrinsic Self-Correction Ability
Large Language Models have Intrinsic Self-Correction Ability
Dancheng Liu
Amir Nassereldine
Ziming Yang
Chenhui Xu
Yuting Hu
Jiajie Li
Utkarsh Kumar
Changjae Lee
Jinjun Xiong
KELM
ReLM
LRM
23
9
0
21 Jun 2024
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
Ke Wang
Tianyu Xia
Zhangxuan Gu
Yi Zhao
Shuheng Shen
Changhua Meng
Weiqiang Wang
Ke Xu
18
0
0
20 Jun 2024
GUI Action Narrator: Where and When Did That Action Take Place?
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu
Difei Gao
Kevin Qinghong Lin
Zhuoyu Wu
Xiangwu Guo
Peiran Li
Weichen Zhang
Hengxu Wang
Mike Zheng Shou
29
3
0
19 Jun 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for
  LLM Agents
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
43
25
1
19 Jun 2024
WebCanvas: Benchmarking Web Agents in Online Environments
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan
Dehan Kong
Sida Zhou
Cheng Cui
Yifei Leng
...
Hangyu Liu
Yanyi Shang
Shuyan Zhou
Tongshuang Wu
Zhengyang Wu
24
26
0
18 Jun 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
31
31
0
17 Jun 2024
Security of AI Agents
Security of AI Agents
Yifeng He
Ethan Wang
Yuyang Rong
Zifei Cheng
Hao Chen
LLMAG
29
7
0
12 Jun 2024
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks
  with Front-End UI Only
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only
Junhee Cho
Jihoon Kim
Daseul Bae
Jinho Choo
Youngjune Gwon
Yeong-Dae Kwon
LLMAG
21
1
0
11 Jun 2024
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning
  Strategies
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Junlin Wang
Siddhartha Jain
Dejiao Zhang
Baishakhi Ray
Varun Kumar
Ben Athiwaratkun
30
19
0
10 Jun 2024
AICoderEval: Improving AI Domain Code Generation of Large Language
  Models
AICoderEval: Improving AI Domain Code Generation of Large Language Models
Yinghui Xia
Yuyan Chen
Tianyu Shi
Jun Wang
Jinsong Yang
34
2
0
07 Jun 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAML
LLMAG
29
19
0
05 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of
  Self-Correction of LLMs
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
40
57
0
03 Jun 2024
Two Tales of Persona in LLMs: A Survey of Role-Playing and
  Personalization
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Yu-Min Tseng
Yu-Chao Huang
Teng-Yun Hsiao
Yu-Ching Hsu
Chao-Wei Huang
Jia-Yin Foo
Yun-Nung Chen
LLMAG
246
63
0
03 Jun 2024
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
Zi-Yi Dou
Cheng-Fu Yang
Xueqing Wu
Kai-Wei Chang
Nanyun Peng
LRM
81
7
0
03 Jun 2024
WebSuite: Systematically Evaluating Why Web Agents Fail
WebSuite: Systematically Evaluating Why Web Agents Fail
Eric Li
Jim Waldo
LLMAG
15
4
0
01 Jun 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
28
13
0
28 May 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
31
77
0
28 May 2024
RLSF: Reinforcement Learning via Symbolic Feedback
RLSF: Reinforcement Learning via Symbolic Feedback
Piyush Jha
Prithwish Jana
Arnav Arora
Vijay Ganesh
LRM
36
3
0
26 May 2024
AutoManual: Generating Instruction Manuals by LLM Agents via Interactive
  Environmental Learning
AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning
Minghao Chen
Yihang Li
Yanting Yang
Shiyu Yu
Binbin Lin
Xiaofei He
LLMAG
31
1
0
25 May 2024
Harnessing Large Language Models for Software Vulnerability Detection: A
  Comprehensive Benchmarking Study
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study
Karl Tamberg
Hayretdin Bahsi
39
5
0
24 May 2024
Large Language Models Can Self-Correct with Minimal Effort
Large Language Models Can Self-Correct with Minimal Effort
Zhenyu Wu
Qingkai Zeng
Zhihan Zhang
Zhaoxuan Tan
Chao Shen
Meng-Long Jiang
KELM
LRM
ReLM
24
3
0
23 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
57
44
0
23 May 2024
Latent State Estimation Helps UI Agents to Reason
Latent State Estimation Helps UI Agents to Reason
Will Bishop
Alice Li
Christopher Rawles
Oriana Riva
LRM
LLMAG
19
3
0
17 May 2024
METAREFLECTION: Learning Instructions for Language Agents using Past
  Reflections
METAREFLECTION: Learning Instructions for Language Agents using Past Reflections
Priyanshu Gupta
Shashank Kirtania
Ananya Singha
Sumit Gulwani
Arjun Radhakrishna
Sherry Shi
Gustavo Soares
LLMAG
27
4
0
13 May 2024
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
Zhuoxuan Jiang
Haoyuan Peng
Shanshan Feng
Fan Li
Dongsheng Li
LRM
KELM
35
12
0
09 May 2024
Previous
123456
Next