ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.20305
  4. Cited By
Can't make an Omelette without Breaking some Eggs: Plausible Action
  Anticipation using Large Video-Language Models

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

30 May 2024
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
ArXivPDFHTML

Papers citing "Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models"

21 / 21 papers shown
Title
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
45
0
0
03 May 2025
SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model
SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model
Zongcan Ding
H. Zhang
Peng Wu
Guansong Pang
Zhiwei Yang
Peng Wang
Y. Zhang
21
0
0
14 Apr 2025
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang
Yuan-Ming Li
Zhi-Wei Xia
Yu-Ming Tang
Kun-Yu Lin
Jian-Fang Hu
Wei-Shi Zheng
39
0
0
28 Mar 2025
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun
Rayan Armani
Manuel Meier
Max Moebus
Christian Holz
EgoV
31
0
0
28 Feb 2025
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
92
0
0
21 Dec 2024
Egocentric and Exocentric Methods: A Short Survey
Egocentric and Exocentric Methods: A Short Survey
Anirudh Thatipelli
Shao-Yuan Lo
Amit K. Roy-Chowdhury
EgoV
39
2
0
27 Oct 2024
Human Action Anticipation: A Survey
Human Action Anticipation: A Survey
Bolin Lai
Sam Toyer
Tushar Nagarajan
Rohit Girdhar
S. Zha
James M. Rehg
Kris M. Kitani
Kristen Grauman
Ruta Desai
Miao Liu
AI4TS
26
1
0
17 Oct 2024
TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human
  Action Prediction
TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction
Kojiro Takeyama
Yimeng Liu
Misha Sra
14
1
0
05 Oct 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with
  Multimodal Large Language Models
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
22
2
0
31 Aug 2024
Bridging Compressed Image Latents and Multimodal Large Language Models
Bridging Compressed Image Latents and Multimodal Large Language Models
Chia-Hao Kao
Cheng Chien
Yu-Jen Tseng
Yi-Hsin Chen
Alessandro Gnutti
Shao-Yuan Lo
Wen-Hsiao Peng
Riccardo Leonardi
27
0
0
29 Jul 2024
Follow the Rules: Reasoning for Video Anomaly Detection with Large
  Language Models
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang
Kwonjoon Lee
Behzad Dariush
Yinzhi Cao
Shao-Yuan Lo
LRM
23
11
0
14 Jul 2024
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action
  Localization
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta
Gaurav Mittal
Ahmed Magooda
Ye Yu
Graham W. Taylor
Mei Chen
31
2
0
01 Apr 2024
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
89
51
0
22 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Anticipative Feature Fusion Transformer for Multi-Modal Action
  Anticipation
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
66
44
0
23 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
52
20
0
27 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Andreas Fürst
Elisabeth Rumetshofer
Johannes Lehner
Viet-Hung Tran
Fei Tang
...
David P. Kreil
Michael K Kopp
G. Klambauer
Angela Bitto-Nemling
Sepp Hochreiter
VLM
CLIP
179
101
0
21 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
1