ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02858
  4. Cited By
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

5 June 2023
Hang Zhang
Xin Li
Lidong Bing
    MLLM
ArXivPDFHTML

Papers citing "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding"

50 / 697 papers shown
Title
A Misleading Gallery of Fluid Motion by Generative Artificial
  Intelligence
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
43
5
0
24 May 2024
Continuously Learning, Adapting, and Improving: A Dual-Process Approach
  to Autonomous Driving
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Jianbiao Mei
Yukai Ma
Xuemeng Yang
Licheng Wen
Xinyu Cai
...
Min Dou
Botian Shi
Liang He
Yong-Jin Liu
Yu Qiao
35
9
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
67
41
0
23 May 2024
Dense Connector for MLLMs
Dense Connector for MLLMs
Huanjin Yao
Wenhao Wu
Taojiannan Yang
Yuxin Song
Mengxi Zhang
Haocheng Feng
Yifan Sun
Zhiheng Li
Wanli Ouyang
Jingdong Wang
MLLM
VLM
32
16
0
22 May 2024
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation
  Models
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
Guangzhi Sun
Potsawee Manakul
Adian Liusie
Kunat Pipatanakul
Chao Zhang
P. Woodland
Mark J. F. Gales
HILM
MLLM
16
7
0
22 May 2024
An Empirical Study and Analysis of Text-to-Image Generation Using Large
  Language Model-Powered Textual Representation
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Zhiyu Tan
Mengping Yang
Luozheng Qin
Hao Yang
Ye Qian
Qiang-feng Zhou
Cheng Zhang
Hao Li
63
3
0
21 May 2024
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
Zhiyuan Liu
An Zhang
Hao Fei
Enzhi Zhang
Xiang Wang
Kenji Kawaguchi
Tat-Seng Chua
52
16
0
21 May 2024
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Zhenwei Shao
Zhou Yu
Jun Yu
Xuecheng Ouyang
Lihao Zheng
Zhenbiao Gai
Mingyang Wang
Jiajun Ding
21
10
0
20 May 2024
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
Fanfan Wang
Heqing Ma
Jianfei Yu
Rui Xia
Erik Cambria
32
22
0
19 May 2024
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Zeyu Zhang
Yiran Wang
Biao Wu
Shuo Chen
Zhiyuan Zhang
Shiya Huang
Wenbo Zhang
Meng Fang
Ling-Hao Chen
Yang Zhao
VGen
32
6
0
18 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
34
28
0
18 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
39
45
0
17 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
E. Chng
Ruizhe Li
AuLLM
KELM
36
5
0
16 May 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World
  Knowledge
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
24
16
0
15 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
32
19
0
13 May 2024
Sakuga-42M Dataset: Scaling Up Cartoon Research
Sakuga-42M Dataset: Scaling Up Cartoon Research
Zhenglin Pan
Yu Zhu
Yuxuan Mu
30
6
0
13 May 2024
A Survey of Large Language Models for Graphs
A Survey of Large Language Models for Graphs
Xubin Ren
Jiabin Tang
Dawei Yin
Nitesh V. Chawla
Chao Huang
26
30
0
10 May 2024
Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language
  Translation
Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation
Ryan Wong
Necati Cihan Camgöz
Richard Bowden
SLR
40
21
0
07 May 2024
How Good is my Video LMM? Complex Video Reasoning and Robustness
  Evaluation Suite for Video-LMMs
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Jameel Hassan
Muzammal Naseer
Federico Tombari
Fahad Shahbaz Khan
Salman Khan
LRM
ELM
32
10
0
06 May 2024
WorldQA: Multimodal World Knowledge in Videos through Long-Chain
  Reasoning
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Yuanhan Zhang
Kaichen Zhang
Bo-wen Li
Fanyi Pu
Christopher Arif Setiadharma
Jingkang Yang
Ziwei Liu
VGen
47
7
0
06 May 2024
Octopi: Object Property Reasoning with Large Tactile-Language Models
Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu
Kelvin Lin
Anxing Xiao
Jiafei Duan
Harold Soh
LRM
31
23
0
05 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max W.F. Ku
Qian Liu
Wenhu Chen
VLM
MLLM
28
100
0
02 May 2024
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Deng Li
Xin Liu
Bohao Xing
Baiqiang Xia
Yuan Zong
Bihan Wen
H. Kalviainen
23
3
0
01 May 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
76
34
0
29 Apr 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question
  Answering
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
24
28
0
26 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and
  Open-Vocabulary Multimodal Emotion Recognition
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Licai Sun
Zhuofan Wen
Siyuan Zhang
...
Bin Liu
Erik Cambria
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
31
11
0
26 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
34
156
0
25 Apr 2024
Energy-Latency Manipulation of Multi-modal Large Language Models via
  Verbose Samples
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
Kuofeng Gao
Jindong Gu
Yang Bai
Shu-Tao Xia
Philip H. S. Torr
Wei Liu
Zhifeng Li
59
11
0
25 Apr 2024
Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage
  framework for Emotion-Cause Pair Extraction in Conversations
Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations
Shen Zhang
Haojie Zhang
Jing Zhang
Xudong Zhang
Yimeng Zhuang
Jinting Wu
33
2
0
25 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of
  Theories, Detection Methods, and Opportunities
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
62
7
0
25 Apr 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
27
5
0
24 Apr 2024
Pegasus-v1 Technical Report
Pegasus-v1 Technical Report
Raehyuk Jung
Hyojun Go
Jaehyuk Yi
Jiho Jang
Daniel Kim
...
Maninder Saini
Meredith Sanders
Soyoung Lee
Sue Kim
Travis Couture
MLLM
VLM
29
5
0
23 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
36
20
0
22 Apr 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
TAVGBench: Benchmarking Text to Audible-Video Generation
Yuxin Mao
Xuyang Shen
Jing Zhang
Zhen Qin
Jinxing Zhou
Mochu Xiang
Yiran Zhong
Yuchao Dai
40
11
0
22 Apr 2024
Graphic Design with Large Multimodal Model
Graphic Design with Large Multimodal Model
Yutao Cheng
Zhao Zhang
Maoke Yang
Hui Nie
Chunyuan Li
Xinglong Wu
Jie Shao
36
10
0
22 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
  Instruction Tuning
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
60
25
0
18 Apr 2024
AccidentBlip: Agent of Accident Warning based on MA-former
AccidentBlip: Agent of Accident Warning based on MA-former
Yihua Shao
Hongyi Cai
Xinwei Long
Weiyi Lang
Ziyang Yan
Haoran Wu
Yan Wang
Jiayi Yin
Yang Yang
Yisheng Lv
26
2
0
18 Apr 2024
From Image to Video, what do we need in multimodal LLMs?
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Yao Hu
Zengchang Qin
VLM
36
8
0
18 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
32
4
0
18 Apr 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe
Satya Narayan Shukla
Omid Poursaeed
Michael S. Ryoo
Tsung-Yu Lin
LRM
38
22
0
11 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
35
23
0
10 Apr 2024
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Xincan Feng
A. Yoshimoto
32
2
0
10 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large
  Language Model
DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model
Chao Gao
Sai Qian Zhang
ALM
118
7
0
08 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context
  and Dynamics of Human Interactions Within Social Groups
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
26
6
0
06 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from
  Interleaved Multimodal Inputs
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen
3DV
200
4
0
05 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li
Jingbo Wang
Lixin Yang
Cewu Lu
Bo Dai
38
15
0
04 Apr 2024
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
  Interleaved Visual-Textual Tokens
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Deyao Zhu
Jian Ding
Mohamed Elhoseiny
VLM
39
66
0
04 Apr 2024
Previous
123...10111213149
Next