ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02858
  4. Cited By
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

5 June 2023
Hang Zhang
Xin Li
Lidong Bing
    MLLM
ArXivPDFHTML

Papers citing "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding"

50 / 694 papers shown
Title
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
24
10
0
04 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
10
174
0
04 Dec 2023
Video Summarization: Towards Entity-Aware Captions
Video Summarization: Towards Entity-Aware Captions
Hammad A. Ayyubi
Tianqi Liu
Arsha Nagrani
Xudong Lin
Mingda Zhang
Anurag Arnab
Feng Han
Yukun Zhu
Jialu Liu
Shih-Fu Chang
26
1
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
21
49
0
01 Dec 2023
ChatPose: Chatting about 3D Human Pose
ChatPose: Chatting about 3D Human Pose
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
26
34
0
30 Nov 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
28
45
0
30 Nov 2023
VTimeLLM: Empower LLM to Grasp Video Moments
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
87
113
0
30 Nov 2023
M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image
  Generation
M2^{2}2Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation
Xiaowei Chi
Rongyu Zhang
Zhengkai Jiang
Yijiang Liu
Ziyi Lin
...
Chaoyou Fu
Peng Gao
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
MLLM
33
1
0
29 Nov 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of
  Video-Language Models
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li
Lei Li
Shuhuai Ren
Yuanxin Liu
Yi Liu
Rundong Gao
Xu Sun
Lu Hou
27
28
0
29 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
26
259
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
46
398
0
28 Nov 2023
Mitigating Object Hallucinations in Large Vision-Language Models through
  Visual Contrastive Decoding
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng
Hang Zhang
Guanzheng Chen
Xin Li
Shijian Lu
Chunyan Miao
Li Bing
VLM
MLLM
85
196
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
49
20
0
28 Nov 2023
Graph Prompt Learning: A Comprehensive Survey and Beyond
Graph Prompt Learning: A Comprehensive Survey and Beyond
Xiangguo Sun
Jiawen Zhang
Xixi Wu
Hong Cheng
Yun Xiong
Jia Li
25
51
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
23
58
0
27 Nov 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
14
26
0
26 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision
  Language Models in Open-Ended Video Question Answering
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELM
MLLM
18
26
0
25 Nov 2023
ADriver-I: A General World Model for Autonomous Driving
ADriver-I: A General World Model for Autonomous Driving
Fan Jia
Weixin Mao
Yingfei Liu
Yucheng Zhao
Yuqing Wen
Chi Zhang
Xiangyu Zhang
Tiancai Wang
22
63
0
22 Nov 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
17
34
0
22 Nov 2023
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models
  with MultiAPI Benchmark
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
Xiao Liu
Jianfeng Lin
Jiawei Zhang
19
2
0
21 Nov 2023
A Survey of Graph Meets Large Language Model: Progress and Future
  Directions
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Yuhan Li
Zhixun Li
Peisong Wang
Jia Li
Xiangguo Sun
Hongtao Cheng
Jeffrey Xu Yu
28
53
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
36
248
0
21 Nov 2023
VLM-Eval: A General Evaluation on Video Large Language Models
VLM-Eval: A General Evaluation on Video Large Language Models
Shuailin Li
Yuang Zhang
Yucheng Zhao
Qiuyue Wang
Fan Jia
Yingfei Liu
Tiancai Wang
MLLM
ELM
10
2
0
20 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
586
0
16 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated
  Physics understanding in multimodal language models
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
13
9
0
15 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
28
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models
  with Image and Video Understanding
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
34
217
0
14 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
4
0
10 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
37
3
0
08 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
375
0
07 Nov 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang
Xiaosong Jia
Hongyang Li
Junchi Yan
ELM
24
92
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
21
62
0
30 Oct 2023
Transformation vs Tradition: Artificial General Intelligence (AGI) for
  Arts and Humanities
Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities
Zheng Liu
Yiwei Li
Qian Cao
Junwen Chen
Tianze Yang
...
John Gibbs
Khaled Rasheed
Ninghao Liu
Gengchen Mai
Tianming Liu
AI4CE
36
10
0
30 Oct 2023
Large Language Models are Temporal and Causal Reasoners for Video
  Question Answering
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
33
31
0
24 Oct 2023
Vision Language Models in Autonomous Driving: A Survey and Outlook
Vision Language Models in Autonomous Driving: A Survey and Outlook
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
20
33
0
22 Oct 2023
MarineGPT: Unlocking Secrets of Ocean to the Public
MarineGPT: Unlocking Secrets of Ocean to the Public
Ziqiang Zheng
Jipeng Zhang
Tuan-Anh Vu
Shizhe Diao
Yue Him Wong Tim
Sai-Kit Yeung
28
11
0
20 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
35
195
0
20 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
31
34
0
13 Oct 2023
KwaiYiiMath: Technical Report
KwaiYiiMath: Technical Report
Jia-Yi Fu
Lei Lin
Xiaoyang Gao
Pengli Liu
Zhengzong Chen
...
Zijia Lin
Fuzheng Zhang
Zhongyuan Wang
Di Zhang
Kun Gai
LRM
ReLM
RALM
38
2
0
11 Oct 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
28
0
0
11 Oct 2023
MuseChat: A Conversational Music Recommendation System for Videos
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong
Bin Chen
Xiulong Liu
Paweł Polak
Peng Zhang
LRM
37
25
0
10 Oct 2023
FireAct: Toward Language Agent Fine-tuning
FireAct: Toward Language Agent Fine-tuning
Baian Chen
Chang Shu
Ehsan Shareghi
Nigel Collier
Karthik Narasimhan
Shunyu Yao
ALM
LLMAG
99
97
0
09 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
26
12
0
09 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
21
12
0
08 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
15
7
0
05 Oct 2023
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
107
159
0
04 Oct 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large
  Language Model
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu
Yujia Zhang
Enze Xie
Zhen Zhao
Yong Guo
Kwan-Yee. K. Wong
Zhenguo Li
Hengshuang Zhao
MLLM
18
250
0
02 Oct 2023
Previous
123...121314
Next