ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07207
  4. Cited By
Valley: Video Assistant with Large Language model Enhanced abilitY

Valley: Video Assistant with Large Language model Enhanced abilitY

12 June 2023
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
    MLLM
ArXivPDFHTML

Papers citing "Valley: Video Assistant with Large Language model Enhanced abilitY"

47 / 147 papers shown
Title
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
19
104
0
22 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
18
30
0
15 Mar 2024
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual
  Affective Computing
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
Hao Lu
Xuesong Niu
Jiyao Wang
Yin Wang
Qingyong Hu
...
Dengbo He
Shuiguang Deng
Hao Chen
Ying Chen
Shiguang Shan
MLLM
33
9
0
09 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in
  Dynamic Audio-Visual Scenarios
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Philip H. S. Torr
Xiaochun Cao
MLLM
27
24
0
07 Mar 2024
LLMs Meet Long Video: Advancing Long Video Comprehension with An
  Interactive Visual Adapter in LLMs
LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Yunxin Li
Xinyu Chen
Baotain Hu
Min-Ling Zhang
27
9
0
21 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
34
94
0
20 Feb 2024
SoMeLVLM: A Large Vision Language Model for Social Media Processing
SoMeLVLM: A Large Vision Language Model for Social Media Processing
Xinnong Zhang
Haoyu Kuang
Xinyi Mou
Hanjia Lyu
Kun Wu
Siming Chen
Jiebo Luo
Xuanjing Huang
Zhongyu Wei
MLLM
33
2
0
20 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained
  Temporal Reasoning
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
57
9
0
18 Feb 2024
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented
  In-Context Learning in Multi-Modal Large Language Model
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model
Jianhao Yuan
Shuyang Sun
Daniel Omeiza
Bo-Lu Zhao
Paul Newman
Lars Kunze
Matthew Gadd
LRM
14
47
0
16 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
27
56
0
13 Feb 2024
A Survey on Generative AI and LLM for Video Generation, Understanding,
  and Streaming
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
24
24
0
30 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
18
36
0
11 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
27
143
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
28
22
0
27 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for
  Human-Level Planning
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
13
8
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
17
36
0
11 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
10
142
0
04 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
16
49
0
01 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
25
45
0
30 Nov 2023
VTimeLLM: Empower LLM to Grasp Video Moments
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
72
80
0
30 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
16
28
0
29 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
26
259
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
22
398
0
28 Nov 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLM
VLM
15
58
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
15
58
0
27 Nov 2023
ADriver-I: A General World Model for Autonomous Driving
ADriver-I: A General World Model for Autonomous Driving
Fan Jia
Weixin Mao
Yingfei Liu
Yucheng Zhao
Yuqing Wen
Chi Zhang
Xiangyu Zhang
Tiancai Wang
13
63
0
22 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
182
576
0
16 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
25
12
0
14 Nov 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang
Xiaosong Jia
Hongyang Li
Junchi Yan
ELM
24
92
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
10
62
0
30 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
28
116
0
16 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
13
7
0
05 Oct 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large
  Language Model
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu
Yujia Zhang
Enze Xie
Zhen Zhao
Yong Guo
Kwan-Yee. K. Wong
Zhenguo Li
Hengshuang Zhao
MLLM
18
250
0
02 Oct 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction
  Tuning
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
17
29
0
27 Sep 2023
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo
  Embeddings
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
Yulin Su
Min Yang
Minghui Qiu
Jing Wang
Tao Wang
VLM
14
0
0
17 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
8
493
0
30 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
23
106
0
17 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
9
71
0
05 Jul 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
22
7
0
30 Jun 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
6
20
0
26 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
6
944
0
05 Jun 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
59
73
0
10 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Previous
123