Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.01766
Cited By
v1
v2 (latest)
VideoBERT: A Joint Model for Video and Language Representation Learning
3 April 2019
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VideoBERT: A Joint Model for Video and Language Representation Learning"
50 / 803 papers shown
Can LLMs Understand Time Series Anomalies?
International Conference on Learning Representations (ICLR), 2024
Zihao Zhou
Rose Yu
AI4TS
390
31
0
13 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
1.1K
11
0
11 Mar 2025
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Yiheng Zhu
Mingyang Li
Junlong Liu
Kun Fu
Jian Wu
Yue Liu
Mingze Yin
Jieping Ye
Jian Wu
Xiping Hu
334
0
0
06 Mar 2025
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu
Na Zhao
Gang Niu
Masashi Sugiyama
Xiaofeng Zhu
536
3
0
06 Mar 2025
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
Proceedings of the VLDB Endowment (PVLDB), 2024
Biao Ouyang
Yingying Zhang
Hanyin Cheng
Yang Shu
Chenjuan Guo
Bin Yang
Qingsong Wen
L. Fan
Christian S. Jensen
200
6
0
06 Mar 2025
EgoLife: Towards Egocentric Life Assistant
Computer Vision and Pattern Recognition (CVPR), 2025
Jingkang Yang
Shuai Liu
Hongming Guo
Yuhao Dong
Xinyu Zhang
...
Joerg Widmer
Francesco Gringoli
Lei Yang
Bo Li
Ziwei Liu
EgoV
254
12
0
05 Mar 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
384
5
0
24 Feb 2025
Understanding the Emergence of Multimodal Representation Alignment
Megan Tjandrasuwita
Chanakya Ekbote
Liu Ziyin
Paul Pu Liang
329
14
0
22 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
326
0
0
11 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
324
3
0
09 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
328
0
0
04 Feb 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
428
2
0
13 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
461
33
0
06 Jan 2025
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
919
10
0
18 Dec 2024
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched EMR
IEEE Access (IEEE Access), 2024
Jangyeong Jeon
Sangyeon Cho
Dongjoon Lee
Changhee Lee
Junyeong Kim
221
0
0
16 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
373
5
0
13 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
484
6
0
12 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yanjie Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
290
1
0
10 Dec 2024
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
Yuan-Ming Li
An-Lan Wang
Kun-Yu Lin
Yu-Ming Tang
Ling-an Zeng
Jian-Fang Hu
Wei-Shi Zheng
542
6
0
26 Nov 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
434
23
0
23 Nov 2024
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
255
1
0
11 Nov 2024
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
290
15
0
31 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Applied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
231
3
0
24 Oct 2024
Masked Differential Privacy
David Schneider
Sina Sajadmanesh
Vikash Sehwag
Saquib Sarfraz
Rainer Stiefelhagen
Lingjuan Lyu
Vivek Sharma
227
0
0
22 Oct 2024
Reducing Hallucinations in Vision-Language Models via Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Zou
VLM
LLMSV
370
36
0
21 Oct 2024
Multimodal Learning for Embryo Viability Prediction in Clinical IVF
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Junsik Kim
Zhiyi Shi
Davin Jeong
Johannes Knittel
H. Yang
...
Wanhua Li
Yicong Li
D. Ben-Yosef
D. Needleman
Hanspeter Pfister
231
3
0
21 Oct 2024
SEA: State-Exchange Attention for High-Fidelity Physics Based Transformers
Neural Information Processing Systems (NeurIPS), 2024
Parsa Esmati
Amirhossein Dadashzadeh
Vahid Goodarzi
Nicolas Larrosa
Nicolo Grilli
320
0
0
20 Oct 2024
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
287
0
0
15 Oct 2024
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
287
6
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
IEEE Transactions on Image Processing (TIP), 2024
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
218
6
0
12 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
AAAI Conference on Artificial Intelligence (AAAI), 2024
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
238
3
0
11 Oct 2024
Exploring Efficient Foundational Multi-modal Models for Video Summarization
Karan Samel
Apoorva Beedu
Nitish Sontakke
Irfan Essa
132
2
0
09 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Neural Information Processing Systems (NeurIPS), 2024
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
399
23
0
30 Sep 2024
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Neural Information Processing Systems (NeurIPS), 2024
Yun Xu
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
VGen
CML
408
20
0
26 Sep 2024
A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence Processing
Neuro Inspired Computational Elements Workshop (NICE), 2024
Svea Marie Meyer
Philipp Weidel
Philipp Plank
L. Campos-Macias
Sumit Bam Shrestha
Philipp Stratmann
M. R
219
14
0
23 Sep 2024
Mamba-ST: State Space Model for Efficient Style Transfer
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Filippo Botti
Alex Ergasti
Leonardo Rossi
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
Mamba
218
11
0
16 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
203
1
0
15 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
244
20
0
15 Sep 2024
What to align in multimodal contrastive learning?
International Conference on Learning Representations (ICLR), 2024
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
333
28
0
11 Sep 2024
T3M: Text Guided 3D Human Motion Synthesis from Speech
Wenshuo Peng
Kaipeng Zhang
Sai Qian Zhang
164
4
0
23 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
International Journal of Computer Vision (IJCV), 2024
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
344
24
0
08 Aug 2024
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
ACM Multimedia (MM), 2024
Mingcan Xiang
Steven Jiaxun Tang
Qizheng Yang
Hui Guan
Tongping Liu
VLM
226
3
0
07 Aug 2024
Dual-path Collaborative Generation Network for Emotional Video Captioning
ACM Multimedia (MM), 2024
Cheng Ye
Weidong Chen
Jingyu Li
Li Zhang
Zhendong Mao
255
8
0
06 Aug 2024
Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph
International Conference on Information and Knowledge Management (CIKM), 2024
Zhao Kaichen
Song Yaoxian
Zhao Haiquan
Liu Haoyu
Li Tiefeng
Li Zhixu
218
1
0
05 Aug 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
European Conference on Computer Vision (ECCV), 2024
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
259
7
0
29 Jul 2024
MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili
ACM Multimedia (MM), 2024
Han Wang
Tan Rui Yang
Usman Naseem
Roy Ka-wei Lee
261
23
0
28 Jul 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
492
0
0
28 Jul 2024
LoFormer: Local Frequency Transformer for Image Deblurring
Xintian Mao
Jiansheng Wang
Xingran Xie
Qingli Li
Yan Wang
194
36
0
24 Jul 2024
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
Xiaowan Hu
Yiyi Chen
Yan Li
Minquan Wang
Haoqian Wang
Quan Chen
Han Li
Peng Jiang
AI4TS
290
0
0
23 Jul 2024
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
275
0
0
23 Jul 2024
Previous
1
2
3
4
5
...
15
16
17
Next