Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.11513
Cited By
v1
v2 (latest)
Mamba Fusion: Learning Actions Through Questioning
17 September 2024
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
Re-assign community
ArXiv (abs)
PDF
HTML
Github (5★)
Papers citing
"Mamba Fusion: Learning Actions Through Questioning"
31 / 31 papers shown
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images
Yunfei Zhang
Yizhuo He
Yuanxun Shao
Zhengtao Yao
Haoyan Xu
Junhao Dong
Zhen Yao
Zhikang Dong
CoGe
193
0
0
30 Nov 2025
Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models
Wangjiaxuan Xin
LLMAG
298
0
0
24 Nov 2025
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Elman Ghazaei
Erchan Aptoula
271
0
0
12 Aug 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
918
1
0
25 Apr 2025
Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them
AAAI Conference on Artificial Intelligence (AAAI), 2024
H. Haresamudram
Apoorva Beedu
Mashfiqui Rabbi
Sankalita Saha
Irfan Essa
Thomas Ploetz
317
10
0
21 Aug 2024
Fusion-Mamba for Cross-modality Object Detection
Wenhao Dong
Haodong Zhu
Shaohui Lin
Xiaoyan Luo
Chunjiang Ge
Xuhui Liu
Juan Zhang
Guodong Guo
Baochang Zhang
Mamba
373
90
0
14 Apr 2024
MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion
Zhe Li
Haiwei Pan
Kejia Zhang
Yuhua Wang
Feng Yu
Mamba
222
65
0
12 Apr 2024
VideoMamba: State Space Model for Efficient Video Understanding
European Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
349
459
0
11 Mar 2024
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Apoorva Beedu
Karan Samel
Irfan Essa
457
4
0
23 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
International Conference on Machine Learning (ICML), 2024
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
547
1,627
0
17 Jan 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
809
6,333
0
01 Dec 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
309
24
0
28 Sep 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
IEEE International Conference on Computer Vision (ICCV), 2023
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
427
149
0
11 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
International Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
445
366
0
01 Jun 2023
Learning Video Representations from Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2022
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
442
245
0
08 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
227
10
0
21 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
251
7
0
26 Oct 2022
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
240
66
0
23 Oct 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
ACM Multimedia (ACM MM), 2022
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
309
432
0
15 Jul 2022
Long Movie Clip Classification with State-Space Video Models
European Conference on Computer Vision (ECCV), 2022
Md. Mohaiminul Islam
Gedas Bertasius
VLM
478
145
0
04 Apr 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
522
261
0
20 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Computer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
704
299
0
20 Jan 2022
Efficiently Modeling Long Sequences with Structured State Spaces
International Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
1.2K
3,295
0
31 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
407
336
0
21 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
1.2K
1,646
0
13 Oct 2021
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
414
100
0
13 Oct 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Neural Information Processing Systems (NeurIPS), 2021
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
379
349
0
09 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
2.2K
46,392
0
26 Feb 2021
Rescaling Egocentric Vision
International Journal of Computer Vision (IJCV), 2020
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
638
629
0
23 Jun 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
6.0K
28,988
0
26 Jul 2019
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.3K
171,167
0
12 Jun 2017
1
Page 1 of 1