ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.11513
  4. Cited By
Mamba Fusion: Learning Actions Through Questioning
v1v2 (latest)

Mamba Fusion: Learning Actions Through Questioning

17 September 2024
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
    Mamba
ArXiv (abs)PDFHTMLGithub (5★)

Papers citing "Mamba Fusion: Learning Actions Through Questioning"

31 / 31 papers shown
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images
Yunfei Zhang
Yizhuo He
Yuanxun Shao
Zhengtao Yao
Haoyan Xu
Junhao Dong
Zhen Yao
Zhikang Dong
CoGe
193
0
0
30 Nov 2025
Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models
Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models
Wangjiaxuan Xin
LLMAG
298
0
0
24 Nov 2025
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Elman Ghazaei
Erchan Aptoula
271
0
0
12 Aug 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
918
1
0
25 Apr 2025
Limitations in Employing Natural Language Supervision for Sensor-Based
  Human Activity Recognition -- And Ways to Overcome Them
Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome ThemAAAI Conference on Artificial Intelligence (AAAI), 2024
H. Haresamudram
Apoorva Beedu
Mashfiqui Rabbi
Sankalita Saha
Irfan Essa
Thomas Ploetz
317
10
0
21 Aug 2024
Fusion-Mamba for Cross-modality Object Detection
Fusion-Mamba for Cross-modality Object Detection
Wenhao Dong
Haodong Zhu
Shaohui Lin
Xiaoyan Luo
Chunjiang Ge
Xuhui Liu
Juan Zhang
Guodong Guo
Baochang Zhang
Mamba
373
90
0
14 Apr 2024
MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image
  Fusion
MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion
Zhe Li
Haiwei Pan
Kejia Zhang
Yuhua Wang
Feng Yu
Mamba
222
65
0
12 Apr 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
349
459
0
11 Mar 2024
On the Efficacy of Text-Based Input Modalities for Action Anticipation
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Apoorva Beedu
Karan Samel
Irfan Essa
457
4
0
23 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelInternational Conference on Machine Learning (ICML), 2024
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
547
1,627
0
17 Jan 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
809
6,333
0
01 Dec 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
309
24
0
28 Sep 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the
  Backbone
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneIEEE International Conference on Computer Vision (ICCV), 2023
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
427
149
0
11 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
445
366
0
01 Jun 2023
Learning Video Representations from Large Language Models
Learning Video Representations from Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLMAI4TS
442
245
0
08 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
227
10
0
21 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
251
7
0
26 Oct 2022
Anticipative Feature Fusion Transformer for Multi-Modal Action
  Anticipation
Anticipative Feature Fusion Transformer for Multi-Modal Action AnticipationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
240
66
0
23 Oct 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text
  Retrieval
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text RetrievalACM Multimedia (ACM MM), 2022
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIPVLM
309
432
0
15 Jul 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video ModelsEuropean Conference on Computer Vision (ECCV), 2022
Md. Mohaiminul Islam
Gedas Bertasius
VLM
478
145
0
04 Apr 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
522
261
0
20 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual ModalitiesComputer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
704
299
0
20 Jan 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
1.2K
3,295
0
31 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIPVLM
407
336
0
21 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
1.2K
1,646
0
13 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
414
100
0
13 Oct 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video TransformersNeural Information Processing Systems (NeurIPS), 2021
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
379
349
0
09 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
2.2K
46,392
0
26 Feb 2021
Rescaling Egocentric Vision
Rescaling Egocentric VisionInternational Journal of Computer Vision (IJCV), 2020
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
638
629
0
23 Jun 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
6.0K
28,988
0
26 Jul 2019
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.3K
171,167
0
12 Jun 2017
1
Page 1 of 1