Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03206
Cited By
Perceiver: General Perception with Iterative Attention
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 680 papers shown
Title
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian
Bingquan Liu
Chengjie Sun
Zhen Xu
Baoxun Wang
26
0
0
14 Oct 2024
Gridded Transformer Neural Processes for Large Unstructured Spatio-Temporal Data
Matthew Ashman
Cristiana-Diana Diaconu
Eric Langezaal
Adrian Weller
Richard E. Turner
AI4TS
36
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
29
5
0
08 Oct 2024
STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking
Yidi Li
Hong Liu
Bing Yang
27
4
0
08 Oct 2024
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Y. Zhou
Zijie Li
Michael Schneier
John R Buchanan Jr
Amir Barati Farimani
AI4CE
DiffM
52
5
0
02 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
21
6
0
27 Sep 2024
Show and Guide: Instructional-Plan Grounded Vision and Language Model
Diogo Glória-Silva
David Semedo
João Magalhães
13
0
0
27 Sep 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
31
15
0
26 Sep 2024
UNICORN: A Deep Learning Model for Integrating Multi-Stain Data in Histopathology
Valentin Koch
Sabine Bauer
Valerio Luppberger
Michael Joner
Heribert Schunkert
Julia A. Schnabel
Moritz von Scheidt
Carsten Marr
MedIm
ViT
25
0
0
26 Sep 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
19
0
0
23 Sep 2024
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
Yinpei Dai
Jayjun Lee
Nima Fazeli
Joyce Chai
39
10
0
23 Sep 2024
Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation
Guokang Wang
Hang Li
Shuyuan Zhang
Di Guo
Huaping Liu
Huaping Liu
45
1
0
23 Sep 2024
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence
Alessandro Riva
Alessandro Raganato
Simone Melzi
3DPC
18
0
0
20 Sep 2024
Generating Visual Stories with Grounded and Coreferent Characters
Danyang Liu
Mirella Lapata
Frank Keller
15
2
0
20 Sep 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
23
11
0
19 Sep 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIP
VLM
18
0
0
18 Sep 2024
FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement
Liam Hebert
Marialena Kyriakidi
Hubert Pham
Krishna Sayana
James Pine
Sukhdeep S. Sodhi
Ambarish Jash
VLM
53
4
0
18 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
50
0
17 Sep 2024
FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots
Bo Peng
D. Baek
Qijie Wang
Joao Ramos
19
0
0
15 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
26
0
0
12 Sep 2024
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
Bram Willemsen
Gabriel Skantze
18
0
0
09 Sep 2024
Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection
Mingjin Zhang
Chi Zhang
Qiming Zhang
Yunsong Li
Xinbo Gao
Jing Zhang
VLM
30
3
0
07 Sep 2024
Segmenting Object Affordances: Reproducibility and Sensitivity to Scale
Tommaso Apicella
Alessio Xompero
Paolo Gastaldo
Andrea Cavallaro
32
0
0
03 Sep 2024
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Shiting Xiao
Yuhang Li
Youngeun Kim
Donghyun Lee
Priyadarshini Panda
21
1
0
03 Sep 2024
Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions
Sully F. Chen
Robert J. Steele
Beakal Lemeneh
S. Lad
Eric Oermann
Eric K. Oermann
AI4CE
26
0
0
29 Aug 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
28
1
0
28 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
41
4
0
26 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
34
60
0
22 Aug 2024
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li
Mushui Liu
Gaoang Wang
Yunlong Yu
13
5
0
22 Aug 2024
Variable Assignment Invariant Neural Networks for Learning Logic Programs
Yin Jun Phua
Katsumi Inoue
17
0
0
20 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
33
0
0
14 Aug 2024
Implicit Neural Representation For Accurate CFD Flow Field Prediction
L. D. Vito
Nils Pinnau
Simone Dey
AI4CE
35
1
0
12 Aug 2024
PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting
Liam Hebert
Krishna Sayana
Ambarish Jash
Alexandros Karatzoglou
Geordie Williamson
Sumanth Doddapaneni
Yanli Cai
Dima Kuzmin
25
3
0
02 Aug 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
33
7
0
29 Jul 2024
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu
Benlin Liu
Jiahui Wang
Yuhao Dong
Guangyi Chen
Yongming Rao
Ranjay Krishna
Jiwen Lu
VLM
32
8
0
25 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
29
1
0
22 Jul 2024
Reinforcement Learning Meets Visual Odometry
Nico Messikommer
Giovanni Cioffi
Mathias Gehrig
Davide Scaramuzza
34
2
0
22 Jul 2024
VideoGameBunny: Towards vision assistants for video games
Mohammad Reza Taesiri
C. Bezemer
VLM
MLLM
33
2
0
21 Jul 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang
Ruijie Yu
Kaipeng Zeng
Ding Li
Feng Zhu
Xiaokang Yang
Yaohui Jin
Yanyan Xu
25
2
0
21 Jul 2024
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
26
0
0
20 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul M. Chilimbi
Mubarak Shah
47
2
0
18 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
25
5
0
18 Jul 2024
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking
Ting-Chih Chen
Chia-Wei Tang
Chris Thomas
29
3
0
18 Jul 2024
IoT-LM: Large Multisensory Language Models for the Internet of Things
Shentong Mo
Russ Salakhutdinov
Louis-Philippe Morency
Paul Pu Liang
MLLM
16
6
0
13 Jul 2024
Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series
Iris Dumeur
Silvia Valero
Jordi Inglada
24
3
0
11 Jul 2024
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
Sungkyun Chang
Emmanouil Benetos
Holger Kirchhoff
Simon Dixon
24
2
0
05 Jul 2024
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen
Haofei Xu
Stefano Esposito
Siyu Tang
Andreas Geiger
AI4CE
29
22
0
05 Jul 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRL
MedIm
28
3
0
04 Jul 2024
Previous
1
2
3
4
5
6
...
12
13
14
Next