ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 680 papers shown
Title
Topology-Aware Latent Diffusion for 3D Shape Generation
Topology-Aware Latent Diffusion for 3D Shape Generation
Jiangbei Hu
Ben Fei
Baixin Xu
Fei Hou
Weidong Yang
Shengfa Wang
Na Lei
Chen Qian
Ying He
32
7
0
31 Jan 2024
Triple Disentangled Representation Learning for Multimodal Affective
  Analysis
Triple Disentangled Representation Learning for Multimodal Affective Analysis
Ying Zhou
Xuefeng Liang
Han Chen
Yin Zhao
Xin Chen
Lida Yu
43
3
0
29 Jan 2024
On the generalization capacity of neural networks during generic
  multimodal reasoning
On the generalization capacity of neural networks during generic multimodal reasoning
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
30
2
0
26 Jan 2024
Jump Cut Smoothing for Talking Heads
Jump Cut Smoothing for Talking Heads
Xiaojuan Wang
Taesung Park
Yang Zhou
Eli Shechtman
Richard Zhang
VGen
12
1
0
09 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
25
6
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
17
5
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
19
4
0
08 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
19
7
0
06 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
16
4
0
06 Jan 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos
  from Non-Verbal Cues
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
30
11
0
05 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
Qiuhui Chen
Yi Hong
MedIm
15
1
0
02 Jan 2024
Saliency-Aware Regularized Graph Neural Network
Saliency-Aware Regularized Graph Neural Network
Wenjie Pei
Weina Xu
Zongze Wu
Weichao Li
Jinfan Wang
Guangming Lu
Xiangrong Wang
17
4
0
01 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
38
14
0
31 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
26
32
0
28 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
20
0
0
24 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot
  Manipulation
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
21
92
0
20 Dec 2023
Inducing Point Operator Transformer: A Flexible and Scalable
  Architecture for Solving PDEs
Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs
Seungjun Lee
Taeil Oh
13
5
0
18 Dec 2023
Reconstruction of Fields from Sparse Sensing: Differentiable Sensor
  Placement Enhances Generalization
Reconstruction of Fields from Sparse Sensing: Differentiable Sensor Placement Enhances Generalization
Agnese Marcato
Dan O’Malley
Hari S. Viswanathan
E. Guiltinan
Javier E. Santos
16
1
0
14 Dec 2023
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
  Reconstruction with Transformers
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Zi-Xin Zou
Zhipeng Yu
Yuanchen Guo
Yangguang Li
Ding Liang
Yan-Pei Cao
Song-Hai Zhang
3DGS
26
168
0
14 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human
  Pathology
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
20
19
0
13 Dec 2023
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
Yoonwoo Jeong
Jinwoo Lee
Chiheon Kim
Minsu Cho
Doyup Lee
19
3
0
12 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
25
11
0
07 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
UPOCR: Towards Unified Pixel-Level OCR Interface
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
21
10
0
05 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large
  Image-Language Models
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
15
9
0
03 Dec 2023
Learning to Compose SuperWeights for Neural Parameter Allocation Search
Learning to Compose SuperWeights for Neural Parameter Allocation Search
Piotr Teterwak
Soren Nelson
Nikoli Dryden
D. Bashkirova
Kate Saenko
Bryan A. Plummer
10
1
0
03 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
28
45
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
13
1
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction
  Learner
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
9
2
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Unlearning via Sparse Representations
Unlearning via Sparse Representations
Vedant Shah
Frederik Trauble
Ashish Malik
Hugo Larochelle
Michael C. Mozer
Sanjeev Arora
Yoshua Bengio
Anirudh Goyal
MU
11
9
0
26 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
11
24
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
22
4
0
21 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
  Large Language Models
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLM
LRM
ELM
39
9
0
20 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
  Parsing
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu
Conghui Hu
Gim Hee Lee
9
1
0
14 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and
  Reasoning
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCL
ObjD
19
1
0
10 Nov 2023
Hiformer: Heterogeneous Feature Interactions Learning with Transformers
  for Recommender Systems
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
Huan Gui
Ruoxi Wang
Ke Yin
Long Jin
Maciej Kula
Taibai Xu
Lichan Hong
Ed H. Chi
38
2
0
10 Nov 2023
Mirasol3B: A Multimodal Autoregressive model for time-aligned and
  contextual modalities
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
A. Piergiovanni
Isaac Noble
Dahun Kim
Michael S. Ryoo
Victor Gomes
A. Angelova
30
19
0
09 Nov 2023
On the Behavior of Audio-Visual Fusion Architectures in Identity
  Verification Tasks
On the Behavior of Audio-Visual Fusion Architectures in Identity Verification Tasks
Daniel Claborne
Eric Slyman
Karl Pazdernik
6
0
0
09 Nov 2023
A Hierarchical Spatial Transformer for Massive Point Samples in
  Continuous Space
A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space
Wenchong He
Zhe Jiang
Tingsong Xiao
Zelin Xu
Shigang Chen
Ronald Fick
Miles Medina
Christine Angelini
8
10
0
08 Nov 2023
LRM: Large Reconstruction Model for Single Image to 3D
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong
Kai Zhang
Jiuxiang Gu
Sai Bi
Yang Zhou
Difan Liu
Feng Liu
Kalyan Sunkavalli
Trung Bui
Hao Tan
3DV
3DH
28
411
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial
  Healthcare Assistant: A Review
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELM
LM&MA
41
13
0
03 Nov 2023
OpenForest: A data catalogue for machine learning in forest monitoring
OpenForest: A data catalogue for machine learning in forest monitoring
Arthur Ouaknine
T. Kattenborn
Etienne Laliberté
David Rolnick
36
5
0
01 Nov 2023
Adaptive Latent Diffusion Model for 3D Medical Image to Image
  Translation: Multi-modal Magnetic Resonance Imaging Study
Adaptive Latent Diffusion Model for 3D Medical Image to Image Translation: Multi-modal Magnetic Resonance Imaging Study
Jonghun Kim
Hyunjin Park
MedIm
11
30
0
01 Nov 2023
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain
  Data
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades
Yiyi Yu
Joseph Canzano
William Wang
Spencer L. Smith
AI4CE
40
9
0
31 Oct 2023
Circuit as Set of Points
Circuit as Set of Points
Jialv Zou
Xinggang Wang
Jiahao Guo
Wenyu Liu
Qian Zhang
Chang Huang
GNN
3DV
3DPC
15
0
0
26 Oct 2023
A Unified, Scalable Framework for Neural Population Decoding
A Unified, Scalable Framework for Neural Population Decoding
Mehdi Azabou
Vinam Arora
Venkataramana Ganesh
Ximeng Mao
Santosh Nachimuthu
Michael J. Mendelson
Blake A. Richards
M. Perich
Guillaume Lajoie
Eva L. Dyer
HAI
AI4TS
16
35
0
24 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Bill Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
30
112
0
24 Oct 2023
Accented Speech Recognition With Accent-specific Codebooks
Accented Speech Recognition With Accent-specific Codebooks
Darshan Prabhu
P. Jyothi
Sriram Ganapathy
Vinit Unni
29
7
0
24 Oct 2023
Frozen Transformers in Language Models Are Effective Visual Encoder
  Layers
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Ziqi Pang
Ziyang Xie
Yunze Man
Yu-xiong Wang
38
25
0
19 Oct 2023
Previous
123...567...121314
Next