ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.08072
  4. Cited By
AssembleNet++: Assembling Modality Representations via Attention
  Connections

AssembleNet++: Assembling Modality Representations via Attention Connections

18 August 2020
Michael S. Ryoo
A. Piergiovanni
Juhana Kangaspunta
A. Angelova
ArXiv (abs)PDFHTML

Papers citing "AssembleNet++: Assembling Modality Representations via Attention Connections"

31 / 31 papers shown
Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings
Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings
Abderrazek Abid
Thanh-Cong Ho
Fakhri Karray
VLM
144
2
0
24 Oct 2025
Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition
Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition
Xiaodan Hu
Chuhang Zou
Suchen Wang
Jaechul Kim
Narendra Ahuja
LRM
190
0
0
20 Jun 2025
Salient Temporal Encoding for Dynamic Scene Graph Generation
Salient Temporal Encoding for Dynamic Scene Graph Generation
Zhihao Zhu
278
0
0
15 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
355
0
0
11 Feb 2025
Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction
  Models
Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction ModelsWeb Search and Data Mining (WSDM), 2024
Kexin Zhang
Fuyuan Lyu
Xing Tang
Dugang Liu
Chen Ma
Kaize Ding
Xiuqiang He
Xue Liu
284
7
0
24 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
277
0
0
04 Nov 2024
Just Add $π$! Pose Induced Video Transformers for Understanding
  Activities of Daily Living
Just Add πππ! Pose Induced Video Transformers for Understanding Activities of Daily LivingComputer Vision and Pattern Recognition (CVPR), 2023
Dominick Reilly
Srijan Das
ViT
325
31
0
30 Nov 2023
Flow Dynamics Correction for Action Recognition
Flow Dynamics Correction for Action Recognition
Lei Wang
Piotr Koniusz
276
15
0
16 Oct 2023
LAC: Latent Action Composition for Skeleton-based Action Segmentation
LAC: Latent Action Composition for Skeleton-based Action SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Di Yang
Yaohui Wang
A. Dantcheva
Quan Kong
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
693
18
0
28 Aug 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Vasu Sharma
Srijan Das
ViT
294
4
0
15 Jun 2023
Self-Supervised Video Representation Learning via Latent Time Navigation
Self-Supervised Video Representation Learning via Latent Time NavigationAAAI Conference on Artificial Intelligence (AAAI), 2023
Di Yang
Yaohui Wang
Quan Kong
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
SSLAI4TS
288
17
0
10 May 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
367
39
0
05 Apr 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
222
14
0
29 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video LearningComputer Vision and Pattern Recognition (CVPR), 2022
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
255
71
0
06 Dec 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
369
2
0
08 Oct 2022
ViA: View-invariant Skeleton Action Representation Learning via Motion
  Retargeting
ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
269
9
0
31 Aug 2022
Cross-modal Representation Learning for Zero-shot Action Recognition
Cross-modal Representation Learning for Zero-shot Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
209
31
0
03 May 2022
Gate-Shift-Fuse for Video Action Recognition
Gate-Shift-Fuse for Video Action RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
311
35
0
16 Mar 2022
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural
  Architecture Search
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Lezhi Li
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zinan Lin
VGen
166
2
0
09 Dec 2021
4D-Net for Learned Multi-Modal Alignment
4D-Net for Learned Multi-Modal Alignment
A. Piergiovanni
Vincent Casser
Michael S. Ryoo
A. Angelova
3DPC
273
68
0
02 Sep 2021
Searching for Two-Stream Models in Multivariate Space for Video
  Recognition
Searching for Two-Stream Models in Multivariate Space for Video RecognitionIEEE International Conference on Computer Vision (ICCV), 2021
Xinyu Gong
Heng Wang
Zheng Shou
Matt Feiszli
Zinan Lin
Zhicheng Yan
216
9
0
30 Aug 2021
UNIK: A Unified Framework for Real-world Skeleton-based Action
  Recognition
UNIK: A Unified Framework for Real-world Skeleton-based Action RecognitionBritish Machine Vision Conference (BMVC), 2021
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
212
60
0
19 Jul 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
724
162
0
21 Jun 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of
  Daily Living
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily LivingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Srijan Das
Rui Dai
Di Yang
Francois Bremond
ViT
460
90
0
17 May 2021
Visionary: Vision architecture discovery for robot learning
Visionary: Vision architecture discovery for robot learningIEEE International Conference on Robotics and Automation (ICRA), 2021
Iretiayo Akinola
A. Angelova
Yao Lu
Yevgen Chebotar
Dmitry Kalashnikov
Jacob Varley
Julian Ibarz
Michael S. Ryoo
221
10
0
26 Mar 2021
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLMAI4TS
309
214
0
11 Dec 2020
Selective Spatio-Temporal Aggregation Based Pose Refinement System:
  Towards Understanding Human Activities in Real-World Videos
Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos
Di Yang
Rui Dai
Yaohui Wang
Rupayan Mallick
Luca Minciullo
Gianpiero Francesca
Francois Bremond
243
16
0
10 Nov 2020
Multi-Label Activity Recognition using Activity-specific Features and
  Activity Correlations
Multi-Label Activity Recognition using Activity-specific Features and Activity CorrelationsComputer Vision and Pattern Recognition (CVPR), 2020
Yanyi Zhang
Xinyu Li
I. Marsic
HAI
188
28
0
16 Sep 2020
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
  Gesture Recognition
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition
Zitong Yu
Benjia Zhou
Jun Wan
Pichao Wang
Zhaodong Sun
Xin Liu
Stan Z. Li
Guoying Zhao
3DPC
266
113
0
21 Aug 2020
Self-supervising Action Recognition by Statistical Moment and Subspace
  Descriptors
Self-supervising Action Recognition by Statistical Moment and Subspace DescriptorsACM Multimedia (ACM MM), 2020
Lei Wang
Piotr Koniusz
305
57
0
14 Jan 2020
Tiny Video Networks
Tiny Video NetworksApplied AI Letters (AA), 2019
A. Piergiovanni
A. Angelova
Michael S. Ryoo
444
52
0
15 Oct 2019
1
Page 1 of 1