Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

16 April 2016

Papers citing "Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering"

17 / 17 papers shown

Title
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery Lianhao Yin Yutong Ban J. Eckhoff O. Meireles Daniela Rus Guy Rosman 39 1 0 03 Feb 2024
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection C. Nwoye Tong Yu Saurav Sharma Aditya Murali Deepak Alapatt ... Pietro Mascagni B. Seeliger Cristians Gonzalez Didier Mutter N. Padoy 30 17 0 13 Feb 2023
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning Xiaojian Ma Weili Nie Zhiding Yu Huaizu Jiang Chaowei Xiao Yuke Zhu Song-Chun Zhu Anima Anandkumar ViT LRM 22 19 0 24 Apr 2022
The Overlooked Classifier in Human-Object Interaction Recognition Ying Jin Yinpeng Chen Lijuan Wang Jianfeng Wang Pei Yu Lin Liang Jenq-Neng Hwang Zicheng Liu VLM 41 8 0 10 Mar 2022
Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding Lalithkumar Seenivasan Sai Mitheran Mobarakol Islam Hongliang Ren 32 32 0 28 Jan 2022
Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition Asish Bera Zachary Wharton Yonghuai Liu Nikolaos Bessis Ardhendu Behera 26 41 0 23 Oct 2021
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation Zih-Siou Hung Arun Mallya Svetlana Lazebnik ViT 10 14 0 28 May 2019
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification Yang Du Chunfen Yuan Bing Li Lili Zhao Yangxi Li Weiming Hu 58 79 0 03 Aug 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Dong Huk Park Lisa Anne Hendricks Zeynep Akata Anna Rohrbach Bernt Schiele Trevor Darrell Marcus Rohrbach 35 418 0 15 Feb 2018
Attentional Pooling for Action Recognition Rohit Girdhar Deva Ramanan 16 318 0 04 Nov 2017
Detecting and Recognizing Human-Object Interactions Georgia Gkioxari Ross B. Girshick Piotr Dollár Kaiming He 15 570 0 24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks Spandana Gella Frank Keller ObjD 12 11 0 24 Apr 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions Peng Wang Qi Wu Chunhua Shen A. Hengel OOD 18 86 0 16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the Evidence Dong Huk Park Lisa Anne Hendricks Zeynep Akata Bernt Schiele Trevor Darrell Marcus Rohrbach AAML 16 79 0 14 Dec 2016
Solving Visual Madlibs with Multiple Cues Tatiana Tommasi Arun Mallya Bryan A. Plummer Svetlana Lazebnik Alexander C. Berg Tamara L. Berg 23 18 0 11 Aug 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 144 1,465 0 06 Jun 2016
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 60 584 0 18 Dec 2012