ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.04808
  4. Cited By
Learning Models for Actions and Person-Object Interactions with Transfer
  to Question Answering

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

16 April 2016
Arun Mallya
Svetlana Lazebnik
ArXivPDFHTML

Papers citing "Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering"

17 / 17 papers shown
Title
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Lianhao Yin
Yutong Ban
J. Eckhoff
O. Meireles
Daniela Rus
Guy Rosman
39
1
0
03 Feb 2024
CholecTriplet2022: Show me a tool and tell me the triplet -- an
  endoscopic vision challenge for surgical action triplet detection
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
30
17
0
13 Feb 2023
RelViT: Concept-guided Vision Transformer for Visual Relational
  Reasoning
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Xiaojian Ma
Weili Nie
Zhiding Yu
Huaizu Jiang
Chaowei Xiao
Yuke Zhu
Song-Chun Zhu
Anima Anandkumar
ViT
LRM
22
19
0
24 Apr 2022
The Overlooked Classifier in Human-Object Interaction Recognition
The Overlooked Classifier in Human-Object Interaction Recognition
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Lin Liang
Jenq-Neng Hwang
Zicheng Liu
VLM
41
8
0
10 Mar 2022
Global-Reasoned Multi-Task Learning Model for Surgical Scene
  Understanding
Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding
Lalithkumar Seenivasan
Sai Mitheran
Mobarakol Islam
Hongliang Ren
32
32
0
28 Jan 2022
Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep
  Network for Image Recognition
Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition
Asish Bera
Zachary Wharton
Yonghuai Liu
Nikolaos Bessis
Ardhendu Behera
26
41
0
23 Oct 2021
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
10
14
0
28 May 2019
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action
  Classification
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification
Yang Du
Chunfen Yuan
Bing Li
Lili Zhao
Yangxi Li
Weiming Hu
58
79
0
03 Aug 2018
Multimodal Explanations: Justifying Decisions and Pointing to the
  Evidence
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Anna Rohrbach
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
35
418
0
15 Feb 2018
Attentional Pooling for Action Recognition
Attentional Pooling for Action Recognition
Rohit Girdhar
Deva Ramanan
16
318
0
04 Nov 2017
Detecting and Recognizing Human-Object Interactions
Detecting and Recognizing Human-Object Interactions
Georgia Gkioxari
Ross B. Girshick
Piotr Dollár
Kaiming He
15
570
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
12
11
0
24 Apr 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to
  Answer New Questions
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
A. Hengel
OOD
18
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the
  Evidence
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
16
79
0
14 Dec 2016
Solving Visual Madlibs with Multiple Cues
Solving Visual Madlibs with Multiple Cues
Tatiana Tommasi
Arun Mallya
Bryan A. Plummer
Svetlana Lazebnik
Alexander C. Berg
Tamara L. Berg
23
18
0
11 Aug 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,465
0
06 Jun 2016
A Multi-View Embedding Space for Modeling Internet Images, Tags, and
  their Semantics
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
60
584
0
18 Dec 2012
1