Captioning Images Taken by People Who Are Blind

20 February 2020

Papers citing "Captioning Images Taken by People Who Are Blind"

47 / 97 papers shown

Title
Toucha11y: Making Inaccessible Public Touchscreens Accessible Jiasheng Li Zeyu Yan Arushi Shah J. Lazar Huaishu Peng 11 9 0 06 May 2023
Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment Lu Yu Malvina Nikandrou Jiali Jin Verena Rieser 42 5 0 28 Apr 2023
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval Yang Yang Zhongtian Fu Xiangyu Wu Wenjie Li VLM 15 1 0 15 Apr 2023
Self-Supervised Multimodal Learning: A Survey Yongshuo Zong Oisin Mac Aodha Timothy M. Hospedales SSL 19 43 0 31 Mar 2023
Fine-grained Audible Video Description Xuyang Shen Dong Li Jinxing Zhou Zhen Qin Bowen He ... Yuchao Dai Lingpeng Kong Meng Wang Yu Qiao Yiran Zhong VGen 36 11 0 27 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning Yongil Kim Yerin Hwang Hyeongu Yun Seunghyun Yoon Trung Bui Kyomin Jung 19 6 0 15 Mar 2023
Salient Object Detection for Images Taken by People With Vision Impairments Jarek Reynolds Chandra Kanth Nagesh Danna Gurari 22 10 0 12 Jan 2023
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning Ukyo Honda Taro Watanabe Yuji Matsumoto 8 9 0 06 Dec 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Sophia Gu Christopher Clark Aniruddha Kembhavi VLM 14 24 0 17 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired Kazuya Ohata Shunsuke Kitada Hitoshi Iyatomi 14 0 0 17 Nov 2022
Low-resource Neural Machine Translation with Cross-modal Alignment Zhe Yang Qingkai Fang Yang Feng VLM 29 9 0 13 Oct 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation R. Ramos Bruno Martins Desmond Elliott Yova Kementchedjhieva VLM 30 86 0 30 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model Xi Chen Xiao Wang Soravit Changpinyo A. Piergiovanni Piotr Padlewski ... Andreas Steiner A. Angelova Xiaohua Zhai N. Houlsby Radu Soricut MLLM VLM 26 682 0 14 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding Jihyung Kil Soravit Changpinyo Xi Chen Hexiang Hu Sebastian Goodman Wei-Lun Chao Radu Soricut VLM 135 29 0 12 Sep 2022
VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments Yu-Yun Tseng Alexander Bell Danna Gurari 19 8 0 24 Jul 2022
LineCap: Line Charts for Data Visualization Captioning Models Anita Mahinpei Zona Kostic Christy Tanner VLM 16 17 0 15 Jul 2022
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection Yunhao Ge Jiashu Xu Brian Nlong Zhao Neel Joshi Laurent Itti Vibhav Vineet DiffM ObjD 25 16 0 20 Jun 2022
UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and Graph Convolutional Networks for Multimedia Automatic Misogyny Identification Andrei Paraschiv M. Dascalu Dumitru-Clementin Cercel 19 3 0 29 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language Jianfeng Wang Zhengyuan Yang Xiaowei Hu Linjie Li Kevin Qinghong Lin Zhe Gan Zicheng Liu Ce Liu Lijuan Wang VLM 27 526 0 27 May 2022
Prompt-based Learning for Unpaired Image Captioning Peipei Zhu Xiao Wang Lin Zhu Zhenglong Sun Weishi Zheng Yaowei Wang C. L. P. Chen VLM 19 31 0 26 May 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics Elisa Kreiss Cynthia L. Bennett Shayan Hooshmand E. Zelikman Meredith Ringel Morris Christopher Potts 40 27 0 21 May 2022
CapOnImage: Context-driven Dense-Captioning on Image Yiqi Gao Xinglin Hou Yuanmeng Zhang T. Ge Yuning Jiang Peifeng Wang 25 10 0 27 Apr 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text Pinaki Nath Chowdhury A. Bhunia Aneeshan Sain Subhadeep Koley Tao Xiang Yi-Zhe Song 34 29 0 25 Apr 2022
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities S. Abdali Sina shaham Bhaskar Krishnamachari 13 18 0 25 Mar 2022
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs Daniel Louzada Fernandes Marcos Henrique Fonseca Ribeiro F. Cerqueira Michel Melo Silva 12 6 0 10 Feb 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People Chongyan Chen Samreen Anjum Danna Gurari 23 50 0 04 Feb 2022
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 8 88 0 31 Jan 2022
Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety A. Rajagopal V. Nirmala Arun Muthuraj Vedamanickam 11 0 0 04 Jan 2022
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets Marcella Cornia Lorenzo Baraldi G. Fiameni Rita Cucchiara 20 12 0 24 Nov 2021
CIDEr-R: Robust Consensus-based Image Description Evaluation G. O. D. Santos Esther Luna Colombini Sandra Avila 40 30 0 28 Sep 2021
Question-controlled Text-aware Image Captioning Anwen Hu Shizhe Chen Qin Jin 11 15 0 04 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 53 254 0 14 Jul 2021
Data augmentation to improve robustness of image captioning solutions Shashank Bujimalla Mahesh Subedar Omesh Tickoo 8 2 0 10 Jun 2021
Multi-Modal Image Captioning for the Visually Impaired Hiba Ahsan Nikita Bhalla Daivat Bhatt Kaivankumar Shah 17 20 0 17 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text Amanpreet Singh Guan Pang Mandy Toh Jing Huang Wojciech Galuba Tal Hassner 4 163 0 12 May 2021
Concadia: Towards Image-Based Text Generation with a Purpose Elisa Kreiss Fei Fang Noah D. Goodman Christopher Potts 14 23 0 16 Apr 2021
#PraCegoVer: A Large Dataset for Image Captioning in Portuguese G. O. D. Santos Esther Luna Colombini Sandra Avila 23 10 0 21 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 273 1,081 0 17 Feb 2021
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels Xiaoyi Zhang Lilian de Greef Amanda Swearngin Samuel White Kyle I. Murray ... Jeffrey Nichols Jason Wu Chris Fleizach Aaron Everitt Jeffrey P. Bigham 183 167 0 13 Jan 2021
Detecting Hate Speech in Multi-modal Memes Abhishek Das Japsimar Singh Wahi Siyao Li 17 60 0 29 Dec 2020
Vision Skills Needed to Answer Visual Questions Xiaoyu Zeng Yanan Wang Tai-Yin Chiu Nilavra Bhattacharya Danna Gurari 6 17 0 07 Oct 2020
A Multimodal Memes Classification: A Survey and Open Research Issues Tariq Habib Afridi A. Alam Muhammad Numan Khan Jawad Khan Young-Koo Lee 21 35 0 17 Sep 2020
On the use of human reference data for evaluating automatic image descriptions Emiel van Miltenburg 3DH 8 2 0 15 Jun 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes Douwe Kiela Hamed Firooz Aravind Mohan Vedanuj Goswami Amanpreet Singh Pratik Ringshia Davide Testuggine 23 577 0 10 May 2020
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning Shashank Bujimalla Mahesh Subedar Omesh Tickoo BDL UQCV 9 10 0 06 Apr 2020
Assessing Image Quality Issues for Real-World Problems Tai-Yin Chiu Yinan Zhao Danna Gurari 49 54 0 27 Mar 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension Oleksii Sidorov Ronghang Hu Marcus Rohrbach Amanpreet Singh 20 386 0 24 Mar 2020