Captioning Images Taken by People Who Are Blind

20 February 2020

Papers citing "Captioning Images Taken by People Who Are Blind"

50 / 97 papers shown

Title
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou Malvina Nikandrou Georgios Pantazopoulos Danae Sanchez Villegas Phillip Rust Ruchira Dhar Daniel Hershcovich Anders Søgaard 34 0 0 28 Mar 2025
ImageSet2Text: Describing Sets of Images through Text Piera Riccio F. Galati Kajetan Schweighofer Noa Garcia Nuria Oliver VLM CoGe 72 0 0 25 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models Zhaoyi Liu Huan Zhang AAML 72 0 0 25 Feb 2025
Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation Yang Yang Wenjuan Xi Luping Zhou Jinhui Tang 74 0 0 14 Dec 2024
Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera Inpyo Song Sanghyeon Lee Minjun Joo Jangwon Lee 41 0 0 17 Nov 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies Yingqiang Gao Lukas Fischer Alexa Lintner Sarah Ebling 31 0 0 11 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 28 3 0 09 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Kazuki Matsuda Yuiga Wada Komei Sugiura 26 1 0 28 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Xin Jiang Junwei Zheng Ruiping Liu Jiahang Li Jiaming Zhang Sven Matthiesen Rainer Stiefelhagen VLM 21 0 0 21 Sep 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information Jiashuo Sun Jihai Zhang Yucheng Zhou Zhaochen Su Xiaoye Qu Yu Cheng 43 11 0 21 Sep 2024
Pixels to Prose: Understanding the art of Image Captioning Hrishikesh Singh Aarti Sharma Millie Pant 3DV VLM 25 0 0 28 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Nicholas Moratelli Davide Caffagni Marcella Cornia Lorenzo Baraldi Rita Cucchiara CLIP 29 3 0 26 Aug 2024
Audio Description Customization Rosiana Natalie Ruei-Che Chang Smitha Sheshadri Anhong Guo Kotaro Hara 19 4 0 21 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis Uri Berger Gabriel Stanovsky Omri Abend Lea Frermann 27 0 0 09 Aug 2024
AccessShare: Co-designing Data Access and Sharing with Blind People Rie Kamikubo Farnaz Zamiri Zeraati Kyungjun Lee Hernisa Kacorri 40 1 0 27 Jul 2024
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments Yu-Yun Tseng Tanusree Sharma Lotus Zhang Abigale Stangl Leah Findlater Yang Wang Danna Gurari 64 0 0 25 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations Antonia Karamolegkou Phillip Rust Yong Cao Ruixiang Cui Anders Søgaard Daniel Hershcovich VLM 49 7 0 08 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Tiancheng Zhao Qianqian Zhang Kyusong Lee Peng Liu Lu Zhang Chunxin Fang Jiajia Liao Kelei Jiang Yibo Ma Ruochen Xu MLLM VLM 49 5 0 06 Jul 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions Vasu Singla Kaiyu Yue Sukriti Paul Reza Shirkavand Mayuka Jayawardhana Alireza Ganjdanesh Heng Huang A. Bhatele Gowthami Somepalli Tom Goldstein 3DV VLM 28 22 0 14 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Wenyan Li Jiaang Li R. Ramos Raphael Tang Desmond Elliott VLM 36 3 0 04 Jun 2024
Selectively Answering Visual Questions Julian Martin Eisenschlos Hernán Maina Guido Ivetta Luciana Benotti 40 0 0 03 Jun 2024
The Evolution of Multimodal Model Architectures S. Wadekar Abhishek Chaurasia Aman Chadha Eugenio Culurciello 41 14 0 28 May 2024
Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition Yuchen Zhou Linkai Liu Chao Gou 23 3 0 16 May 2024
"We are at the mercy of others' opinion": Supporting Blind People in Recreational Window Shopping with AI-infused Technology Rie Kamikubo Hernisa Kacorri Chieko Asakawa 24 4 0 10 May 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction Hang Hua Jing Shi Kushal Kafle Simon Jenni Daoan Zhang John Collomosse Scott D. Cohen Jiebo Luo CoGe VLM 42 9 0 23 Apr 2024
Task-Oriented Paraphrase Analytics Marcel Gohsen Matthias Hagen Martin Potthast Benno Stein 31 0 0 26 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes Ting Yu Xiaojun Lin Shuhui Wang Weiguo Sheng Qingming Huang Jun-chen Yu 3DV 37 10 0 12 Mar 2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding Haogeng Liu Quanzeng You Xiaotian Han Yiqi Wang Bohan Zhai Yongfei Liu Yunzhe Tao Huaibo Huang Ran He Hongxia Yang MLLM 44 2 0 03 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Yuiga Wada Kanta Kaneda Daichi Saito Komei Sugiura 34 24 0 28 Feb 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation Kohei Uehara Nabarun Goswami Hanqin Wang Toshiaki Baba Kohtaro Tanaka ... Takagi Naoya Ryo Umagami Yingyi Wen Tanachai Anakewat Tatsuya Harada LRM 21 2 0 18 Jan 2024
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models Haoyuan Wu Xinyun Zhang Peng Xu Peiyu Liao Xufeng Yao Bei Yu VLM 19 0 0 17 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts Jialin Wu Xia Hu Yaqing Wang Bo Pang Radu Soricut MoE 19 14 0 01 Dec 2023
Fully Authentic Visual Question Answering Dataset from Online Communities Chongyan Chen Mengchen Liu Noel Codella Yunsheng Li Lu Yuan Danna Gurari 30 5 0 27 Nov 2023
Multimodal Large Language Models: A Survey Jiayang Wu Wensheng Gan Zefeng Chen Shicheng Wan Philip S. Yu 22 168 0 22 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models Yuiga Wada Kanta Kaneda Komei Sugiura 23 4 0 07 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Md Farhan Ishmam Md Sakib Hossain Shovon M. F. Mridha Nilanjan Dey 35 36 0 01 Nov 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description Tengda Han Max Bain Arsha Nagrani Gül Varol Weidi Xie Andrew Zisserman VGen DiffM 19 36 0 10 Oct 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation Yunhao Ge Jiashu Xu Brian Nlong Zhao Neel Joshi Laurent Itti Vibhav Vineet DiffM 30 14 0 12 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning Manuele Barraco Sara Sarto Marcella Cornia Lorenzo Baraldi Rita Cucchiara VLM 51 19 0 23 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments Anwen Hu Shizhe Chen Liang Zhang Qin Jin LM&Ro 30 2 0 21 Aug 2023
Interactive Segmentation for Diverse Gesture Types Without Context Josh Myers-Dean Yifei Fan Brian L. Price Wilson Chan Danna Gurari 11 2 0 20 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning Fabian Paischer M. Hofmarcher Sepp Hochreiter Thomas Adler CLIP VLM 42 0 0 10 Jul 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training Alyssa Huang Peihan Liu Ryumei Nakada Linjun Zhang Wanrong Zhang VLM 65 5 0 13 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP Sandro Pezzelle 14 9 0 08 Jun 2023
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory Aliki Anagnostopoulou Mareike Hartmann Daniel Sonntag CLL VLM 10 0 0 06 Jun 2023
Putting Humans in the Image Captioning Loop Aliki Anagnostopoulou Mareike Hartmann Daniel Sonntag VLM 24 1 0 06 Jun 2023
CapText: Large Language Model-based Caption Generation From Image Context and Description Shinjini Ghosh Sagnik Anupam VLM 20 3 0 01 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model Xi Chen Josip Djolonga Piotr Padlewski Basil Mustafa Soravit Changpinyo ... Mojtaba Seyedhosseini A. Angelova Xiaohua Zhai N. Houlsby Radu Soricut VLM 44 187 0 29 May 2023
Text encoders bottleneck compositionality in contrastive vision-language models Amita Kamath Jack Hessel Kai-Wei Chang CoGe CLIP VLM 25 19 0 24 May 2023
Preconditioned Visual Language Inference with Weak Supervision Ehsan Qasemi Amani Maina-Kilaas Devadutta Dash Khalid Alsaggaf Muhao Chen 17 0 0 22 May 2023