v1v2v3 (latest)

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

11 April 2025

ArXiv (abs)PDF HTML HuggingFace (11 upvotes)

Papers citing "Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images"

31 / 31 papers shown

Title
Organizing Unstructured Image Collections using Natural Language Mingxuan Liu Zhun Zhong Jun Li Gianni Franchi Subhankar Roy Elisa Ricci VLM 547 9 0 07 Oct 2024
Explaining Datasets in Words: Statistical Models with Natural Language ParametersNeural Information Processing Systems (NeurIPS), 2024 Ruiqi Zhong Heng Wang Dan Klein Jacob Steinhardt 163 10 0 13 Sep 2024
Diffusion Models as Data Mining Tools Ioannis Siglidis Aleksander Holynski Alexei A. Efros Mathieu Aubry Shiry Ginosar DiffM MedIm 164 4 0 20 Jul 2024
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Boyang Deng Richard Tucker Zhengqi Li Leonidas Guibas Noah Snavely Gordon Wetzstein VGen 3DGS DiffM 180 26 0 18 Jul 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Chankyu Lee Rajarshi Roy Mengyao Xu Jonathan Raiman Mohammad Shoeybi Bryan Catanzaro Ming-Yu Liu RALM 443 337 0 27 May 2024
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers Zeliang Zhang Mingqian Feng Zhiheng Li Chenliang Xu 213 11 0 19 Mar 2024
Rethinking Interpretability in the Era of Large Language Models Chandan Singh J. Inala Michel Galley Rich Caruana Jianfeng Gao LRM AI4CE 168 97 0 30 Jan 2024
Describing Differences in Image Sets with Natural LanguageComputer Vision and Pattern Recognition (CVPR), 2023 Lisa Dunlap Yuhui Zhang Xiaohan Wang Ruiqi Zhong Trevor Darrell Jacob Steinhardt Joseph E. Gonzalez Serena Yeung-Levy CoGe VLM 228 43 0 05 Dec 2023
Can large language models provide useful feedback on research papers? A large-scale empirical analysis Weixin Liang Yuhui Zhang Hancheng Cao Binglu Wang Daisy Ding ... Siyu He D. Smith Yian Yin Daniel A. McFarland James Y. Zou ALM LM&MA 176 204 0 03 Oct 2023
Prototype-based Dataset ComparisonIEEE International Conference on Computer Vision (ICCV), 2023 Nanne van Noord 151 10 0 05 Sep 2023
Changes to Captions: An Attentive Network for Remote Sensing Change CaptioningIEEE Transactions on Image Processing (IEEE TIP), 2023 Shizhen Chang Pedram Ghamisi 123 63 0 03 Apr 2023
GPT-4 Technical Report OpenAI OpenAI OpenAI Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad ... Shengjia Zhao Tianhao Zheng Juntang Zhuang William Zhuk Barret Zoph LLMAG MLLM 2.7K 19,069 0 15 Mar 2023
Goal Driven Discovery of Distributional Differences via Language DescriptionsNeural Information Processing Systems (NeurIPS), 2023 Ruiqi Zhong Peter Zhang Steve Li Jinwoo Ahn Dan Klein Jacob Steinhardt 186 59 0 28 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023 Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 892 6,060 0 30 Jan 2023
What's in a Decade? Transforming Faces Through Time Eric Chen Jin Sun Apoorv Khandelwal Dani Lischinski Noah Snavely Hadar Averbuch-Elor 162 8 0 13 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image ModelInternational Conference on Learning Representations (ICLR), 2022 Xi Chen Tianlin Li Soravit Changpinyo A. Piergiovanni Piotr Padlewski ... Andreas Steiner A. Angelova Xiaohua Zhai N. Houlsby Radu Soricut MLLM VLM 533 866 0 14 Sep 2022
GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language Zhiying Zhu Weixin Liang James Zou 118 11 0 30 Jun 2022
Flamingo: a Visual Language Model for Few-Shot LearningNeural Information Processing Systems (NeurIPS), 2022 Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr ... Mikolaj Binkowski Ricardo Barreira Oriol Vinyals Andrew Zisserman Karen Simonyan MLLM VLM 582 4,461 0 29 Apr 2022
Image Difference Captioning with Pre-training and Contrastive LearningAAAI Conference on Artificial Intelligence (AAAI), 2022 Linli Yao Weiying Wang Qin Jin SSL VLM 137 50 0 09 Feb 2022
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021 Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 1.7K 37,939 0 26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionInternational Conference on Machine Learning (ICML), 2021 Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 995 4,601 0 11 Feb 2021
Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning Xi Shen Alexei A. Efros Mathieu Aubry SSL 121 90 0 07 Mar 2019
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun Manohar Paluri 396 3,285 0 30 Nov 2017
StreetStyle: Exploring world-wide clothing styles from millions of photos Kevin Blackburn-Matzen Kavita Bala Noah Snavely 121 92 0 06 Jun 2017
Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the USProceedings of the National Academy of Sciences of the United States of America (PNAS), 2017 Timnit Gebru J. Krause Yilun Wang Duyun Chen Gaowen Liu Erez Aiden Lieberman Li Fei-Fei HAI 184 439 0 22 Feb 2017
3D Time-lapse Reconstruction from Internet Photos Ricardo Martín Brualla D. Gallup S. M. Seitz 138 23 0 10 Nov 2015
A Century of Portraits: A Visual Historical Record of American High School Yearbooks Shiry Ginosar Kate Rakelly Sarah Sachs Brian Yin Crystal Lee Philipp Krahenbuhl Alexei A. Efros 134 123 0 09 Nov 2015
Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping Sang Michael Xie Neal Jean Marshall Burke David B. Lobell Stefano Ermon 173 438 0 01 Oct 2015
Deep Visual-Semantic Alignments for Generating Image DescriptionsComputer Vision and Pattern Recognition (CVPR), 2014 A. Karpathy Li Fei-Fei 418 5,809 0 07 Dec 2014
Show and Tell: A Neural Image Caption GeneratorComputer Vision and Pattern Recognition (CVPR), 2014 Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 532 6,288 0 17 Nov 2014
Recognizing Image StyleBritish Machine Vision Conference (BMVC), 2013 Sergey Karayev Matthew Trentacoste Helen Han A. Agarwala Trevor Darrell Aaron Hertzmann Holger Winnemoeller 166 475 0 15 Nov 2013