ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.19474
  4. Cited By
Visual Riddles: a Commonsense and World Knowledge Challenge for Large
  Vision and Language Models

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

28 July 2024
Nitzan Bitton-Guetta
Aviv Slobodkin
Aviya Maimon
Eliya Habba
Royi Rassin
Yonatan Bitton
Idan Szpektor
Amir Globerson
Yuval Elovici
    ReLM
    VLM
    LRM
ArXivPDFHTML

Papers citing "Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models"

6 / 6 papers shown
Title
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal
  language models
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Piotr Padlewski
Max Bain
Matthew Henderson
Zhongkai Zhu
Nishant Relan
...
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Mikel Artetxe
Yi Tay
VLM
82
26
0
03 May 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
53
34
0
29 Apr 2024
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Simian Luo
Yiqin Tan
Suraj Patil
Daniel Gu
Patrick von Platen
Apolinário Passos
Longbo Huang
Jian Li
Hang Zhao
MoMe
100
139
0
09 Nov 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
113
65
0
13 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
1