ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.13043
  4. Cited By
When Do We Not Need Larger Vision Models?

When Do We Not Need Larger Vision Models?

19 March 2024
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
    VLM
    LRM
ArXivPDFHTML

Papers citing "When Do We Not Need Larger Vision Models?"

13 / 13 papers shown
Title
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Kung-Hsiang Huang
Can Qin
Haoyi Qiu
Philippe Laban
Shafiq R. Joty
Caiming Xiong
C. Wu
VLM
59
1
0
17 Feb 2025
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
Eui Jun Hwang
Sukmin Cho
Junmyeong Lee
Jong C. Park
SLR
43
4
0
20 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
63
11
0
16 Aug 2024
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie
Ce Zhou
Lang Gao
Juncheng Wu
Xianhang Li
...
Sheng Liu
Lei Xing
James Zou
Cihang Xie
Yuyin Zhou
LM&MA
MedIm
56
23
0
06 Aug 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
42
105
0
21 Dec 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
144
114
0
03 Mar 2023
Real-World Robot Learning with Masked Visual Pre-training
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
141
238
0
06 Oct 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
194
477
0
01 Oct 2021
Deep High-Resolution Representation Learning for Visual Recognition
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
170
3,480
0
20 Aug 2019
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
154
3,574
0
09 Dec 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
226
74,467
0
18 May 2015
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1