ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08506
  4. Cited By
Does Visual Pretraining Help End-to-End Reasoning?

Does Visual Pretraining Help End-to-End Reasoning?

17 July 2023
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
    OCL
    LRM
    ViT
ArXivPDFHTML

Papers citing "Does Visual Pretraining Help End-to-End Reasoning?"

10 / 10 papers shown
Title
Slot State Space Models
Slot State Space Models
Jindong Jiang
Fei Deng
Gautam Singh
Minseung Lee
Sungjin Ahn
39
4
0
18 Jun 2024
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
22
7
0
30 Jun 2023
What Do Self-Supervised Vision Transformers Learn?
What Do Self-Supervised Vision Transformers Learn?
Namuk Park
Wonjae Kim
Byeongho Heo
Taekyung Kim
Sangdoo Yun
SSL
62
76
1
01 May 2023
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
175
494
0
22 Feb 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
231
341
0
22 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
573
0
22 Apr 2021
Learning Object Permanence from Video
Learning Object Permanence from Video
Aviv Shamsian
Ofri Kleinfeld
Amir Globerson
Gal Chechik
SSL
24
31
0
23 Mar 2020
1