ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.17425
  4. Cited By
Data Filtering Networks

Data Filtering Networks

29 September 2023
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
    CLIP
ArXivPDFHTML

Papers citing "Data Filtering Networks"

50 / 101 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
38
0
0
12 May 2025
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
Xiyuan Wei
Ming Lin
Fanjiang Ye
Fengguang Song
Liangliang Cao
My T. Thai
Tianbao Yang
LLMSV
24
0
0
10 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Y. Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
76
0
0
07 May 2025
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Simon Ging
Sebastian Walter
Jelena Bratulić
Johannes Dienert
Hannah Bast
Thomas Brox
CLIP
20
0
0
05 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
52
0
0
02 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shinýa Yamaguchi
Dewei Feng
Sekitoshi Kanai
Kazuki Adachi
Daiki Chijiwa
VLM
34
0
0
17 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
31
0
0
11 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
44
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
50
0
0
02 Apr 2025
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Weizhi Wang
Yu Tian
L. Yang
Heng Wang
Xifeng Yan
MLLM
VLM
74
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
56
2
0
01 Apr 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
67
0
0
27 Mar 2025
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng
Lingjuan Lyu
Zhenting Wang
X. Zhang
Vikash Sehwag
40
0
0
24 Mar 2025
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang
Yanjiang Liu
Ben He
Y. Lu
Hongyu Lin
Jia Zheng
Xianpei Han
Le Sun
Yingfei Sun
39
0
0
23 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
54
0
0
21 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
41
0
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models
Shree Singhi
Aayan Yadav
Aayush Gupta
Shariar Ebrahimi
Parisa Hassanizadeh
36
0
0
14 Mar 2025
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng
Alexander Schwing
OT
70
0
0
13 Mar 2025
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
Sanghyuk Chun
Sangdoo Yun
VLM
37
1
0
11 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
42
0
0
11 Mar 2025
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow
Evelien Riddell
Yimu Wang
Sean Sedwards
Krzysztof Czarnecki
3DPC
46
0
0
09 Mar 2025
Language-Assisted Feature Transformation for Anomaly Detection
EungGu Yun
Heonjin Ha
Yeongwoo Nam
Bryan Dongik Lee
61
0
0
03 Mar 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
96
0
0
21 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
65
7
0
20 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begum Demir
Ioannis Papoutsis
VLM
81
0
0
13 Feb 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano
M. W. Sun
James Burgess
Liangyu Chen
Jeffrey Nirschl
...
Xiaohan Wang
Yuhui Zhang
Alfred Seunghoon Song
Robert Tibshirani
Serena Yeung-Levy
LM&MA
VLM
MedIm
58
6
0
13 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
60
6
0
03 Jan 2025
HyperCLIP: Adapting Vision-Language models with Hypernetworks
HyperCLIP: Adapting Vision-Language models with Hypernetworks
Victor Akinwande
Mohammad Sadegh Norouzzadeh
Devin Willmott
Anna Bair
Madan Ravi Ganesh
J. Zico Kolter
CLIP
VLM
84
0
0
21 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level
  Vision-Language Alignment
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
88
6
0
20 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
  Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
ROOT: VLM based System for Indoor Scene Understanding and Beyond
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
H. Li
VLM
67
3
0
24 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic
  Vision-Language Negatives
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
50
7
0
04 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
36
3
0
04 Nov 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
93
1
1
01 Nov 2024
Learning Video Representations without Natural Videos
Learning Video Representations without Natural Videos
Xueyang Yu
Xinlei Chen
Yossi Gandelsman
VGen
AI4TS
44
0
0
31 Oct 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen
  Foundation Models
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
VLM
23
3
0
25 Oct 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
75
4
2
24 Oct 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM
  Pretraining
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Han Huang
Yuqi Huo
Zijia Zhao
Haoyu Lu
Shu Wu
B. Wang
Qiang Liu
Weipeng Chen
Liang Wang
VLM
25
1
0
21 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
43
1
0
18 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
Visual Scratchpads: Enabling Global Reasoning in Vision
Visual Scratchpads: Enabling Global Reasoning in Vision
Aryo Lotfi
Enrico Fini
Samy Bengio
Moin Nabi
Emmanuel Abbe
LRM
32
0
0
10 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained
  Vision-Language (CLIP) Models
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
Yufang Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Aimin Zhou
VLM
MLLM
38
1
0
04 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
C. L. P. Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
33
3
0
03 Oct 2024
Toward a Holistic Evaluation of Robustness in CLIP Models
Toward a Holistic Evaluation of Robustness in CLIP Models
Weijie Tu
Weijian Deng
Tom Gedeon
VLM
34
5
0
02 Oct 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
23
11
0
19 Sep 2024
High-Performance Few-Shot Segmentation with Foundation Models: An
  Empirical Study
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study
Shijie Chang
Lihe Zhang
Huchuan Lu
VLM
31
0
0
10 Sep 2024
123
Next