Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.01860
Cited By
v1
v2
v3
v4
v5 (latest)
Encoding high-cardinality string categorical variables
3 July 2019
Patricio Cerda
Gaël Varoquaux
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Encoding high-cardinality string categorical variables"
18 / 18 papers shown
Title
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Alan Arazi
Eilam Shapira
Roi Reichart
LMTD
205
0
0
23 May 2025
CLAMS: A System for Zero-Shot Model Selection for Clustering
Prabhant Singh
Pieter Gijsbers
Murat Onur Yildirim
Elif Ceren Gok
Joaquin Vanschoren
82
0
0
15 Jul 2024
CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference
Anirban Mukherjee
H. Chang
61
0
0
07 Apr 2024
Automated data processing and feature engineering for deep learning and big data applications: a survey
A. Mumuni
F. Mumuni
TPM
84
60
0
18 Mar 2024
CARTE: Pretraining and Transfer for Tabular Learning
Myung Jun Kim
Léo Grinsztajn
Gaël Varoquaux
LMTD
147
23
0
26 Feb 2024
Comparative Study on the Performance of Categorical Variable Encoders in Classification and Regression Tasks
Wenbin Zhu
Runwen Qiu
Ying Fu
25
4
0
18 Jan 2024
Encoding categorical data: Is there yet anything 'hotter' than one-hot encoding?
Ekaterina Poslavskaya
Alexey Korolev
56
7
0
28 Dec 2023
Vectorizing string entries for data processing on tables: when are larger language models better?
Léo Grinsztajn
Edouard Oyallon
Myung Jun Kim
Gaël Varoquaux
71
3
0
15 Dec 2023
Predicting delays in Indian lower courts using AutoML and Decision Forests
M. Bhatnagar
Shivraj Huchhanavar
83
1
0
30 Jul 2023
A benchmark of categorical encoders for binary classification
Federico Matteucci
Vadim Arzamasov
Klemens Boehm
ELM
59
5
0
17 Jul 2023
Saibot: A Differentially Private Data Search Platform
Zezhou Huang
Jiaxiang Liu
Daniel Alabi
Raul Castro Fernandez
Eugene Wu
64
7
0
01 Jul 2023
Categorising Products in an Online Marketplace: An Ensemble Approach
Kieron Drumm
26
0
0
26 Apr 2023
Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain
Morteza Mohammady Gharasuie
Fenjiao Wang
70
0
0
01 Dec 2022
Predicting Treatment Adherence of Tuberculosis Patients at Scale
Mihir Kulkarni
Satvik Golechha
Rishi Raj
J. Sreedharan
Ankit Bhardwaj
...
Jayakrishna Kurada
S. Mattoo
R. Joshi
K. Rade
Alpa Raval
65
3
0
05 Nov 2022
URANUS: Radio Frequency Tracking, Classification and Identification of Unmanned Aircraft Vehicles
Domenico Lofú
Pietro Di Gennaro
Pietro Tedeschi
Tommaso Di Noia
E. Sciascio
72
14
0
13 Jul 2022
Fairness Implications of Encoding Protected Categorical Attributes
Carlos Mougan
J. Álvarez
Salvatore Ruggieri
Steffen Staab
FaML
67
16
0
27 Jan 2022
From Strings to Data Science: a Practical Framework for Automated String Handling
John W. van Lith
Joaquin Vanschoren
17
1
0
02 Nov 2021
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features
F. Pargent
Florian Pfisterer
Janek Thomas
B. Bischl
49
88
0
01 Apr 2021
1