ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.11232
104
13
v1v2v3 (latest)

Efficient posterior sampling for high-dimensional imbalanced logistic regression

27 May 2019
Deborshee Sen
Matthias Sachs
Jianfeng Lu
David B. Dunson
ArXiv (abs)PDFHTML
Abstract

High-dimensional data are routinely collected in many application areas. In this article, we are particularly interested in classification models in which one or more variables are imbalanced. This creates difficulties in estimation. To improve performance, one can apply a Bayesian approach with Markov chain Monte Carlo algorithms used for posterior computation. However, current algorithms can be inefficient as nnn and/or ppp increase due to worsening time per step and mixing rates. One promising strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize recent piece-wise deterministic Markov chain Monte Carlo algorithms to include stratified and importance-weighted sub-sampling. We also propose a new sub-sampling algorithm based on sorting data-points. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.

View on arXiv
Comments on this paper