Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02182
Cited By
Regularizing and Optimizing LSTM Language Models
7 August 2017
Stephen Merity
N. Keskar
R. Socher
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regularizing and Optimizing LSTM Language Models"
50 / 508 papers shown
Title
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta
Eric Lundy
Priscilla Burity
24
0
0
09 May 2025
Smoothed Normalization for Efficient Distributed Private Optimization
Egor Shulgin
Sarit Khirirat
Peter Richtárik
FedML
82
0
0
20 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
93
0
0
10 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
51
0
0
02 Feb 2025
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Tomasz Rutowski
Amir Harati
Yang Lu
Elizabeth Shriberg
23
15
0
03 Jan 2025
Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
Haotian Qian
YD Chen
Shengtao Lou
F. Khan
Xiaogang Jin
Deng-Ping Fan
DiffM
37
6
0
26 Dec 2024
Robust Speech and Natural Language Processing Models for Depression Screening
Y. Lu
A. Harati
T. Rutowski
R. Oliveira
P. Chlebek
E. Shriberg
AI4MH
39
5
0
26 Dec 2024
Classification of residential and non-residential buildings based on satellite data using deep learning
Jai G Singla
18
0
0
11 Nov 2024
Don't Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune Attention in Extreme Multi-Label Text Classification
Debjyoti Saharoy
J. Aslam
Virgil Pavlu
VLM
34
0
0
30 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Financial Sentiment Analysis on News and Reports Using Large Language Models and FinBERT
Yanxin Shen
Pulin Kirin Zhang
AIFin
24
11
0
02 Oct 2024
Modelando procesos cognitivos de la lectura natural con GPT-2
Bruno Bianchi
Alfredo Umfurer
Juan Esteban Kamienkowski
26
0
0
30 Sep 2024
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Support
Adil Bahaj
Mounir Ghogho
38
2
0
24 Sep 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Ruiqi Zhong
Heng Wang
Dan Klein
Jacob Steinhardt
35
6
0
13 Sep 2024
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
Md. Rafi Ur Rashid
Jing Liu
T. Koike-Akino
Shagufta Mehnaz
Ye Wang
MU
SILM
36
3
0
30 Aug 2024
Interactive Topic Models with Optimal Transport
Garima Dhanania
Sheshera Mysore
Chau Minh Pham
Mohit Iyyer
Hamed Zamani
Andrew McCallum
OT
27
1
0
28 Jun 2024
Hidden Holes: topological aspects of language models
Stephen Fitz
P. Romero
Jiyan Jonas Schneider
35
0
0
09 Jun 2024
Thinking Tokens for Language Modeling
David Herel
Tomáš Mikolov
LRM
19
2
0
14 May 2024
Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Yida Mu
Peizhen Bai
Kalina Bontcheva
Xingyi Song
33
6
0
01 May 2024
Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models
Rishav Mukherji
Mark Schöne
Khaleelulla Khan Nazeer
Christian Mayr
David Kappel
Anand Subramoney
35
2
0
01 May 2024
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
Michelle S. Lam
Janice Teoh
James A. Landay
Jeffrey Heer
Michael S. Bernstein
27
40
0
18 Apr 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
31
1
0
12 Apr 2024
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution
Brandon Morgan
Dean Frederick Hougen
ODL
23
0
0
10 Apr 2024
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
Yuxin Wen
Leo Marchyok
Sanghyun Hong
Jonas Geiping
Tom Goldstein
Nicholas Carlini
SILM
AAML
26
9
0
01 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
27
3
0
01 Apr 2024
A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness
Zhenyu Sun
Ermin Wei
34
0
0
22 Mar 2024
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks
Reinhard Booysen
Anna Sergeevna Bosman
38
1
0
17 Mar 2024
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
Aisha Khatun
Anisur Rahman
Md. Saiful Islam
Hemayet Ahmed Chowdhury
A. Tasnim
24
2
0
08 Mar 2024
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma
G. Qiao
Yián Liu
L. Meng
N. Ning
Yang Liu
Shaogang Hu
AAML
MQ
26
3
0
06 Mar 2024
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis
Latifah Almurqren
Ryan Hodgson
A Ioana Cristea
39
3
0
04 Mar 2024
Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate
Can Jin
Tong Che
Hongwu Peng
Yiyuan Li
Dimitris N. Metaxas
Marco Pavone
44
43
0
05 Feb 2024
Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies
Bingshen Mu
Pengcheng Guo
Dake Guo
Pan Zhou
Wei-Neng Chen
Lei Xie
30
2
0
15 Dec 2023
Language Modeling on a SpiNNaker 2 Neuromorphic Chip
Khaleelulla Khan Nazeer
Mark Schöne
Rishav Mukherji
Bernhard Vogginger
Christian Mayr
David Kappel
Anand Subramoney
32
5
0
14 Dec 2023
A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
En-hao Liu
Xuefei Ning
Huazhong Yang
Yu Wang
DiffM
31
11
0
12 Dec 2023
Advancing State of the Art in Language Modeling
David Herel
Tomáš Mikolov
29
1
0
28 Nov 2023
BEND: Benchmarking DNA Language Models on biologically meaningful tasks
Frederikke Isa Marin
Felix Teufel
Marc Horlacher
Dennis Madsen
Dennis Pultz
Ole Winther
Wouter Boomsma
12
33
0
21 Nov 2023
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Rishav Mukherji
Mark Schöne
Khaleelulla Khan Nazeer
Christian Mayr
Anand Subramoney
30
2
0
13 Nov 2023
Parameter-Agnostic Optimization under Relaxed Smoothness
Florian Hübler
Junchi Yang
Xiang Li
Niao He
26
12
0
06 Nov 2023
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks
Xinting Huang
Jiajing Wan
Ioannis Kritikos
Nora Hollenstein
9
3
0
31 Oct 2023
Out-of-distribution Object Detection through Bayesian Uncertainty Estimation
Tianhao Zhang
Shenglin Wang
N. Bouaynaya
R. Calinescu
Lyudmila Mihaylova
OODD
21
2
0
29 Oct 2023
Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Tao Sun
Congliang Chen
Peng Qiao
Li Shen
Xinwang Liu
Dongsheng Li
34
3
0
23 Oct 2023
Controlled Randomness Improves the Performance of Transformer Models
Tobias Deuβer
Cong Zhao
Wolfgang Krämer
David Leonhard
Christian Bauckhage
R. Sifa
19
1
0
20 Oct 2023
Prototype of a robotic system to assist the learning process of English language with text-generation through DNN
Carlos Morales-Torres
Mario Campos Soberanis
Diego Campos-Sobrino
8
0
0
20 Sep 2023
Machine Learning Technique Based Fake News Detection
Biplob Kumar Sutradhar
Mohammad Zonaid
Nushrat Jahan Ria
S. R. H. Noori
22
2
0
18 Sep 2023
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification
Chenyu Zhao
Yunjiang Jiang
Yiming Qiu
Han Zhang
Wen-Yun Yang
RALM
26
5
0
18 Aug 2023
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev
Eldar Kurtic
Eugenia Iofinova
Elias Frantar
Alexandra Peste
Dan Alistarh
VLM
21
11
0
03 Aug 2023
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Jingjing Xue
Min Liu
Sheng Sun
Yuwei Wang
Hui Jiang
Xue Jiang
15
7
0
14 Jul 2023
Lookaround Optimizer:
k
k
k
steps around, 1 step average
Jiangtao Zhang
Shunyu Liu
Jie Song
Tongtian Zhu
Zhenxing Xu
Mingli Song
MoMe
29
6
0
13 Jun 2023
Revisiting Conversation Discourse for Dialogue Disentanglement
Bobo Li
Hao Fei
Fei Li
Shengqiong Wu
Lizi Liao
Yin-wei Wei
Tat-Seng Chua
Donghong Ji
35
1
0
06 Jun 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
27
128
0
23 May 2023
1
2
3
4
...
9
10
11
Next