Gaussian Error Linear Units (GELUs)

27 June 2016

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 783 papers shown

Title
4M: Massively Multimodal Masked Modeling David Mizrahi Roman Bachmann Ouguzhan Fatih Kar Teresa Yeo Mingfei Gao Afshin Dehghan Amir Zamir MLLM 44 63 0 11 Dec 2023
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion Yujie Wei Shiwei Zhang Zhiwu Qing Hangjie Yuan Zhiheng Liu Yu Liu Yingya Zhang Jingren Zhou Hongming Shan DiffM VGen 17 89 0 07 Dec 2023
Defense Against Adversarial Attacks using Convolutional Auto-Encoders Shreyasi Mandal AAML 23 1 0 06 Dec 2023
C3: High-performance and low-complexity neural compression from a single image or video Hyunjik Kim Matthias Bauer Lucas Theis Jonathan Richard Schwarz Emilien Dupont VGen 22 23 0 05 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 28 155 0 05 Dec 2023
HUGS: Human Gaussian Splats Muhammed Kocabas Jen-Hao Rick Chang J. Gabriel Oncel Tuzel Anurag Ranjan 3DGS 42 91 0 29 Nov 2023
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context Shashank Agnihotri Julia Grabinski M. Keuper 30 6 0 29 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames Shuming Liu Chen-Da Liu-Zhang Chen Zhao Bernard Ghanem 33 25 0 28 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 33 6 0 21 Nov 2023
Deep Learning-Based Real-Time Quality Control of Standard Video Compression for Live Streaming Matin Mortaheb M. A. Khojastepour S. Chakradhar S. Ulukus 13 1 0 21 Nov 2023
GRAM: An Interpretable Approach for Graph Anomaly Detection using Gradient Attention Maps Yifei Yang Peng Wang Xiaofan He Dongmian Zou 14 5 0 10 Nov 2023
Towards a Unified Framework of Contrastive Learning for Disentangled Representations Stefan Matthes Zhiwei Han Hao Shen 31 4 0 08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing Siddharth Srivastava Gaurav Sharma SSL 27 64 0 07 Nov 2023
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion Lunjun Zhang Yuwen Xiong Ze Yang Sergio Casas Rui Hu R. Urtasun 39 50 0 02 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation Juan Pablo Zuluaga Zhaocheng Huang Xing Niu Rohit Paturi S. Srinivasan Prashant Mathur Brian Thompson Marcello Federico BDL 27 2 0 01 Nov 2023
Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery Sarah Rastegar Hazel Doughty Cees G. M. Snoek 30 15 0 30 Oct 2023
Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement Ping Hu Simon Niklaus Lu Zhang Stan Sclaroff Kate Saenko 25 6 0 29 Oct 2023
TorchDEQ: A Library for Deep Equilibrium Models Zhengyang Geng J. Zico Kolter VLM 54 12 0 28 Oct 2023
Understanding the Effects of Projectors in Knowledge Distillation Yudong Chen Sen Wang Jiajun Liu Xuwei Xu Frank de Hoog Brano Kusy Zi Huang 26 0 0 26 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps Sidi Wu Yizi Chen Konrad Schindler L. Hurni 21 2 0 19 Oct 2023
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport Quentin Bouniot I. Redko Anton Mallasto Charlotte Laclau Karol Arndt Oliver Struckmeier Markus Heinonen Ville Kyrki Samuel Kaski 54 2 0 17 Oct 2023
SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation Tan-Hanh Pham Xianqi Li Kim-Doang Nguyen MedIm ViT 26 8 0 16 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi 28 12 0 15 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration Piyush Singh Pasi Karthikeya Battepati P. Jyothi Ganesh Ramakrishnan T. Mahapatra Manoj Singh 51 0 0 10 Oct 2023
Understanding the Feature Norm for Out-of-Distribution Detection Jaewoo Park Jacky Chen Long Chai Jaeho Yoon Andrew Beng Jin Teoh OODD 24 12 0 09 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation Yu-Huan Wu Shi-Chen Zhang Yun-Hai Liu Le Zhang Xin Zhan Daquan Zhou Jiashi Feng Ming-Ming Cheng Liangli Zhen ViT 45 3 0 08 Oct 2023
Deep Learning Based Uplink Multi-User SIMO Beamforming Design Cemil Vahapoglu Tim O'Shea Tamoghna Roy S. Ulukus 23 7 0 28 Sep 2023
Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks Matin Mortaheb M. A. Khojastepour S. Chakradhar S. Ulukus 13 0 0 27 Sep 2023
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification Hee-Soo Heo Ki-hyun Nam Bong-Jin Lee Youngki Kwon Min-Ji Lee You Jin Kim Joon Son Chung 26 1 0 26 Sep 2023
Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew Shaltiel Shmidman Avi Shmidman Amir DN Cohen Moshe Koppel 25 0 0 25 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 35 81 0 25 Sep 2023
On the Posterior Distribution in Denoising: Application to Uncertainty Quantification Hila Manor T. Michaeli UQCV 23 17 0 24 Sep 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening Zhonglin Cao Simone Sciabola Ye Wang 32 1 0 20 Sep 2023
PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement Jia-Yu Pan Shulin He Tianci Wu Hui Zhang Xueliang Zhang 19 0 0 19 Sep 2023
Limited-Angle Tomography Reconstruction via Deep End-To-End Learning on Synthetic Data Thomas Germer Jan Robine S. Konietzny Stefan Harmeling Tobias Uelwer MedIm 18 5 0 13 Sep 2023
Advancing Parsimonious Deep Learning Weather Prediction using the HEALPix Mesh Matthias Karlbauer Nathaniel Cresswell-Clay Dale Durran Raul A Moreno Thorsten Kurth Boris Bonev Noah D. Brenowitz Martin Volker Butz MDE 25 20 0 11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning Jiaming Han Renrui Zhang Wenqi Shao Peng Gao Peng-Tao Xu ... Yafei Wen Xiaoxin Chen Xiangyu Yue Hongsheng Li Yu Qiao MLLM 49 116 0 07 Sep 2023
3D Transformer based on deformable patch location for differential diagnosis between Alzheimer's disease and Frontotemporal dementia H. Nguyen Michael Clement Boris Mansencal Pierrick Coupé MedIm 28 0 0 06 Sep 2023
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation Michael Jungo Beat Wolf Andrii Maksai C. Musat Andreas Fischer 24 2 0 06 Sep 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis Esteve Valls Mascaro Hyemin Ahn Dongheui Lee CVBM 37 4 0 14 Aug 2023
Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation Liam Chalcroft Ruben Lourencco Pereira Mikael Brudfors Andrew S. Kayser M. D’Esposito Cathy J. Price Ioannis Pappas John Ashburner ViT 3DV MedIm 26 8 0 14 Aug 2023
Composable Function-preserving Expansions for Transformer Architectures Andrea Gesmundo Kaitlin Maile AI4CE 32 8 0 11 Aug 2023
Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies Zijian Lv X. Chen Zijian Feng 22 0 0 01 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? Ari Holtzman Peter West Luke Zettlemoyer AI4CE 30 14 0 31 Jul 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup Yan Sun Li Shen Hao Sun Liang Ding Dacheng Tao FedML 19 16 0 30 Jul 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering Khiem Vinh Tran Kiet Van Nguyen N. Nguyen ViT 23 2 0 28 Jul 2023
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs Or Sharir Anima Anandkumar 27 0 0 27 Jul 2023
Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity Matteo Ciotola Giovanni Poggi G. Scarpa 23 22 0 26 Jul 2023
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix Shashank Agnihotri Kanchana Vaishnavi Gandikota Julia Grabinski Paramanand Chandramouli M. Keuper 32 9 0 25 Jul 2023
Simultaneous temperature estimation and nonuniformity correction from multiple frames N. Oz O. Berman N. Sochen David Mendelovich I. Klapp 22 1 0 23 Jul 2023