Recursion in Recursion: Two-Level Nested Recursion for Length
Generalization with Scalability Jishnu Ray Chowdhury Cornelia Caragea |
What Algorithms can Transformers Learn? A Study in Length GeneralizationInternational Conference on Learning Representations (ICLR), 2023 |
Sparse Universal TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Stack Attention: Improving the Ability of Transformers to Model
Hierarchical PatternsInternational Conference on Learning Representations (ICLR), 2023 |
Efficient Beam Tree RecursionNeural Information Processing Systems (NeurIPS), 2023 Jishnu Ray Chowdhury Cornelia Caragea |
FlashAttention-2: Faster Attention with Better Parallelism and Work
PartitioningInternational Conference on Learning Representations (ICLR), 2023 |
Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023 |
Block-State TransformersNeural Information Processing Systems (NeurIPS), 2023 |
Exposing Attention Glitches with Flip-Flop Language ModelingNeural Information Processing Systems (NeurIPS), 2023 |
Beam Tree Recursive CellsInternational Conference on Machine Learning (ICML), 2023 Jishnu Ray Chowdhury Cornelia Caragea |
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
PerspectiveNeural Information Processing Systems (NeurIPS), 2023 |
RWKV: Reinventing RNNs for the Transformer EraConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Transformer Working Memory Enables Regular Language Reasoning and
Natural Language Length ExtrapolationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
CoLT5: Faster Long-Range Transformers with Conditional ComputationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Joshua Ainslie Tao Lei Michiel de Jong Santiago Ontañón Siddhartha Brahma ...Mandy Guo James Lee-Thorp Yi Tay Yun-hsuan Sung Sumit Sanghai |
Resurrecting Recurrent Neural Networks for Long SequencesInternational Conference on Machine Learning (ICML), 2023 |
Adaptive Computation with Elastic Input SequenceInternational Conference on Machine Learning (ICML), 2023 |
A Length-Extrapolatable TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
Towards Reasoning in Large Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Jie Huang Kevin Chen-Chuan Chang |
Simplicity Bias in Transformers and their Ability to Learn Sparse
Boolean FunctionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
Transformers Learn Shortcuts to AutomataInternational Conference on Learning Representations (ICLR), 2022 |
Neural Attentive CircuitsNeural Information Processing Systems (NeurIPS), 2022 |
Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022 |
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
Scaling?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Confident Adaptive Language ModelingNeural Information Processing Systems (NeurIPS), 2022 |
Recurrent Memory TransformerNeural Information Processing Systems (NeurIPS), 2022 |
Neural Networks and the Chomsky HierarchyInternational Conference on Learning Representations (ICLR), 2022 |
The Parallelism Tradeoff: Limitations of Log-Precision TransformersTransactions of the Association for Computational Linguistics (TACL), 2022 |
Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022 |
On the Parameterization and Initialization of Diagonal State Space
ModelsNeural Information Processing Systems (NeurIPS), 2022 |
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
Mechanisms in Sequence LearningNeural Information Processing Systems (NeurIPS), 2022 |
Formal Language Recognition by Hard Attention Transformers: Perspectives
from Circuit ComplexityTransactions of the Association for Computational Linguistics (TACL), 2022 |
Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022 |
Transformer Quality in Linear TimeInternational Conference on Machine Learning (ICML), 2022 |
Flowformer: Linearizing Transformers with Conservation FlowsInternational Conference on Machine Learning (ICML), 2022 |
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022 |
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021 |