
![]() Recursion in Recursion: Two-Level Nested Recursion for Length
Generalization with Scalability Jishnu Ray Chowdhury Cornelia Caragea |
![]() What Algorithms can Transformers Learn? A Study in Length GeneralizationInternational Conference on Learning Representations (ICLR), 2023 |
![]() Sparse Universal TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
![]() Stack Attention: Improving the Ability of Transformers to Model
Hierarchical PatternsInternational Conference on Learning Representations (ICLR), 2023 |
![]() Efficient Beam Tree RecursionNeural Information Processing Systems (NeurIPS), 2023 Jishnu Ray Chowdhury Cornelia Caragea |
![]() FlashAttention-2: Faster Attention with Better Parallelism and Work
PartitioningInternational Conference on Learning Representations (ICLR), 2023 |
![]() Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023 |
![]() Block-State TransformersNeural Information Processing Systems (NeurIPS), 2023 |
![]() Exposing Attention Glitches with Flip-Flop Language ModelingNeural Information Processing Systems (NeurIPS), 2023 |
![]() Beam Tree Recursive CellsInternational Conference on Machine Learning (ICML), 2023 Jishnu Ray Chowdhury Cornelia Caragea |
![]() Towards Revealing the Mystery behind Chain of Thought: A Theoretical
PerspectiveNeural Information Processing Systems (NeurIPS), 2023 |
![]() RWKV: Reinventing RNNs for the Transformer EraConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
![]() Transformer Working Memory Enables Regular Language Reasoning and
Natural Language Length ExtrapolationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
![]() CoLT5: Faster Long-Range Transformers with Conditional ComputationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Joshua Ainslie Tao Lei Michiel de Jong Santiago Ontañón Siddhartha Brahma ...Mandy Guo James Lee-Thorp Yi Tay Yun-hsuan Sung Sumit Sanghai |
![]() Resurrecting Recurrent Neural Networks for Long SequencesInternational Conference on Machine Learning (ICML), 2023 |
![]() Adaptive Computation with Elastic Input SequenceInternational Conference on Machine Learning (ICML), 2023 |
![]() A Length-Extrapolatable TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
![]() Towards Reasoning in Large Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Jie Huang Kevin Chen-Chuan Chang |
![]() Simplicity Bias in Transformers and their Ability to Learn Sparse
Boolean FunctionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
![]() Transformers Learn Shortcuts to AutomataInternational Conference on Learning Representations (ICLR), 2022 |
![]() Neural Attentive CircuitsNeural Information Processing Systems (NeurIPS), 2022 |
![]() Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022 |
![]() Scaling Laws vs Model Architectures: How does Inductive Bias Influence
Scaling?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
![]() Confident Adaptive Language ModelingNeural Information Processing Systems (NeurIPS), 2022 |
![]() Recurrent Memory TransformerNeural Information Processing Systems (NeurIPS), 2022 |
![]() Neural Networks and the Chomsky HierarchyInternational Conference on Learning Representations (ICLR), 2022 |
![]() The Parallelism Tradeoff: Limitations of Log-Precision TransformersTransactions of the Association for Computational Linguistics (TACL), 2022 |
![]() Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022 |
![]() On the Parameterization and Initialization of Diagonal State Space
ModelsNeural Information Processing Systems (NeurIPS), 2022 |
![]() Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
Mechanisms in Sequence LearningNeural Information Processing Systems (NeurIPS), 2022 |
![]() Formal Language Recognition by Hard Attention Transformers: Perspectives
from Circuit ComplexityTransactions of the Association for Computational Linguistics (TACL), 2022 |
![]() Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022 |
![]() Transformer Quality in Linear TimeInternational Conference on Machine Learning (ICML), 2022 |
![]() Flowformer: Linearizing Transformers with Conservation FlowsInternational Conference on Machine Learning (ICML), 2022 |
![]() Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022 |
![]() Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021 |