ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15688
  4. Cited By
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
v1v2 (latest)

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

Annual Meeting of the Association for Computational Linguistics (ACL), 2021
31 December 2020
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
ArXiv (abs)PDFHTML

Papers citing "ERNIE-Doc: A Retrospective Long-Document Modeling Transformer"

25 / 25 papers shown
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference FrameworkIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Run Shao
Cheng Yang
Qiujun Li
Qing Zhu
Yongjun Zhang
...
Yu Liu
Yong Tang
Dapeng Liu
Shizhong Yang
Haifeng Li
625
0
0
08 Jan 2025
Graph-tree Fusion Model with Bidirectional Information Propagation for
  Long Document Classification
Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sudipta Singha Roy
Xindi Wang
Robert E. Mercer
Frank Rudzicz
193
1
0
03 Oct 2024
NACL: A General and Effective KV Cache Eviction Framework for LLMs at
  Inference Time
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yilong Chen
Guoxia Wang
Junyuan Shang
Shiyao Cui
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
Dianhai Yu
Hua Wu
287
38
0
07 Aug 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via
  Adaptive Heads Fusion
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen
Linhao Zhang
Junyuan Shang
Ying Tai
Tingwen Liu
Shuohuan Wang
Yu Sun
289
10
0
03 Jun 2024
Focus on the Core: Efficient Attention via Pruned Token Compression for
  Document Classification
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun
Mihyeon Kim
Youngbin Kim
339
11
0
03 Jun 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
486
17
0
14 Apr 2024
Labels Need Prompts Too: Mask Matching for Natural Language
  Understanding Tasks
Labels Need Prompts Too: Mask Matching for Natural Language Understanding TasksAAAI Conference on Artificial Intelligence (AAAI), 2023
Bo Li
Wei Ye
Quan-ding Wang
Wen Zhao
Shikun Zhang
VLM
351
4
0
14 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAGKELM
502
114
0
21 Nov 2023
Large Language Models are legal but they are not: Making the case for a
  powerful LegalLLM
Large Language Models are legal but they are not: Making the case for a powerful LegalLLM
Thanmay Jayakumar
Fauzan Farooqui
Luqman Farooqui
ELMAILawALM
299
31
0
15 Nov 2023
Incrementally-Computable Neural Networks: Efficient Inference for
  Dynamic Inputs
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs
Or Sharir
Anima Anandkumar
150
0
0
27 Jul 2023
S2vNTM: Semi-supervised vMF Neural Topic Modeling
S2vNTM: Semi-supervised vMF Neural Topic Modeling
Weijie Xu
Jay Desai
Srinivasan H. Sengamedu
Xiaoyu Jiang
Francis Iannacci
VLM
237
2
0
06 Jul 2023
Recurrent Action Transformer with Memory
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
473
14
0
15 Jun 2023
Neural Natural Language Processing for Long Texts: A Survey on
  Classification and Summarization
Neural Natural Language Processing for Long Texts: A Survey on Classification and SummarizationEngineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023
Dimitrios Tsirmpas
Ioannis Gkionis
Georgios Th. Papadopoulos
Ioannis Mademlis
AILawAI4TSAI4CE
516
44
0
25 May 2023
A General-Purpose Multilingual Document Encoder
A General-Purpose Multilingual Document Encoder
Onur Galoglu
Robert Litschko
Goran Glavaš
234
2
0
11 May 2023
Scaling Transformer to 1M tokens and beyond with RMT
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Andrey Kravchenko
LRM
422
114
0
19 Apr 2023
A Survey on Long Text Modeling with Transformers
A Survey on Long Text Modeling with Transformers
Zican Dong
Tianyi Tang
Lunyi Li
Wayne Xin Zhao
VLM
430
72
0
28 Feb 2023
Processing Long Legal Documents with Pre-trained Transformers: Modding
  LegalBERT and Longformer
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLMAILaw
327
46
0
02 Nov 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for
  Music Generation
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music GenerationNeural Information Processing Systems (NeurIPS), 2022
Botao Yu
Peiling Lu
Rui Wang
Wei Hu
Xu Tan
Wei Ye
Shikun Zhang
Tao Qin
Tie-Yan Liu
MGen
346
86
0
19 Oct 2022
Recurrent Memory Transformer
Recurrent Memory TransformerNeural Information Processing Systems (NeurIPS), 2022
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
523
170
0
14 Jul 2022
Revisiting Transformer-based Models for Long Document Classification
Revisiting Transformer-based Models for Long Document ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiang Dai
Ilias Chalkidis
Kenny Erleben
Desmond Elliott
VLM
281
95
0
14 Apr 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
235
87
0
23 Dec 2021
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
Chao-Hong Tan
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Zhenhua Ling
ViT
251
15
0
06 Oct 2021
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language
  Understanding and Generation
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Yu Sun
Shuohuan Wang
Shikun Feng
Siyu Ding
Chao Pang
...
Ouyang Xuan
Dianhai Yu
Hao Tian
Hua Wu
Haifeng Wang
301
579
0
05 Jul 2021
A Survey of Transformers
A Survey of TransformersAI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
651
1,457
0
08 Jun 2021
Self-Teaching Machines to Read and Comprehend with Large-Scale
  Multi-Subject Question-Answering Data
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Dian Yu
Kai Sun
Dong Yu
Claire Cardie
222
8
0
01 Feb 2021
1
Page 1 of 1