ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.05353
  4. Cited By
Recurrent Stacking of Layers for Compact Neural Machine Translation
  Models

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

14 July 2018
Raj Dabre
Atsushi Fujita
ArXivPDFHTML

Papers citing "Recurrent Stacking of Layers for Compact Neural Machine Translation Models"

22 / 22 papers shown
Title
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
84
5
0
28 Oct 2024
The Tunnel Effect: Building Data Representations in Deep Neural Networks
The Tunnel Effect: Building Data Representations in Deep Neural Networks
Wojciech Masarczyk
M. Ostaszewski
Ehsan Imani
Razvan Pascanu
Piotr Milo's
Tomasz Trzciñski
41
19
0
31 May 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
35
42
0
10 Mar 2023
FusionFormer: Fusing Operations in Transformer for Efficient Streaming
  Speech Recognition
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
29
2
0
31 Oct 2022
On the optimization and generalization of overparameterized implicit
  neural networks
On the optimization and generalization of overparameterized implicit neural networks
Tianxiang Gao
Hongyang Gao
MLT
AI4CE
19
3
0
30 Sep 2022
Stable Invariant Models via Koopman Spectra
Stable Invariant Models via Koopman Spectra
Takuya Konishi
Yoshinobu Kawahara
23
3
0
15 Jul 2022
Streaming parallel transducer beam search with fast-slow cascaded
  encoders
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
37
15
0
29 Mar 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
23
58
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
31
15
0
11 Feb 2022
Deep Equilibrium Models for Video Snapshot Compressive Imaging
Deep Equilibrium Models for Video Snapshot Compressive Imaging
Yaping Zhao
Siming Zheng
Xin Yuan
53
19
0
18 Jan 2022
A global convergence theory for deep ReLU implicit networks via
  over-parameterization
A global convergence theory for deep ReLU implicit networks via over-parameterization
Tianxiang Gao
Hailiang Liu
Jia Liu
Hridesh Rajan
Hongyang Gao
MLT
36
16
0
11 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
110
9
0
08 Oct 2021
IndicBART: A Pre-trained Model for Indic Natural Language Generation
IndicBART: A Pre-trained Model for Indic Natural Language Generation
Raj Dabre
Himani Shrotriya
Anoop Kunchukuttan
Ratish Puduppully
Mitesh M. Khapra
Pratyush Kumar
49
70
0
07 Sep 2021
Deep Equilibrium Architectures for Inverse Problems in Imaging
Deep Equilibrium Architectures for Inverse Problems in Imaging
Davis Gilton
Greg Ongie
Rebecca Willett
49
181
0
16 Feb 2021
Implicit Feature Pyramid Network for Object Detection
Implicit Feature Pyramid Network for Object Detection
Tiancai Wang
Xinming Zhang
Jian Sun
ObjD
13
27
0
25 Dec 2020
Cascaded encoders for unifying streaming and non-streaming ASR
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Softmax Tempering for Training Neural Machine Translation Models
Softmax Tempering for Training Neural Machine Translation Models
Raj Dabre
Atsushi Fujita
28
11
0
20 Sep 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
  Translation
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
41
134
0
18 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
  Representation
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
Po-Han Chi
Pei-Hung Chung
Tsung-Han Wu
Chun-Cheng Hsieh
Yen-Hao Chen
Shang-Wen Li
Hung-yi Lee
SSL
9
147
0
18 May 2020
Heart Sound Segmentation using Bidirectional LSTMs with Attention
Heart Sound Segmentation using Bidirectional LSTMs with Attention
Tharindu Fernando
H. Ghaemmaghami
Simon Denman
Sridha Sridharan
Nayyar Hussain
Clinton Fookes
26
63
0
02 Apr 2020
Sharing Attention Weights for Fast Transformer
Sharing Attention Weights for Fast Transformer
Tong Xiao
Yinqiao Li
Jingbo Zhu
Zhengtao Yu
Tongran Liu
17
50
0
26 Jun 2019
1