ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00857
  4. Cited By
Bi-Directional Block Self-Attention for Fast and Memory-Efficient
  Sequence Modeling

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

3 April 2018
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Chengqi Zhang
    HAI
ArXivPDFHTML

Papers citing "Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling"

20 / 20 papers shown
Title
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
149
0
27 Apr 2022
One model Packs Thousands of Items with Recurrent Conditional Query
  Learning
One model Packs Thousands of Items with Recurrent Conditional Query Learning
Dongda Li
Zhaoquan Gu
Yuexuan Wang
Changwei Ren
F. Lau
27
17
0
12 Nov 2021
IBERT: Idiom Cloze-style reading comprehension with Attention
IBERT: Idiom Cloze-style reading comprehension with Attention
Ruiyang Qin
Haozheng Luo
Zheheng Fan
Ziang Ren
AIMat
15
10
0
05 Nov 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
Improving Long-Tail Relation Extraction with Collaborating
  Relation-Augmented Attention
Improving Long-Tail Relation Extraction with Collaborating Relation-Augmented Attention
Yongqian Li
Tao Shen
Guodong Long
Jing Jiang
Dinesh Manocha
Chengqi Zhang
13
16
0
08 Oct 2020
BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
  Outcomes
BiteNet: Bidirectional Temporal Encoder Network to Predict Medical Outcomes
Xueping Peng
Guodong Long
Tao Shen
Sen Wang
Jing Jiang
Chengqi Zhang
14
15
0
24 Sep 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
33
131
0
30 Jun 2020
Multi-Label Text Classification using Attention-based Graph Neural
  Network
Multi-Label Text Classification using Attention-based Graph Neural Network
Ankit Pal
M. Selvakumar
Malaikannan Sankarasubbu
29
80
0
22 Mar 2020
Sparse Sinkhorn Attention
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
14
330
0
26 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
  Translation
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
30
92
0
24 Feb 2020
Omni-Scale CNNs: a simple and effective kernel size configuration for
  time series classification
Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification
Wensi Tang
Guodong Long
Lu Liu
Dinesh Manocha
Michael Blumenstein
Jing Jiang
AI4TS
24
99
0
24 Feb 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
73
6,375
0
26 Sep 2019
Convolutional Self-Attention Networks
Convolutional Self-Attention Networks
Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu
24
124
0
05 Apr 2019
Dialogue Act Classification with Context-Aware Self-Attention
Dialogue Act Classification with Context-Aware Self-Attention
Vipul Raheja
Joel R. Tetreault
22
102
0
04 Apr 2019
Cross-lingual transfer learning for spoken language understanding
Cross-lingual transfer learning for spoken language understanding
Q. Do
Judith Gaspers
32
20
0
03 Apr 2019
Multi-Head Attention with Disagreement Regularization
Multi-Head Attention with Disagreement Regularization
Jian Li
Zhaopeng Tu
Baosong Yang
Michael R. Lyu
Tong Zhang
27
145
0
24 Oct 2018
Dynamic Self-Attention : Computing Attention over Words Dynamically for
  Sentence Embedding
Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding
Deunsol Yoon
Dongbok Lee
SangKeun Lee
24
43
0
22 Aug 2018
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global
  Dependencies Together
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Chengqi Zhang
43
14
0
02 May 2018
Neural Semantic Encoders
Neural Semantic Encoders
Tsendsuren Munkhdalai
Hong-ye Yu
219
131
0
14 Jul 2016
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
1