ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.07095
  4. Cited By
Incorporating Context into Subword Vocabularies

Incorporating Context into Subword Vocabularies

13 October 2022
Shaked Yehezkel
Yuval Pinter
ArXivPDFHTML

Papers citing "Incorporating Context into Subword Vocabularies"

12 / 12 papers shown
Title
UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security
Binghui Wu
D. Divakaran
M. Gurusamy
57
0
0
06 Mar 2025
Linguistic Laws Meet Protein Sequences: A Comparative Analysis of
  Subword Tokenization Methods
Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods
Burak Suyunu
Enes Taylan
Arzucan Özgür
62
1
0
26 Nov 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
39
12
0
08 Oct 2024
Infusing clinical knowledge into tokenisers for language models
Infusing clinical knowledge into tokenisers for language models
Abul Hasan
Jinge Wu
Quang Ngoc Nguyen
Salomé Andres
Imane Guellil
Huayu Zhang
Arlene Casey
Beatrice Alex
Bruce Guthrie
Honghan Wu
25
1
0
20 Jun 2024
PatternGPT :A Pattern-Driven Framework for Large Language Model Text
  Generation
PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation
Le Xiao
Xin Shan
19
4
0
02 Jul 2023
MaxMatch-Dropout: Subword Regularization for WordPiece
MaxMatch-Dropout: Subword Regularization for WordPiece
Tatsuya Hiraoka
27
8
0
09 Sep 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a
  256GB Open-Source Legal Dataset
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
Improving Tokenisation by Alternative Treatment of Spaces
Improving Tokenisation by Alternative Treatment of Spaces
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
19
20
0
08 Apr 2022
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
65
0
24 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1