ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.07909
  4. Cited By
Neural Machine Translation of Rare Words with Subword Units

Neural Machine Translation of Rare Words with Subword Units

31 August 2015
Rico Sennrich
Barry Haddow
Alexandra Birch
ArXivPDFHTML

Papers citing "Neural Machine Translation of Rare Words with Subword Units"

50 / 3,808 papers shown
Title
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits
Xiang Zhang
Juntai Cao
Jiaqi Wei
Yiwei Xu
Chenyu You
LRM
14
0
0
20 May 2025
Byte Pair Encoding for Efficient Time Series Forecasting
Byte Pair Encoding for Efficient Time Series Forecasting
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
AI4TS
9
0
0
20 May 2025
FreeMesh: Boosting Mesh Generation with Coordinates Merging
FreeMesh: Boosting Mesh Generation with Coordinates Merging
Jian Liu
Haohan Weng
Biwen Lei
Xianghui Yang
Zibo Zhao
Zhuo Chen
Song Guo
Tao Han
Chunchao Guo
2
0
0
19 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
9
0
0
18 May 2025
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Jingcheng Niu
Subhabrata Dutta
Ahmed Elshabrawy
Harish Tayyar Madabushi
Iryna Gurevych
24
0
0
16 May 2025
Rethinking Repetition Problems of LLMs in Code Generation
Rethinking Repetition Problems of LLMs in Code Generation
Yihong Dong
Yuchen Liu
Xue Jiang
Zhi Jin
Ge Li
24
0
0
15 May 2025
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
Shaurya Sharthak
Vinayak Pahalwan
Adithya Kamath
Adarsh Shirawalmath
CLL
VLM
45
0
0
14 May 2025
Qwen3 Technical Report
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
50
0
0
14 May 2025
Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies
Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies
Xiaoliang Luo
Xinyi Xu
Michael Ramscar
Bradley C. Love
30
0
0
13 May 2025
TiSpell: A Semi-Masked Methodology for Tibetan Spelling Correction covering Multi-Level Error with Data Augmentation
TiSpell: A Semi-Masked Methodology for Tibetan Spelling Correction covering Multi-Level Error with Data Augmentation
Yutong Liu
Feng Xiao
Ziyue Zhang
Yongbin Yu
Cheng Huang
...
Thupten Tsering
Cheng Huang
Gadeng Luosang
Renzeng Duojie
Nyima Tashi
31
0
0
12 May 2025
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Harshvardhan Sikka
LM&Ro
VLM
203
0
0
08 May 2025
DMRL: Data- and Model-aware Reward Learning for Data Extraction
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
31
0
0
07 May 2025
GIF: Generative Inspiration for Face Recognition at Scale
GIF: Generative Inspiration for Face Recognition at Scale
Saeed Ebrahimi
Sahar Rahimi
Ali Dabouei
Srinjoy Das
Jeremy M. Dawson
Nasser M. Nasrabadi
CVBM
198
0
0
05 May 2025
Data Augmentation With Back translation for Low Resource languages: A case of English and Luganda
Data Augmentation With Back translation for Low Resource languages: A case of English and Luganda
Richard Kimera
DongNyeong Heo
Daniela N. Rim
Heeyoul Choi
167
0
0
05 May 2025
DNAZEN: Enhanced Gene Sequence Representations via Mixed Granularities of Coding Units
DNAZEN: Enhanced Gene Sequence Representations via Mixed Granularities of Coding Units
Lei Mao
Yuanhe Tian
Yan Song
23
0
0
04 May 2025
Parameter-Efficient Transformer Embeddings
Parameter-Efficient Transformer Embeddings
Henry Ndubuaku
Mouad Talhi
29
0
0
04 May 2025
Demystifying optimized prompts in language models
Demystifying optimized prompts in language models
Rimon Melamed
Lucas H. McCabe
H. H. Huang
44
0
0
04 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
34
0
0
03 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
1
0
01 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
106
1
0
01 May 2025
Hierarchical Multi-Label Generation with Probabilistic Level-Constraint
Hierarchical Multi-Label Generation with Probabilistic Level-Constraint
Linqing Chen
Weilei Wang
Wentao Wu
Hanmeng Zhong
37
0
0
30 Apr 2025
Modes of Sequence Models and Learning Coefficients
Modes of Sequence Models and Learning Coefficients
Zhongtian Chen
Daniel Murfet
90
1
0
25 Apr 2025
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Antonios Tragoudaras
Theofanis Aslanidis
Emmanouil Georgios Lionis
Marina Orozco González
Panagiotis Eustratiadis
MIACV
SILM
66
0
0
23 Apr 2025
Tokenization Matters: Improving Zero-Shot NER for Indic Languages
Tokenization Matters: Improving Zero-Shot NER for Indic Languages
Priyaranjan Pattnayak
Hitesh Laxmichand Patel
Amit Agarwal
30
0
0
23 Apr 2025
Compass-V2 Technical Report
Compass-V2 Technical Report
Sophia Maria
MoE
LRM
41
0
0
22 Apr 2025
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
Enes Özeren
Yihong Liu
Hinrich Schütze
36
0
0
21 Apr 2025
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Charles OÑeill
Tirthankar Ghosal
Roberta Răileanu
Mike Walmsley
Thang Bui
Kevin Schawinski
I. Ciucă
LRM
56
0
0
17 Apr 2025
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data
Hezhao Liu
Yang Lu
Mengke Li
Yiqun Zhang
Shreyank N Gowda
Chen Gong
Hanzi Wang
34
0
0
14 Apr 2025
MorphTok: Morphologically Grounded Tokenization for Indian Languages
MorphTok: Morphologically Grounded Tokenization for Indian Languages
Maharaj Brahma
NJ Karthika
A. Singh
D. Adiga
Smruti Bhate
Ganesh Ramakrishnan
Rohit Saluja
Maunendra Sankar Desarkar
34
0
0
14 Apr 2025
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
Yating Liu
Yaowei Li
Xiangyuan Lan
Wenming Yang
Zimo Liu
Q. Liao
34
0
0
14 Apr 2025
Parameterized Synthetic Text Generation with SimpleStories
Parameterized Synthetic Text Generation with SimpleStories
Lennart Finke
Chandan Sreedhara
Thomas Dooms
Mat Allen
Emerald Zhang
Juan Diego Rodriguez
Noa Nabeshima
Thomas Marshall
Dan Braun
SyDa
32
0
0
12 Apr 2025
SortBench: Benchmarking LLMs based on their ability to sort lists
SortBench: Benchmarking LLMs based on their ability to sort lists
Steffen Herbold
RALM
LRM
45
0
0
11 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
31
0
0
09 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
35
0
0
09 Apr 2025
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Monojit Choudhury
Shivam Chauhan
Rocktim Jyoti Das
Dhruv Sahnan
Xudong Han
...
Rituraj Joshi
Gurpreet Gosal
Avraham Sheinin
Natalia Vassilieva
Preslav Nakov
33
0
0
08 Apr 2025
High-Resource Translation:Turning Abundance into Accessibility
High-Resource Translation:Turning Abundance into Accessibility
Abhiram Reddy Yanampally
24
0
0
08 Apr 2025
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
Xinglin Lyu
Wei Tang
Yong Li
X. Zhao
Ming Zhu
...
Yaojie Lu
Min Zhang
Daimeng Wei
Hao Yang
Min Zhang
78
0
0
07 Apr 2025
Generative Large Language Model usage in Smart Contract Vulnerability Detection
Generative Large Language Model usage in Smart Contract Vulnerability Detection
Peter Ince
Jiangshan Yu
Joseph K. Liu
Xiaoning Du
37
0
0
07 Apr 2025
Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition
Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition
Denis Coquenet
AI4TS
41
0
0
04 Apr 2025
Limitations of Religious Data and the Importance of the Target Domain: Towards Machine Translation for Guinea-Bissau Creole
Limitations of Religious Data and the Importance of the Target Domain: Towards Machine Translation for Guinea-Bissau Creole
Jacqueline Rowe
Edward Gow-Smith
Mark Hepple
49
0
0
03 Apr 2025
Grammar-based Ordinary Differential Equation Discovery
Grammar-based Ordinary Differential Equation Discovery
Karin L. Yu
Eleni Chatzi
Georgios Kissas
45
0
0
03 Apr 2025
Enhancing Embedding Representation Stability in Recommendation Systems with Semantic ID
Enhancing Embedding Representation Stability in Recommendation Systems with Semantic ID
Carolina Zheng
Minhui Huang
Dmitrii Pedchenko
Kaushik Rangadurai
S. Wang
...
Yiping Han
Lin Yang
Hangjun Xu
Rong Jin
Shuang Yang
38
0
0
02 Apr 2025
From Smør-re-brød to Subwords: Training LLMs on Danish, One Morpheme at a Time
From Smør-re-brød to Subwords: Training LLMs on Danish, One Morpheme at a Time
Mikkel Wildner Kildeberg
Emil Allerslev Schledermann
Nicolaj Larsen
Rob van der Goot
35
0
0
02 Apr 2025
Overcoming Vocabulary Constraints with Pixel-level Fallback
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
43
0
0
02 Apr 2025
Tokenization of Gaze Data
Tokenization of Gaze Data
Tim Rolff
Jurik Karimian
Niklas Hypki
S. Schmidt
Markus Lappe
Frank Steinicke
41
0
0
28 Mar 2025
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
Hongxuan Tang
Hao Liu
Xinyan Xiao
45
1
0
27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
101
0
0
26 Mar 2025
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
Edoardo Ponti
199
0
0
25 Mar 2025
Understanding and Improving Information Preservation in Prompt Compression for LLMs
Understanding and Improving Information Preservation in Prompt Compression for LLMs
Weronika Łajewska
Momchil Hardalov
Laura Aina
Neha Anna John
Hang Su
Lluís Marquez
65
0
0
24 Mar 2025
Payload-Aware Intrusion Detection with CMAE and Large Language Models
Payload-Aware Intrusion Detection with CMAE and Large Language Models
Yongcheol Kim
Chanjae Lee
Young Yoon
49
0
0
23 Mar 2025
1234...757677
Next