首站-论文投稿智能助手
典型文献
Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
文献摘要:
Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages (LRLs).In the low-resource NMT paradigm,Transfer Learning (TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning (HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages (HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek (Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4.94 and 4.84 BLEU points for low-resource child languages Az → Zh and Uz → Zh,respectively.
文献关键词:
作者姓名:
Mieradilijiang Maimaiti;Yang Liu;Huanbo Luan;Maosong Sun
作者机构:
Institute for Artificial Intelligence,Beijing National Research Center for Information Science and Technology,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;Beijing Academy of Artificial Intelligence,Beijing Advanced Innovation Center for Language Resources,Beijing 100084,China
引用格式:
[1]Mieradilijiang Maimaiti;Yang Liu;Huanbo Luan;Maosong Sun-.Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation)[J].清华大学学报自然科学版(英文版),2022(01):150-163
A类:
Enriching,Trained,LRLs,HRLs,vocabularies,Azerbaijani,Uzbek,Uz
B类:
Transfer,Learning,Pre,Lexicon,Embedding,Low,Resource,Neural,Machine,Translation,Most,State,Of,Art,SOTA,NMT,systems,today,achieve,outstanding,results,only,large,parallel,corpora,scale,high,resource,languages,easily,obtainable,However,translation,quality,morphologically,still,unsatisfactory,mainly,because,data,sparsity,problem,encountered,Languages,In,low,paradigm,has,been,developed,into,one,most,efficient,methods,It,difficult,include,information,both,parent,child,models,well,initially,trained,that,contains,lexicon,features,word,embeddings,instead,this,work,aim,address,issue,by,proposing,independent,Hybrid,HTL,sharing,between,without,leveraging,back,manually,injecting,noises,First,High,its,Then,combine,pairs,using,oversampling,hybrid,initialized,previously,Finally,fine,tune,Besides,explore,some,exciting,discoveries,original,approach,Experimental,show,consistently,outperforms,five,two,Meanwhile,practical,significantly,better,achieving,improvements,up,BLEU,points,Zh,respectively
AB值:
0.489902
相似文献
Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach
Hui Yu;Xin-Yu Jiang;Jin Zhao;Hao Qi;Yu Zhang;Xiao-Fei Lia;Hai-Kun Liu;Fu-Bing Mao;Hai Jin-National Engineering Research Center for Big Data Technology and System,Huazhong University of Science and Technology,Wuhan 430074,China;Service Computing Technology and System Laboratory,Huazhong University of Science and Technology Wuhan 430074,China;Cluster and Grid Computing Laboratory,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,HUST,Wuhan;School of Computer Science and Technology at HUST,Wuhan;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan;Huazhong University of Science and Technology(HUST),Wuhan
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。