View : 819 Download: 0

Understanding recurrent neural network for texts using English-Korean corpora

Title
Understanding recurrent neural network for texts using English-Korean corpora
Authors
Leea H.Song J.
Ewha Authors
송종우
SCOPUS Author ID
송종우scopus
Issue Date
2020
Journal Title
Communications for Statistical Applications and Methods
ISSN
2287-7843JCR Link
Citation
Communications for Statistical Applications and Methods vol. 27, no. 3, pp. 313 - 326
Keywords
KerasNeural machine translationNLPRNNSeq2Seq
Publisher
Korean Statistical Society
Indexed
SCOPUS; KCI scopus
Document Type
Article
Abstract
Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well. © 2020 The Korean Statistical Society, and Korean International Statistical Society.
DOI
10.29220/CSAM.2020.27.3.313
Appears in Collections:
자연과학대학 > 통계학전공 > Journal papers
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

BROWSE