DSpace at EWHA: Understanding Deep Neural Network for images and texts from a statistical point of view

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 1030 Download: 0

Understanding Deep Neural Network for images and texts from a statistical point of view

Title: Understanding Deep Neural Network for images and texts from a statistical point of view

Authors: 이하경

Issue Date: 2020

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 송종우

Abstract: Deep Learning is one of the machine learning (ML) methods to find features from a huge data using non-linear transformation. It is now widely used in many fields with several main architectures. For example, Convolutional Neural Network (CNN) has been the best technique for images since 2012. Recurrent Neural Network (RNN) is another prominent architecture, which has a strength in handling sequential data such as texts and time series. For users who consider deep learning models for real-world applications, Keras is a popular API for neural networks written in Python and also can be used in R. In this research, we first try to understand the parameter estimation procedure of Deep Neural Network from basics to advanced techniques as a statistical point of view. We also consider some supervised learning objectives using two main architectures. We describe the structure of CNN for Image Classification and build models on two benchmark images, MNIST and CIFAR10. We figure out some crucial steps that can improve the performance of CNN, as we found that several stacks of convolutional layers and batch normalization could improve the prediction accuracy on CIFAR10 dataset. We also compared the performances of classification with other ML methods on both datasets, including K-NN, Random Forest, and XGBoost. Furthermore, we try to understand the recurrent network. It differs from other networks in handling sequential data. As one of the main tasks recently in Natural Language Processing, we consider Neural Machine Translation (NMT) using RNNs. We summarize fundamental structures of the recurrent networks and some topics of representing natural words to reasonable numeric vectors. We organize those topics to figure out the entire estimation procedures from representing input source sequences to predict target translated sequences. Finally, we apply multiple translation models on English-Korean sentences from two different corpora; Colloquialism and News. We verified some factors that influence the quality of training as we found that the loss decreases by adding more recurrent units or using bidirectional RNN in the encoder. We also computed BLEU scores as the measures of the translation performance and stated the scores of Google Translate and Naver PaPago on the same test sentences. We summarize some difficulties when training a proper translation model as well as dealing with Korean language. Since we used Keras in Python for overall tasks from processing raw data to evaluating models, we sum up some useful functions and libraries as well. ;딥러닝은 비선형 변환을 사용하여 거대한 데이터에서 특징을 찾는 기계 학습 방법의 하나로서, 현재 몇 가지 주요 아키텍처들이 많은 분야의 지도학습 및 비지도학습에 널리 사용되고 있다. 예를 들어, 합성곱 신경망(CNN)은 2012년 이후 이미지를 다루는 최고의 기법이 되었으며 순환신경망(RNN)은 텍스트 및 시계열과 같은 순차적 데이터를 처리하는 데 강점을 가지고 있다. 실무에 딥러닝 모델을 고려하는 사용자에게 Keras는 Python으로 작성된 인공신경망을 위한 유용한 API로서, R에서도 사용할 수 있다. 본 연구에서는 우선 통계적 관점으로서 기초부터 심화 기법에 이르기까지 심층신경망의 파라미터 추정 절차를 이해하고자 한다. 또한, 이미지와 텍스트를 다루는 데 널리 사용되는 두 개의 주요 아키텍처 CNN과 RNN의 구조를 탐구하고 이를 실제 데이터를 활용해 지도학습 모형에 적용한다. 먼저 CNN의 구조를 설명하고, 벤치마크 데이터인 MNIST와 CIFAR10 이미지에 대해 각각 여러 가지 분류 모델을 구축한다. 우리는 여러 층의 합성곱 층을 연달아 쌓는 것과 배치 정규화가 예측 성능을 향상시킬 수 있다는 것을 발견함으로써 CNN의 성능을 개선할 수 있는 몇 가지 중요한 단계를 규명하였다. 더 나아가 두 데이터 셋에 대해 각각 K-NN, Random Forest, XGBoost 등 다른 머신러닝 모형을 적용하고 CNN과 이미지 분류 성능을 비교하였다. 순환신경망(RNN)은 순차적 데이터를 처리하는 데 있어서 다른 네트워크와 차이가 있다. 본 연구에서는 RNN을 이용해 최근 자연어 처리의 주요 과제 중 하나인 신경망 기계 번역(NMT)을 고려한다. 먼저 RNN의 기본 구조와 자연어의 벡터 표현 등 몇 가지 이론을 요약하고, 입력 소스 시퀀스에서 변환된 목표 시퀀스 예측에 이르는 추정 절차를 파악하기 위해 전반적인 구조를 탐구한다. 더 나아가 사례 분석으로서 쌍을 이루는 영어와 한국어 병렬 문장 데이터를 사용해 RNN 중에서도 GRU를 이용한 여러 개의 번역 모델을 구축한다. 이 데이터는 두 개의 다른 말뭉치 구어체, 뉴스에서 추출된 약 26,000개의 문장을 포함한다. 우리는 인코더에서 양방향 RNN을 사용하거나 RNN의 은닉 노드를 증가시킬 때 손실이 감소할 수 있다는 것을 발견하였고 학습의 품질에 영향을 미치는 몇 가지 요소들을 확인하였다. 또한 기계 번역 성능의 주요 척도로서 BLEU 스코어를 계산하고 같은 테스트 문장에서 구글 번역, 네이버 파파고의 성능 역시 확인하였으며 한국어 분석의 어려움과 함께 적절한 번역 모델을 학습시키고자 할 때 발생하는 어려움을 요약하였다. 모든 분석에서 원 데이터의 처리부터 모델 평가까지 Keras를 이용했기 때문에, 우리는 몇 가지 유용한 함수와 라이브러리를 소개한다.