DSpace at EWHA: An Automated Cryptocurrency Trading Approach Using Ensemble Deep Reinforcement Learning

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 577 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	강윤철	-
dc.contributor.author	Liu, Jing	-
dc.creator	Liu, Jing	-
dc.date.accessioned	2023-02-24T16:31:20Z	-
dc.date.available	2023-02-24T16:31:20Z	-
dc.date.issued	2023	-
dc.identifier.other	OAK-000000201867	-
dc.identifier.uri	https://dcollection.ewha.ac.kr/common/orgView/000000201867	en_US
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/264491	-
dc.description.abstract	Cryptocurrency is a digital asset which has drastic fluctuations in a 24-h market, it has become a sort of popular profitable trading instrument although the risk is high. Machine learning based Artificial Intelligent (AI) tools such as trading bot have received much attention recently. Several studies have applied machine learning approaches on financial market trend prediction or trading decision-making. Although these studies have achieved profitable performance, limitations are still existed due to the weakness of raw data on predicting a dynamic market, particularly the cryptocurrency market. To these issues, in this deep einforcement learning ensemble approach, multi-resolution candlestick images which contains temporal and spatial information of prices are used as the input data. The multi-resolution candlesticks image used in this study is composed of three candlesticks chart with the timespan of 15 minutes, 30 minutes, and 2hours. The purpose of this study is to train trading models which can generate optimal trading signals base on candlesticks image, and compare the performance of raw data with candlesticks image data. Reinforcement learning is defined as an approach training one or more than one agent explores in the environment to get observations with a reward. The goal of an agent is maximizing the cumulative reward in an episode task. Deep Q-Networks (DQN) combined deep learning with reinforcement learning algorithms and this approach can learn from nonlinear image data. Dueling Deep Q-networks (Dueling-DQN) is an improved DQN architecture which also learns the advantage of the selected action. Proximal Policy optimization (PPO) is an algorithm of policy-based learning which can optimize the policy directly. In this study, deep reinforcement learning algorithms DQN, Dueling-DQN and PPO are applied to generate trading signals of open long, short or close position, or stay idle. An ensemble approach of combining weighted results from multi agents is employed to enhance the robustness. In our proposed approach, illegal actions are handled by using the trick of action masking, which replaces the Q values of illegal actions by a large negative value in action choosing and learning step to speed up model convergence. In order to extract more relative features in both spatial and channel dimensions, attention mechanism CBAM is applied in our CNN backbone. In agent ensemble, trading signal is generated based on the Sortino ratio of validation process. Agents with the weight of Sortino ratio in top-3 ranking will be picked up for agent ensemble, and the decision is made by the approach of weighted voting. In the experiment, all three deep reinforcement learning models shared the same network architecture and adopted the same activation function and optimizer. We simulated the ensemble automated trading approach by using the OpenAI Gym and tested on two BTC/USTD dataset, which are 30-day bullish market and 15-day bearish market respectively. There are two branches of baselines for performance comparing, one is the models using single vector raw data or the multi-resolution raw data. The other one is the strategies of buy-and-hold, random policy, and heuristic policy. Evaluation metrics for the performance evaluation are cumulative return, volatility, Sortino ratio, trading coverage, and Max drawdown. Results showed that, models using candlesticks image data outperformed the raw data and baselines even the transaction cost is considered. And we conclude that, PPO using candlesticks image data have attained the best average performance during the models. The proposed studies indicated that, with candlesticks image data, results learned from deep reinforcement learning algorithms are explainable and convincing. This study is expected to inspire the researches on trading decision-making by using candlesticks image data. Moreover, this study has the potential to be applied in an application for trading signal recommendation.;가상화폐는 급격한 변동이 존재하는 디지털 자산으로, 리스크는 높지만 수익성 역시 높은 특성을 지니고 있다. 최근 금융시장에서는 트레이딩 봇과 같은 머신 러닝 기반 인공지능(AI) 도구가 주목받고 있으며, 여러 연구에서 금융 시장 동향 예측 또는 거래 의사 결정에 다양한 머신 러닝 알고리즘을 적용하고 있다. 다만, 기존 연구들은 주로 수익성 극대화에 연구의 초점을 두지만, 알고리즘에 입력되는 raw 데이터만으로 가상화폐 시장을 예측하는 것은 다이나믹한 시장 특성으로 인해 여전히 한계가 존재한다. 이러한 문제를 해결하기 위해 본 연구에서는 가격의 시공간 정보를 포함하는 multi-resolution (다중 시간간격) candlesticks 이미지를 심층 강화 학습 알고리 즘의 입력 데이터로 사용하는 방법을 제안한다. 특히, raw 데이터가 아닌 candlesticks 이미지 데이터 자체를 학습에 활용하며, 해당 결과를 기존의 방법과 비교하고자 한다. 또한, 본 연구에서는 심층 강화 학습 알고리즘 DQN, DuelingDQN 및 PPO등 다양한 강화학습 알고리즘을 활용하며, 특히 포지션을 close 하거나 거래하지 않는 idle 액션까지 고려한다. 이는 기존의 강화학습 활용 관련 연구들이 close 액션을 포함하지 않거나 rule-based 방식으로 포지션을 닫는 등의 비현실적 액션들을 해결하는 효과가 있다. 추가로 모델의 견고성을 향상시키기 위해 다중 에이전트의 의사결정들을 결합하는 앙상블 방법을 사용한다. 실험데이터로 바이낸스 마켓의 BTC/USTD 과거 데이터 중 30일 상승세와 15일 하락세 충 두 개의 데이터셋을 활용하였다. 결과에 따르면 candlesticks 이미지 데이터를 사용하는 모델은 트랜잭션 비용을 고려하더라도 raw 데이터만을 활용하거나 타 베이스라인 모델보다 더 향상된 퍼포먼스를 달성하는 것으로 나타났다. 추가로, 어텐션 기법을 활용하여 강화학습 알고리즘이 선택한 액션을 candlesticks 차트 상에서 시각적으로 나타냄으로써 다중 에이전트의 의사결정 설명성을 향상시켰다.	-
dc.description.tableofcontents	Ⅰ. Introduction 1 Ⅱ. Related Work 4 A. Financial Data 4 B. Machine Learning Approach 6 C. Ensemble Approach 8 Ⅲ. Research Methodology 9 A. Problem Description 9 B. Input Data 10 1. Candlesticks Image Data 10 2. Raw Data 12 C. Deep Reinforcement Learning 13 1. Deep Q-Networks (DQN) 13 2. Dueling Deep Q-Network (Dueling-DQN) 14 3. Proximal Policy Optimization (PPO) 15 D. Reward Function and Legal Actions 16 1. Reward Function 16 2. Action Masking 18 E. Attention Mechanism 19 F. Agent Ensemble 20 Ⅳ. Experiments and Results 23 A. Experiments 23 1. Datasets 23 2. Network Architecture 24 3. Evaluation Metrics 26 4. Baselines 27 B. Results and Discussion 28 1. Results on 30-day Datasets (Bullish Market) 29 2. Results on 15-day Datasets (Bearish Market) 35 3. Discussions on the Issue of Choosing Buy and Hold 39 4. Image Attention 42 Ⅴ. Conclusions 45 Reference 47 Appendix1. Details of Balance Cumulative Returns for Different Models and Baselines 52 Abstract (in Korean) 53	-
dc.format	application/pdf	-
dc.format.extent	2462002 bytes	-
dc.language	eng	-
dc.publisher	이화여자대학교 대학원	-
dc.subject.ddc	005.7	-
dc.title	An Automated Cryptocurrency Trading Approach Using Ensemble Deep Reinforcement Learning	-
dc.type	Master's Thesis	-
dc.title.subtitle	Learn to Understand Candlesticks	-
dc.title.translated	Candlestick 이미지 정보 및 심층강화학습을 이용한 가상화폐 자동매매 앙상블 기법	-
dc.format.page	vi, 54 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 빅데이터분석학협동과정	-
dc.date.awarded	2023. 2	-