DSpace at EWHA: Development of machine learning-based algorithm for monitoring pre-convective environments using a geostationary imager

Browse

My Repository

DSpace at EWHA일반대학원 대기과학공학과 Theses_Ph.D

View : 521 Download: 0

Development of machine learning-based algorithm for monitoring pre-convective environments using a geostationary imager

Title: Development of machine learning-based algorithm for monitoring pre-convective environments using a geostationary imager

Authors: 이연진

Issue Date: 2022

Department/Major: 대학원 대기과학공학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 안명환

Abstract: In the remote sensing field, for the retrieval of total precipitable water (TPW) and convective available potential energy (CAPE) from geostationary satellite observations, a physical retrieval method using the one-dimensional variational system has been widely used. This method has high accuracy but largely depends on the information from numerical weather prediction models and provides a limited temporal and spatial resolution due to a too high computational load. Machine learning methods have been commonly used as ways for fast and reliable estimation. Thus, in this study, an algorithm based on an artificial neural network (ANN) model has been developed to retrieve clear-sky TPW and CAPE from the measured radiance in the infrared regions of Advanced Meteorological Imager (AMI) equipped in Korea’s geostationary meteorological satellite. The method is capable of directly predicting the nonlinear relationship between the parameters. Unlike the physical method, this method can fully utilize the much improved spatiotemporal resolution of the observation data. For the preparation of learning datasets of the algorithm, nine infrared brightness temperatures (BT) of AMI, six dual channel differences, temporal and geographic information, and a satellite zenith angle are used as input variables, and the TPW and CAPE from ECMWF model reanalysis (ERA5) data are used as the corresponding target values. For the optimization of hyper-parameters in the ANN model, the sensitivity tests including the number of hidden layers, the number of hidden neurons, and activation function are conducted. The other parameters including optimizer, batch size, and learning rate were tuned to achieve optimal performance using the learning dataset. To examine and quantify the feature attribution, the influence of input variables in the model is investigated. In the initial development stage, the retrieval algorithm was developed using Advanced Himawari Imager (AHI) equipped in Japan’s geostationary meteorological satellite which has almost similar specification AMI due to unavailable AMI data. At the beginning of the launch of the GK2A satellite, a traditional static (ST) learning approach, which uses a fixed period of learning data, was applied in the ANN model using AMI data and showed a limitation that the algorithm performance degrades over time. That is because the learning datasets are not sufficiently representative and comprehensive. To overcome the limitations of the ST learning method, we adopted an ANN model with incremental (INC) learning and compared the results with ST and INC methods. The INC ANN uses a dynamic dataset that begins with the existing weight information transferred from a previously learned model when new samples emerge. To prevent sudden changes in the distribution of learning data, this method uses a sliding window that moves along the data with a small contiguous portion. Through an empirical test, the update cycle and the window size of the model are set to be one day and ten days, respectively. In addition to this, to validate the developed ANN model, the accuracy test using reference data (radiosonde observation) and analysis using spatial and temporal error distribution in Northeast Asia during the one year are conducted. When compared to the radiosonde observation, the results show that the INC method results with a correlation coefficient of 0.96 and 0.64, a mean bias of -0.20 mm and 235.54 J/kg, and a root mean square error (RMSE) of 4.31 mm and 617.84 J/kg for TPW and CAPE, respectively. Similarly, the physical method results in a correlation coefficient of 0.97 and 0.64, a mean bias of 0.14 mm and 194.12 J/kg, and an RMSE of 3.69 mm and 657.48 J/kg for TPW and CAPE, respectively. Evaluation results reveal overall comparable error statistics compared with the physical method and stable error patterns over time. Lastly, to evaluate the potential for utilization of the retrieved TPW and CAPE for the short-term weather forecasting at a local scale, case studies are conducted on the convective severe weather events that occurred in the Korean peninsula in summer 2020 and 2021.;급격하게 발달하는 대류 시스템에 의해 발생한 악기상의 조기 예보는 매우 중요하지만 여전히 어려운 과제이다. 이러한 악기상의 조기 예보를 위하여 시공간적 해상도가 높은 기상위성은 대기 중 수증기 변동과 함께 불안정성 동향 모니터링에 활용해왔다. 대류 가용 잠재 에너지와 같이 대기 불안정도를 나타내는 지수와 함께 대기의 수증기량을 나타내는 총가강수량은 전대류적 대기 상태에 있는 악기상적 현상들을 효과적으로 나타내는 지표들이다. GeoKompsat-2A (GK2A)는 한국의 차세대 정지궤도 기상위성으로 2018년 12월에 발사되었고 영상기 (AMI)를 탑재하고 있다. 이 영상기는 가시 영역에서 적외 영역까지 총 16개의 채널을 가지며, 10분에 한 번 전구 영역을 관측하는 시간해상도와 위성 직하점에서 2 km의 공간 해상도 (적외 영역)를 갖는 등 시공간적 그리고 분광적으로 성능이 향상 되었다. 이러한 고해상도 위성 자료로부터 총가강수량과 대류 가용 잠재 에너지를 적시에 도출하려면 효율적인 알고리즘이 매우 필요하다. 원격 탐지 분야에서는 정지궤도 기상위성의 적외영역 관측자료를 이용하여 총가강수량과 대류 가용 잠재 에너지를 산출하기 위하여 1차 변분법을 이용한 물리적 방법을 사용해 왔다 이 방법은 정확도가 높지만 수치예보 모델 자료에 크게 의존하며 매우 높은 계산량으로 인하여 제한된 시간 및 공간 해상도를 제공하고 있다. 머신러닝 방법은 종종 빠르고 신뢰성 있는 목표 값 계산을 위한 방법으로 사용되어 왔다. 따라서 본 연구에서는 AMI의 적외 영역에서 측정된 복사휘도에서 청천영역 총가강수량과와 대류 가용 잠재 에너지를 산출하기 위한 신경망 모델 기반의 알고리즘이 개발되었다. 이 방법은 변수들 간의 비선형 관계를 직접적으로 예측할 수 있고, 물리적 모델과 달리 향상된 시공간 해상도를 가진 관측 자료를 충분히 활용할 수 있다는 장점을 가진다. 알고리즘의 학습 자료를 준비하기 위하여 AMI의 적외 영역 휘도 온도, 6개의 휘도 온도 차, 시간 및 지리적 정보, 위성 천정각을 입력 변수로 사용되었으며, 이에 상응하여 ECMWF 모델 재분석 (ERA5) 온∙습도 프로파일로부터 계산된 총가강수량과 대류 가용 잠재 에너지 값이 목표 값으로 사용된다. 신경망 모델의 초매개 변수들의 최적화를 위하여 활성화 함수와 은닉층의 수 및 은닉층의 뉴런 수에 대하여 민감도 테스트를 수행하였다. 준비된 훈련 자료에 대해 최적의 성능을 달성하기 위하여 가중치 최적화 방법, 배치 크기 및 학습 속도 포함한 초매개 변수들도 최적화하였다. 또한 학습 후 상대적 변수 중요도를 정량화하고 조사하기 위하여 신경망 모델에서 입력 변수들의 특성이 출력에 대해 가지는 기여도를 조사하였다. 개발 단계 초기에는 AMI 자료를 이용할 수 없었기 때문에 대체 자료로 AMI와 사양이 거의 동일한 Himawari-8 위성의 탑재된 영상기 (AHI) 관측 자료를 적용하였다. 이 후, GK2A 위성 발사된 초기에는 관측된 AMI 자료를 기존에 사용하던 고정된 기간의 학습 자료를 사용하는 정적 학습 방법을 신경망 모델에 적용하였다. 시간이 지남에 따라 자료를 얻기 때문에 초기 학습 자료가 충분히 대표적이고 포괄적이지 않아 시간이 지남에 따라 알고리즘 성능이 저하되는 한계를 보였다. 이러한 정적 학습 방법의 한계를 극복하기 위하여 점진적으로 학습하는 방법을 신경망 모델에 도입하였다. 점진적 방법을 적용한 신경망 모델은 새로운 자료가 생성될 때 이 자료들만을 사용하여 이전에 학습한 모델에서 전달된 매개 변수 (가중치와 편차) 정보를 이용하여 훈련한다. 이 때 학습 자료 분포의 급격한 변화를 방지하기 위하여 일정 범위의 자료를 시간에 따라 이동하는 슬라이딩 윈도우 방법을 사용하였다. 경험적 테스트를 통해 모델의 업데이트 주기와 윈도우 크기는 각각 1일과 10일로 설정하였고 기존의 정적 학습 방법과 점진적 학습 방법의 결과를 비교하였다. 이 외에도 개발된 신경망 모델을 검증하기 위하여 1년 동안의 동북아시아 영역에서 검증자료 (라디오존데 자료와 ERA5 자료)를 이용하여 정확도 테스트를 실시하였고, 오차의 시간적 안정도와 공간적 분포 분석이 수행되었다. 라디오존데 자료와 비교할 때 점진적 학습 방법의 결과는 TPW와 CAPE의 경우 각각 0.96 및 0.64의 상관 계수, -0.20 mm 및 235.54 J/kg의 편차, 4.31mm 및 617.84J/kg의 평균 제곱근 오차를 나타냈다. 마찬가지로 물리적 방법은 TPW와 CAPE에 대하여 상관 계수 0.97과 0.64, 편차 0.14mm와 194.12J/kg, 평균 제곱근 오차 3.69mm와 657.48J/kg가 나타났다. 물리적 알고리즘 대비 점진적 학습 알고리즘의 결과가 비교할만한 수준으로 나타났고, 시간에 따른 오차가 안정적으로 나타났다. 또한 물리적 방법과 비교하여 계산 시간이 약 100배 이상 빠르며 전반적으로 비교 가능한 수치를 보였다. 마지막으로 2020년과 2021년 한반도에서 여름에 발생한 악기상 현상에 대한 두 가지 사례를 통해 산출된 기상 변수들을 이용하여 단기 기상예보에 활용 가능성을 평가해 본다.