DSpace at EWHA: Convolutional Neural Networks 기반 건설재해 분류 모델 개발에 관한 연구

Browse

My Repository

DSpace at EWHA일반대학원 건축도시시스템공학과 Theses_Master

View : 1190 Download: 0

Convolutional Neural Networks 기반 건설재해 분류 모델 개발에 관한 연구

Title: Convolutional Neural Networks 기반 건설재해 분류 모델 개발에 관한 연구

Other Titles: A Study on the Characteristics and Categorization of Construction Disaster Data for Accuracy Improvement of Classification Model of Construction Cases Based on CNN Algorithm

Authors: 강현빈

Issue Date: 2019

Department/Major: 대학원 건축도시시스템공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 이준성

손정욱

Abstract: 건설 안전 분야는 건설산업 내에서도 데이터 축적이 가장 활발하게 일어나는 분야중 하나이며 사고 사례로 축적된 텍스트 데이터의 활용이 기대되는 분야이다. 해당 데이터들을 보다 충분히 학습하고 활용하여 기존에 담고 있던 지식 그 이상의 통찰을 얻어내기 위해서는 자연어 처리에 기반한 문서 및 문장 분류 기술이 필요하다. 최근 학계에서는 텍스트 분류 성능을 높이기 위한 신경망 모델 선정 및 모델 파라미터 최적화 연구가 다수 진행되고 있으나 지금까지 축적되어온 활용가능한 데이터의 질과 처리에 관한 고찰은 부족한 실정이며 특히 한국어로 축적된 데이터에 대한 분석은 언어 자체의 특수성과 분류 모델이 활발히 개발되지 못하는 특성으로 인하여 영문 텍스트 데이터에 대한 연구보다 활발히 진행되고 있지 못한 실정이다. 이에 본 논문은 1) 한국어 비정형 텍스트 데이터를 문서 내부의 정보를 학습하여 특정 범주로 분류하는 모델을 구현하고, 2) 한국어로 축적된 건설재해사례 데이터를 재해유형별로 분류하는 실험을 통하여 분류성능을 개선할 수 있는 방안을 데이터 관리 및 분석의 측면에서 제시하는 것을 목적으로 한다. 본 연구에서는 건설 중대재해 사례 텍스트 데이터를 CNN(Convolutional Neural Network)알고리즘을 사용하여 추락, 감전, 낙하, 붕괴, 협착 5가지의 범주로 분류하는 실험을 진행하였다. 실험의 초기 분류 성능은 29.44%였으나 분류 정확도가 낮게 나오는 이유를 분석하여 1) 키워드 및 숫자 전처리 여부, 2) 추락데이터의 세부 분류, 3) 복합사고에서 1차 원인으로의 분류, 4) 복합사고 제거 여부에 따른 정확도 성능 변화 실험을 진행하였다. 위 실험 결과 추락데이터에 대한 세부 분류를 제외한 세 가지 실험에서는 성능 향상 결과가 도출되었으며 복합사고를 제거할 경우 분류 성능이 50.42%까지 향상되었다. 건설 중대재해 중 복합사고를 기록하는 과정에서 한 사고 안에 여러가지 재해 속성이 혼재되어 기록되면서 기존 분류 범주에서의 정확한 분류를 방해하는 것으로 밝혀졌다. 본 연구 결과는 건설 재해 사례를 원하는 범주로 분류하고, 궁극적으로 그 속에서 정확한 정보를 추출하고 데이터를 원하는 곳에 활용하기 위해서는 복합사고와 같이 여러 사고가 함께 기인한 데이터에 대한 사전적인 분석과 처리가 필요함을 시사한다. 따라서 공개된 건설재해 데이터들을 활용한 적절한 예방대책과 데이터 분석 기반의 예측시스템, 위험 요소 감지 시스템 등의 개발을 위해서는 정부 및 기관에서 재해 사례를 기록함에 있어 복합사고의 경우 사고의 원인이 된 1차사고를 기준으로 데이터를 축적해야 한다. 또한 연구자들이 우리나라의 건설재해 사례 데이터를 분석함에 있어서 복합사고의 정제 및 처리에 주의를 기울인다면 연구의 결과 및 활용성에 있어서 저 나은 결과를 기대할 수 있을 것이다. ;The construction safety sector is the area where the most data is accumulated within the construction industry and where the data is expected to be fully utilized. Automatic classification of data into desired categories is the basis for smoothly extracting information from within vast amounts of data and enabling interaction with data from different areas. Recently, many scholars have been conducting research on selecting neural network models and optimizing model parameters to enhance classification performance, but there is a lack of consideration on the quality and processing of available data that has been accumulated so far. In this study, experiments for classifying the text data of construction severe accident cases into five categories are conducted: falls, electric shock, strikes, collapses, and crushes using CNN (Convolution Neural Network) algorithm. The initial classification performance was 29.44%, but the reasons for the low classification accuracy were analyzed and 1) keyword and number pre-treatment, 2) detailed classification of the falling data, 3) classification of the primary cause in the composite accident, 4) Accuracy Change experiment were conducted. The above experiment resulted in performance improvements in three experiments, excluding detailed classifications of falls data, and the classification performance was improved to 50.42% when complex accidents were eliminated. In the process of recording complex accidents during construction, it was found that various disaster properties were mixed and recorded within one accident, thus disturbing accurate classification in existing categories. The results of this study suggest that pre-analysis and processing of data from multiple accidents, such as complex accidents, are necessary to categorize the construction disaster cases into desired categories and extract accurate information from them. It also contributes to the usability of research by presenting a general-purpose text classification model in documents in the construction industry as well as construction cases.