DSpace at EWHA: BERT를 이용한 학습용 판례 데이터셋 자동구축 및 판례 유사도 분석

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 1032 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	강윤철	-
dc.contributor.author	조희진	-
dc.creator	조희진	-
dc.date.accessioned	2022-08-03T16:30:45Z	-
dc.date.available	2022-08-03T16:30:45Z	-
dc.date.issued	2022	-
dc.identifier.other	OAK-000000191929	-
dc.identifier.uri	https://dcollection.ewha.ac.kr/common/orgView/000000191929	en_US
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/261735	-
dc.description.abstract	본 연구의 목적은 법률 인공지능 연구 활성화에 필수적인 학습용 판례 데이터셋 구축을 위한 자동 라벨링 방안을 마련하고, 학습용 판례 데이터셋을 활용한 판례 유사도 분석 방법을 검증하는 데에 있다. 문장의 의미적 인식을 높이기 위해 단어의 순서까지 학습할 수 있는 BERT를 사용하여 연구의 목적을 달성하였다. 학습 데이터로는 해고무효확인소송 등 고용관계 종료와 관련된 판례 데이터(1962.2 ~ 2021.2)를 사용하였다. 판결문은 판례 전문을 활용하였다. 학습용 판례 데이터셋 구축 방안의 성능 비교 실험을 위해 판결문의 특정 부분 간에 비교하였고, 판결문에서 추출한 사실관계로 주제를 분류하였다. 평가 척도로는 정확도(Accuracy)를 사용하였다. 판례 유사도 분석 실험은 빈도 기반의 자카드분석, 확률 기반의 Doc2Vec, 딥 네트워크 기반의 BERT 모델로 성능 비교하였고, 평가 방법으로는 정략적 방법의 코사인 유사도(Cosine Similarity)와 정성적 방법으로 의미적 분석을 선택하였다. 연구 결과에서 보이는 바와 같이, 학습용 판례 데이터셋 구축 시 판결 결과 라벨링에는 당사자 및 주문 정보를 활용하는 것이 효과적이고, 판례에서 뽑아낸 사실관계 문장으로 주제를 분류할 수 있었다. 또한 판례 유사도 분석 시 확률 기반의 Doc2Vec 보다는 BERT 모형의 결과가 의미적 관점에서 효과적이었다. 본 연구에서는 판결문에서 데이터셋 구축에 필요한 정보 유형을 확인하고, 주제분류를 통해 학습용 판례 데이터셋을 구축하였다. 또한 판례내용(판례전문)을 이용한 판례 유사도 분석 방안을 연구했다는 점에서 기존 연구와 차별성을 지닌다. ;Unlike other countries, there is no publicly available training dataset in Korea. That is why there is less legal artificial intelligence research. In the legal domain, the task of finding existing precedents similar to the factual findings is essential in predicting the expected legal issues and conclusions of the case. However, since the existing keyword-based precedent search system requires specialized expertise to select keywords and derives results only with the presence or absence of a search term, it is difficult to determine whether it is a precedent with similar factual findings. In this study, we propose a framework for creating a training dataset and deriving similar precedents of labor legal documents with BERT-based approach by considering the semantic similarity between factual findings and precedents.	-
dc.description.tableofcontents	Ⅰ. 연구 배경 및 목적 1 A. 학습용 판례 데이터셋 구축 1 B. 판례 유사도 분석 3 Ⅱ. 선행연구 8 A. 문서 자동분류 선행연구 8 B. 문서 유사도 분석 선행연구 10 Ⅲ. 연구기법 12 A. BERT 12 B. Doc2Vec 14 Ⅳ. 연구설계 16 A. 제안모형 16 B. 실험데이터 18 1. 데이터 수집 18 2. 판결문 구성 요소 20 C. 전처리 및 실험과정 21 1. 학습용 판례 데이터 자동 라벨링 21 2. 판례 유사도 분석 24 D. 실험평가 26 1. 정량적 분석 26 2. 정성적 분석 27 Ⅴ. 실험결과 및 해석 28 A. 학습용 판례 데이터 자동 라벨링 28 1. 판결 결과 라벨링 28 2. 주제 분류 29 B. 판례 유사도 분석 30 1. 코사인 유사도 30 2. 의미론적 분석 32 Ⅵ. 결론 및 향후연구 40 참고문헌 42 ABSTRACT 47	-
dc.format	application/pdf	-
dc.format.extent	1002091 bytes	-
dc.language	kor	-
dc.publisher	이화여자대학교 대학원	-
dc.subject.ddc	005.7	-
dc.title	BERT를 이용한 학습용 판례 데이터셋 자동구축 및 판례 유사도 분석	-
dc.type	Master's Thesis	-
dc.title.translated	A study on Dataset for training and Similarities of Labor Legal Precedents using BERT-based Approach	-
dc.creator.othername	Jo, Hui Jin	-
dc.format.page	viii, 47 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 빅데이터분석학협동과정	-
dc.date.awarded	2022. 8	-