DSpace at EWHA: 디지털 환경 내 아동학대 발견을 위한 알고리즘 개발

Browse

My Repository

DSpace at EWHA일반대학원 사회복지학과 Theses_Ph.D

View : 322 Download: 0

디지털 환경 내 아동학대 발견을 위한 알고리즘 개발

Title: 디지털 환경 내 아동학대 발견을 위한 알고리즘 개발

Other Titles: Development of an Algorithm for the Detection of Child Abuse in Digital Environment

Authors: 강희주

Issue Date: 2023

Department/Major: 대학원 사회복지학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 정익중

Abstract: This study aims to examine in depth the current state of child abuse in the digital environment and develop and evaluate the 'YouTube Child Abuse Detection Algorithm' to detect cases of child abuse in the digital environment quickly. This study is very significant in the first study that suggested using algorithms to detect children affected by child abuse in the digital environment at a time when there is no work and response manual for child abuse in the digital environment and the first step of child abuse intervention is not made. This study will contribute to expanding the scope of the existing child protection system that cannot keep up with the rapid pace of digital environmental change. The author designed the ‘Eating Show Featuring Children/Harmful Foods Recognition Algorithm Model' and ‘Children's Facial Expression Recognition Algorithm Model,' to recognize and classify child abuse. Moreover, as training data for deep learning, the images of 3,234 children's faces, 5,985 adult faces, 4,803 regular foods, 5,750 harmful foods, 2,339 negative emotional expressions, and 5,750 children's facial expressions excluding negative emotional expressions were collected. Finally, 634 videos were collected as testing data to evaluate the algorithm's performance, and the author directly monitored and performed content analysis. For testing data, 100 videos were randomly collected from the top 6 channels with the highest number of subscribers for kid's channels. The channels focused on eating show, pranks, and situational settings (Kang Hee-ju and Jeong Ick-joong, 2020), which accounted for 92.8% of videos where child abuse occurred, including 152 ground truths of 19 channels that were considered as child abuse in the previous studies by Kang Hee-ju and Jeong Ick-joong, 2020. As there were channels with less than 100 videos uploaded to the playlist, the total testing data was 634 videos. The videos were converted into quantitative data coded 0 when child abuse did not occur in the video and 1 when child abuse occurred. And new types of child abuse or violation of rights cases which were not included in the content analysis by Kang Hee-ju and Jeong Ick-joong(2020), were analyzed and described according to the situation and context, actions and statements of the cast and their reasons in detail and how the related content can have a negative influence on the children in the videos. When it was unclear whether child abuse occurred, reliability was finally set to 1 after going through a consensus process with the Department of Social Welfare professor. For the final evaluation of the algorithm performance, the content evaluation for validity was performed to compare and analyze the results from the detection of child abuse videos performed by the algorithm and the content analysis results from the 634 videos. With the results from the evaluation, the performance of the algorithm was evaluated by dividing the cases into three types: cases where the algorithm could not classify a video in which child abuse occurred, cases where the algorithm classified as child abuse, but child abuse did not actually occur, and cases where child abuse occurred, and the algorithm also classified as child abuse. The results and discussions from the study are as follows: First, the number of videos where child abuse occurred was 327, accounting for 51.6% of all 634 videos monitored. Among the 327 videos where child abuse occurred, the number of child abuse occurred in the eating show content was 175, accounting for 53.5%, which is a 6.8% increase from the 46.7% of the previous study conducted in 2020 (Kang Hee-ju, Jeong Ick-joong, 2020). The number of videos where child abuse occurred in daily content was 84 among 325 videos, accounting for 25.7%, which was an additional increase of 11.9% from 13.8% in a preceding study conducted in 2020(Kang Hee-ju, Jeong Ick-joong, 2020). Therefore, it was confirmed that child abuse that occurred in YouTube videos increased more than two years ago. The severity of child abuse in the digital environment, represented as YouTube, was reported frequently through media. However, the public lacks social sensitivity concerning child abuse, parents', primary caregivers', and producers' efforts for self-reflection to root out child abuse are insufficient, with inadequate social safety nets to protect the children featured in the digital space. Second, the sensitivity, the rate at which the algorithm recognizes videos where child abuse occurred, was 0.801. In other words, the algorithm could detect more than half of the videos where child abuse occurred. The sensitivity rate of eating show content that presented the highest rate of child abuse was 0.983, and for the prank and daily content, 0.765 and 0.702, respectively. Therefore, this study proved that the approach using the algorithm to detect YouTube videos where child abuse occurs is possible. Third, the specificity, the rate at which the algorithm recognizes videos in which child abuse has not occurred as no child abuse, was 0.407. Content with a low specificity rate was eating show(0.105), sponsored advertisement(0.375), situational setting(0.400), cooking(0.429), prank(0.438), daily life(0.525), and play(0.927). Therefore, it was confirmed that an expert's monitoring is required to reassess the videos that the algorithm recognized as child abuse occurred, including additional deep learning to improve the feature to classify harmful foods and children's facial expressions. Fourth, the rate that the algorithm predicted child abuse was very high, but the process that the algorithm recognized as child abuse was a mechanical process that classified video images based on deep learning, while the direct classification process done by a human was a comprehensive thinking process performed considering various context faced by the children. In conclusion, it was found that in addition to the detection of child abuse in the digital environment, humans can also detect situations where children's other rights are violated, including the protection of personal information and privacy, right to work, right to rest and play, right to express opinions, and damage relief. As a result, the need to prepare measures to guarantee not only the right to protect children but also other children's rights within the digital environment was derived. This study suggests as follows: First, the algorithm needs to be developed and actively used as a primary screening tool to detect child abuse cases in videos, as humans cannot detect videos where child abuse occurs in times when the videos are produced exponentially. In particular, while the kid's channels with a high number of subscribers can have opportunities for censorship due to the increased social interest in them, other channels with a low number of subscribers or views may have a higher probability of staying off the surveillance with more risk of child abuse. Therefore, the primary algorithm filtering should be performed not only for the high-ranking channels but also for the low-ranking channels. Second, performing the primary algorithm filtering on child abuse videos, human resources consisting of child welfare experts with sufficient sensitivity about child abuse has to be secured to detect the errors of the algorithm. And for the expertise of monitors, relevant training for students majoring in social welfare, education, and working-level staff at child and youth institutions should be prepared, including budget compilation and employment stability for the training of the professional personnel who will work related to child abuse in the digital environment. Third, as mobile phones and the Internet can be accessed easily, child abuse in the digital environment can also be found when people pay attention in homes, schools, and communities. Therefore, preventive education and awareness improvement to increase the sensitivity to child abuse should continue. In this way, our society should not only continue the existing child abuse prevention education but also include education on child abuse in the digital environment in digital content production, literacy education, and digital citizenship education to protect children in the digital environment and promote awareness of respect for children as subjects of rights. Fourth, considering the digital environment's impact on children, our society needs to emphasize the companies' responsibilities for children's protection and impose a duty to monitor and take action against the videos occurring child abuse and infringement of their rights. At the same time, the government should provide technological support for the companies to fulfill such duties. Additionally, it is necessary to integrate child protection in the digital environment into national child protection policies and perform Child Rights Impact Assessment on the digital environment. Moreover, by opening the results to the public, our society can urge companies to implement more responsible self-regulation to create a child-friendly digital environment. Fifth, apart from child abuse in the videos, the domestic legislative system does not cover children's labor rights, property rights, privacy protection rights, the right to play and to rest, and the right to an education that may be infringed due to the time spent filming. Therefore, the fundamental law on children must specify the provisions for protecting children's rights in the digital environment so that children's interests can be prioritized and guaranteed online and offline.;본 연구의 목적은 디지털 환경에서 발생하는 아동학대 현황을 구체적으로 파악 하고, 디지털 환경 안에서 발생하는 아동학대를 신속하게 발견하는 ‘유튜브 아동학대 검출 알고리즘(YouTube Child Abuse Detection Algorithm)’을 개발하여 그 성능을 평가하기 위함이다. 이 연구는 디지털 환경 내에서 발생하는 아동학대에 대한 업무 및 대응 매뉴얼이 전무하고, 아동학대 개입의 첫 단계인 ‘아동학대 발견’ 조차 이루어지지 못하고 있는 상황에서 알고리즘을 활용해 디지털 환경 내 아동학대 피해 아동을 발견하는 방안을 최초로 제시한 연구라는 점에서 그 의의가 매우 크다. 본 연구는 급격한 디지털 환경변화 속도에 따라가지 못하는 기존의 아동 보호 체계의 외연을 확장 시키는데 기여할 것이다. 연구자는 알고리즘 개발을 위해‘출연아동 먹방/유해음식 인식 알고리즘 모델’, ‘아동 표정 인식 알고리즘 모델’을 설계하였으며, 딥러닝을 위해 아동 얼굴 이미지 3,234장, 성인 얼굴 5,985장, 일반음식 4,803장, 유해음식 5,750장, 부정적 정서표현 이미지 2,339장, 부정적 정서표현을 제외한 아동 표정 이미지 5,750장을 학습데이터(training data)로 수집하였다. 알고리즘의 성능을 최종 평가하기 위해서 테스팅데이터(testing data) 634개의 영상을 수집하였으며, 연구자가 직접 모니터링하여 내용분석을 실시하였다. 테스팅데이터는 강희주, 정익중의 선행연구(2020)에서 아동학대로 판단된 19개의 채널 정답값 152개와 함께, 아동 출연 채널의 구독자 수 상위 6개 채널의 동영상 중에서 아동학대 발생 영상의 92.8%를 차지하는 먹방, 몰카, 상황설정 콘텐츠 위주로 100개의 동영상을 무작위로 수집하였다. 재생목록에 100개 미만의 동영상이 업로드된 채널도 있었으므로 테스팅데이터는 최종 634개의 영상이 수집되었다. 동영상에서 아동학대가 발생하지 않았을 때 0, 아동학대가 발생했을 때 1로 코딩하여 양적인 데이터로 변환하였으며, 강희주, 정익중(2020) 연구의 내용분석 결과에 포함되지 않는 새로운 아동학대나 권리침해 상황이 발생한 경우 상황과 맥락, 출연자들의 행동과 말, 그 이유를 구체적으로 전사하고 관련 내용이 출연아동에게 어떤 부정적인 영향을 미칠 수 있는지를 분석하여 기술하였다. 아동학대 여부가 모호한 영상의 경우 사회복지학과 교수와 최종적으로 의견을 일치시키는 과정을 거쳐 신뢰도를 최종적으로 1로 맞추었다. 최종 알고리즘 성능평가를 위해 알고리즘의 아동학대 영상검색 결과와 634개 동영상에 대한 내용분석 결과를 비교·분석하는 유효성 내용평가를 실시하였다. 그 결과, 아동학대 동영상을 알고리즘이 분류하지 못한 경우, 알고리즘이 아동학대라고 분류했으나 실제로 아동학대가 발생하지 않은 경우, 실제로 아동학대가 발생한 영상을 알고리즘도 동일하게 아동학대로 분류한 경우로 구분하여 알고리즘의 성능을 최종 평가하였다.연구결과 및 논의사항은 다음과 같다. 첫째, 아동학대가 발생한 동영상은 327개로, 전체 영상 634개의 51.6%를 차지하였다. 아동학대가 발생한 327개 영상 중에서 먹방 콘텐츠에서 발생하는 아동학대는 175개로 53.5%를 차지하였는데, 이 수치는 2020년도에 진행한 선행연구의 46.7%보다 6.8% 더 증가한 수치이다. 일상 콘텐츠에서 발생한 아동학대도 327개 영상 중 84개로 25.7%를 차지하여 2020년도 선행연구에서 발생한 비율 13.8%보다 11.9% 더 증가하였다. 이러한 결과는 유튜브로 대표되는 디지털 환경 내에서 발생하는 아동학대의 심각성이 언론을 통해 여러 번 보도되었음에도 여전히 출연아동에게 발생하는 아동학대에 대한 사회적 민감성이 부족하고, 이를 근절하기 위한 보호자 및 제작자들의 각성과 자정 노력이 미흡하며, 디지털 내에서 출연아동을 보호할 수 있는 사회안전망이 부재하다는 것을 시사한다. 둘째, 아동학대가 발생한 동영상을 알고리즘이 아동학대가 발생했다고 동일하게판단한 비율인 민감도(Sensitivity)는 0.801의 값을 보여 아동학대가 발생한 영상의 절반이 넘는 비율을 알고리즘이 검출하는 것으로 확인되었다. 아동학대가 가장 많이 발생한 먹방 콘텐츠 민감도는 0.983으로 매우 높은 수치를 보였으며, 몰카와 일상 콘텐츠 민감도도 각각 0.765, 0.702 값을 보였다. 이를 통해 본 연구는 디지털 환경에서 발생하는 동영상을 검색하는 데 있어 알고리즘의 활용이 가능하다는 것을 검증하였다. 셋째, 아동학대가 발생하지 않은 동영상을 아동학대가 발생하지 않았다고 판단한 비율인 특이도(Specificity)는 0.407로 나타났다. 특이도 비율이 낮은 콘텐츠는 먹방(0.105), 협찬광고(0.375), 상황설정(0.400), 요리(0.429), 몰카(0.438), 일상(0.525), 놀이(0.927) 순으로 나타나 유해음식 및 아동 표정 분류 기능 개선을 위한 추가적인 딥러닝과 함께, 알고리즘이 아동학대가 발생했다고 인식한 영상을 재판별하는 전문가의 모니터링이 필요함을 확인하였다. 넷째, 알고리즘이 아동학대를 예측하는 비율은 매우 높은 편이나 알고리즘이 아동학대로 인지하는 과정은 딥러닝 요소에 기반하여 영상 이미지를 기계적으로 분류하는 것과 다르게 인간의 직접적인 모니터링은 아동이 처한 다양한 맥락을 고려하는 종합적 사고과정인 것으로 나타났다. 그 결과, 인간은 디지털 환경에서 발생하는 아동학대 이외에도 아동의 개인정보 및 사생활 보장, 노동권, 쉴 권리와 놀 권리, 의견표명권, 피해구제 등 기타 아동의 권리가 침해되는 현황을 추가로 파악할 수 있는 것으로 나타났다. 디지털 환경 내에서 아동의 권리를 침해하는 다양한 상황이 출현하고 있는 상황에서, 아동의 보호권뿐만 아니라 기타 아동의 권리 보장을 위한 대책 마련의 필요성이 도출되었다. 본 연구의 제언내용은 다음과 같다. 첫째, 기하급수적으로 쏟아지는 동영상에서 인간의 힘만으로는 아동학대 영상을 발견하기란 불가능하므로 디지털 환경에서 발생하는 아동학대 동영상 검색의 1차 스크리닝 도구로 알고리즘을 개발해 적극적으로 활용해야 한다. 이미 구독자 수가높은 아동 채널은 사회적 관심을 받으면서 아동학대에 대한 검열의 눈을 스스로 가질 수 있는 반면에 구독자 수, 조회 수에서 낮은 순위를 보이는 아동출연 채널의 경우, 사람들의 감시망에서 벗어나 있어 출연아동이 더 위험한 상황에 놓일 수 있으므로 높은 순위 채널뿐만 아니라 낮은 순위의 채널을 대상으로도 알고리즘 1차 필터링을 실시해야 한다. 둘째, 아동학대 영상 1차 필터링을 알고리즘에 위임하되 알고리즘의 오류를 발견할 수 있도록 아동학대에 대한 민감성을 가진 아동복지 전문가로 구성된 모니터링 인력을 충분히 확보해야 한다. 모니터링 인력의 전문성을 함양하기 위해서는 사회복지, 교육학, 아동학을 전공하는 학생과 아동·청소년 기관 실무자들을 대상으로 관련 교육을 실시하고, 디지털 환경 아동학대 전문인력 양성을 위한 예산 편성 및 고용 안전성도 확보해야 한다. 셋째, 휴대폰과 인터넷의 보편화로 인해 디지털 환경 내 아동학대를 발견할 수 있는 현장은 일상 곳곳이 될 수 있으므로 가정, 학교, 지역사회 등에서 디지털 환경에서 발생하는 아동학대에 대한 민감성을 높이는 예방 교육과 인식개선을 꾸준히 진행해야 한다. 그 방법으로 기존의 아동학대 예방 교육뿐만 아니라 디지털 콘텐츠 생산 및 리터러시 교육, 디지털 시민 교육에 디지털 환경 내 아동학대에 대한 교육을 포함하여 디지털 환경 내에서 아동을 보호하고 아동을 권리의 주체로써 존중하는 인식이 확대되도록 해야 한다. 넷째, 디지털 환경이 아동에 미치는 영향을 특별히 고려하여 기업의 아동보호 책임을 강조하고, 기업이 아동학대 및 아동 권리 침해 영상에 대해 모니터링하고 조치할 수 있도록 의무를 부과하는 대신, 국가는 기업이 이 의무를 잘 수행할 수 있도록 기술적인 지원책을 제공해야 한다. 추가적으로 디지털 내에서의 아동보호가 국가아동보호 정책에 통합되도록 하여 디지털 환경에 대한 아동권리영향평가를 실시하고 그 결과를 대중에 공개하여 기업이 아동 친화적인 디지털 환경 조성을 위해 보다 책임 있는 자율규제를 이행할 수 있도록 촉구해야 한다. 다섯째, 국내 법제의 경우 출연아동에게 발생하는 아동학대 외에도 노동권과 재산권, 프라이버시 보호권, 촬영으로 소비되는 시간으로 인해 침해될 수 있는 놀권리와 쉴 권리, 교육권 등을 포괄하고 있지 못하므로 아동기본법에 디지털 환경에서의 아동권리 보호를 위한 법령 규정을 명시해 디지털 환경에서도 오프라인과 동일하게 아동 최우선의 이익이 보장될 수 있도록 해야 한다.