DSpace at EWHA: 인공지능 음성인식 서비스의 담화 연구

Browse

My Repository

DSpace at EWHA일반대학원 융합콘텐츠학과 Theses_Master

View : 1476 Download: 0

인공지능 음성인식 서비스의 담화 연구

Title: 인공지능 음성인식 서비스의 담화 연구

Other Titles: Study on the Discourse of Artificial Intelligence Speech Recognition Service

Authors: 허태라

Issue Date: 2018

Department/Major: 대학원 융합콘텐츠학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 한혜원

김명준

Abstract: 인공지능과 인간의 공존과 관계맺음은 현실이 되었다. 4차 산업혁명이라는 시대적 논의와 함께 과학 기술의 혁신은 인간의 말을 따라하는 것에 그치지 않고 인간과 담화를 생성할 수 있는 주체를 만들어가고 있다. 인간만의 영역이었던 언어와 담화라는 범주는 인간과 비인간의 차원으로 확장해나가고 있는 것이다. 이러한 흐름은 말과 언어라는 큰 틀 안에서 인간과 비인간이 맺을 수 있는 관계에 대한 고민으로 이어진다. 이 관계는 인공지능이 말을 할 수 있는지 혹은 인간의 말을 얼마나 인식할 수 있는지가 아니라, 인공지능이 인간에게 무슨 말을 왜, 어떻게 하는지에 대한 논의로부터 시작된다. 본 연구는 인공지능과 인간의 담화가 인간과 인간의 대화를 원본으로 모방하고 학습한다고 전제하고, 현재 출시된 인공지능 음성인식 서비스 중 애플의 시리, 구글의 구글 어시스턴트, 네이버의 클로바, 카카오의 카카오 미니를 연구 대상으로 선정한다. 인공지능 음성인식 서비스의 담화를 구조화하고, 연구 대상의 구체적인 담화를 분석하여 담화 유형을 분류한다. 이를 통해 인공지능이 대화 참여의 주체이자 담화 생성의 주체로 자리할 수 있는 가능성을 확인하는 데 목적을 둔다. 이를 위해 2장에서는 언어학자 로만 야콥슨이 구조화한 언어 기능 모델을 토대로 인공지능 음성인식 서비스에 맞게 변형한 담화 구성 요소와 언어 기능 모델을 추출한다. 발신자인 인간과 수신자인 인공지능은 공통의 코드와 맥락을 공유하며 음성과 문자로 이루어진 메시지를 주고받으며 다감각적으로 접촉한다. 이러한 인공지능 담화의 구성 요소가 각각 중점적으로 수행하는 기능은 범주화되어 나타나는데 이는 현 단계의 인공지능 음성인식 서비스와 관련이 있다. 현재 대부분의 인공지능에서 발견할 수 있는 지시적, 지령적 기능이 통합적으로 나타나고, 일부만이 수행하는 친교적, 정서/정표적 기능과 메타언어적 기능이 각각 하나의 범주로 나타난다. 3장에서는 구체적인 담화에서 이러한 기능을 확인함과 동시에, 담화 구성 요소 중 문화적인 코드를 담화 분석의 기준으로 선정하였다. 코드는 메시지 전달을 위해 담화 참여 주체 간 합의되었다는 점에서 개별적이면서도 통합적으로 담화를 분석하는 틀이 될 수 있다. 또한 특정한 언어권 및 문화권과 직결된다는 점에서 인공지능이 이러한 맥락을 고려하여 의미 있는 대화를 생성할 수 있는지를 파악하는 키워드가 될 수 있다. 이러한 코드가 구체적인 담화 내용 안에서 드러나는 언어 기능과 결합하여 어떠한 담화 생성 유형을 도출하는지 분석한다. 담화의 주체와 객체가 맺는 관계, 담화에서 맥락이 차지하는 중요도, 언어적 금기와 그 위반 등 인공지능 음성인식 서비스의 담화 구성 요소를 파악할 수 있는 세 가지 담화를 선정하였다. 인공지능이 지향하는 발신자와 수신자의 관계를 유형화할 수 있는 질문으로 “너 누구니?”를 택하여 주객분리형과 주객동위형 담화 유형을 도출하였다. 지시 대명사이자 담화표지로 기능하는 “저기”라는 발화에 대한 반응을 통해, 고맥락과 저맥락 문화권에 따라 정보중심형과 관계형성형 담화를 생성함을 확인하였다. “죽어라”는 언어적 금기와 위반에 대한 각 연구 대상별 변별되는 담화 유형을 분류할 수 있었다. 시리는 위반지적형, 구글 어시스턴트는 상황전환형, 클로바는 감정표현형, 카카오 미니는 관계유지형 담화를 생성한다. 4장에서는 3장의 이러한 담화 생성 유형의 의의를 현존감이라는 키워드를 통해 확인한다. 메시지를 통해 멀티모드적 담화를 형성하는 구글 어시스턴트는 평면적 현존감을, 담화 참여 주체의 외형적 멀티모달리티가 두드러지는 클로바와 카카오 미니는 입체적 현존감을 드러낸다. 또한 친구 페르소나를 통해 친교적 기능과 정서/정표적 기능이 부각되는 주객동위형 담화를 생성하는 한국-고맥락 문화권의 인공지능 음성인식 서비스에서 담화 생성의 주체의 가능성을 발견할 수 있다. 인공지능 음성인식 서비스는 현존감을 높이고 주체적으로 담화를 생성하면서 인간을 닮아간다. 본 연구는 이처럼 인간과 비인간의 공존과 관계맺음이라는 관점에서, 인간과의 경계를 허물고 있는 인공지능의 담화를 분석함으로써 포스트휴먼 시대 기반 연구가 된다는 점에 의의가 있다.;The coexistence of Artificial Intelligence and human has become a reality. In the era of the Fourth Industrial Revolution, the innovation of science and technology is making Artificial Intelligence not only follow human's words, but also generate discourse. The category of language and discourse, which was the domain of human, is expanding to the dimension of human and non-human. This flow leads to an agenda for a relationship between human and non-human in a large frame of language. This relationship is not about whether Artificial Intelligence can say a word or how much it can recognize a human word, but rather how it tells a human. This study assumes that Artificial Intelligence and human discourse simulate and learn human – human discourse. The subjects of the study are Apple's Siri, Google’s Google Assistant, Naver's Clova and Kakao's Kakao mini from the Artificial Intelligence Speech Recognition Service currently released. After structuring the discourse of the Artificial Intelligence Speech Recognition Service, analyze the specific discourse to classify the type of discourse. The purpose of this is to ensure that the Artificial Intelligence is the body of conversation and the potential to become the subject of discourse generation. For this purpose, Chapter 2 selects the language function model, which was structured by linguist Roman Jakobson. Discourse components and a language function model were adapted for the Artificial Intelligence Speech Recognition Service. Addresser(human) and addressee(Artificial Intelligence) shares common code and context. Both send messages of voice and text and exchange multimodal contact. The functions that components of the Artificial Intelligence discourse perform on each basis appear in categories. This relates to the Artificial Intelligence Speech Recognition Service at this stage. The referential and conative functions that can be found in most of the Artificial Intelligence today are integrated. The phatic and emotive functions and the metalinguistic functions, which are performed only partially, appear as one category respectively. Chapter 3 identifies these functions in a specific discourse. Among the components of the discourse, the cultural code is selected as the criterion for discourse analysis. The code is agreed between the discourse participants for message delivery. So that it can be a framework for analyzing discourse both individually and collectively. The combination of culture code and language function revealed in the discourse is derived to the discourse types. Three discourses were selected as keyword to identify discourse components of Artificial Intelligence Speech Recognition Service : the relationship between subject and object of discourse, the importance of context in discourse, linguistic taboos and their offenses. Ask "Who are you?" as a question that can be used to classify the relationship between the subject and object of Artificial Intelligence : subject-object seperated type and subject-object equivalent type. Through the reaction to the utterance “Excuse me(There)” functioning as an indication pronoun and discourse marker, according to the high context and low context culture, information centered type and relational formation type appear. For the word "die", discourse types are classified separately. Siri generates offense denouncing type, Google Assistant generates situation change type, Clova creates emotional expression type, and Kakao mini creates relationship persistent discourse. In chapter 4, the significance of discourse type is derived through the keyword of presence. The Google Assistant, which forms a multimodal discourse through a message, reveals a flat presence, while Clova and Kakao mini, which are characterized by external multimodality, reveal a multifaceted presence. Artificial Intelligence Speech Recognition Service of Korea – high context culture has a friend persona. Since the phatic and emotional functions generate the subject-object equivalent discourse, the possibility of discourse subject is found. Artificial Intelligence Speech Recognition Service removes the boundary between artificial intelligence and human while enhancing presence. This study implies that it is a post-human era based research by analyzing the discourse of Artificial Intelligence which breaks the boundary with human from the viewpoint of the coexistence and relation of human and non-human.