DSpace at EWHA: Deep Packet Inspection을 위한 해싱을 이용한 스트링 매칭 엔진

Browse

My Repository

DSpace at EWHA일반대학원 전자공학과 Theses_Master

View : 936 Download: 0

Deep Packet Inspection을 위한 해싱을 이용한 스트링 매칭 엔진

Title: Deep Packet Inspection을 위한 해싱을 이용한 스트링 매칭 엔진

Authors: 계지연

Issue Date: 2011

Department/Major: 대학원 전자공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 임혜숙

Abstract: Due to complex Internet environment and increased malicious attacks, the network security becomes a more important issue. As the emerging various applications and the rapidly increasing number of Internet users, the various levels of service qualities are also required. The deep packet inspection (DPI), which examines the payload of each input packet as well as the header of the packet, is an essential function to provide the security and the quality of service. The core of the DPI is a string match engine, and the string match engine examines in high-speed whether specific strings are included in the payload of the input packet. The efficiency of the DPI and the resource requirement for the DPI are highly related to string match algorithm. As one of the most representative string algorithms, Aho-Corasick algorithm constructs a finite state machine using the given strings (keywords) and then locates all occurrences of any keywords against the stream of an input text using the state machine. Using the Aho-Corasick algorithm, each character in the input stream is examined only once and all the matching keywords are returned. However, based on the ASCII code, the algorithm requires storing 256 next states in each entry of the table storing the state machine, and it results in excessive memory usage. Therefore, it is required to study reducing the memory usage in implementing Aho-Corasick algorithm. Hashing obtains an index for each keyword and the index is used to store the keyword into an entry of the table. If the number of keywords of a given set is much smaller than the actual possible keywords, the required memory amount storing the keywords will be reduced a lot using hashing. The proposed two algorithms in this dissertation are motivated on reducing the memory requirement in implementing the Aho-Corasick algorithm. The first proposed algorithm applies the hashing to reduce the number of next states, in which each next state corresponds to a column of the table. The string match table constructed by the first proposed algorithm is highly condensed. The second proposed algorithm applies the block-level hashing by getting multiple characters, which is the size of the string with the smallest length among the given keywords, as the input of the hashing. The constructed table of the proposed second algorithm becomes more condensed by reducing the number of states, which is equal to the number of table entries. For the performance evaluation, the state machines of Aho-Corasick algorithm and the proposed two algorithms are constructed for three sets of keywords, which have 50, 100, and 200 strings. The input stream of 1242 words is applied to each of the state machines. From the simulation result, it is shown that the proposed two algorithms reduce the memory requirement by reducing the width of the table and the number of table entries.;네트워크 환경이 복잡해지고 네트워크에 대한 악의적인 공격이 증가하면서 보안이 중요한 이슈가 되고 있다. 또한 다양한 어플리케이션의 등장과 인터넷 사용자 수의 급증으로 네트워크 자원상의 QoS(Quality of Service)같은 품질 보장 서비스가 요구되고 있다. 이 가운데 패킷의 헤더만이 아니라 데이터까지 분석하는 Deep Packet Inspection(DPI)의 도입이 필수적이다. DPI의 핵심은 패킷 내 데이터 스트림에서 특정 스트링을 고속으로 탐지할 수 있는 스트링 매칭 엔진이다. DPI가 요구되는 보안 시스템의 효율성이나 자원 요구량은 스트링 매칭 알고리즘에 크게 좌우된다. Aho-Corasick 알고리즘은 대표적인 스트링 매칭 알고리즘 중 하나로 주어진 스트링으로 유한 상태 기계를 구성하고, 이 데이터 구조를 통해 데이터 스트림 가운데 일치하는 스트링들을 찾아낸다. 데이터 스트림을 한번에 한 캐릭터씩 단 한번 읽음으로 모든 일치하는 스트링을 검색할 수 있다는 장점이 있다. 그러나 아스키 코드를 기반으로 했을 때 모든 상태에 대해 다음으로 진행할 256가지의 포인터에 대한 검색 테이블이 필요하므로 메모리 사용량이 매우 크다. 따라서 메모리를 효율적으로 사용하기 위한 여러 가지 방법들이 제시되고 있다. 해싱은 어떤 데이터의 탐색 키에 대해 산술적인 연산을 통해 인덱스를 얻고, 그 인덱스를 통해 배열로 데이터를 저장 및 검색하는 방식이다. 전체 탐색 키의 집합에 비해 실제로 저장되는 키의 수가 작을 경우 메모리 사용에 매우 효율적인 대안이 된다. 본 논문을 통해서 제안하는 두 가지 알고리즘은 기존의 Aho-Corasick 알고리 즘에 해싱을 적용하여 메모리 사용량의 감소를 이루었다. 제안하는 알고리즘 1은 하나의 캐릭터에 대해 해싱을 적용하여 상태 변환이 가능한 가지 수를 줄였다. 따라서 검색 테이블이 기존 알고리즘에 비해 매우 집약적이 되고 불필요한 메모리 사용량이 줄어든다. 제안하는 알고리즘 2는 한 캐릭터가 아닌 최소한의 스트링의 길이를 블록 크기로 정하여 블록 단위로 해싱을 적용하였다. 이를 통해 검색 테이블은 제안하는 알고리즘 1에 비해 더욱 집약되고, 메모리 사용량 또한 더욱 줄일 수 있다. 그러나 기존 알고리즘의 구성 및 검색 과정을 그대로 적용시킬 때 데이터 스트림에서 주어진 스트링을 찾지 못하는 경우가 발생한다. 이를 해결하기 위해 함수의 구성 및 검색 과정에 차이를 두었다. Aho-Corasick 알고리즘과 제안하는 알고리즘 1, 2의 성능 비교 분석을 위하여 총 길이 1242의 인풋 데이터 스트링에 대해 스트링의 개수를 50, 100, 200개로 늘려가면서 검색을 진행하였다. 본 논문에서 제안하는 알고리즘 1, 2가 기존 알고리즘에 비해 메모리 사용량이 크게 줄어든 것을 확인할 수 있었다.