DSpace at EWHA: 블룸필터를 활용한 효과적인 조화 알고리즘

Browse

My Repository

DSpace at EWHA일반대학원 전자전기공학과 Theses_Master

View : 521 Download: 0

블룸필터를 활용한 효과적인 조화 알고리즘

Title: 블룸필터를 활용한 효과적인 조화 알고리즘

Other Titles: Efficient set reconciliation algorithm using Bloom filters

Authors: 이승은

Issue Date: 2022

Department/Major: 대학원 전자전기공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 임혜숙

박형곤

Abstract: In various distributed network applications, maintaining data consistency among hosts is essential. If the difference of data among two hosts is large, the entire data owned by a host can be transmitted to the other. However, if the most part of the data is similar and the number of unique elements that each host owns is very small, it is inefficient to transmit the entire data to the other. Exchanging only the unique elements is desirable to reduce the amount and complexity of communication. Set reconciliation problem is to keep the data consistency among hosts while minimizing the amount of communication. The set reconciliation can be applied to various applications such as P2P(Peer-to-Peer) network, blockchain, link state routing, and mobile database synchronization, etc. Data consistency can also be maintained by recording the time after the communication and updates occur, but it has an issue of depending on previous information that each host has. Bloom filter has a simple bit vector structure that can check whether an element belongs to a given set. Several variants of a Bloom filter such as counting Bloom filter (CBF) and invertible Bloom filter (IBF) have been proposed to support the improved features of deletion and decoding of inserted elements. These two Bloom filters can be utilized for set reconciliation using the principle that identical elements of two sets are inserted to the same location of the two IBFs (or CBFs) under the same condition. In this paper, we propose a new set reconciliation algorithm that does not use an existing information such as time-stamped update logs. Our proposed algorithm is composed of two parts. The first part is to sort elements and identify the candidate ranges of unique elements of its own. Elements included in a host are sorted to ordered numbers and each subset of elements is programmed to a quarternary Bloom filter (QBF). The QBF is used to represent the signature of a subset. In other words, by comparing two QBFs constructed for each subset of identical ranges of two hosts, each host can identify whether the subset contains unique elements that the other host does not have. If the number of unique data between two hosts is extremely small, the number of subsets of containing unique elements is very small and hence the number of elements that need to be transmitted to the other host would be small. The second part of our algorithm uses IBFs for programming the elements included in candidate subsets of unique elements. By excluding the subsets of not containing unique elements and by programming only the subsets of containing unique elements to IBFs for decoding at the other hosts, set reconciliation of two set can be efficiently performed.;많은 분산 네트워크 어플리케이션에서 호스트들 간 데이터 일관성을 유지하는 것인 필수적이다. 만약, 호스트들이 저장한 데이터 간 차이가 크다면, 상대 호스트에게 자기가 가진 모든 정보를 보낼 수 있다. 그러나, 대다수 정보가 동일하고 각 호스트가 자신만 가진 원소, 즉, 고유원소의 수가 매우 적은 경우, 전체 데이터를 상대에게 전달하는 것은 비효율적이다. 통신량 및 복잡도를 감소시키기 위해서는 고유원소만 교환하는 것이 바람직하다. 집합 조화는 최소한의 통신으로 호스트들 간 데이터 일관성을 유지하는 것을 의미한다. 집합 조화는 다양한 어플리케이션에 응용될 수 있는데, 대표적으로 P2P(Peer-to-Peer) 네트워크, 블록체인, 링크 상태 라우팅과 모바일 데이터 동기화 등을 예시로 들 수 있다. 호스트들 간 통신이 이뤄지거나 업데이트가 완료된 시간을 기록하는 방식을 통해 데이터 일관성을 유지할 수도 있지만, 각 호스트가 가진 기존 정보에 의존하게 된다는 문제가 있다. 블룸필터는 어떤 원소가 주어진 집합에 속하는지 확인할 수 있는 간단한 비트 벡터 구조이다. 필터에 삽입한 원소의 삭제와 해독이라는 개선된 기능을 사용할 수 있도록 카운팅 블룸필터(Counting Bloom Filter, CBF)와 인버터블 블룸필터(Invertible Bloom Filter, IBF)와 같은 여러 블룸필터의 개선안들이 제시되었다. 이 두 가지 블룸필터는 동일한 조건 아래에서 같은 원소는 두 개의 IBF(또는 CBF)에서 같은 위치에 삽입된다는 점을 이용하여 집합 조화에 사용할 수 있다. 본 논문은 시간이 기록된 업데이트 로그와 같은 기존 정보를 사용하지 않는 새로운 집합 조화 알고리즘을 제안한다. 제안하는 알고리즘은 두 가지 과정으로 구성되어 있다. 첫 번째는 원소들을 정렬하고 고유원소 후보 영역을 식별하는 과정이다. 호스트 내 정보는 순서대로 정렬되고 각 부분집합의 원소들은 쿼터너리 블룸필터(Quaternary Bloom Filter, QBF)에 삽입된다. 이 때, QBF는 부분집합을 대표하는 축약으로 사용된다. 즉, 두 호스트에서 동일한 영역의 부분집합에 대해 생성된 두 QBF를 비교하면 각 호스트는 그 부분집합이 다른 호스트에 없는 고유원소를 포함하고 있는지를 판별할 수 있다. 만약 두 호스트가 가진 고유원소 수가 매우 적을 경우, 고유원소를 포함하는 부분집합의 수는 매우 작을 것이기 때문에 상대 호스트에게 전송해야 할 원소 수는 적다. 제안하는 알고리즘의 두 번째 과정은 IBF를 사용하여 고유원소를 포함할 수도 있는 부분집합에 포함된 원소들만 IBF에 삽입하는 것이다. 고유원소가 없는 부분집합을 제외한 후 고유원소를 포함한 부분집합만 상대 호스트에서 해독하기 위해 IBF에 삽입함으로써, 두 집합 간 집합 조화를 효과적으로 달성할 수 있다.