DSpace at EWHA: Future Object Localization in Autonomous Driving through Multi-Modal Sensor Fusion

Browse

My Repository

DSpace at EWHA일반대학원 전자전기공학과 Theses_Master

View : 359 Download: 0

Future Object Localization in Autonomous Driving through Multi-Modal Sensor Fusion

Title: Future Object Localization in Autonomous Driving through Multi-Modal Sensor Fusion

Other Titles: 자율 주행에서의 멀티모달 센서 합성을 통한 미래 객체 위치 예측

Authors: 조서영

Issue Date: 2023

Department/Major: 대학원 전자전기공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 강제원

Abstract: FOL (Future Object Localization)은 비디오 시퀀스에서 전방에 위치한 객체의 미래 위치를 예측하기 위한 이미지 처리 및 컴퓨터 비전 연구이다. 자율주행차량에서 취득한 1인칭 시점 영상에 FOL을 적용하면 현재 상태에서 미래 객체 위치를 정확하게 파악해 충돌을 피하고 인근 보행자와 차량의 안전을 보장할 수 있다. 본 논문에서는 자율 주행 시스템에서의 1인칭 시점 비디오와 LiDAR 데이터의 멀티모달 센서 합성을 사용하여 정확도가 높은 FOL 방법을 제안한다. 제안된 방법은 1인칭 시점의 이미지 및 움직임 (FOLe)을 사용하는 우리의 이전 확률적 FOL 프레임워크를 기반으로 한다 [58]. FOLe는 물체 위치 예상을 위한 미래 후보 네트워크(Future Candidate Network, FCN)와 미래 결정 네트워크(Future Decision Network, FDN)를 포함한 두 단계의 하위 네트워크로 구성된다. FCN은 물체가 나타날 가능성이 있는 위치를 알리기 위해 여러 가설을 생성하고, FDN은 다중 모달 분포를 예측하고 확률 분포를 최대화하여 객체의 최종 위치를 결정한다. 여기에 전용 네트워크를 통해 취득한 LiDAR 데이터를 융합해 3D 포인트 클라우드를 활용한다. 실험 결과는 제안된 모델이 최첨단 연구보다 우수한 성능을 제공한다는 것을 보여준다.;Future object localization (FOL) is an image processing and computer vision task to predict the future locations of foreground objects in a video sequence. When the FOL is applied to an ego-centric video obtained from an autonomous agent, it can help avoid a collision and ensure the safety of nearby pedestrians and vehicles, by accurately determining their future locations from the current status. In this thesis, we propose an accurate FOL method, using multi-modal sensor fusion of ego-centric videos and LiDAR data in autonomous driving system. The proposed method is based on our previous stochastic FOL using ego-centric images and motions (FOLe) framework [58]. FOLe consists of two staged sub-networks, including a future candidate network (FCN) and a future decision network (FDN) for localization. The FCN produces several hypotheses to inform where an object will probably appear, and the FDN predicts a multi-modal distribution and determines the final location by maximizing the probability distribution. On top of that, the LiDAR data is fused through a dedicated network to use 3D point cloud. Experimental results demonstrate that the proposed model provides a superior performance to the state-of-the-art studies.