DSpace at EWHA: Spectra-preserving Neural Representations for Videos

Browse

My Repository

DSpace at EWHA일반대학원 전자전기공학과 Theses_Master

View : 198 Download: 0

Spectra-preserving Neural Representations for Videos

Title: Spectra-preserving Neural Representations for Videos

Authors: 이지후

Issue Date: 2024

Department/Major: 대학원 전자전기공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 강제원

Abstract: Neural representations for videos (NeRV), which employs a neural network to parameterize video signals, introduces a novel methodology in video representations. However, existing NeRV-based methods have difficulty in capturing fine spatial details and motion patterns due to spectral bias, in which a neural network learns high-frequency (HF) components at a slower rate than low-frequency (LF) components. In this paper, we propose spectra-preserving NeRV (SNeRV) as a novel approach to enhance implicit video representations by efficiently handling various frequency components. SNeRV uses 2D discrete wavelet transform (DWT) to decompose video into LF and HF features, preserving spatial structures and directly addressing the spectral bias issue. To balance the compactness, we encode only the LF components, while HF components that include fine textures are generated by a decoder. Specialized modules, including a multi-resolution fusion unit (MFU) and a high-frequency restorer (HFR), are integrated into a backbone to facilitate the representation. Experimental results demonstrate that SNeRV outperforms existing NeRV models in capturing fine details and achieves enhanced reconstruction, making it a promising approach in the field of implicit video representations.;비디오 신호를 매개변수화 하는 비디오를 위한 신경망 표현 (Neural representations for videos, NeRV)은 비디오 표현 대한 새로운 방법론을 제시한다. 하지만 기존 NeRV 기반 모델들은 신경망이 저주파 구성 요소보다 느린 속도로 고주파 구성 요소를 학습하는 스펙트럼 편향 (spectral bias)으로 인해 공간상에서의 세부 정보 및 동작 패턴을 캡처하는 데 어려움을 겪는다. 본 논문에서는 다양한 주파수 성분을 효율적으로 처리하여 암시적 비디오 표현 모델의 성능을 향상시키는 새로운 접근 방식으로 스펙트럼 보존 NeRV (SNeRV)를 제안한다. SNeRV는 2차원 이산 웨이블릿 변환을 사용하여 비디오를 저주파 및 고주파 피처로 분해함으로써 공간 구조를 보존하고 스펙트럼 편향 문제를 직접 해결한다. 모델의 컴팩트함의 유지하기 위해 저주파 구성 요소만 인코딩하고 정밀한 질감을 포함하는 고주파 구성 요소는 디코더에 의해 생성된다. 이때 MFU (MultiResolution Fusion Unit) 및 HFR (High-Frequency Restorer)을 포함한 특수한 모듈이 백본에 통합되어 표현을 용이하게 한다. 실험 결과를 통해 SNeRV가 정밀한 세부 정보를 캡처하는 데 기존 NeRV 기반 모델들보다 성능이 뛰어나고 향상된 복원 성능을 달성하여 암시적 비디오 표현 분야에서 유망한 접근 방식임을 보여준다.