DSpace at EWHA: Advance Deep Learning Approaches for Detecting Objects, Tumors, and Landmarks in Diverse Domains

Browse

My Repository

DSpace at EWHAETC ETC

View : 103 Download: 0

Advance Deep Learning Approaches for Detecting Objects, Tumors, and Landmarks in Diverse Domains

Title: Advance Deep Learning Approaches for Detecting Objects, Tumors, and Landmarks in Diverse Domains

Authors: Thanaporn Viriyasaranon

Issue Date: 2024

Department/Major: 대학원 휴먼기계바이오공학부

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 최장환

Abstract: Detection is a vital aspect of computer vision, with applications spanning object recognition, image analysis, surveillance, and medical diagnosis, primarily utilizing machine learning algorithms, with deep learning being a prominent and highly successful approach known for enhancing accuracy and efficiency. Each image domain presents unique challenges that can impede the performance of a deep learning framework. For example, in the case of natural image datasets, there are challenges related to the presence of multiple scales of the objects; for X-ray security images, challenges arise from overlapping objects and heavy cluttering; and for medical image datasets, there is a scarcity of annotations. Therefore, this thesis proposes a deep learning-based strategy tailored to each image domain to achieve high performance in detection-based tasks. To address the challenge of multiscale objects and improve the detection performance of deep learning framework detectors, we introduced the NAS-gate convolutional module. This module incorporates multi-kernel and dilated rate convolutional operations using natural architecture search (NAS). Additionally, we introduced capsule attention modules, which can encode spatial relationships between objects effectively. The performance of the proposed modules and their integration with state-of-the-art object detectors were evaluated using NASGC-CapANet on both MS COCO and PASCAL VOC datasets. The experimental results demonstrate that NASGC-CapANet significantly enhances the detection performance compared to state-of-the-art baseline object detectors. For X-ray screening systems, the MFA-Net is an object detector designed to identify contraband items in both cargo and baggage security scans. It introduces a multiscale dilated convolutional module to address the object-scale variance issue in X-ray security scans. Furthermore, the fusion feature pyramid network combines attention and fusion modules to improve multiscale object recognition and mitigate object occlusion challenges. Additionally, an auxiliary point detection head is used to predict new keypoints of bounding boxes, emphasizing localization without requiring additional ground-truth information. The performance of the MFA-Net was evaluated on two large-scale X-ray security image datasets from different domains: the Security Inspection X-ray (SIXray) dataset in the baggage domain and our dataset, named CargoX, in the cargo domain. Notably, MFA-Net outperformed state-of-the-art object detectors in both domains, highlighting the potential for enhancing detection capabilities in X-ray security images by adopting these proposed modules. Advanced deep learning has been applied successfully in automated medical image analysis; however, the scarcity of annotations remains a challenge. To address this, a novel self-supervised pretext task called pseudo-shape segmentation (PSSeg) has been introduced, aimed at learning semantic features in a self-supervised manner by training transformer-based models to segment numerical signals representing geometric shapes inserted into original computed tomography (CT) images. Furthermore, a Convolutional Pyramid Vision Transformer (CPT) has been developed, utilizing multi-kernel convolutional patch embedding and local spatial reduction in each layer to generate multi-scale features, capture local information, and reduce computational costs. Consequently, the incorporation of PSSeg with CPT has led to significant performance improvements, surpassing state-of-the-art deep learning-based methods in tasks such as classification, tumor segmentation, and early-stage cancer detection in pancreatic and liver cancer datasets. Additionally, the proposed method has demonstrated high accuracy in MRI breast cancer datasets and enhanced robustness for small training and external validation datasets. The last application that is explored in this thesis is anatomical landmark detection. The multiresolution heatmap learning and the hybrid transformer-CNN (HTC) architecture were developed in order to increase the landmark localization accuracy and balance between the bias and variance of the predicted landmarks. Consequently, extensive experiments demonstrated that our approach outperforms state-of-the-art deep learning-based anatomical landmark localization networks on the numerical XCAT 2D projection images and two public X-ray landmark detection benchmark datasets. In order to enhance the landmark detection performance, the TriForceNet which is a landmark detection framework, featuring the Sequential Hybrid Transformer-CNN (SeqHTC), multiresolution heatmap learning, and multi-task learning with an auxiliary segmentation head for motion estimation. The experiments results demonstrate that TriForceNet outperforms state-of-the-art landmark detectors of both natural and medical image domains in 2D projection images of the XCAT head numerical phantom, and real patient CT scans (CQ500 dataset) that contain the patient movement of 6 DoF motion conditions with amplitudes of 6 mm and 10 mm. In addition, the anatomical landmark position information was utilized to estimate the patient motion parameter for motion artifacts reduction in 3D Cone beam CT volumes. To accurate estimate the motion parameters, the Dynamic Landmark Motion Estimation (DLME), which aims to reduce high-frequency noise and outliers in the motion sequence caused by landmarks localization errors, thereby preventing image quality degradation was proposed. Consequently, the proposed motion artifact reduction method significantly enhances image quality in 3D reconstruction for the same datasets ;검출은 컴퓨터 비전의 중요한 측면으로, 물체 인식, 이미지 분석, 감시 및 의료 진단에 걸쳐 다양한 응용 프로그램으로 나타납니다. 이를 위해 대부분 기계학습 알고리즘과, 특히 정확도와 효율성을 향상시키는 데 뛰어난 접근법으로 알려진 딥러닝을 활용합니다. 각 이미지 도메인은 딥러닝 프레임워크의 성능을 저해할 수 있는 고유한 도전 과제를 제시합니다. 예를 들어, 자연 이미지 데이터셋의 경우, 객체의 다양한 스케일의 존재와 관련된 도전 과제가 있으며, X-ray 보안 이미지의 경우, 겹치는 물체와 과도한 혼잡으로 인한 어려움이 있으며, 의료 이미지 데이터셋의 경우 학습을 위한 라벨 부족합니다. 그러므로 본 논문은 검출 기반 작업에서 고성능을 달성하기 위해 각 이미지 도메인에 맞춘 딥러닝 기반 전략을 제안합니다. 다중 스케일 객체의 도전 과제를 해결하고 딥러닝 프레임워크 검출기의 성능을 개선하기 위해 NAS-게이트 컨볼루션 모듈을 소개했습니다. 이 모듈은 자연 아키텍처 탐색(NAS)을 사용하여 다중 커널 및 확장 비율 컨볼루션 작업을 통합합니다. 또한, 객체 간의 공간적 관계를 효과적으로 인코딩할 수 있는 캡슐 어텐션 모듈을 소개했습니다. 제안된 모듈의 성능과 최신 객체 검출기의 성능은 NASGC-CapANet을 사용하여 MS COCO 및 PASCAL VOC 데이터셋에서 평가되었습니다. 실험 결과는 NASGC-CapANet이 최신 객체 검출기와 비교하여 검출 성능을 현저하게 향상시킨다는 것을 보여줍니다. X-ray 스크리닝 시스템을 위해 MFA-Net은 화물 및 수하물 보안 스캔에서 불법 물품을 식별하기 위해 설계된 객체 검출기입니다. 이는 X-ray 보안 스캔에서 물체 크기의 변동 문제를 해결하기 위해 다중 스케일 확장 컨볼루션 모듈을 도입합니다. 더불어 어텐션과 퓨전 모듈을 결합하는 퓨전 피라미드 네트워크가 다중 스케일 객체 인식을 개선하고 객체 가려짐 도전 과제를 완화하는 데 기여합니다. 또한, 보조 포인트 검출 헤드를 사용하여 경계 상자의 새로운 키포인트를 예측하며 추가적인 정답 라벨 정보 없이도 위치화 능력을 강조합니다. MFA-Net의 성능은 두 가지 대규모 X-레이 보안 이미지 데이터셋인 수하물 도메인의 Security Inspection X-ray(SIXray) 데이터셋과 화물 도메인의 CargoX 데이터셋에서 평가되었습니다. 특히 MFA-Net은 두 도메인 모두에서 최신 객체 검출기를 능가하며, 이러한 제안된 모듈을 채택함으로써 X-ray 보안 이미지의 검출 능력을 향상시킬 수 있는 잠재력을 강조합니다. 고급 딥러닝은 의료 이미지 분석에서 성공적으로 적용되었지만, 학습을 위한 라벨 부족이 여전히 도전적인 과제로 남아 있습니다. 이를 해결하기 위해, 의미론적 특성을 학습하기 위해 트랜스포머 기반 모델을 학습하여 원본 CT 이미지에 삽입된 기하학적 모양을 나타내는 숫자 신호를 분할하는 것으로 목표로 하는 새로운 자기지도 사전 작업인 의사 형태 분할(PSSeg)이 소개되었습니다. 더 나아가, 다중 커널 컨볼루션 패치 임베딩과 각 레이어에서 지역적 공간 축소를 활용하여 다중 스케일 기능을 생성하고 지역 정보를 캡처하며 계산 비용을 감소시키는 Convolutional Pyramid Vision Transformer (CPT)가 개발되었습니다. 결과적으로 PSSeg를 CPT와 통합하면 분류, 종양 분할 및 췌장암 및 간암 데이터셋의 조기 암 검출과 같은 작업에서 최신의 선행 딥러닝 기반 방법을 뛰어넘는 성능 향상을 이끌어 냈습니다. 또한, 제안된 방법은 MRI 유방암 데이터셋에서 높은 정확도를 보여주며 학습 및 외부 검증 데이터셋에 대한 향상된 강건성을 시연했습니다. 이 논문에서 탐구한 마지막 응용 프로그램은 해부학적 랜드마크 감지입니다. 랜드마크 위치화의 정확도를 높이고 예측된 랜드마크의 편향과 분산 사이의 균형을 조절하기 위해 다중 해상도 히트맵 학습 및 하이브리드 트랜스포머-CNN(HTC) 아키텍처가 개발되었습니다. 결과적으로 포괄적인 실험에서 개발된 방법이 수치적인 XCAT 2D 프로젝션 이미지 및 두 개의 공개 X-ray 랜드마크 감지 벤치마크 데이터셋에서 최첨단 딥러닝 기반 해부학적 랜드마크 감지 네트워크를 뛰어넘는 성능을 보여 주었습니다. 랜드마크 감지 성능을 향상시키기 위해 랜드마크 감지 프레임워크 인 TriForceNet이 개발되었습니다. 이 프레임워크는 Sequential Hybrid Transformer-CNN (SeqHTC), 다중 해상도 히트맵 학습 및 다중 학습을 위한 보조 세그멘테이션 헤드를 특징으로 합니다. 실험 결과는 TriForceNet이 6mm 및 10mm 진폭의 6 DoF 운동 조건을 가진 XCAT 머리 팬텀의 2D 프로젝션 이미지와 실제 환자 CT 스캔(CQ500 데이터셋)에서 자연 이미지 및 의료 이미지 도메인의 최첨단 랜드마크 감지기를 뛰어넘는 것을 보여줍니다. 게다가, 해부학적 랜드마크 위치 정보는 3D 콘빔 CT 볼륨의 모션 아티팩트 감소를 위해 활용되었습니다. 모션 파라미터를 정확하게 추정하기 위해 랜드마크 위치 오차에 의한 모션 시퀀스의 고주파 노이즈와 이상점을 감소시키는 Dynamic Landmark Motion Estimation (DLME)가 제안되었습니다. 결과적으로 제안된 모션 아티팩트 감소 방법은 동일한 데이터셋의 3D 재구성에서 이미지 품질을 현저하게 향상시킵니다.