Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data

Dimitrovski, Ivica; Spasev, Vlatko; Kitanovski, Ivan

Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/33586

DC Field	Value	Language
dc.contributor.author	Dimitrovski, Ivica	en_US
dc.contributor.author	Spasev, Vlatko	en_US
dc.contributor.author	Kitanovski, Ivan	en_US
dc.date.accessioned	2025-05-21T07:33:48Z	-
dc.date.available	2025-05-21T07:33:48Z	-
dc.date.issued	2024-10-01	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/33586	-
dc.description.abstract	Accurate semantic segmentation of remote sensing imagery is critical for various Earth observation applications, such as land cover mapping, urban planning, and environmental monitoring. However, individual data sources often present limitations for this task. Very High Resolution (VHR) aerial imagery provides rich spatial details but cannot capture temporal information about land cover changes. Conversely, Satellite Image Time Series (SITS) capture temporal dynamics, such as seasonal variations in vegetation, but with limited spatial resolution, making it difficult to distinguish fine-scale objects. This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation that leverages the complementary strengths of both VHR aerial imagery and SITS. The proposed model consists of two independent deep learning branches. One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (MaxViT) backbone. The other branch captures complex spatio-temporal dynamics from the Sentinel-2 satellite image time series using a U-Net with Temporal Attention Encoder (U-TAE). This approach leads to state-of-the-art results on the FLAIR dataset, a large-scale benchmark for land cover segmentation using multi-source optical imagery. The findings highlight the importance of multi-modality fusion in improving the accuracy and robustness of semantic segmentation in remote sensing applications.	en_US
dc.relation.ispartof	arXiv preprint arXiv:2410.00469	en_US
dc.subject	Earth observation, semantic segmentation, remote sensing, multi-modality fusion, deep learning	en_US
dc.title	Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data	en_US
dc.type	Preprint	en_US
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
Appears in Collections:	Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:

File	Size	Format
2410.00469v1.pdf	2.13 MB	Adobe PDF	View/Open

Прикажи едноставен запис

Google Scholar^TM

Проверете

Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.

Репозиториум на трудови на УКИМ

Files in This Item:

Google ScholarTM

Google Scholar^TM