Multi-Head Encoder (MHEnc)-based Fusion Strategies for Land Cover and Land Use Classification Using Sentinel-1 and Sentinel-2
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This study aims to develop a fusion strategy guided by data characteristics to improve LULC classification from multi-source data. The Multi-Head Encoder framework was designed as a backbone, integrating parallel convolutional heads with different dilation rates through learned attention mechanisms to process features across multiple spatial scales from fine local detail to broad landscape context. Building on this framework, fusion strategies were systematically investigated for heterogeneous sensor combinations and homogeneous temporal scenarios, establishing data-driven guidelines. To understand the underlying mechanisms, Grad-CAM analysis was employed, revealing that Early Fusion can lead to optical dominance over SAR information while Middle Fusion preserves modality-specific contributions. These insights guided the development of a hybrid fusion strategy that achieved 96.83% Overall Accuracy on the spatially independent test area. Patch boundary effects were investigated, and a central-area approach was evaluated, improving performance by up to 13% for attention-based SAR architectures. This integrated framework of multi-scale processing, systematic fusion evaluation, and interpretability analysis enables principled, data-driven classification design.