TY - GEN
T1 - NeuroXVocal
T2 - detection and explanation of alzheimer’s disease through non-invasive analysis of picture-prompted speech
AU - Ntampakis, Nikolaos
AU - Diamantaras, Konstantinos
AU - Chouvarda, Ioanna
AU - Tsolaki, Magda
AU - Sarigianndis, Panagiotis
AU - Argyriou, Vasileios
PY - 2025/9/20
Y1 - 2025/9/20
N2 - The early diagnosis of Alzheimer’s Disease (AD) through non-invasive methods remains a significant healthcare challenge. We present NeuroXVocal, the first end-to-end explainable AD classification system that achieves state-of-the-art performance while providing clinically interpretable explanations. Our novel dual-component architecture consists of: (1) Neuro, a multimodal classifier implementing a unique transformer based fusion strategy that projects acoustic, textual, and speech embeddings into a common dimensional space for complex cross-modal interactions; and (2) XVocal, a specialized RAG-based explainer that retrieves relevant clinical literature to generate evidence-based explanations. Unlike previous approaches using late fusion or simple concatenation, our architecture enables both robust classification and meaningful clinical insights. Using the IS2021 ADReSSo Challenge benchmark dataset, NeuroXVocal achieved 95.77% accuracy, significantly outperforming previous state-of-the-art. Medical professionals validated the clinical relevance of XVocal’s explanations through structured evaluation. This work advances beyond pure classification to bridge the gap between machine learning predictions and clinical decision-making.
AB - The early diagnosis of Alzheimer’s Disease (AD) through non-invasive methods remains a significant healthcare challenge. We present NeuroXVocal, the first end-to-end explainable AD classification system that achieves state-of-the-art performance while providing clinically interpretable explanations. Our novel dual-component architecture consists of: (1) Neuro, a multimodal classifier implementing a unique transformer based fusion strategy that projects acoustic, textual, and speech embeddings into a common dimensional space for complex cross-modal interactions; and (2) XVocal, a specialized RAG-based explainer that retrieves relevant clinical literature to generate evidence-based explanations. Unlike previous approaches using late fusion or simple concatenation, our architecture enables both robust classification and meaningful clinical insights. Using the IS2021 ADReSSo Challenge benchmark dataset, NeuroXVocal achieved 95.77% accuracy, significantly outperforming previous state-of-the-art. Medical professionals validated the clinical relevance of XVocal’s explanations through structured evaluation. This work advances beyond pure classification to bridge the gap between machine learning predictions and clinical decision-making.
U2 - 10.1007/978-3-032-05185-1_40
DO - 10.1007/978-3-032-05185-1_40
M3 - Conference contribution
SN - 9783032051844
VL - 15973
T3 - MICCAI : International Conference on Medical Image Computing and Computer-Assisted Intervention
SP - 410
EP - 419
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2025
A2 - Gee, James C.
A2 - Alexander, Daniel C.
A2 - Hong, Jaesung
A2 - Eugenio Iglesias, Juan
A2 - Sudre, Carole H.
A2 - Venkataraman, Archana
A2 - Golland, Polina
A2 - Hyo Kim, Jong
A2 - Park, Jinah
PB - Springer Cham
ER -