TY - GEN
T1 - Enhancing sign language communication
T2 - advanced gesture recognition models for Indian Sign Language
AU - Kumar, Sujay Grama Suresh
AU - Abbass, Jad
PY - 2025/6/13
Y1 - 2025/6/13
N2 - Globally, millions of individuals experience varying degrees of hearing impairment, creating an urgent demand for effective communication solutions. The limited number of proficient sign language users exacerbates this challenge. Recent advancements in machine learning provide promising avenues to address this issue. This study introduces an innovative automated system that translates one of the most popular sign languages, namely, Indian Sign Language (ISL), into English text using a webcam. Our comprehensive dataset includes ~1M images across 36 categories, covering digits (0-9) and alphabet letters (A-Z). The dataset features diverse gestures captured from various angles and performed by 6 individuals with different characteristics followed by data augmentation. We evaluated the effectiveness of 5 models, 3 standards and 2 customized respectively: (1) MobileNetV2, a pre-trained convolutional neural network (CNN) optimized for mobile applications, (2) VGG16, a well-established pre-trained deep learning model, (3) the standard CNN, (4) a custom-designed CNN tailored for ISL recognition, trained on 32x32 images for 20 epochs, (5) Customized MobileNetV2 for ISL recognition retrained on 128x128 images for 20 epochs. Both customized models achieved an F1-Score of 94 whilst standard ones achieved an F1-Score of no more than 85. The comprehensive comparison underscores the enhanced accuracy and efficiency of our custom models, establishing it as a significant advancement in sign language recognition.
AB - Globally, millions of individuals experience varying degrees of hearing impairment, creating an urgent demand for effective communication solutions. The limited number of proficient sign language users exacerbates this challenge. Recent advancements in machine learning provide promising avenues to address this issue. This study introduces an innovative automated system that translates one of the most popular sign languages, namely, Indian Sign Language (ISL), into English text using a webcam. Our comprehensive dataset includes ~1M images across 36 categories, covering digits (0-9) and alphabet letters (A-Z). The dataset features diverse gestures captured from various angles and performed by 6 individuals with different characteristics followed by data augmentation. We evaluated the effectiveness of 5 models, 3 standards and 2 customized respectively: (1) MobileNetV2, a pre-trained convolutional neural network (CNN) optimized for mobile applications, (2) VGG16, a well-established pre-trained deep learning model, (3) the standard CNN, (4) a custom-designed CNN tailored for ISL recognition, trained on 32x32 images for 20 epochs, (5) Customized MobileNetV2 for ISL recognition retrained on 128x128 images for 20 epochs. Both customized models achieved an F1-Score of 94 whilst standard ones achieved an F1-Score of no more than 85. The comprehensive comparison underscores the enhanced accuracy and efficiency of our custom models, establishing it as a significant advancement in sign language recognition.
U2 - 10.1109/SCSE65633.2025.11031017
DO - 10.1109/SCSE65633.2025.11031017
M3 - Conference contribution
SN - 9798331523275
T3 - International Research Conference on Smart Computing and Systems Engineering
BT - 2025 International Research Conference on Smart Computing and Systems Engineering
PB - IEEE
CY - Piscataway, U.S
ER -