TY - JOUR
T1 - Vision transformer-powered conversational agent for real-time Indian Sign Language e-governance accessibility
AU - Nedungadi, Prema
AU - Dileep, Gautham
AU - Geetha, M.
AU - Raman, Raghu
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - This paper presents a vision transformer-based, context-aware, real-time Indian Sign Language (ISL) conversational agent designed to enhance digital accessibility for India’s Deaf and Hard-of-Hearing community within e-governance services. The system supports continuous sign language recognition, from isolated words to full ISL sentences, and delivers information and services directly in sign language. A domain-specific ISL video dataset, incorporating diverse signing styles and environments, addresses ISL’s low-resource challenges, enabling robust, scalable real-world deployment. The hybrid architecture, combining convolutional neural networks with vision transformer models, effectively handles ISL’s spatial–temporal complexities, maintaining reliability even with complex queries. A dynamic context-response mapping engine uses contextual data to increase accuracy, particularly for ambiguous inputs. The modular design ensures efficient scalability, facilitating seamless integration of new services. System evaluations, including stress testing and usability studies, confirmed its effectiveness in enabling real-time, inclusive digital interactions. Under optimized conditions, the system achieved 97.5% accuracy, a mean response time of 6.53 s, and an average system usability score of 71.83, significantly advancing digital inclusion in India.
AB - This paper presents a vision transformer-based, context-aware, real-time Indian Sign Language (ISL) conversational agent designed to enhance digital accessibility for India’s Deaf and Hard-of-Hearing community within e-governance services. The system supports continuous sign language recognition, from isolated words to full ISL sentences, and delivers information and services directly in sign language. A domain-specific ISL video dataset, incorporating diverse signing styles and environments, addresses ISL’s low-resource challenges, enabling robust, scalable real-world deployment. The hybrid architecture, combining convolutional neural networks with vision transformer models, effectively handles ISL’s spatial–temporal complexities, maintaining reliability even with complex queries. A dynamic context-response mapping engine uses contextual data to increase accuracy, particularly for ambiguous inputs. The modular design ensures efficient scalability, facilitating seamless integration of new services. System evaluations, including stress testing and usability studies, confirmed its effectiveness in enabling real-time, inclusive digital interactions. Under optimized conditions, the system achieved 97.5% accuracy, a mean response time of 6.53 s, and an average system usability score of 71.83, significantly advancing digital inclusion in India.
UR - https://www.scopus.com/pages/publications/105017416875
UR - https://www.scopus.com/pages/publications/105017416875#tab=citedBy
U2 - 10.1038/s41598-025-12667-3
DO - 10.1038/s41598-025-12667-3
M3 - Article
C2 - 41006331
AN - SCOPUS:105017416875
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 33055
ER -