Echo – Restoring Voices Through AI
Overview
Echo is an innovative AI-powered solution that gives a voice to those who can no longer speak. Using Visual Speech Recognition (VSR), our system tracks mouth movements and converts them into spoken words, allowing users to communicate naturally. Whether through preset voices or a custom-trained voice based on past recordings, Echo helps users reclaim their ability to speak.
How It Works
- Sign Up & Setup – Users create an account with a single click.
- Webcam Input – The system tracks the user's mouth movements in real-time.
- AI Speech Generation – Using MediaPipe, the VSR model interprets the words being mouthed.
- Voice Synthesis – ElevenLabs' generative AI converts text into speech using either a preset or a custom-trained voice.
- Seamless Communication – Users can broadcast their voice over calls, enabling "silent" phone conversations with only a webcam.
Technologies Used
- Frontend: React, Next.js
- Backend: Python, PyTorch, MediaPipe, Visual Speech Recognition (VSR) ML model
- AI & Speech Synthesis: ElevenLabs Generative AI
Use Cases
Echo is designed to assist individuals who are affected by conditions that limit their ability to speak or produce sound. These include:
- Aphasia: A condition affecting speech and language, often caused by a stroke or brain injury.
- Laryngectomy: A surgical procedure that removes the voice box, leading to the loss of natural speech.
- ALS (Amyotrophic Lateral Sclerosis): A progressive neurological disease that can lead to the loss of speech.
- Speech Disabilities: Conditions like cerebral palsy that can cause difficulty in speech articulation.
- Post-Surgery Recovery: For individuals recovering from surgeries that impact speech temporarily or permanently.
- Other Uses: This technology can be leveraged in a variety of situations, including:
- Emergency Response: When a person is unable to make sound but can mouth words, this technology can help transmit a generated voice over a call, potentially saving lives in dangerous situations.
- Remote Communication: In phone calls, virtual interviews, or video conferencing where voice may be compromised or unavailable or the environment is too quiet or noisy, users can rely on this system to participate seamlessly.
- Assisting Deaf or Hard-of-Hearing Individuals: Individuals who have lost the ability to speak can use the system to communicate more effectively in social and professional settings as our application generates a live transcript as people speak.
- Telemedicine & Therapy Sessions: People with speech difficulties can use Echo to communicate more clearly during telemedicine consultations or speech therapy. Echo can also be an alternative to mechanical devices like voiceboxes which can become too expensive and require potentially risky and inaccessible surgeries whereas our application is free to use.
Vision
Echo is designed to break communication barriers for those who have lost their voice, allowing them to express themselves effortlessly and even participate in phone conversations—something that was previously impossible. It also has the potential to be used in various situations, such as emergencies, where someone can use it to call for help when they cannot produce any sound.