Xiaomi Global had unveiled its latest application of advanced algorithms and self-developed speech technology to the accessibility field. The spontaneous style Text-To-Speech technology, which is developed by Xiaomi AI Lab, is used to generate a unique and customized voice for a user with speech disorders. “Own My Voice”, a project team invited a user with speech disorders as the voice recipient. Zhu Xi, Technology Committee topic convener on Tech for Good, Xiaomi Corporation, said, “We are excited to explore multiple values that technology innovation brings to us, such as responding to users’ demands for self-identity and the construction of identity.”
This enables users to communicate with others by using “his own voice”, instead of a typical monotonous electronic voice. As a part of the “Own My Voice” pre-research project led by the Xiaomi Technical Committee, this successful attempt demonstrates Xiaomi’s commitment to “Tech for Good” and to achieve its mission of “let everyone in the world enjoy a better life through innovative technology”.
How did Xiaomi carry out the project?
In order to generate the most suitable and personalized voice for the recipient, the project team recruited more than 200 volunteers within Xiaomi to donate their voices. They used the voiceprint matching algorithm to match the features of volunteers’ donated voices with those of the recipient’s voice. Through this approach, they found the most suitable voice as the basic sound of voice reference for the recipient. In consideration of personalization and privacy protection, the chosen real voice was manipulated with complex acoustic modification to form a new and original sound of voice.
Next, they used spontaneous style Text-To-Speech technology to train AI models, making this new voice gradually gain a natural rhythm and intonation that can truthfully express the emotion and the tone of a human.
The “Own My Voice” project combines a variety of most advanced algorithms with Xiaomi’s self-developed speech technology to ensure the specificity, safety, and high genuineness of the synthesized voice, creating a new idea on customized speech synthesis for users with speech disorders.
The Project Is Significant
The backbone of this project is a group of speech technology experts from Xiaomi AI Lab. Since 2017, they have published 37 papers on speech in the proceedings of top international conferences, such as the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The success of “Own My Voice” mainly depends on spontaneous style Text-To-Speech technology developed by them.
The spontaneous style Text-To-Speech technology essentially makes the synthesized voice like a real human in its intonation, pause, speed, and other features. This replaces the monotonous and unnatural feeling of the electronic voice with a more natural one. Currently, this technology applies to many smart devices equipped with Xiaoai, the AI voice assistant of Xiaomi. The “Own My Voice” project showcases that spontaneous style Text-To-Speech technology can also be widely adopted in accessibility areas and improve user experience.
“If we notice and address the needs of minority groups at an early stage, the process of technology diffusion could be greatly shortened. This allows the benefits of new technologies to become accessible to users with special needs without delay.”