Jose A. Gonzalez
Direct Speech Reconstruction from Articulatory Sensor Data by Machine Learning
Gonzalez, Jose A.; Cheah, Lam A.; Gomez, Angel M.; Green, Phil D.; Gilbert, James M.; Ell, Stephen R.; Moore, Roger K.; Holdsworth, Ed
Authors
Lam A. Cheah
Angel M. Gomez
Phil D. Green
Professor James Gilbert J.M.Gilbert@hull.ac.uk
Professor of Engineering
Stephen R. Ell
Roger K. Moore
Ed Holdsworth
Contributors
Professor James Gilbert J.M.Gilbert@hull.ac.uk
Other
Abstract
© 2014 IEEE. This paper describes a technique that generates speech acoustics from articulator movements. Our motivation is to help people who can no longer speak following laryngectomy, a procedure that is carried out tens of thousands of times per year in the Western world. Our method for sensing articulator movement, permanent magnetic articulography, relies on small, unobtrusive magnets attached to the lips and tongue. Changes in magnetic field caused by magnet movements are sensed and form the input to a process that is trained to estimate speech acoustics. In the experiments reported here this 'Direct Synthesis' technique is developed for normal speakers, with glued-on magnets, allowing us to train with parallel sensor and acoustic data. We describe three machine learning techniques for this task, based on Gaussian mixture models, deep neural networks, and recurrent neural networks (RNNs). We evaluate our techniques with objective acoustic distortion measures and subjective listening tests over spoken sentences read from novels (the CMU Arctic corpus). Our results show that the best performing technique is a bidirectional RNN (BiRNN), which employs both past and future contexts to predict the acoustics from the sensor data. BiRNNs are not suitable for synthesis in real time but fixed-lag RNNs give similar results and, because they only look a little way into the future, overcome this problem. Listening tests show that the speech produced by this method has a natural quality that preserves the identity of the speaker. Furthermore, we obtain up to 92% intelligibility on the challenging CMU Arctic material. To our knowledge, these are the best results obtained for a silent-speech system without a restricted vocabulary and with an unobtrusive device that delivers audio in close to real time. This work promises to lead to a technology that truly will give people whose larynx has been removed their voices back.
Citation
Gonzalez, J. A., Cheah, L. A., Gomez, A. M., Green, P. D., Gilbert, J. M., Ell, S. R., Moore, R. K., & Holdsworth, E. (2017). Direct Speech Reconstruction from Articulatory Sensor Data by Machine Learning. IEEE/ACM transactions on audio, speech, and language processing, 25(12), 2362-2374. https://doi.org/10.1109/TASLP.2017.2757263
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 15, 2017 |
Online Publication Date | Nov 23, 2017 |
Publication Date | Dec 1, 2017 |
Deposit Date | Dec 11, 2017 |
Publicly Available Date | Dec 11, 2017 |
Journal | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Print ISSN | 2329-9290 |
Electronic ISSN | 2329-9304 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 25 |
Issue | 12 |
Pages | 2362-2374 |
DOI | https://doi.org/10.1109/TASLP.2017.2757263 |
Keywords | Speech and Hearing; Media Technology; Linguistics and Language; Signal Processing; Acoustics and Ultrasonics; Instrumentation; Electrical and Electronic Engineering |
Public URL | https://hull-repository.worktribe.com/output/499609 |
Publisher URL | http://ieeexplore.ieee.org/document/8114382/ |
Contract Date | Dec 11, 2017 |
Files
Article
(2.8 Mb)
PDF
Copyright Statement
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
A silent speech system based on permanent magnet articulography and direct synthesis
(2016)
Journal Article
Voice restoration after laryngectomy based on magnetic sensing of articulator movement and statistical articulation-to-speech conversion
(-0001)
Presentation / Conference Contribution
Towards an intraoral-based silent speech restoration system for post-laryngectomy voice replacement
(-0001)
Presentation / Conference Contribution
Monitoring of Curing Process of Epoxy Resin by Long-Period Fiber Gratings
(2024)
Journal Article
Downloadable Citations
About Repository@Hull
Administrator e-mail: repository@hull.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search