An FPT.AI-Based Text-to-Speech Application's End-to-End Conversion Speed Analysis
Keywords:
FPT.AI, TTS, performance, analysis, Vietnamese, voiceAbstract
In this paper, an FPT.AI-based text-to speech (TTS) application is developed that converts Vietnamese text into spoken words. the applying is developed supported Django for Python ANd within the style of AN interactive website that is connected to an FPT.AI server through its application programming interface (API). the applying supports conversion of text to seven totally different Vietnamese speeches. Four out of seven voices are often wont to convert up to five hundred characters in an exceedingly single group action whereas the others support that of four hundred characters. supported the results obtained, the primary conversion time takes up to ten s to convert 400- character text into speech whereas the following times, given same text, it takes beneath one.8 s for the conversion. this is often applicable to any or all voices.
Downloads
References
B. Liu et al., “Content-Oriented User Modeling for Personalized Response Ranking in Chatbots,” IEEE/ACM Trans. Audio Speech Lang. Process., 2018, doi: 10.1109/TASLP.2017.2763243.
H. Cuayáhuitl et al., “Ensemble-based deep reinforcement learning for chatbots,” Neurocomputing, 2019, doi: 10.1016/j.neucom.2019.08.007.
S. Arsovski, H. Osipyan, M. I. Oladele, and A. D. Cheok, “Automatic knowledge extraction of any Chatbot from
conversation,” Expert Syst. Appl., 2019, doi: 10.1016/j.eswa.2019.07.014.
D. C. Tran, H. S. Ha, and A. Khalyasmaa, “A Question Detection Algorithm for Text Analysis,” in 2020 5th International Conference on Intelligent Information Technology (ICIIT), 2020, pp. 1–6.
F. Eyben et al., “Unsupervised clustering of emotion and voice styles for expressive TTS,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2012, doi: 10.1109/ICASSP.2012.6288797.
W. Ping et al., “Deep Voice 3: 2000-Speaker Neural Text to- Speech,” in Proc. ICLR, 2018.
D. C. Tran and A. K. M. K. A., “Effects of Soft-Masking Function on Spectrogram-based Instrument – Vocal Separation,” in 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING), 2019, pp. 1–5.
J. Shen et al., “Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings,
, doi: 10.1109/ICASSP.2018.8461368.
D. S. Bormane, S. D. Shirbahadurkar, and U. D. Shiurka, “Performance of Marathi language TTS synthesis based on perceptual test and spectrogram analysis,” in 2010 The 2nd International Conference on Computer and Automation Engineering, ICCAE 2010,
, doi: 10.1109/ICCAE.2010.5451850.
H. Tora and B. Uslu, “Naturalness analysis of the speech synthesized by a TTS card,” 2016, doi: 10.1109/siu.2016.7496096.
FPT, “FPT AI,” 2019. [Online]. Available: https://fpt.ai. [12] Django Software Foundation, “Django,” 2019. [Online]. Available: https://www.djangoproject.com/.