An FPT.AI-Based Text-to-Speech Application's End-to-End Conversion Speed Analysis

Authors

  • M Chaitanya Bharathi Assistant Professor, Department of Information Technology, PACE Institute of Technology and Sciences, Ongole, India Author
  • A Seshagiri Rao Professor, Department of Information Technology, PACE Institute of Technology and Sciences, Ongole, India Author
  • P Ramalingamma Assistant Professor, Department of Information Technology, PACE Institute of Technology and Sciences, Ongole, India Author

Keywords:

FPT.AI, TTS, performance, analysis, Vietnamese, voice

Abstract

In this paper, an FPT.AI-based text-to speech (TTS) application is developed that converts  Vietnamese text into spoken words. the applying is  developed supported Django for Python ANd within the  style of AN interactive website that is connected to an  FPT.AI server through its application programming  interface (API). the applying supports conversion of text  to seven totally different Vietnamese speeches. Four out  of seven voices are often wont to convert up to five  hundred characters in an exceedingly single group action  whereas the others support that of four hundred  characters. supported the results obtained, the primary  conversion time takes up to ten s to convert 400- character text into speech whereas the following times,  given same text, it takes beneath one.8 s for the  conversion. this is often applicable to any or all voices. 

Downloads

Download data is not yet available.

References

B. Liu et al., “Content-Oriented User Modeling for Personalized Response Ranking in Chatbots,” IEEE/ACM Trans. Audio Speech Lang. Process., 2018, doi: 10.1109/TASLP.2017.2763243.

H. Cuayáhuitl et al., “Ensemble-based deep reinforcement learning for chatbots,” Neurocomputing, 2019, doi: 10.1016/j.neucom.2019.08.007.

S. Arsovski, H. Osipyan, M. I. Oladele, and A. D. Cheok, “Automatic knowledge extraction of any Chatbot from

conversation,” Expert Syst. Appl., 2019, doi: 10.1016/j.eswa.2019.07.014.

D. C. Tran, H. S. Ha, and A. Khalyasmaa, “A Question Detection Algorithm for Text Analysis,” in 2020 5th International Conference on Intelligent Information Technology (ICIIT), 2020, pp. 1–6.

F. Eyben et al., “Unsupervised clustering of emotion and voice styles for expressive TTS,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2012, doi: 10.1109/ICASSP.2012.6288797.

W. Ping et al., “Deep Voice 3: 2000-Speaker Neural Text to- Speech,” in Proc. ICLR, 2018.

D. C. Tran and A. K. M. K. A., “Effects of Soft-Masking Function on Spectrogram-based Instrument – Vocal Separation,” in 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING), 2019, pp. 1–5.

J. Shen et al., “Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings,

, doi: 10.1109/ICASSP.2018.8461368.

D. S. Bormane, S. D. Shirbahadurkar, and U. D. Shiurka, “Performance of Marathi language TTS synthesis based on perceptual test and spectrogram analysis,” in 2010 The 2nd International Conference on Computer and Automation Engineering, ICCAE 2010,

, doi: 10.1109/ICCAE.2010.5451850.

H. Tora and B. Uslu, “Naturalness analysis of the speech synthesized by a TTS card,” 2016, doi: 10.1109/siu.2016.7496096.

FPT, “FPT AI,” 2019. [Online]. Available: https://fpt.ai. [12] Django Software Foundation, “Django,” 2019. [Online]. Available: https://www.djangoproject.com/.

Downloads

Published

2022-10-30

How to Cite

An FPT.AI-Based Text-to-Speech Application’s End-to-End Conversion Speed Analysis . (2022). International Journal of Innovative Research in Engineering & Management, 9(5), 227–231. Retrieved from https://acspublisher.com/journals/index.php/ijirem/article/view/10757