Vocal Visage: Crafting Lifelike 3D Talking Faces from Static Images and Sound
Keywords:
Eye Blinking, Generative Models, Natural Lip Synchronization, Talking Face AnimationsAbstract
In the field of computer graphics and animation, the challenge of generating lifelike and expressive talking face animations has historically necessitated extensive 3D data and complex facial motion capture systems. However, this project presents an innovative approach to tackle this challenge, with the primary goal of producing realistic 3D motion coefficients for stylized talking face animations driven by a single reference image synchronized with audio input. Leveraging state-of-the-art deep learning techniques, including generative models, image-to-image translation networks, and audio processing methods, the methodology bridges the gap between static images and dynamic, emotionally rich facial animations. The ultimate aim is to synthesize talking face animations that exhibit seamless lip synchronization and natural eye blinking, thereby achieving an exceptional degree of realism and expressiveness, revolutionizing the realm of computer-generated character interactions.
Downloads
References
F. I. Parke," Computer generated animation of faces," in Proc. the ACM Annual Conference, vol. 1, pp. 451-457, 1972.
Zhou, Yang & Xu, Zhan & Landreth, Chris & Kalogerakis, Evangelos & Maji, Subhransu & Singh, Karan. (2018). VisemeNet: Audio-driven animator-centric speech animation. ACM Transactions on Graphics. 37. 1-10. 10.1145/3197517.3201292.
Yu Ping, Heng & Abdullah, Lili & Sulaiman, Puteri & Abdul Halin, Alfian. (2013). Computer Facial Animation: A Review. International Journal of Computer Theory and Engineering. 5. 658-662. 10.7763/IJCTE.2013.V5.770.
Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Perez, Christian ´Richardt,Michael Zollhofer, and Christian Theobalt. Deep ¨video portraits. ACM Transactions on Graphics (TOG), 2018.
Xin Wen, Miao Wang, Christian Richardt, Ze-Yin Chen, and Shi-Min Hu. Photorealistic audio-driven video portraits.IEEE Transactions on Visualization and Computer Graphics,26(12):3457–3466, 2020.
Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, and Matthias Nießner. Neural voice puppetry:Audio-driven facial reenactment. In ECCV, 2020.
N. Otberdout, C. Ferrari, M. Daoudi, S. Berretti and A. Del Bimbo, "Sparse to Dense Dynamic 3D Facial Expression Generation," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 20353-20362, doi: 10.1109/CVPR52688.2022.01974.
M. Cerda, R. Valenzuela, N. Hitschfeld-Kahler, L. D. Terissi and J. C. Gómez, "Generic Face Animation," 2010 XXIX International Conference of the Chilean Computer Science Society, Antofagasta, Chile, 2010, pp. 252-257, doi: 10.1109/SCCC.2010.25.
H. E. Tasli, T. M. den Uyl, H. Boujut and T. Zaharia, "Real time facial character animation," 2015 11th IEEE International Conference and Workshops on Automatic Face
and Gesture Recognition (FG), Ljubljana, Slovenia, 2015, pp. 1-1, doi: 10.1109/FG.2015.7163173.
E. Mendi and C. Bayrak, "Facial animation framework for web and mobile platforms," 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services, Columbia, MO, USA, 2011, pp. 52-55, doi: 10.1109/HEALTH.2011.6026785.