MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation

Kaisiyuan Wang*
Linsen Song*
Qianyi Wu*
Zhuoqian Yang

Wayne Wu
Chen Qian
Ran He
Yu Qiao
Chen Change Loy

SenseTime Research
Robotics Institute, Carnegie Mellon University

University of Chinese Academy of Sciences
Nanyang Technological University




The synthesis of natural emotional reactions is an essential criterion in vivid talking-face video generation. This criterion is nevertheless seldom taken into consideration in previous works due to the absence of a large-scale, high-quality emotional audio-visual dataset. To address this issue, we build the Multi-view Emotional Audio-visual Dataset (MEAD), a talking-face video corpus featuring 60 actors and actresses talking with eight different emotions at three different intensity levels. High-quality audio-visual clips are captured at seven different view angles in a strictly-controlled environment. Together with the dataset, we release an emotional talking-face generation baseline that enables the manipulation of both emotion and its intensity. Our dataset could benefit a number of different research fields including conditional generation, cross-modal understanding and expression recognition.




Paper

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation

Kaisiyuan Wang*, Qianyi Wu*, Linsen Song*, Zhuoqian Yang, Wayne Wu, Chen Qian, Ran He, Yu Qiao, Chen Change Loy

European Conference on Computer Vision, ECCV 2020.

[PDF]
[Appendix]
[Bibtex]


Dataset



We build the Multi-view Emotional Audio-visual Dataset (MEAD), a talking-face video corpus featuring 60 actors talking with eight different emotions at three different intensity levels (except for neutral). The videos are simultaneously recorded at seven different perspectives in a strictly-controlled environment to provide high-quality details of facial expressions. About 40 hours of audio-visual clips are recorded for each person and view. Part0 and Part1 consists of 30 actors's data each.

[Download-Part0](code: bhhu)
[Download-Part1](coming soon)


Acknowledgements

This work is supported by the SenseTime-NTU Collaboration Project, Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU SUG, and NTU NAP.