Every individual's perception of multimedia content varies based on their interpretation. Therefore, it is quite challenging to predict likability of any multimedia just based on its content. This paper presents a novel system for analysis of facial expressions of subject against the multimedia content to be evaluated. First, we developed a dataset by recording facial expressions of subjects under uncontrolled environment. These subjects are volunteers recruited to watch the videos of different genre, and provide their feedback in terms of likability. Subject responses are divided into three categories: Like, Neutral and Dislike. A novel multimodal system is developed using the developed dataset. The model learns feature representation from data based on the three provided categories. The proposed system contains ensemble of time distributed convolutional neural network, 3D convolutional neural network, and long short term memory networks. All the modalities in proposed architecture are evaluated independently as well as in distinct combinations. The paper also provides detailed insight into learning behavior of the proposed system.
Singh Bawa, Vivek
Sharma, ShailzaUsman, Mohammed
School of Engineering, Computing and Mathematics
Year of publication: 2021Date of RADAR deposit: 2021-11-19
RADAR: Research Archive and Digital Asset RepositoryAbout RADAR