Abstract: Emotion recognition performance of deep learning models is influenced by multiple factors such as acoustic condition, textual content, style of emotion expression (e.g. acted, natural), etc.