A survey of music emotion recognition

doi:10.1007/s11704-021-0569-4

Frontiers of Computer Science

2022, Vol. 16

Issue (6): 166335 https://doi.org/10.1007/s11704-021-0569-4

本期目录

A survey of music emotion recognition

Donghong HAN¹(

), Yanru KONG¹, Jiayi HAN², Guoren WANG³

¹. School of Computer Science and Engineering, Northeastern University, Shenyang 110000, China
². Institute of Science and Technology Brain-Inspired Intelligence, Fudan University, Shanghai 200082, China
³. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100089, China

全文: PDF(888 KB) HTML

Abstract：

Music is the language of emotions. In recent years, music emotion recognition has attracted widespread attention in the academic and industrial community since it can be widely used in fields like recommendation systems, automatic music composing, psychotherapy, music visualization, and so on. Especially with the rapid development of artificial intelligence, deep learning-based music emotion recognition is gradually becoming mainstream. This paper gives a detailed survey of music emotion recognition. Starting with some preliminary knowledge of music emotion recognition, this paper first introduces some commonly used evaluation metrics. Then a three-part research framework is put forward. Based on this three-part research framework, the knowledge and algorithms involved in each part are introduced with detailed analysis, including some commonly used datasets, emotion models, feature extraction, and emotion recognition algorithms. After that, the challenging problems and development trends of music emotion recognition technology are proposed, and finally, the whole paper is summarized.

Key words： artificial intelligence deep learning music emotion recognition

收稿日期: 2020-11-29 出版日期: 2022-01-12

Corresponding Author(s): Donghong HAN

引用本文:

. [J]. Frontiers of Computer Science, 2022, 16(6): 166335.
Donghong HAN, Yanru KONG, Jiayi HAN, Guoren WANG. A survey of music emotion recognition. Front. Comput. Sci., 2022, 16(6): 166335.

链接本文:

https://academic.hep.com.cn/fcs/CN/10.1007/s11704-021-0569-4
https://academic.hep.com.cn/fcs/CN/Y2022/V16/I6/166335

Tab.1

Fig.1

Tab.2

Tab.3

Tab.4

Tab.5

Tab.6

Tab.7

Tab.8

Tab.9

Tab.10

Tab.11

Tab.12

Tab.13

1	X Y Yang , Y Z Dong , J Li . Review of data features-based music emotion recognition methods. Multimedia System, 2018, 24( 4): 365– 389
2	Z Y Cheng, J L Shen, L Zhu, M Kankanhalli, L Q Nie. Exploiting music play sequence for music recommendation. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 3654−3660
3	Z Y Cheng, J L Shen, L Q Nie, T S Chua, M Kankanhalli. Exploring user-specific information in music retrieval. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, 655– 664
4	Y E Kim, E M Schmidt, R Migneco, B G Morton, P Richardson, J Scott, J A Speck, D Turnbull. Music emotion recognition: a state of the art review. In: Proceedings of the 11th International Society for Music Information Retrieval Conference. 2010, 255– 266
5	Yang Y H, Chen H H. Machine recognition of music emotion: a review. ACM Transactions on Intelligent Systems and Technology. 2011, 3(3): 1−30
6	M Bartoszewski, H Kwasnicka, M U Kaczmar, P B Myszkowski. Extraction of emotional content from music data. In: Proceedings of the 7th International Conference on Computer Information Systems and Industrial Management Applications. 2008, 293– 299
7	K Hevner . Experimental studies of the elements of expression in music. The American Journal of Psychology, 1936, 48( 2): 246– 268
8	J A Russell . A circumplex model of affect. Journal of Personality and Social Psychology, 1980, 39( 6): 1161– 1178
9	J Posner , J A Russell , B S Peterson . The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychology. Development and Psychopathology, 2005, 17( 3): 715– 734
10	Chekowska-Zacharewicz M, Janowski M. Polish adaptation of the geneva emotional music scale (GEMS): factor structure and reliability. Psychology of Music, 2020, 57(6): 427−438
11	R Thayer. The Biopsychology of Mood and Arousal. 1st ed. Oxford: Oxford University Press, 1989
12	F Weninger, F Eyben, B W Schuller. On-line continuous-time music mood regression with deep recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 2014, 5412−5416
13	Y H Yang , Y C Lin , Y F Su , H H Chen . A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16( 2): 448– 457
14	X X Li, H S Xianyu, J S Tian, W X Chen, F H Meng, M X Xu, L H Cai. A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal. 2016, 544– 548
15	J Y Fan, K Tatar, M Thorogood, P Pasquier. Ranking-based emotion recognition for experimental music. In: Proceedings of the 18th International Society for Music Information Retrieval Conference. 2017, 368– 375
16	N Thammasan, K I Fukui, M Numao. Multimodal fusion of EEG and musical features in music-emotion recognition. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4991−4992
17	Y H Yang , H H Chen . Prediction of the distribution of perceived music emotions using discrete samples. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19( 7): 2184– 2196
18	H P Liu, Y Fang, Q H Huang. Music emotion recognition using a variant of recurrent neural network. In: Proceedings of the International Conference on Matheatics, Modeling, Simulation and Statistics Application. 2018, 15− 18
19	M Soleymani, M N Caro, E M Schmidt, C Y Sha, Y H Yang. 1000 songs for emotional analysis of music. In: Proceedings of the 2nd ACM International Workshop on Crowdsourcing for Multimedia. 2013, 1– 6
20	D Turnbull, L Barrington, D Torres, G Lanckriet. Towards musical query-by-semantic-description using the CAL500 data set. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 439– 446
21	S Y Wang, J C Wang, Y H Yang, H M Wang. Towards time-varying music auto-tagging on CAL500 expansion. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2014, 1– 6
22	Y A Chen, Y H Yang, J C Wang, H Chen. The AMG1608 dataset for music emotion recognition. In: Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing. 2015, 693– 697
23	A Aljanaki , Y H Yang , M Soleymani . Developing a benchmark for emotional analysis of music. PLoS ONE, 2017, 12( 3): e0173392–
24	J A Speck, E M Schmidt, B G Morton, Y E Kim. A comparative study of collaborative vs. traditional musical mood annotation. In: Proceedings of the 12th International Society for Music Informational Retrieval Conference. 2011, 549– 554
25	T Eerola , J K Vuoskoski . A comparison of the discrete and dimensional models of emotion in music. Psychology Music, 2011, 39( 1): 18– 49
26	M Zentner , D Grandjean , K R Scherer . Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion, 2008, 8( 4): 494– 521
27	T B Mahieux, D P W Ellis, B Whitman, P Lamere. The million songs dataset. In: Proceedings of the 12th International Society for Music Information Retrieval Conference. 2011, 591– 596
28	G Tzanetakis , P Cook . MARSYAS: a framework for audio analysis. Organised Sound, 2000, 4( 3): 169– 175
29	B Mathieu, S Essid, T Fillon, J Prado, G Richard. YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th International Society for Music Information Retrieval Conference. 2010, 441– 446
30	O Lartillot, P Toiviainen. MIR in MATLAB (Ⅱ)A toolbox for musical feature extraction from audio. In: Proceedings of the 8th International Conference on Music Information Retrieval. 2007, 127– 130
31	D McEnnis, C Mckay, I Fujinaga, P Depalle. jAudio: a feature extraction library. In: Proceedings of the 6th International Conference on Music Information Retrieval. 2005, 600– 603
32	X Liu, Q C Chen, X P Wu, Y Liu, Y Liu. CNN based music emotion classification. 2017, arXiv preprint arXiv: 1704.5665
33	W J Han , H F Li , H B Ruan , Lin Ma . Review on speech emotion recognition (In Chinese). Journal of Software, 2014, 25( 1): 37– 50
34	M Barthet, G Fazekas, M Sandler. Multidisciplinary perspectives on music emotion recognition: implications for content and context-based model. In: Proceedings of the 9th International Symposium on Computer Music Modelling and Retrieval. 2012, 492– 507
35	P L Chen, L Zhao, Z Y Xin, Y M Qiang, M Zhang, T M Li. A scheme of MIDI music emotion classification based on fuzzy theme extraction and neural network. In: Proceedings of the 12th International Conference on Computational Intelligence and Security. 2016, 323– 326
36	P N Juslin , P Laukka . Expression, perception, and induction of musical emotions: a review and a questionnaire study of everyday listening. Journal of New Music Research, 2004, 33( 3): 217– 238
37	D Yang, W S Lee. Disambiguating music emotion using software agents. In: Proceedings of the 5th International Conference on Music Information Retrieval. 2004, 218– 223
38	H He, J M Jin, Y H Xiong, B Chen, L Zhao. Language feature mining for music emotion classification via supervised learning from lyrics. In: Proceedings of International Symposium on Intelligence Computation and Applications. 2008, 426– 435
39	X Hu, J S Downie, A F Ehmann. Lyric text mining in music mood classification. In: Proceedings of the 10th International Society for Music Information Retrieval Conference. 2009, 411– 416
40	M V Zaanen, P Kanters. Automatic mood classification using TF*IDF based on lyrics. In: Proceedings of the 11th International Society for Music Information Retrieval Conference. 2010, 75– 80
41	X Wang, X O Chen, D S Yang, Y Q Wu. Music emotion classification of Chinese songs based on lyrics using TF*IDF and rhyme. In: Proceedings of the 12th International Society for Music Information Retrieval Conference. 2011, 765– 770
42	R Malheiro , R Panda , P Gomes , R P Paiva . Emotionally-relevant features for classification and regression of music lyrics. IEEE Transactions on Affective Computing, 2018, 9( 2): 240– 254
43	Y J Hu, X O Chen, D S Yang. Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In: Proceedings of the 10th International Society for Music Information Retrieval Conference. 2009, 123– 128
44	D Yang, W S Lee. Music emotion identification from lyrics. In: Proceedings of the 11th IEEE International Symposium on Multimedia. 2009, 624– 629
45	K Dakshina , R Sridhar . LDA based emotion recognition from lyrics. Advanced Computing, Networking and Informatics, 2014, 27( 1): 187– 194
46	N Thammasan, K I Fukui, M Numao. Application of deep belief networks in EEG-based dynamic music-emotion recognition. In: Proceedings of the 2016 International Joint Conference on Neural Networks. 2016, 881– 888
47	X Hu, F J Li, D T J Ng. On the relationships between music-induced emotion and physiological signals. In: Proceedings of the 19th International Society for Music Information Retrieval Conference. 2018, 362– 369
48	N E Nawa, D E Callan, P Mokhtari, H Ando, J Iversen. Decoding music-induced experienced emotions using functional magnetic resonance imaging- Preliminary result. In: Proceedings of the 2018 International Joint Conference on Neural Networks. 2018, 1– 7
49	T Li, M Ogihara. Detecting emotion in music. In: Proceedings of the 4th International Conference on Music Information Retrieval. 2003, 239– 240
50	C Laurier, J Grivolla, P Herrera. Multimodal music mood classification using audio and lyrics. In: Proceedings of the 7th International Conference on Machine Learning and Applications. 2008, 688– 693
51	Y H Yang, Y C Lin, H T Cheng, I B Liao, Y C Ho, H Chen. Toward multi-modal music emotion classification. In: Proceedings of the 9th Pacific Rim Conference on Multimedia. 2008, 70– 79
52	Y Liu , Y Liu , Y Zhao , K A Hua . What strikes the strings of your heart? – feature mining for music emotion analysis.. IEEE Transactions on Affective Computing, 2015, 6( 3): 247– 260
53	J C Wang, Y H Yang, H M Wang, S K Jeng. The acoustic emotion gaussians model for emotion-based music annotation and retrieval. In: Proceedings of the 20th ACM Multimedia Conference. 2012, 89– 98
54	Y A Chen , J C Wang , Y H Yang , H Chen . Component tying for mixture model adaptation in personalization of music emotion recognition. IEEE ACM Transactions on Audio, Speech and Language Processing, 2017, 25( 7): 1409– 1420
55	Y A Chen, J C Wang, Y H Yang, H Chen. Linear regression-based adaptation of music emotion recognition models for personalization. In: Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing. 2014, 2149−2153
56	S Fukayama, M Goto. Music emotion recognition with adaptive aggregation of Gaussian process regressors. In: Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing. 2016, 71– 75
57	M Soleymani, A Aljanaki, Y H Yang, M N Caro, F Eyben, K Markov, B Schuller, R C Veltkamp, F Weninger, F Wiering. Emotional analysis of music: a comparison of methods. In: Proceedings of the ACM International Conference on Multimedia. 2014, 1161−1164
58	L Lu , D Liu , H J Zhang . Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14( 1): 5– 18
59	E M Schmidt, D Turnbull, Y E Kim. Feature selection for content-based, time-varying musical emotion regression. In: Proceedings of the 11th ACM SIGMM International Conference on Multimedia Information Retrieval. 2010, 267– 274
60	H S Xianyu, X X Li, W S Chen, F H Meng, J S Tian, M X Xu, L H Cai. SVR based double-scale regression for dynamic emotion prediction in music. In: Proceedings of the 2016 IEEE International Conference on Acoustic, Speech and Signal Processing. 2016, 549– 553
61	M Y Huang, W G Rong, T Arjannikov, J Nan, Z Xiong. Bi-modal deep Boltzmann machine based musical emotion classification. In: Proceedings of the 25th International Conference on Artificial Neural Network. 2016, 199– 207
62	P Keelawat, N Thammasan, B Kijsirikul, M Numao. Subject-independent emotion recognition during music listening based on EEG using deep convolutional neural networks. In: Proceedings of the 2019 the 15th IEEE International Colloquium on Signal Processing & Its Application. 2019, 21– 26
63	R Sarkar , S Choudhury , S Dutta , A Roy , S K Saha . Recognition of emotion in music based on deep convolutional neural network. Multimedia Tools and Application, 2020, 79( 9): 765– 783
64	P T Yang, S M Kuang, C C Wu, J L Hsu. Predicting music emotion by using convolutional neural network. In: Proceedings of the 22nd HCI International Conference. 2020, 266– 275
65	Y Ma, X X Li, M X Xu, J Jia, L H Cai. Multi-scale context based attention for dynamic music emotion prediction. In: Proceedings of the 25th ACM International Conference on Multimedia Conference. 2017, 1443−1450
66	W H Chang, J L Li, Y S Lin, C C Lee. A genre-affect relationship network with task-specific uncertainty weighting for recognizing induced emotion in music. In: Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. 2018, 1– 8
67	R Delbouys, R Hennequin, F Piccoli, J R Letelier, M Moussallam. Music mood detection based on audio and lyrics with deep neural net. In: Proceedings of the 19th International Society for Music Information Retrieval Conference. 2018, 370– 375
68	Y Z Dong , X Y Yang , X Zhao , J Li . Bidirectional convolutional recurrent sparse network (BCRSN): an efficient model for music emotion recognition. IEEE Transactions on Multimedia, 2019, 21( 12): 3150– 3163
69	S Chowdhury, A Vall, V Haunscmid, G Widmer. Towards explainable music emotion recognition: the route via mid-level features. In: Proceedings of the 20th International Society for Music Information Retrieval Conference. 2019, 237– 243
70	X X Li, J S Tian, M X Xu, Y S Ning, L H Cai. DBLSTM-based multi-scale fusion for dynamic emotion prediction in music. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2016, 1– 6
71	S Chaki, P Doshi, P Patnaik, S Bhattacharya. Attentive RNNs for continuous-time emotion prediction in music clips. In: Proceedings of the 3rd Workshop in Affective Content Analysis co-located with 34th AAAI Conference on Artificial Intelligence. 2020, 36– 45
72	R Panda , R Malheiro , R P Paiva . Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing, 2020, 11( 4): 614– 626
73	S G Deng , D J Wang , X T Li , G D Xu . Exploring user emotion in microblogs for music recommendation. Expert System with Applications, 2015, 42( 1): 9284– 9293
74	L N Ferreira, J Whitehead. Learning to generate music with sentiment. In: Proceedings of the 20th International Society for Music Information Retrieval Conference. 2019, 384– 390

[1]

Highlights

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed