Lyra-SA Dataset
1.Dataset Name
Lyra Lab Singing Assessment Dataset (Lyra-SA Dataset)
2.Discription
Automatic Singing Assessment is an important branch of Music Information Retrieval(MIR) and has a very wide range of applications. For example, it has very high research and application value in music education, online Karaoke and offline singing competitions. Automatic singing evaluation helps to avoid personal bias, contributing singing evaluation from a fair and objective perspective. This dataset aims to provide singing voice data and labels in real scenes, helping researchers evaluate and build singing evaluation models.
3.Contents
We provide a singing voice dataset, containing 100 singing voice and correlated MIDI file, lyrics file of 10 songs. The data is collected from Wesing, an online Karaoke app in China. There are no restrictions on the singer's region, age group and gender. Each singer only provides one singing voice sample, that is, there is no situation where the same singer provides multiple samples.
Based on the concept that "the singing evaluations of ordinary listeners are relevant", we invited some ordinary listeners to rate the singing voice and marked the gender and age based on singing timbre. Note that these rough labels are for reference only, and we plan to release more precise labels in the future.
5.Song information
We selected 10 songs with high monthly singing volume from Wesing in June, 2022. We have the right of reproduction and the right of communication of information on networks of these 10 songs. There are 100 complete singing voice samples in each song. The dataset consist of 10*100=1000 songs in total.
When selecting songs, we choose different genres, rhythm, ages and singer gender, to provide more possibilities for the following singing voice research. We also provide MIDI and lyrics resources for each song. MIDI can be used as a reference for the singing evaluation, and the time stamp of the lyrics can help dividing singing voice phrases into segments. The table below shows the song details:
Index
|
Song
|
Singer
|
Gender
|
Released Year
|
BPM
|
---|---|---|---|---|---|
1 | 粉红色的回忆 | 韩宝仪 | F | 1987 | 110 |
2 | 留什么给你 | 孙楠 | M | 1993 | 73 |
3 | 伤心太平洋 | 任贤齐 | M | 1998 | 75 |
4 | 十年 | 陈奕迅 | M | 2003 | 62 |
5 | 太想念 | 彭筝 | F | 2014 | 102 |
6 | 起风了 | 买辣椒也用券 | F | 2017 | 77 |
7 | 嘉宾 | 张远 | M | 2020 | 75 |
8 | 阿拉斯加海湾 | 蓝心羽 | F | 2020 | 58 |
9 | 永不失联的爱 | 单依纯 | F | 2020 | 81 |
10 | 在你的身边 | 盛哲 | M | 2022 | 81 |
6.Recording Conditions
The recording device is the microphone of the IOS or Android phone. To unify the sampling rate and number of channels, the singing voice audio is resampled to 44100Hz, mono, 16-bit.
Audio files are recorded in general recording environments of mobile karaoke users. Some singer wear headphones, while others do not. Singers without headphones may play the accompaniment or the original song. When the mobile phone microphone does not have SAEC, the recorded audio may include the accompaniment and the original song with a weaker loudness, which can affect the quality of the audio file. These are the challenges faced by the singing evaluation in the real singing evaluation scenario.
7. Splitting Dataset
The dataset can be divided into training, validation and test sets. For example, songs 1~6 are training sets, songs 7 and 8 are verification sets, and songs 9 and 10 are test sets.
8.File Naming
└── lyric │ ├── 1.lrc │ ├── 2.lrc └── MIDI │ ├── 1.midi │ ├── 2.midi ├── singing_voice │ ├── 1 │ │ ├──1_001.wav │ │ ├──1_002.wav │ ├── 2 │ │ ├──2_001.wav │ │ ├──2_002.wav └── immature_label.csv
4.Privacy Statements
Each singer has explicitly agreed to the "Instructions for Personal Information Use" and the "Informed Consent Statement" to complete the singing voice authorization.
5. License and Copyright
Lyra-SA follows the license CC BY-NC 4.0, please attach the source link and this notice for non-commercial use.
Lyra-SA is compiled and authored by Tencent Music Lyra Lab team. Copyright(c) 2023 Tencent Music Entertainment Group.
Lyra-SA is not commercially available without permission. For commercial use please contact Tencent Music Group.
10.Download
You need click application button and fill in information, then agreeing “using items”. We will email you the download link within 3 days.
11. Question or Feedback
Update Reminder: On August 14, 2023, 10 MIDI files of songs were updated. The downloaded zip file is renamed as: Lyra_SA_230814.zip.
If you have any questions, please email us:lyracobar@tencentmusic.com