Lyra-QBH Dataset
1.Dataset Name
Lyra-Query By Humming Dataset(Lyra-QBH Dataset)
2.Description
In Music Information Retrieval, Query by Humming is an effective way to search for music in databases, which uses user-hummed melody as query, and the original song containing the melody as output. For improving the development of QBH technology, Tencent Music Lyra Lab constructs a standard dataset, which is used for QBH algorithm evaluation.
The Lyra-QBH dataset were made by 97 subjects, including 38 males and 59 females, with a total of 1005 recordings. These recordings cover 100 tracks, which have satisfied the QQ music library corresponding open source conditions. The Lyra-QBH dataset is offered free of charge for non-commercial use only.
3. Collection
The dataset is collected by weixin mini programs, and allows subjects from different social circles to participate. Before participating in the data collection, the subject will be clearly informed of the privacy data, recording data, as well as the purpose and use of the data set. During recording, subjects were presented with a list of songs out of which they were asked to select the one or more they knew and sing part of the melody without accompaniment. Besides, subjects are reminded to sing with no lyrics as much as possible. For the same subject, multiple samples without duplication are retained for the same song. Therefore, all recordings are collected through the user's mobile phone device, and the recording duration of each segment ranges from 9s to 10s, with an average duration of 9.98s.
4.Content
a. audio: query_list
size and format: 1005 wav audio files sampled as 8000Hz, 16bit, mono format
Naming rule: song ID_ User ID_ Gender_ Upload times.wav
eg. s007_u000_1_2.wav, which means that the song ID=007, the user ID is 000, and the gender is female (1 female, 2 male). The last number 2 means the second recording for the song ID.
The format of queries is as follows:
query_id | song_id | user_id |
---|---|---|
HummingWav/u020/s000_u020_2_1.wav |
s000 | u020 |
HummingWav/u069/s000_u069_1_1.wav |
s000 | u069 |
HummingWav/u006/s001_u006_2_1.wav |
s001 | u006 |
HummingWav/u020/s001_u020_2_1.wav |
s001 |
u020 |
HummingWav/u021/s001_u021_1_1.wav |
s001 |
u021 |
HummingWav/u033/s001_u033_2_1.wav |
s001 | u033 |
b.midi: midi_list
size and format: 100 midi files, and meta data including song names and singers
Naming Nules: song ID.mid
midi | song_name | singers |
---|---|---|
MidiFile/s000.mid | 别叫我达芬奇 | Lil Ghost小鬼 |
MidiFile/s001.mid | 过火 | 张信哲 |
MidiFile/s002.mid | 千千万万 | 深海鱼子酱 |
MidiFile/s003.mid | 下一个天亮 | 郭静 |
MidiFile/s004.mid | 冰雨 | 刘德华 |
MidiFile/s005.mid | 该死的温柔 | 马天宇 |
MidiFile/s006.mid | 王妃 | 萧敬腾 |
MidiFile/s007.mid | 下雨天 | 南拳妈妈 |
MidiFile/s008.mid | 给我一首歌的时间 | 周杰伦 |
5. License and Copyright
Lyra-QBH follows the license CC BY-NC 4.0, please attach the source link and this notice for non-commercial use.
Lyra-QBH is compiled and authored by Tencent Music Lyra Lab team. Copyright(c) 2023 Tencent Music Entertainment Group.
Lyra-QBH is not commercially available without permission. For commercial use please contact Tencent Music Group.
6. How to Download
You need click application button and fill in information, then agreeing “using items”. We will email you the download link within 3 days.
7.Feedback
If you have any question or feedback about Lyra-QBH, please contact: lyracobar@tencentmusic.com