Lyra-QBH Dataset

1.Dataset Name

Lyra-Query By Humming Dataset(Lyra-QBH Dataset)

2.Description

In Music Information Retrieval, Query by Humming is an effective way to search for music in databases, which uses user-hummed melody as query, and the original song containing the melody as output. For improving the development of QBH technology, Tencent Music Lyra Lab constructs a standard dataset, which is used for QBH algorithm evaluation.

The Lyra-QBH dataset were made by 97 subjects, including 38 males and 59 females, with a total of 1005 recordings. These recordings cover 100 tracks, which have satisfied the QQ music library corresponding open source conditions. The Lyra-QBH dataset is offered free of charge for non-commercial use only.

3. Collection

The dataset is collected by weixin mini programs, and allows subjects from different social circles to participate. Before participating in the data collection, the subject will be clearly informed of the privacy data, recording data, as well as the purpose and use of the data set. During recording, subjects were presented with a list of songs out of which they were asked to select the one or more they knew and sing part of the melody without accompaniment. Besides, subjects are reminded to sing with no lyrics as much as possible. For the same subject, multiple samples without duplication are retained for the same song. Therefore, all recordings are collected through the user's mobile phone device, and the recording duration of each segment ranges from 9s to 10s, with an average duration of 9.98s.

4.Content

a. audio: query_list

size and format: 1005 wav audio files sampled as 8000Hz, 16bit, mono format

Naming rule: song ID_ User ID_ Gender_ Upload times.wav

eg. s007_u000_1_2.wav, which means that the song ID=007, the user ID is 000, and the gender is female (1 female, 2 male). The last number 2 means the second recording for the song ID.

The format of queries is as follows:

query_id song_id user_id
HummingWav/u020/s000_u020_2_1.wav
s000 u020
HummingWav/u069/s000_u069_1_1.wav
s000 u069
HummingWav/u006/s001_u006_2_1.wav
s001 u006
HummingWav/u020/s001_u020_2_1.wav
s001
u020
HummingWav/u021/s001_u021_1_1.wav
s001
u021
HummingWav/u033/s001_u033_2_1.wav
s001 u033

b.midi: midi_list

size and format: 100 midi files, and meta data including song names and singers

Naming Nules: song ID.mid

midi song_name singers
MidiFile/s000.mid 别叫我达芬奇 Lil Ghost小鬼
MidiFile/s001.mid 过火 张信哲
MidiFile/s002.mid 千千万万 深海鱼子酱
MidiFile/s003.mid 下一个天亮 郭静
MidiFile/s004.mid 冰雨 刘德华
MidiFile/s005.mid 该死的温柔 马天宇
MidiFile/s006.mid 王妃 萧敬腾
MidiFile/s007.mid 下雨天 南拳妈妈
MidiFile/s008.mid 给我一首歌的时间 周杰伦

5. License and Copyright

Lyra-QBH follows the license CC BY-NC 4.0, please attach the source link and this notice for non-commercial use.

Lyra-QBH is compiled and authored by Tencent Music Lyra Lab team. Copyright(c) 2023 Tencent Music Entertainment Group.

Lyra-QBH is not commercially available without permission. For commercial use please contact Tencent Music Group.

6. How to Download

You need click application button and fill in information, then agreeing “using items”. We will email you the download link within 3 days.

7.Feedback

If you have any question or feedback about Lyra-QBH, please contact: lyracobar@tencentmusic.com