Lyra-CS Dataset
1.Dataset Name
Lyra-CoverSegment Dataset(Lyra-CS Dataset)
2.Dataset Details
For improving the development of music fingerprint, sloving the tough problem of recognizing cover song, Lyra lab public an open source dataset: Lyra-CS, which can be used for experiment like music fingerprint and cover song segment recognization. Lyra-CS is collected from authorized songs of QQ music library covering different language, genre and singer, including original songs and related cover or live version segments of these songs. Lyra-CS is an 400-hour music segment corpus, containing 539203 audio clips. Each segment in Lyra-CS has a different duration less than 15s, with format of wav, 8 kHz, 16bit. Besides, we also provide related song names, singer names for reference. (Lyra-CS is free for academic research, not in the commerce, if without permission.)
3.Dataset Split
We had splited Lyra-CS into training and test sets with an 8:2 ratio:
Dataset | Segment number | duration(hour) |
---|---|---|
Training set | 431376 | 321.7 |
Test set | 107827 | 78.0 |
and related file paths are in train_list and test_list respectively.
4.File Naming Rule
/<CORPUS>/<AnchorSongId>/<Songid_SegmentID>
e.g. Lyra_CS/00001/00001_0.wav. CORPUS denoting dataset name. AnchorSongId means anchor song ID and Songid_SegmentID means segment ID in related anchor group. Each group contains one or more segments from an anchor song or cover songs. Such as 00002_0.wav and 00003_0.wav, mean first cover song segment of 00001_0.wav. Besides, if s or ss is behind Songid, these segments are expended one or two sentences from raw sentences. Such as 00001s_0.wav and 00002s_0.wav, mean expending one sentence from 00001_0.wav and 00002_0.wav, composing a two-sentence pair.
5.Attention
a. Lyra-CS consists of song segments from complete songs. We splited these segments according to timestamp files from technical and manual methods with accuracy of 95% in sample inspection. It’s inevitable existing inconsequential biases.
b. Maybe Lyra-CS exists some same segments from different songs caused by these songs from different album.
6.License and Copyright
Lyra-CS follow the license: CC BY-NC 4.0, Please attach the source link and this notice for non-commercial use.
Lyra-CS is compiled and created by Tencent Music Tianqin lab. Copyright (c) 2023 Tencent Music Entertainment Group.
Lyra-CS is not commercially available without permission. For commercial use please contact Tencent Music Group.
7.How to Download
You need click application button and fill in information, then agreeing “using items”. We will email you the download link within 3 days.
8.Feedback
If you have any question about Lyra-CS, please email to us: lyracobar@tencentmusic.com.