BEIJING, Dec. 21, 2019 /PRNewswire/ -- Magic Data Technology Chinese Mandarin Conversational Speech was selected into LDC Catalog. The catalog ID of this dataset is LDC2019S23 (browse at https://catalog.ldc.upenn.edu/LDC2019S23 for details). At the beginning of this month, LDC published this news to its subscribers through December newsletter.
New trends for conversational datasets
As the leading companies such as Google, Amazon pay more attention to continuous conversation, the importance of conversational datasets increases. Besides, the accuracy of read speech data recognition is up to 97-98%, but in conversational speech recognition, the accuracy is nearly 50% (referred to results of the CHiME-5 Challenge). This large gap indicates the challenge in automatic speech recognition (ASR) extend to new phase.
This is an excellent testing dataset for conversational speech recognition models.
There are three key words for this corpus, diversity, accuracy and variety. Diversity is for data collection, which means these data are collected to cover conversations recorded in different accents and transmission channels, with speakers of different ages and genders and with a background noise corresponding to the scenario. Below are some details:
- Speakers: 60 speakers from different areas in China, with age range from 4 to 67.
- Recording environment: 3 rooms with different reverberation
- Recording equipment: Android device (9 varieties); iOS device (8 varieties); voice recorder (2 varieties)
- Recording channels: single-channel and multi-channel
- The corpus consists of both far-field and near-field voice.
The second key word accuracy is for data annotation. Magic Data Technology has formulated a series of tagging rules to meet actual needs. What does it mean? Spontaneous conversation produces overlapping, pause, cough, and clapping. These sounds are meaningful in some conditions as they may indicate the speaker's state, mood, and even hint at the speaker's mental activities. According to the company's advanced annotation specifications, these sounds could be recognizable by AI systems.
The last key word variety is for data application. This corpus is valuable for at least 3 applications: conversational speech recognition, speaker separation, speaker verification and robustness testing.
- Accuracy testing of various speech recognition models. For example, in a typical family application scenario, the family members using voice interaction include the elderly, the wife (adult female), the husband (adult male), and the children. These family members have different pronunciation patterns and habits. In the speech recognition model, the age diversity of the corpus can be used to test the recognition effect of the model for different age groups.
- Accuracy testing of speaker separation. Scene recognition based on specific speaker has become a research hotspot. In the collection, there are both single-player recording channel and multi-player recording channel. Therefore, this dataset can be used to test the accuracy for speaker separation tasks.
- Accuracy testing of speaker verification. Workers annotated the audio in accordance with the speaker, that is, there is a corresponding speaker for each audio. Since this dataset was recorded by many different types of devices, speech segments recorded by different devices can be used to judge the identity of the speaker in the model, so as to obtain the accuracy of the model in completing the speaker verification task.
- Robustness testing of the model. Since there are far-field and near-field voice recorded at the same time, different audio contains different reverberation and background noise. The corpus was valuable for researchers to test the robustness of their systems.
The accuracy of AI algorithms depends on a large amount of relevant data. Data quality undoubtedly has a decisive influence on its accuracy and practicability. Among them, the variety of data and its relevance to the real business are the two most important factors. The corresponding data collection, integration and application capabilities are the focus of the industry.
Magic Data Tech owns the largest conversational databases in mandarin Chinese. With the human-in-the-loop data processing platform and 300,000+ flexible annotation resources, the company is able to provide high-quality data with accuracy up to 99%. Through its professionality of data processing, the company has been serving top AI companies and Fortune 500 companies and received good reputations.