EI / SCOPUS / CSCD 收录

中文核心期刊

CHEN Yifan, CHENG Gaofeng, XU Ji, YAN Yonghong. Adversial training-based imbalanced speaker diarization system using short-phrase priorJ. ACTA ACUSTICA, 2026, 51(3): 992-1004. DOI: 10.12395/0371-0025.2024050
Citation: CHEN Yifan, CHENG Gaofeng, XU Ji, YAN Yonghong. Adversial training-based imbalanced speaker diarization system using short-phrase priorJ. ACTA ACUSTICA, 2026, 51(3): 992-1004. DOI: 10.12395/0371-0025.2024050

Adversial training-based imbalanced speaker diarization system using short-phrase prior

  • To address the issue that the performance of recent speaker diarization systems degrades when speaker durations are imbalanced, a speaker diarization system is designed using adversarial learning and short-phrase prior. In the speaker data aspect, under the short-phrase prior, the proposed method applies imbalanced data sampling to speakers with different durations, minimizing the speech duration gap among different speakers. For speaker representation extraction and clustering, a training scheme is designed to enhance the separability of clusters after imbalanced data sampling and to maintain similarity in cluster distribution compared to balanced data. To avoid data sparsity problem, adversarial learning is utilized to transfer the optimization process to a lower-dimensional embedding space. During the inference, the proposed method constrains the consistency of clustering results from replicas augmented in different acoustic environments. Compared to existing methods, the proposed approach achieves a DER reduction of 6.15% and 4.27% on imbalanced duration subsets of the VoxConverse dataset and the AISHELL-4 dataset, respectively (The relative reduction is 22.2% and 21.7% correspondingly). The result indicates that the proposed method is a practical approach for mitigating speaker diarization system performance degradation in scenarios with imbalanced speaker durations.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return