这篇文章主要介绍“声纹识别kaldi callhome diarization怎么实现”,在日常操作中,相信很多人在声纹识别kaldi callhome diarization怎么实现问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”声纹识别kaldi callhome diarization怎么实现”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
callhome diarization kaldi 中专门用来进行混合录音文件聚类分别的
学会自己看kaldi中的 指令demo。
个人操作如下:
teps/segmentation/detect_speech_activity.sh --cmd 'run.pl' --nj 1 --mfcc-config ./conf/mfcc_hires.conf --extra-left-context 79 --extra-right-context 21 --extra-left-context-initial 0 --extra-right-context-final 0 --frames-per-chunk 150 data/ljj exp/segmentation_1a/tdnn_stats_asr_sad_1a exp/mfcc_hires exp/segmentation_sad_snr/nnet_tdnn_j_ljj data/ljj steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 1 --cmd "run.pl" --write-utt2num-frames true data/ljj_seg exp/make_mfcc mfcc utils/fix_data_dir.sh data/ljj_seg # 倒谱均值方差归一化(CMVN) local/nnet3/xvector/prepare_feats.sh --nj 1 --cmd "run.pl" data/ljj_seg data/ljj_seg_cmn exp/ljj_seg_cmn cp data/ljj_seg/segments data/ljj_seg_cmn/ utils/fix_data_dir.sh data/ljj_seg_cmn diarization/nnet3/xvector/extract_xvectors.sh --cmd "run.pl" --nj 1 --window 1.5 --period 0.75 --apply-cmn false --min-segment 0.5 exp/xvector_nnet_1a data/ljj_seg_cmn exp/xvectors_ljj_seg diarization/nnet3/xvector/score_plda.sh --cmd "run.pl --mem 4G" --nj 1 --target-energy 0.9 exp/xvector_nnet_1a/xvectors_callhome1 exp/xvectors_ljj_seg exp/xvectors_ljj_seg/plda_scores diarization/cluster.sh --cmd "run.pl --mem 4G" --nj 1 --reco2num-spk data/ljj_seg/reco2num_spk exp/xvectors_ljj_seg/plda_scores exp/xvectors_ljj_seg/plda_scores_num_speakers # 如果知道有多少人说话 则需要生成 --reco2num-spk data/ljj_seg/reco2num_spk diarization/cluster.sh --cmd "run.pl --mem 4G" --nj 1 --threshold 0 exp/xvectors_ljj_seg/plda_scores exp/xvectors_ljj_seg/plda_scores_threshold_0 第二列是文件名,第三列是开始时间,第四列是移动时间 第五列是 从移动时间开始 多少时间算一份 第八列是文件的label 如下是 已知文件有几个人说话的时候 SPEAKER 18642259056-liujinjie.wav 0 0.000 4.510 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 4.530 1.660 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 6.210 4.880 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 11.090 1.660 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 12.800 2.130 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 14.950 4.400 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 19.390 1.810 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 21.220 5.220 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 26.440 4.410 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 30.850 2.480 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 33.340 5.120 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 38.460 5.990 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 44.480 3.910 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 48.460 3.460 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 52.060 5.420 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 57.530 5.030 <NA> <NA> 1 <NA> <NA> 如下是 不知文件有几个人说话的时候 SPEAKER 18642259056-liujinjie.wav 0 0.000 4.510 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 4.530 1.660 <NA> <NA> 3 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 6.210 4.880 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 11.090 1.660 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 12.800 2.130 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 14.950 4.400 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 19.390 1.810 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 21.220 5.220 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 26.440 4.410 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 30.850 2.480 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 33.340 5.120 <NA> <NA> 2 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 38.460 5.990 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 44.480 3.910 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 48.460 3.460 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 52.060 5.420 <NA> <NA> 1 <NA> <NA> SPEAKER 18642259056-liujinjie.wav 0 57.530 5.030 <NA> <NA> 1 <NA> <NA> 接下来就是 用pydub 进行语音片段的拼接了
到此,关于“声纹识别kaldi callhome diarization怎么实现”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。