Speech editing is nothing more than moving about some arrays of numbers. Enhancement filters can be used to remove both natural and intentional noise, to a reasonable extent. And pitch and formant analysis can be used to give a general idea of whether two speakers are the same person or not. There are also other factors, beyond speaker variability, that present a challenge to speaker recognition technology. Examples of these are acoustical noise and variations in recording environments (e.g. speaker uses different telephone handsets).The defect, however, are obvious in the waveform comparison. While these approaches can be used to give a rough estimate or to aid in human decisions about whether two voices are the same, computer programs like these are simply not advanced enough to be completely automated and foolproof. In other words, this is not a black box where you do not have to know anything about how the program works and just expect an accurate answer based on a certain set of inputs. Other things that we would like to explore in the subject include Delta-Cepstrum coefficients and perceptual linear predictive coefficients in order to see how much they could help with or replace pitch and formant analysis. Maybe a combination of all four would give a much higher confirmation percentage.
Author(s): Sumanta Karmakar, Pratik Dey
Publisher: Technical and Scientific Publisher
Year: 0
Language: English
Pages: 11
Tags: Speaker Recognition