Segmentation For A Speech : Evaluation Of Speech Activity Detection

Decent Essays
Evaluation of Speech Activity Detection
The TIMES NOW dataset was used to tune parameters for silence as well as music removal. These parameters were used to obtain results on the CNBC AWAAZ dataset as well. The two blocks of silence and music removal are decoupled, the set of system parameters is chosen for the former and the output from this block and is then fed to the music removal. Hence the evaluation of the music removal is done separately. In the first step silence is removed from the whole recording using bootstrapping that is energy based and then iterative classification is done . In the second step, music and other audible non-speech are identified from the recording.

Evaluation on the Times Now dataset of duration
…show more content…
The i-vector systems have become the state-of-the-art technique in the speech verification systems . They provide a unique way of reducing the large-dimensional input data to a small-dimensional feature vector while retaining most of the relevant information

i-vector extraction
In order to obtain relevant speaker features, the vectors are analyzed towards characteristic factors. Thereby, a factor analysis model is iteratively trained. The factor analyzed features are referred to as identity-vectors (i-vectors).To obtain the i-vectors, first a speech Universal Background Model (UBM) is trained on a training data. The UBM is a GMM with large number of Gaussians, so that it captures all possible variabilities in speech in the feature space. In the proposed system the TIMIT and TIFR datasets have been used for the UBM training.
The Total Variability space is a subspace of the GMM superspace. It seizes all the channel and speaker related information. T is called the low rank matrix . For this system, the matrix T is trained using the speaker labelled dataset used for UBM training. The i-vector of the segment is the projection of the GMM supervectors onto the Total Variability subspace. m = M + T x

where M is the UBM supervector, m is the mean-adapted GMM supervector of the segment.

Figure 4: Extraction of i-vectors

Two distance metrics have been tested for measuring similarity between i-vectors -the cosine similarity metric (1.1) and the Mahalanobis
Get Access