Audio frequency classification by unsupervised learning and dimensionality reduction applied to Multi-Dimensional Scaling (MDS)

Hello World!

My name is Alberto (here Edelberth), I am passionate about sound, officially I am electronics but not long ago I decided to change direction in my professional career.

Here’s a couple of links to some personal projects:


Sound Design:

Soundtrack Movie:

So, last year (a couple of months ago) finished a Master in Business Intelligence and Big Data. One of the many things I learned in the master’s program was that I could relate data that was not related directly to the business, using different data, such as the information that a sound file has, to help the business from another perspective.

Under this premise I could relate two worlds:

Data and audio.

The starting point:

There a professional profile called sound designer. This is a professional, usually independent who is dedicated to the creation of sound effects with the objective of narrating, personifying, generate emotions, portray sound spaces… in short, to create a sound universe with a particular identity within the context in which he is working.

The result of this work is a methodical and handmade work where the volume of sound files generated for the development of a project is usually very large, so its easy to cause a loss of “timbral perspective” in a project because the higher the volume of samples the easier it is for the sounds to end up being too similar.

Therefore, the development of a tool that analyzes the samples (only from the frequency perspective) in which the professional is working and would save a lot of time when making creative decisions, as it would define the frequency predominance of the set of samples and therefore would have a clear vision of what is the timbre character of the group of samples.

The solution:

The solution to achieve this goal is the application of one of the predictive analysis techniques with Machine Learning, specifically the one that refers to (MDS) multidimensional scaling.

The objective is to apply an organizational method used in the field of unsupervised learning. In this method, models are inferred in order to extract knowledge from a set of data (in this case a set of audio samples) by applying a technique called dimensionality reduction, since it is not necessary to know whether or not there is a relationship between the elements of the set and time.

The classification method is performed by (MDS) multidimensional scaling, which allows us to visualize the level of similarity of each individual element belonging to a set. it is one of the forms of nonlinear dimensionality reduction.

This multidimensional scaling technique will be carried out through the previous frequency analysis of each of the elements of the set, obtaining the difference or similarity between the samples of our set at the frequency level.

The MDS algorithm aims to place each object in an N-dimensional space so that the distributions between the objects remain the same and the distances between the objects are maintained in the best possible way.

Allowing us to graph the analyzed elements

Steps followed:

-1 Sample set and analyze their frequency content.

-2 Define a model:

  • 2.1 Create the similarity matrix.
  • 2.2 Calculate the distances between elements.
  • 2.3 Execute a multidimensional scaling.

-3 Evaluate the model:

  • 3.1 Visualization in a coordinate system.

The methodology used is the comparison of a set of samples (.WAV). This consisted of 172 audio samples with the same duration.

To carry out the project, the sound samples were related to the multidimensional scaling example. I used warbleR (library for acoustic signals of animals) to facilitate the analysis of the sound.


## Installing libraries
library(imager) <br>
library(tuneR) <br>
library(knitr) <br>
library(NatureSounds) <br>
library(seewave) <br>
library(warbleR) <br>
library(igraph) <br>
library(ggplot2) <br>
library(ggfortify) <br>

1 Loading audio samples into a table.

## Load audio samples
alles_dir &lt;- "PATH_WHERE_AUDIO_FILES_ARE" <br>
wav_names &lt;- list.files(alles_dir, pattern = "\\.wav$") <br>
sound_design &lt;- selection_table(whole.recs = TRUE, path = alles_dir, extended = TRUE) <br>

Because the table is larger than ~58MB we will get a console message asking us if we want to continue.

Say “y”.

&gt; sound_design &lt;- selection_table(whole.recs = TRUE, path = alles_dir, extended = TRUE)<br>
checking selections (step 1 of 2):<br>
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s  <br>
all selections are OK <br>
Expected 'extended_selection_table' size is ~58MB (~0.05669 GB) <br>
 Do you want to proceed (y/n): <br>

Samples will be loaded into the table:

saving wave objects into extended selection table (step 2 of 2):
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s

2 Defining the model:

In signal processing, cross-correlation (or sometimes called “cross-covariance”) is a measure of the similarity between two signals as a function of time, often used to find relevant features in an unknown signal by comparing it with another known signal. In this case, the aim is to compare the whole set with itself.

The mel frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transformation of a logarithmic power spectrum on a nonlinear mel frequency scale.

  • 2.1 Creating the similarity matrix (xcor).
xcor &lt;- xcorr(sound_design, bp = c(0, 20), wl = 512, ovlp = 99, path = alles_dir, <br>
&nbsp;&nbsp;&nbsp;&nbsp; type = "mfcc", method= 1, na.rm = TRUE,                  <br>
&nbsp;&nbsp;&nbsp;&nbsp; parallel = 4                                             <br>

The matrices are created (keeping them internally as a list) and then the cross-correlation is calculated in the second step.

Note: parallel = 4 refers to the number of processor cores, so it is a parameter that can be modified in case we have more cores in the processor.

creating MFCC matrices (step 1 of 2):
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=11s
running cross-correlation (step 2 of 2):
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04m 18s
  • 2.2 Calculating the distances between the elements.

A distance measure specified for calculating distances between rows in a data matrix.

distancia &lt;- dist(xcor, method = "euclidean")

2.3 Performing the multidimensional scaling (MDS) of the data matrix, also known as principal coordinate analysis

valores &lt;- cmdscale(distancia, eig = T)

3 Evaluating the model:

Visualization in a coordinate system, making use of autoplot as it uses ggplot2 to represent a particular graph of an object of a particular class in a single command.

autoplot(cmdscale(distancia, eig = T), label = TRUE, label.size = 3, frame = TRUE)

8bits_prefabs_explosions|Again, I hope you found it interesting. Best regards.106x110, 100%

8bits Colection

Again, I hope you found it interesting.

It’s a pleasure to be here, if you are also interested in sound and data…

See you here…