how to write yaml files used by the select module

As described in the introduction, a crucial step in using hybrid vocal classifier is selecting which models to use. This can be done in an automated way using the select module. Like the extract and predict modules, the select module works by parsing configuration files. Below the steps are outlined in writing the configuration files in yaml format.

what the select module gets out of the config file: models and data

There are two required elements in a select config file, that
correspond to the two main things that the select module needs to know:
  1. models: what models to test. A Python list of dictionaries, as described below.
  2. todo_list: where the data is to train and test those models. Another Python list of dictionaries, also described below.

The parser that parses the select config file is written so that you don’t have to repeat yourself. You can put one models list at the top of the file, and then for each dataset in the todo_list, the select module will train and test all the models that are specified in that top-level models list. Like so:

..include

However you can also define a models dictionary for each todo_list, in case you need to test different models for different datasets, and want to run them all from one script.

..include

the models list

To be parsed correctly, the models list needs to have the right structure. In yaml terminology, this is a list. Once parsed into Python, it becomes a list of dictionaries. For that reason the structure is described in terms of the keys and values required for each dictionary. Each dictionary in the list represents one model that the select module will test. There are a couple of required keys for each model dictionary.

required key 1: hyperparameters

These models are found using machine learning algorithms. A model can be thought of as a function with parameters, like the beta terms of a linear regression. To find these parameters, the algorithm must train on the data, and this training also has parameters, for example the number of neighbors used by the K-nearest neighbor algorithm. These parameters of the algorithm are known as hyperparameters to distinguish them from the parameters found by the algorithm.

the todo list