spec for yaml config files¶

This document specifies the structure of HVC config files written in yaml. It is a painfully dry document that exists to guide the project code, not to teach someone how to write HVC config files. For a gentle introduction to writing config files, please see the writing_config_files.

Essentially, each config file specifies a list of jobs. Each job in a list will typically correspond to data files from one bird.

Config files consist of three sections:

global_config: parameters that apply to all jobs

model_selection: list of jobs for selecting machine learning models

prediction: list of jobs that apply models to unclassified data

## global_config As the name implies, parameters in the global_config section apply to all jobs. The global_config is a dictionary of dictionaries.

Example: ``` yaml global_config:

spect_params :

samp_freq : 32000 # Hz window_size : 512 window_step : 32 freq_cutoffs : [1000,8000]

neural_net :

syl_spect_width : 300

```

model_selection¶

model_selection is a list of jobs. Each job is a dictionary.: Hence model_selection is a list of dictionaries.

Each job, i.e. each item in the list, is marked with an empty dash. Below each empty dash appear the keys and values that make up the dictionary.

A job in the ‘model_selection` section must include the following keys:

bird_ID : string, alphanumeric, identifies bird

train : dictionary with parameters for training dataset

test : dictionary with parameters for testing dataset - both train and test contain a list dirs. Each item in dirs

is a string, and that string must be a path to a directory of audio files (expected to contain song from the bird bird_ID).

output_dir : string, directory where output will be saved. HVC

creates a new subfolder in the given directory. - labelset : string, labels used for syllabes. Only syllables with the labels in labelset will be included in the training and testing

datasets.

**If a parameter is defined in global_config and then defined again in a job, the value defined in the job takes precedence over the

global_config value, but only for that job.**

Example: ``` yaml model_selection: # list of dictionaries, dash without key next to is a list item so each dictionary is an item in the list

# i.e. this is dictionary 1

bird_ID : gr41rd51

train :

dirs:

C:DATAgr41rd51pre_surgery_baseline06-21-12

test :

dirs:

C:DATAgr41rd51pre_surgery_baseline06-19-12

C:DATAgr41rd51pre_surgery_baseline06-20-12

C:DATAgr41rd51pre_surgery_baseline06-22-12

output_dir: C:DATAgr41rd51

labelset : iabcdefgjkm

spect_params : # not required, but will take precedence over spect_params in global_config

samp_freq : 32000 # Hz window_size : 512 window_step : 32 freq_cutoffs : [1000,10000]

```

prediction¶

Like model_selection, the prediction section is a list of job dictionaries.

A job in the ‘prediction` section must include the following keys:

bird_ID : string, alphanumeric, identifies bird
model_file : string, a file name. Either a scikit-learn model that

has been `pickle`d or `dump`ed by joblib, or an hdf5 model output by Keras.

``` yaml prediction:

bird_ID : gr41rd51 model_file : gr41rd51_svm.pkl

```

parameters¶

The parameters listed below can appear in either global_config or a job.

spect_params :
- samp_freq : integer
- window_size : integer
- window_step : integer
- freq_cutoffs : list
num_train_songs :
- start : integer
- stop : integer
- step : integer
num_train_samples :
- start : integer
- stop : integer
- step : integer
models :
- knn
- linsvm
- svm
- neural_net

spec for yaml config files¶

model_selection¶

prediction¶

parameters¶

hybrid-vocal-classifier

Navigation

Related Topics