spec for YAML files to configure model selection¶
This document specifies the structure of HVC config files written in YAML.
structure¶
Every select.config.yml
file should be written in YAML as a dictionary with (key, value) pairs.
In other words, any YAML file that contains a configuration for model selection should define
a dictionary named select
with keys as outlined below.
required key: todo_list
¶
- Every
select.config.yml
file has exactly one required key at the top level: todo_list
: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
optional keys¶
select.config.yml
files may optionally define other keys at the same level as todo_list
.
Those keys are:
num_replicates
: int- number of replicates, i.e. number of folds for cross-validation
num_test_samples
: int- number of samples from feature file to put in testing set
num_train_samples
: int- number of samples from feature file to put in training set
models
: list- list of dictionaries that define models to be tested on features
When defined at the same level as todo_list
they are considered default
.
If an element in todo_list
defines different values for any of these keys,
the value assigned in that element takes precedence over the default
value.
specification for dictionaries in todo_list¶
required keys¶
- Every dict in a
todo_list
has the following required keys: feature_file
: str for example:C:\Data\gy6or6\extract_output_170711_0104\summary_feature_file_created_170711_0104
output_dir
: str path to directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\
optional keys¶
As stated above, these can all be defined at the top level of the file. If they are also defined
for any dict in a todo_list
, then that definition will override the top-level definition.
models
: list of dicts- dictionary of models, as defined below. Required if not defined at top level of file.
num_replicates
: int- number of replicates, i.e. number of folds for cross-validation
num_test_samples
: int- number of samples from feature file to put in testing set
num_train_samples
: int- number of samples from feature file to put in training set
specification for models list of dicts¶
- Every dict in a
models
list has the following required keys: model_name
: str- name of model, e.g. ‘svm’
hyperparameters
: dict- with hyperparameters defined for each model
Every dict in a models
list must also specify the features with which to train the model.
One of the following is valid, as specified in validation.yml
.
feature_list_indices
: list of ints- corresponding to elements in list of feature names in feature_file e.g.,
[0,1,2,5,7]
feature_group
: str- name of a feature group: one of
{'knn','svm'}
neuralnet_input
: str- name of input for am artificial neural net:
{'flatwindow'}
example select_config.yml
¶
These are some of the select.config.yml
files used for testing, found in
hybrid-vocal-classifier/tests/data_for_tests/config.yml/
: