hybrid-vocal-classifier (hvc)¶
a Python machine learning library for animal vocalizations and bioacoustics¶

the hybrid-vocal-classifier
library (hvc
for short)
makes it easier for researchers studying
animal vocalizations and bioacoustics
to apply machine learning algorithms to their data.
Its focus on automating the sort of annotations
often used by researchers studying
vocal learning
sets hvc
apart from more general software tools for bioacoustics.
In addition to automating annotation of data,
hvc
aims to make it easy for you to compare
different machine learning models that researchers have proposed,
using the data you have in your lab,
so you can see for yourself which one works best for your needs.
A related goal is to help you figure out
just how much data you have to label to get “good enough” accuracy for your analyses.
You can think of hvc
as a high-level wrapper around
the scikit-learn library,
plus built-in functionality for working with annotated animal sounds.
Running hvc
requires almost no coding.
Users write simple Python scripts, and most will have to only
adapt the examples from the documentation. Large batch jobs can be
run with configuration files written in YAML,
an easy-to-read format commonly used for configuration files.
Again, most users will only have to copy the example .yml
files
and then change a couple of options to work with their own datasets.
This code sample gives a high-level view of how you run hvc
:
import hvc
# extract features from audio to train machine learning models
hvc.extract('extract_config.yml') # using .yml config file
# train models/classifiers and select model with best accuracy
hvc.select('select_config.yml')
# use trained model to predict labels for unlabeled data
hvc.predict('predict_config.yml')
Advantages of hybrid-vocal-classifier¶
- frees up hundreds of hours spent annotating data by hand
- completely open source, free
- makes it easy to compare multiple machine learning algorithms
- almost no coding required, configurable with text files
- built on top of Python packages road-tested by the greater data-science community:
- numpy , scipy , matplotlib , scikit-learn , keras
Documentation¶
Tutorial¶
“autolabeling” with k-Nearest Neighbors¶
This tutorial will walk you through using hvc
to
automatically label Bengalese finch song with
the k-Nearest Neighbors algorithm.
We call this the “autolabel” workflow
(for more detail after going through this tutorial,
please see autolabel in the workflows: how to work with hvc
section of How-To Guides.)
There’s three main modules in hvc
that you will use in the
autolabel workflow: extract
to extract features, select
to
select a model, and predict
to predict labels for unlabeled data.
The steps below walk you through doing that.
A convenient way to work through this tutorial would be in iPython, so you might first start iPython from the commmand line, like this:
(my-hvc-environment) $ ipython
iPython is not installed automatically with hvc
so you’ll need to
install it. If you’re using the conda
package manager, this is as
easy as:
(my-hvc-environment) $ conda install ipython
First you import
the library so you can work with it.
0. Label a small set of songs to provide training data for the models, typically ~20-40 songs.¶
Here you would label your own song, using your software of choice (evsonganaly, Sound Analysis Pro, Praat) but for this example you can download some data that is already hand labeled from a repository.
1. Pick a machine learning algorithm/model and the features used to train the model.¶
In this case we use the k-Nearest Neighbors (k-NN) algorithm. This algorithm is quick to apply to data but at least one empirical study shows that it does not give the best accuracy on Bengalese finch song. You’ll use the features built into the library that have been tested with k-NN. These features are based in part on those developed by the Troyer lab (http://www.utsa.edu/troyerlab/software.html).
You specify the models and features in a configuration file (“config”
for short). More information about all the parameters in the config file
can be found on the page :ref:writing-extract-config
. For now you
can just copy the text below and save it in some file. The config is
written in YAML, a language for writing data structures (such as
different types of variables in a programming language).
extract:
spect_params:
ref: evsonganaly
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: evtaf
feature_group:
- knn
data_dirs:
- .\gy6or6\032612
output_dir: .\gy6or6\
labelset: iabcdefghjk
2. Extract features for that model from song files that will be used to train the model.¶
You call the extract
module and pass it the name of the yaml
config file as an argument. In the example below, the config file was
saved as 'gy6or6_autolabel_example.knn.extract.config.yml'
.
3. Pick the hyperparameters used by the algorithm as it trains the model on the data.¶
Now we use a convenience function to get an estimate of what value for our hyperparameters will give us the best accuracy when we train our machine learning models. The k-Nearest Neighbors algorithm has one main hyperparameter, the number of neighbors \(k\) in feature space that we look at to determine the label for a new syllable we are trying to classify.
4. Train, i.e., fit the model to the data¶
5. Select the best model based on some measure of accuracy.¶
Again we use a config file. In the config file, we specify the name of
the feature file saved by hvc.extract
. Again you can just copy and
paste the text below.
The key things to modify here are the hyperparameter :math:`k` and the name of the feature file. You will choose the value for :math:`k` based on your results from running ``hvc.utils.find_best_k``. You will get the name of the feature file from the directory created when you ran ``hvc.extract``. The name of the directory will be something like ``extract_output_bird_ID_date``. Make sure that on the line that says ``feature_file:``, you paste the name of the feature file after the colon. The name will have a format like ``summary_file_bird_ID_date``.
select:
num_replicates: 10
num_train_samples:
start : 50
stop : 250
step : 50
num_test_samples: 500
models:
-
model_name: knn
feature_list_indices: [0,1,2,3,4,5,6,7,8]
hyperparameters:
k : 4
todo_list:
- #1
feature_file: .\gy6or6\extract_output_171031_214453\summary_feature_file_created_171031_214642
output_dir: .\gy6or6\
Now you can use hvc.select
to select the best model. hvc.select
takes the name of the config file as an argument, which in this example
is gy6or6_autolabel.example.select.knn.config.yml
.
6. Using the fit model, predict labels for unlabeled data.¶
Here you also use a config file.
** The key things to modify here is the model_meta_file
parameter.
hvc.select
will also have created a directory, and for each model it
fit, it will have saved two files, a .model
file and a .meta
file. The .meta
file contains all the metadata that hvc
needs to
be able to use the .model
file. You choose whichever .meta
file
gave you the best results according to the metric you’re using, e.g. the
default of average accuracy across syllable classes. You also need to
specify the directories with unlabeled data, under the data_dirs
section.**
predict:
todo_list:
-
bird_ID : gy6or6
file_format: evtaf
data_dirs:
- C:\Users\Seymour Snyder\Documents\example_song\032612
model_meta_file: .\gy6or6\select_output_171031_215004\knn_k4\knn_200samples_replicate9.meta
output_dir: .\gy6or6
predict_proba: True
convert: notmat
- In a text editor, open
- On the line that says
model_meta_file:
, after the colon, paste the name of a meta file from theselect
output. The name will have a format likesummary_file_bird_ID_date
. - Below the line that says
data_dirs:
, after the dash, add the path to the other folder of data that you downloaded.
Lastly you use the hvc.predict
module to predict labels for new
syllables. hvc.predict
also takes a config file name as an argument.
In this example the file name is
gy6or6_autolabel.example.knn.predict.config.yml
.
parsed predict config
Changing to data directory: C:/Data/gy6or6_all_files/032612
Processing audio file 1 of 39.
Processing audio file 2 of 39.
...
Processing audio file 39 of 39.
predicting labels for features in file: features_from_032612_created_171206_013759
converting to .not.mat files
Congratulations! You have auto-labeled an entire day’s worth of data in just a few minutes!
How-To Guides¶
This section provides more detailed “how-to” guides to consult.
workflows: how to work with hvc
¶
There are two main workflows for using hvc
.
Click on the links below for a high-level overview of each:
- autolabel: for researchers that want to automate labeling of vocalizations.
- autocompare workflow: for researchers that want to compare different machine learning algorithms.
how to write YAML configuration files¶
Pages here explain in detail how to write YAML configuration files.
how to write yaml files used by the select module¶
As described in the introduction, a crucial step in using hybrid vocal classifier is selecting which models to use. This can be done in an automated way using the select module. Like the extract and predict modules, the select module works by parsing configuration files. Below the steps are outlined in writing the configuration files in yaml format.
- There are two required elements in a select config file, that
- correspond to the two main things that the select module needs to know:
- models: what models to test. A Python list of dictionaries, as described below.
- todo_list: where the data is to train and test those models. Another Python list of dictionaries, also described below.
The parser that parses the select config file is written so that you don’t have to repeat yourself. You can put one models list at the top of the file, and then for each dataset in the todo_list, the select module will train and test all the models that are specified in that top-level models list. Like so:
..include
However you can also define a models dictionary for each todo_list, in case you need to test different models for different datasets, and want to run them all from one script.
..include
To be parsed correctly, the models list needs to have the right structure. In yaml terminology, this is a list. Once parsed into Python, it becomes a list of dictionaries. For that reason the structure is described in terms of the keys and values required for each dictionary. Each dictionary in the list represents one model that the select module will test. There are a couple of required keys for each model dictionary.
These models are found using machine learning algorithms. A model can be thought of as a function with parameters, like the beta terms of a linear regression. To find these parameters, the algorithm must train on the data, and this training also has parameters, for example the number of neighbors used by the K-nearest neighbor algorithm. These parameters of the algorithm are known as hyperparameters to distinguish them from the parameters found by the algorithm.
Reference¶
yaml config¶
spec for yaml config files¶
This document specifies the structure of HVC config files written in yaml. It is a painfully dry document that exists to guide the project code, not to teach someone how to write HVC config files. For a gentle introduction to writing config files, please see the writing_config_files.
Essentially, each config file specifies a list of jobs. Each job in a list will typically correspond to data files from one bird.
Config files consist of three sections:
- global_config: parameters that apply to all jobs
- model_selection: list of jobs for selecting machine learning models
- prediction: list of jobs that apply models to unclassified data
## global_config As the name implies, parameters in the global_config section apply to all jobs. The global_config is a dictionary of dictionaries.
Example: ``` yaml global_config:
- spect_params :
- samp_freq : 32000 # Hz window_size : 512 window_step : 32 freq_cutoffs : [1000,8000]
- neural_net :
- syl_spect_width : 300
- model_selection is a list of jobs. Each job is a dictionary.
- Hence model_selection is a list of dictionaries.
Each job, i.e. each item in the list, is marked with an empty dash. Below each empty dash appear the keys and values that make up the dictionary.
A job in the ‘model_selection` section must include the following keys:
bird_ID : string, alphanumeric, identifies bird
train : dictionary with parameters for training dataset
test : dictionary with parameters for testing dataset - both train and test contain a list dirs. Each item in dirs
is a string, and that string must be a path to a directory of audio files (expected to contain song from the bird bird_ID).
output_dir : string, directory where output will be saved. HVC
creates a new subfolder in the given directory. - labelset : string, labels used for syllabes. Only syllables with the labels in labelset will be included in the training and testing
datasets.
**If a parameter is defined in global_config and then defined again in a job, the value defined in the job takes precedence over the
global_config value, but only for that job.**
Example: ``` yaml model_selection: # list of dictionaries, dash without key next to is a list item so each dictionary is an item in the list
- # i.e. this is dictionary 1
bird_ID : gr41rd51
- train :
- dirs:
- C:DATAgr41rd51pre_surgery_baseline06-21-12
- test :
- dirs:
- C:DATAgr41rd51pre_surgery_baseline06-19-12
- C:DATAgr41rd51pre_surgery_baseline06-20-12
- C:DATAgr41rd51pre_surgery_baseline06-22-12
output_dir: C:DATAgr41rd51
labelset : iabcdefgjkm
- spect_params : # not required, but will take precedence over spect_params in global_config
samp_freq : 32000 # Hz window_size : 512 window_step : 32 freq_cutoffs : [1000,10000]
Like model_selection, the prediction section is a list of job dictionaries.
- A job in the ‘prediction` section must include the following keys:
- bird_ID : string, alphanumeric, identifies bird
- model_file : string, a file name. Either a scikit-learn model that
has been `pickle`d or `dump`ed by joblib, or an hdf5 model output by Keras.
``` yaml prediction:
- bird_ID : gr41rd51 model_file : gr41rd51_svm.pkl
- The parameters listed below can appear in either global_config or a job.
- spect_params :
- samp_freq : integer
- window_size : integer
- window_step : integer
- freq_cutoffs : list
- num_train_songs :
- start : integer
- stop : integer
- step : integer
- num_train_samples :
- start : integer
- stop : integer
- step : integer
- models :
- knn
- linsvm
- svm
- neural_net
spec for YAML files to configure feature extraction¶
This document specifies the structure of HVC config files written in YAML.
Every extract.config.yml
file should be written in YAML as a dictionary with
(key, value) pairs.
In other words, any YAML file that contains a configuration for feature extraction
should define a dictionary named extract
with keys as outlined below.
todo_list
¶- Every
extract.config.yml
file has exactly one required key at the top level: todo_list
: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
extract.config.yml
files may optionally define two other keys at the same level as todo_list
.
Those keys are spect_params
and segment_params
. As might be expected, spect_params
is a dict
that contains parameters for making spectrograms. The segment_params
dict contains parameters for
segmenting song. Specifications for these dictionaries are given below.
When defined at the same level as todo_list
they are considered “default” and apply to all items in the list.
If an element in todo_list defines different values for any of these keys,
the value assigned in that element takes precedence over the default value.
todo_list
¶- Every dict in a
todo_list
has the following required keys: bird_ID
: str for example,bl26lb16
file_format
: str one of{'evtaf','koumura'}
data_dirs
: list of str directories containing data each str must be a valid directory that can be found on the path for example- C:\DATA\bl26lb16\pre_surgery_baseline\041912 - C:\DATA\bl26lb16\pre_surgery_baseline\042012
output_dir
: str directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\
labelset
: str string of labels corresponding to labeled segments from which features should be extracted. Segments with labels not in this str will be ignored. Converted to a list but not necessary to enter as a list. For example,iabcdef
Finally, each dict in a todo_list
must define either
feature_list
or a feature_group
feature_list
: list- named features. See the list of named features here: named_features
feature_group
: str or list- named group of features, list if more than one group one of
{'knn','svm'}
- Note that a
todo_list
can define both afeature_list
and afeature_group
. In this case features from thefeature_group
are added to thefeature_list
.
Additional variables are added to the feature files that are output by
featureextract.extract
to keep track of which features belong to which
feature group.
spect_params
and segment_params
dictionaries¶
spect_params
: dictparameters to calculate spectrogram keys correspond to parameters/arguments passed to Spectrogram class for __init__. must have either a
ref
key or thenperseg
andnoverlap
keys as defined below:
ref
: strone of
{'tachibana','koumura'}
Use spectrogram parameters from a reference.'tachibana'
uses spectrogram parameters from [1],'koumura'
uses spectrogram parameters from [2].nperseg
: intnumper of samples per segment for FFT, e.g. 512
noverlap
: intnumber of overlapping samples in each segment
- the following keys are all optional for
spect_params
:
freq_cutoffs
: two-element list of integerslimits of frequency band to keep, e.g. [1000,8000]
Spectrogram.make
keeps the band:freq_cutoffs[0] >= spectrogram > freq_cutoffs[1]
window
: strwindow to apply to segments valid strings are
'Hann', 'dpss', None
Hann – Usesnp.Hanning
with parameterM
(window width) set to value ofnperseg
dpss – Discrete prolate spheroidal sequence AKA Slepian.Uses
scipy.signal.slepian
with M parameter equal tonperseg
and width parameter equal to4/nperseg
, as in [2].filter_func
: strfilter to apply to raw audio. valid strings are
'diff'
orNone
'diff'
– differential filter, literallynp.diff
applied to signal as in [1].None
– no filter, this is the defaultspect_func
: strwhich function to use for spectrogram. valid strings are ‘scipy’ or ‘mpl’.
'scipy'
usesscipy.signal.spectrogram
,'mpl'
usesmatplotlib.matlab.specgram
. Default is'scipy'
.log_transform_spect
: boolif True, applies np.log10 to spectrogram to increase range. Default is True.
segment_params
: dictparameters for dividing audio into segments, defined below with the following keys
threshold
: int- value above which amplitude is considered part of a segment. default is 5000.
min_syl_dur
: float- minimum duration of a segment. default is 0.02, i.e. 20 ms.
min_silent_dur
: float- minimum duration of silent gap between segment. default is 0.002, i.e. 2 ms.
extract.config.yml
files¶These are some of the extract.config.yml
files used for testing, found in
hybrid-vocal-classifier/tests//data_for_tests/config.yml/
:
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- knn
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- svm
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: koumura
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_list:
- flatwindow
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
[1] | (1, 2) Tachibana, Ryosuke O., Naoya Oosugi, and Kazuo Okanoya. “Semi- |
- automatic classification of birdsong elements using a linear support vector
- machine.” PloS one 9.3 (2014): e92584.
[2] | (1, 2) Koumura, Takuya, and Kazuo Okanoya. “Automatic recognition of element |
classes and boundaries in the birdsong with variable sequences.” PloS one 11.7 (2016): e0159188.
spec for YAML files to configure model selection¶
This document specifies the structure of HVC config files written in YAML.
Every select.config.yml
file should be written in YAML as a dictionary with (key, value) pairs.
In other words, any YAML file that contains a configuration for model selection should define
a dictionary named select
with keys as outlined below.
todo_list
¶- Every
select.config.yml
file has exactly one required key at the top level: todo_list
: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
select.config.yml
files may optionally define other keys at the same level as todo_list
.
Those keys are:
num_replicates
: int- number of replicates, i.e. number of folds for cross-validation
num_test_samples
: int- number of samples from feature file to put in testing set
num_train_samples
: int- number of samples from feature file to put in training set
models
: list- list of dictionaries that define models to be tested on features
When defined at the same level as todo_list
they are considered default
.
If an element in todo_list
defines different values for any of these keys,
the value assigned in that element takes precedence over the default
value.
- Every dict in a
todo_list
has the following required keys: feature_file
: str for example:C:\Data\gy6or6\extract_output_170711_0104\summary_feature_file_created_170711_0104
output_dir
: str path to directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\
As stated above, these can all be defined at the top level of the file. If they are also defined
for any dict in a todo_list
, then that definition will override the top-level definition.
models
: list of dicts- dictionary of models, as defined below. Required if not defined at top level of file.
num_replicates
: int- number of replicates, i.e. number of folds for cross-validation
num_test_samples
: int- number of samples from feature file to put in testing set
num_train_samples
: int- number of samples from feature file to put in training set
- Every dict in a
models
list has the following required keys: model_name
: str- name of model, e.g. ‘svm’
hyperparameters
: dict- with hyperparameters defined for each model
Every dict in a models
list must also specify the features with which to train the model.
One of the following is valid, as specified in validation.yml
.
feature_list_indices
: list of ints- corresponding to elements in list of feature names in feature_file e.g.,
[0,1,2,5,7]
feature_group
: str- name of a feature group: one of
{'knn','svm'}
neuralnet_input
: str- name of input for am artificial neural net:
{'flatwindow'}
select_config.yml
¶These are some of the select.config.yml
files used for testing, found in
hybrid-vocal-classifier/tests/data_for_tests/config.yml/
:
spec for YAML files to configure label prediction¶
This document specifies the structure of HVC config files written in YAML.
Every predict.config.yml
file should be written in YAML as a dictionary with (key, value) pairs
In other words, any YAML file that contains a configuration for feature extraction
should define a dictionary named ‘predict` with keys as outlined below.
todo_list
¶- Every
predict.config.yml
file has exactly one required key at the top level: todo_list
: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
todo_list
¶- Every dict in a todo_list has the following required keys:
bird_ID
: str for example, bl26lb16file_format
: str {‘evtaf’,’koumura’}data_dirs
: list of str directories containing data each str must be a valid directory that can be found on the path for example- C:\DATA\bl26lb16\pre_surgery_baseline\041912 - C:\DATA\bl26lb16\pre_surgery_baseline\042012
model_file
: str filename of machine learning model / neural network that will be used to predict labels for syllables example: somedir/select_output_170814_005430/knn/knn_100samples_replicate0output_dir
: str directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\
predict_proba
: bool If True, calculate probabilities for predicted labels.
predict_config.yml
¶These are some of the predict.config.yml
files used for testing, found in
hybrid-vocal-classifier/tests/data_for_tests/config.yml/
:
Features¶
This section contains information on engineered features that hvc
provides
for machine learning models.
named features¶
These features are pre-defined and can be referred to by name in the feature_list of YAML files for extract.
- mean_spectrum
- mean_delta_spectrum : 5-order delta of spectrum
- mean_cepstrum
- mean_delta_cepstrum : 5-order delta of cepstrum
- dur : duration
- mA : additional subset of features listed below
- SpecCentroid
- SpecSpread
- SpecSkewness
- SpecKurtosis
- SpecFlatness
- SpecSlope
- Pitch
- PitchGoodness
- Amp
[1] | Tachibana, Ryosuke O., Naoya Oosugi, and Kazuo Okanoya. “Semi- |
- automatic classification of birdsong elements using a linear support vector
- machine.” PloS one 9.3 (2014): e92584.
API reference¶
hvc¶
Development¶
- This section provides more detail for developers, including:
- Descriptions of how the code base works under the hood
- a roadmap for the project
code base¶
These pages contain information on the code base and its design, as a reference for development, including pseudocode-like descriptions of functions to orient someone reading the code.
# HVC workflow in detail This document explains in detail how functions and modules work, mainly as a reference for developers.
Take the example code from the intro notes page: ```Python import hvc
hvc.extract(‘extract_config.yml’) hvc.select(‘select_config.yml’) hvc.predict(‘predict_config.yml’) ```
Here’s a step-by-step outline of what happens under the hood: - import hvc
- automatically imports featureextract, labelpredict, and
- modelselect modules
- specifically, the extract, predict, and select functions
from their respective modules
- hvc.extract(‘extract_config.yml’)
first parse.extract parses the config file
for each element in todo_list from config + for each data directory datadir in todo_list:
- change to that directory
- get all the audio files in that directory with glob
- for each audo file: + run features.extract.from_file + add extracted features to features_from_all_files
save all features in an output file
- hvc.select(‘select_config.yml’)
- hvc.predict(‘predict_config.yml’)
parse package, extract module¶
The extract module parses extract.config.yml files.
Here’s a rough outline of how it works.
parseextract.py contains the following functions: - validate_spect_params - validate_segment_params - _validate_feature_group_and_convert_to_list - _validate_todo_list_dict - validate_yaml
validate_yaml is the main function; it gets called by parse.
_validate_todo_list_dict is called by validate_yaml when it finds a key todo_list whose value is a list of dicts.
This function validates feature groups, which are validated differently than feature lists. Feature lists are validated by making sure every element in the list is a string found within a valid features list, which is a concatenation of all the features listend in feature_groups.yml in the parse module.
The parsing of a feature_group key is a little more complicated. The first step is to make sure the group or groups appear in the dictionary of valid feature groups in hvc/parse/feature_groups.yml’. The keys of the dictionary of valid feature groups are the valid feature group names, and the values of the dictionary are the actual lists of features. If each `str in `feature_group1 is s a valid feature group, then the list of features is taken from the dictionary
of valid feature groups. The list is then validated by comparing it to the list of all features in features.yml.
This is to make sure the developer didn’t make a typo. If feature_group is a list of feature group names, then feature_list will consist of all features from all groups in the list. A vector of the same length as the new feature list has values that indicate which feature group each element in the new feature list belongs to. This vector is named feature_list_group_ID. A dict named ftr_group_dict is also returned,
where each key is a name of a feature group and its
corresponding value is the ID number given to that feature group. Using this dict and the identity array, hvc/parse/select can pull the correct features out of a feature array given a feature group name.
Example: ```Python >>> ftr_tuple = _validate_feature_group_and_convert_to_list(feature_group=[‘knn’,’svm’])
>>> ftr_tuple[0]
[‘duration group’,’preceding syllable duration’ … ] # and so on
>>> ftr_tuple[1]
np.ndarray([0,0,0,0,0,1,1,1,1,1]) # some array with one of two ID numbers
>>> len(ftr_typle[0]) = ftr_tuple[1].shape[-1]
True
>>> ftr_tuple[2]
If a feature_list was passed to this function along with feature_group, the features from the feature groups are appended to the feature_list, and in the feature_group_ID vector, the original features from the original feature list have a value of None.
Installation¶
see Installation
Support¶
Contribute¶
- Issue Tracker: https://github.com/NickleDave/hybrid-vocal-classifier/issues
- Source Code: https://github.com/NickleDave/hybrid-vocal-classifier/
License¶
BSD license.
Code of Conduct¶
We welcome contributions to the codebase and the documentation, and are happy to help first-time contributors through the process. Project maintainers and contributors are expected to uphold the code of conduct described here: Contributor Covenant Code of Conduct
Backstory¶
hvc
was originally developed in the Sober lab
as a tool to automate annotation of birdsong (as shown in the picture above).
It grew out of a submission to the
SciPy 2016 conference
and later developed into a library,
as presented in this talk: https://youtu.be/BwNeVNou9-s