spec for YAML files to configure feature extraction¶
This document specifies the structure of HVC config files written in YAML.
structure¶
Every extract.config.yml
file should be written in YAML as a dictionary with
(key, value) pairs.
In other words, any YAML file that contains a configuration for feature extraction
should define a dictionary named extract
with keys as outlined below.
required key: todo_list
¶
- Every
extract.config.yml
file has exactly one required key at the top level: todo_list
: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
optional keys¶
extract.config.yml
files may optionally define two other keys at the same level as todo_list
.
Those keys are spect_params
and segment_params
. As might be expected, spect_params
is a dict
that contains parameters for making spectrograms. The segment_params
dict contains parameters for
segmenting song. Specifications for these dictionaries are given below.
When defined at the same level as todo_list
they are considered “default” and apply to all items in the list.
If an element in todo_list defines different values for any of these keys,
the value assigned in that element takes precedence over the default value.
specification for dictionaries in todo_list
¶
required keys¶
- Every dict in a
todo_list
has the following required keys: bird_ID
: str for example,bl26lb16
file_format
: str one of{'evtaf','koumura'}
data_dirs
: list of str directories containing data each str must be a valid directory that can be found on the path for example- C:\DATA\bl26lb16\pre_surgery_baseline\041912 - C:\DATA\bl26lb16\pre_surgery_baseline\042012
output_dir
: str directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\
labelset
: str string of labels corresponding to labeled segments from which features should be extracted. Segments with labels not in this str will be ignored. Converted to a list but not necessary to enter as a list. For example,iabcdef
Finally, each dict in a todo_list
must define either
feature_list
or a feature_group
feature_list
: list- named features. See the list of named features here: named_features
feature_group
: str or list- named group of features, list if more than one group one of
{'knn','svm'}
- Note that a
todo_list
can define both afeature_list
and afeature_group
. In this case features from thefeature_group
are added to thefeature_list
.
Additional variables are added to the feature files that are output by
featureextract.extract
to keep track of which features belong to which
feature group.
specification for spect_params
and segment_params
dictionaries¶
spect_params
: dictparameters to calculate spectrogram keys correspond to parameters/arguments passed to Spectrogram class for __init__. must have either a
ref
key or thenperseg
andnoverlap
keys as defined below:
ref
: strone of
{'tachibana','koumura'}
Use spectrogram parameters from a reference.'tachibana'
uses spectrogram parameters from [1],'koumura'
uses spectrogram parameters from [2].nperseg
: intnumper of samples per segment for FFT, e.g. 512
noverlap
: intnumber of overlapping samples in each segment
- the following keys are all optional for
spect_params
:
freq_cutoffs
: two-element list of integerslimits of frequency band to keep, e.g. [1000,8000]
Spectrogram.make
keeps the band:freq_cutoffs[0] >= spectrogram > freq_cutoffs[1]
window
: strwindow to apply to segments valid strings are
'Hann', 'dpss', None
Hann – Usesnp.Hanning
with parameterM
(window width) set to value ofnperseg
dpss – Discrete prolate spheroidal sequence AKA Slepian.Uses
scipy.signal.slepian
with M parameter equal tonperseg
and width parameter equal to4/nperseg
, as in [2].filter_func
: strfilter to apply to raw audio. valid strings are
'diff'
orNone
'diff'
– differential filter, literallynp.diff
applied to signal as in [1].None
– no filter, this is the defaultspect_func
: strwhich function to use for spectrogram. valid strings are ‘scipy’ or ‘mpl’.
'scipy'
usesscipy.signal.spectrogram
,'mpl'
usesmatplotlib.matlab.specgram
. Default is'scipy'
.log_transform_spect
: boolif True, applies np.log10 to spectrogram to increase range. Default is True.
segment_params
: dictparameters for dividing audio into segments, defined below with the following keys
threshold
: int- value above which amplitude is considered part of a segment. default is 5000.
min_syl_dur
: float- minimum duration of a segment. default is 0.02, i.e. 20 ms.
min_silent_dur
: float- minimum duration of silent gap between segment. default is 0.002, i.e. 2 ms.
example extract.config.yml
files¶
These are some of the extract.config.yml
files used for testing, found in
hybrid-vocal-classifier/tests//data_for_tests/config.yml/
:
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- knn
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- svm
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: koumura
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_list:
- flatwindow
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
references¶
[1] | (1, 2) Tachibana, Ryosuke O., Naoya Oosugi, and Kazuo Okanoya. “Semi- |
- automatic classification of birdsong elements using a linear support vector
- machine.” PloS one 9.3 (2014): e92584.
[2] | (1, 2) Koumura, Takuya, and Kazuo Okanoya. “Automatic recognition of element |
classes and boundaries in the birdsong with variable sequences.” PloS one 11.7 (2016): e0159188.