spec for YAML files to configure feature extraction¶
This document specifies the structure of HVC config files written in YAML.
structure¶
Every extract.config.yml file should be written in YAML as a dictionary with
(key, value) pairs.
In other words, any YAML file that contains a configuration for feature extraction
should define a dictionary named extract with keys as outlined below.
required key: todo_list¶
- Every
extract.config.ymlfile has exactly one required key at the top level: todo_list: list of dicts- list where each element is a dict. each dict sets parameters for a ‘job’, typically data associated with one set of vocalizations.
optional keys¶
extract.config.yml files may optionally define two other keys at the same level as todo_list.
Those keys are spect_params and segment_params. As might be expected, spect_params is a dict
that contains parameters for making spectrograms. The segment_params dict contains parameters for
segmenting song. Specifications for these dictionaries are given below.
When defined at the same level as todo_list they are considered “default” and apply to all items in the list.
If an element in todo_list defines different values for any of these keys,
the value assigned in that element takes precedence over the default value.
specification for dictionaries in todo_list¶
required keys¶
- Every dict in a
todo_listhas the following required keys: bird_ID: str for example,bl26lb16file_format: str one of{'evtaf','koumura'}data_dirs: list of str directories containing data each str must be a valid directory that can be found on the path for example- C:\DATA\bl26lb16\pre_surgery_baseline\041912 - C:\DATA\bl26lb16\pre_surgery_baseline\042012
output_dir: str directory in which to save output if it doesn’t exist, HVC will create it for example,C:\DATA\bl26lb16\labelset: str string of labels corresponding to labeled segments from which features should be extracted. Segments with labels not in this str will be ignored. Converted to a list but not necessary to enter as a list. For example,iabcdef
Finally, each dict in a todo_list must define either
feature_list or a feature_group
feature_list: list- named features. See the list of named features here: named_features
feature_group: str or list- named group of features, list if more than one group one of
{'knn','svm'}- Note that a
todo_listcan define both afeature_listand afeature_group. In this case features from thefeature_groupare added to thefeature_list.
Additional variables are added to the feature files that are output by
featureextract.extract to keep track of which features belong to which
feature group.
specification for spect_params and segment_params dictionaries¶
spect_params: dictparameters to calculate spectrogram keys correspond to parameters/arguments passed to Spectrogram class for __init__. must have either a
refkey or thenpersegandnoverlapkeys as defined below:
ref: strone of
{'tachibana','koumura'}Use spectrogram parameters from a reference.'tachibana'uses spectrogram parameters from [1],'koumura'uses spectrogram parameters from [2].nperseg: intnumper of samples per segment for FFT, e.g. 512
noverlap: intnumber of overlapping samples in each segment
- the following keys are all optional for
spect_params:
freq_cutoffs: two-element list of integerslimits of frequency band to keep, e.g. [1000,8000]
Spectrogram.makekeeps the band:freq_cutoffs[0] >= spectrogram > freq_cutoffs[1]
window: strwindow to apply to segments valid strings are
'Hann', 'dpss', NoneHann – Usesnp.Hanningwith parameterM(window width) set to value ofnpersegdpss – Discrete prolate spheroidal sequence AKA Slepian.Uses
scipy.signal.slepianwith M parameter equal tonpersegand width parameter equal to4/nperseg, as in [2].filter_func: strfilter to apply to raw audio. valid strings are
'diff'orNone'diff'– differential filter, literallynp.diffapplied to signal as in [1].None– no filter, this is the defaultspect_func: strwhich function to use for spectrogram. valid strings are ‘scipy’ or ‘mpl’.
'scipy'usesscipy.signal.spectrogram,'mpl'usesmatplotlib.matlab.specgram. Default is'scipy'.log_transform_spect: boolif True, applies np.log10 to spectrogram to increase range. Default is True.
segment_params: dictparameters for dividing audio into segments, defined below with the following keys
threshold: int- value above which amplitude is considered part of a segment. default is 5000.
min_syl_dur: float- minimum duration of a segment. default is 0.02, i.e. 20 ms.
min_silent_dur: float- minimum duration of silent gap between segment. default is 0.002, i.e. 2 ms.
example extract.config.yml files¶
These are some of the extract.config.yml files used for testing, found in
hybrid-vocal-classifier/tests//data_for_tests/config.yml/:
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- knn
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: tachibana
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_group:
- svm
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
extract:
spect_params:
ref: koumura
segment_params:
threshold: 1500 # arbitrary units of amplitude
min_syl_dur: 0.01 # ms
min_silent_dur: 0.006 # ms
todo_list:
-
bird_ID : gy6or6
file_format: cbin
feature_list:
- flatwindow
data_dirs:
- ../cbins/gy6or6/032312
- ../cbins/gy6or6/032412
output_dir: replace with tmp_output_dir
labels_to_use: iabcdefghjk
references¶
| [1] | (1, 2) Tachibana, Ryosuke O., Naoya Oosugi, and Kazuo Okanoya. “Semi- |
- automatic classification of birdsong elements using a linear support vector
- machine.” PloS one 9.3 (2014): e92584.
| [2] | (1, 2) Koumura, Takuya, and Kazuo Okanoya. “Automatic recognition of element |
classes and boundaries in the birdsong with variable sequences.” PloS one 11.7 (2016): e0159188.