Aggregation and merging
DatasetTagger
¶
Bases: GlobalStep
Class to tag each example with the dataset name. Useful when running aggregations.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(keyword='source', **kwargs)
¶
Parameters:
-
keyword(str, default:'source') –The key to use for tagging. Default is "source".
FilterDataset
¶
Bases: GlobalStep
Step for filtering a dataset.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(filter_fn, **kwargs)
¶
Parameters:
-
filter_fn(function) –Function to filter the dataset.
MergeDatasets
¶
Bases: GlobalStep
Step for merging datasets.
Merge is done using concatenation. Optional shuffling by providing a seed.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(output, shuffle=None, **kwargs)
¶
Parameters:
-
output(str) –Name of the output dataset. Should be unique.
-
shuffle(int, default:None) –seed for shuffling. Default is None.
Source code in ragfit/processing/global_steps/aggregation.py
SelectColumns
¶
Bases: GlobalStep
Step for selecting specified columns in a dataset.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(columns: list[str], **kwargs)
¶
Parameters:
-
columns(list) –List of keys to keep in the dataset.