Aggregation and merging
DatasetTagger
¶
Bases: GlobalStep
Class to tag each example with the dataset name. Useful when running aggregations.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(keyword='source', **kwargs)
¶
Parameters:
-
keyword
(str
, default:'source'
) –The key to use for tagging. Default is "source".
FilterDataset
¶
Bases: GlobalStep
Step for filtering a dataset.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(filter_fn, **kwargs)
¶
Parameters:
-
filter_fn
(function
) –Function to filter the dataset.
MergeDatasets
¶
Bases: GlobalStep
Step for merging datasets.
Merge is done using concatenation. Optional shuffling by providing a seed.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(output, shuffle=None, **kwargs)
¶
Parameters:
-
output
(str
) –Name of the output dataset. Should be unique.
-
shuffle
(int
, default:None
) –seed for shuffling. Default is None.
Source code in ragfit/processing/global_steps/aggregation.py
SelectColumns
¶
Bases: GlobalStep
Step for selecting specified columns in a dataset.
Source code in ragfit/processing/global_steps/aggregation.py
__init__(columns: list[str], **kwargs)
¶
Parameters:
-
columns
(list
) –List of keys to keep in the dataset.