Sampling and Fewshot
FewShot
¶
Bases: Sampler
Class to collect fewshot examples from the same or another dataset.
Source code in ragfit/processing/global_steps/sampling.py
__init__(k, output_key='fewshot', input_dataset=None, **kwargs)
¶
Parameters:
-
k(int) –Number of examples to collect.
-
output_key(str, default:'fewshot') –output key to use for the collected examples.
-
input_dataset(str, default:None) –Name of the dataset to take the examples from. To use the same dataset, use None.
Source code in ragfit/processing/global_steps/sampling.py
Sampler
¶
Bases: GlobalStep
Class to augment a dataset with sampled examples from the same or another dataset.
Full examples can be collected, as well as an individual example keys like query, documents, etc.
The step can be used to collect negative documents, negative queries and collect fewshot examples.
For fewshot examples, use the dedicated FewShot class.
Source code in ragfit/processing/global_steps/sampling.py
__init__(k, input_key=None, output_key='fewshot', input_dataset=None, **kwargs)
¶
Parameters:
-
k(int) –Number of examples to collect.
-
input_key(str, default:None) –a key to collect from the collected examples, or None to take entire example.
-
output_key(str, default:'fewshot') –output key to use for the examples.
-
input_dataset(str, default:None) –Name of the dataset to take the examples from. To use the same dataset, use None.
Source code in ragfit/processing/global_steps/sampling.py
ShuffleSelect
¶
Bases: GlobalStep
Class to optionally shuffle and select a subset of the dataset.
Based on the shuffle and select methods of HF Dataset.
Source code in ragfit/processing/global_steps/sampling.py
__init__(shuffle=None, limit=None, **kwargs)
¶
Parameters:
-
shuffle(int, default:None) –Seed for shuffling the dataset.
-
limit(int, default:None) –Number of items to select from the dataset.