Sampling and Fewshot
FewShot
¶
Bases: Sampler
Class to collect fewshot examples from the same or another dataset.
Source code in ragfit/processing/global_steps/sampling.py
__init__(k, output_key='fewshot', input_dataset=None, **kwargs)
¶
Parameters:
-
k
(int
) –Number of examples to collect.
-
output_key
(str
, default:'fewshot'
) –output key to use for the collected examples.
-
input_dataset
(str
, default:None
) –Name of the dataset to take the examples from. To use the same dataset, use None.
Source code in ragfit/processing/global_steps/sampling.py
Sampler
¶
Bases: GlobalStep
Class to augment a dataset with sampled examples from the same or another dataset.
Full examples can be collected, as well as an individual example keys like query
, documents
, etc.
The step can be used to collect negative documents, negative queries and collect fewshot examples.
For fewshot examples, use the dedicated FewShot
class.
Source code in ragfit/processing/global_steps/sampling.py
__init__(k, input_key=None, output_key='fewshot', input_dataset=None, **kwargs)
¶
Parameters:
-
k
(int
) –Number of examples to collect.
-
input_key
(str
, default:None
) –a key to collect from the collected examples, or None to take entire example.
-
output_key
(str
, default:'fewshot'
) –output key to use for the examples.
-
input_dataset
(str
, default:None
) –Name of the dataset to take the examples from. To use the same dataset, use None.
Source code in ragfit/processing/global_steps/sampling.py
ShuffleSelect
¶
Bases: GlobalStep
Class to optionally shuffle and select a subset of the dataset.
Based on the shuffle
and select
methods of HF Dataset.
Source code in ragfit/processing/global_steps/sampling.py
__init__(shuffle=None, limit=None, **kwargs)
¶
Parameters:
-
shuffle
(int
, default:None
) –Seed for shuffling the dataset.
-
limit
(int
, default:None
) –Number of items to select from the dataset.