Metrics
BERTScore
¶
Bases: MetricBase
BERTScore metric, based on the BERTScore library.
Source code in ragfit/evaluation/metrics.py
__init__(key_names: dict, model='microsoft/deberta-large-mnli', **kwargs)
¶
Initialize the Metrics class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
-
model
(str
, default:'microsoft/deberta-large-mnli'
) –The name of the BERT model to use. Defaults to "microsoft/deberta-large-mnli".
Source code in ragfit/evaluation/metrics.py
Classification
¶
Bases: MetricBase
Metrics for classification answers: accuracy, precision, recall, F1; macro-averaged.
dict - mapping of labels to integers.
Example: {"true": 1, "false": 0, "maybe": 2}
else_value: int - value to assign to labels not in the mapping.
Source code in ragfit/evaluation/metrics.py
EM
¶
Bases: MetricBase
Implementing Exact Match based on code from Kilt.
Source code in ragfit/evaluation/metrics.py
__init__(key_names, **kwargs) -> None
¶
Initialize the Metrics class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
F1
¶
Bases: MetricBase
Implementing F1 based on code from Kilt.
Source code in ragfit/evaluation/metrics.py
__init__(key_names, **kwargs) -> None
¶
Initialize the Metrics class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
HFEvaluate
¶
Bases: MetricBase
Wrapper class around evaluate
metrics; easy to use, only need metric names.
Source code in ragfit/evaluation/metrics.py
__init__(key_names, metric_names: list[str], **kwargs)
¶
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
-
metric_names
(list[str]
) –A list of metric names.
Source code in ragfit/evaluation/metrics.py
measure(example)
¶
Measure the performance of the model on a given example.
Parameters:
-
example
(dict
) –The example containing input and target values.
Returns:
-
dict
–The performance metric(s) computed for the example.
Source code in ragfit/evaluation/metrics.py
RecallEM
¶
Bases: MetricBase
Implementing EM as in XRAG.
Source code in ragfit/evaluation/metrics.py
__init__(key_names, **kwargs) -> None
¶
Initialize the Metrics class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
has_answer(answers, text, tokenizer=SimpleTokenizer())
¶
Check if a document contains an answer string.
Source code in ragfit/evaluation/metrics.py
Semantic
¶
Bases: MetricBase
Semantic similarity between label and answer using a cross-encoder.
Source code in ragfit/evaluation/metrics.py
__init__(key_names: dict, model: str = 'vectara/hallucination_evaluation_model', **kwargs) -> None
¶
Initializes an instance of the class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
-
model
(str
, default:'vectara/hallucination_evaluation_model'
) –The name of the BERT model to use.
Source code in ragfit/evaluation/metrics.py
SimpleTokenizer
¶
Bases: object
Source code in ragfit/evaluation/metrics.py
__init__()
¶
Parameters:
-
annotators
–None or empty set (only tokenizes).
StringEM
¶
Bases: MetricBase
Implementing String Exact Match.
Used in ASQA to evaluate whether the annoated short answers appear in the generated answer as sub-strings.
Source code in ragfit/evaluation/metrics.py
__init__(key_names: dict, **kwargs) -> None
¶
Initialize the Metrics class.
Parameters:
-
key_names
(dict
) –A dictionary containing the field names.
normalize_text(s)
¶
Normalize the given text by lowercasing it, removing punctuation, articles, and extra whitespace.
Parameters:
-
s
(str
) –The text to be normalized.
Returns:
-
str
–The normalized text.