flexeval.compute_metrics#
Utilities for computing needed metric computations and actually invoking those computations.
Functions
Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects. |
|
|
|
|
Returns the total number of rubric type metrics in the metrics_to_evaluate field in each object. |
- class flexeval.compute_metrics.MetricComputer(function_modules: list, evalsetrun: EvalSetRun | None = None)[source]#
Bases:
object
Methods
compute_metrics
(object)we've defined a variable called metrics_to_evaluate it's a list we need to loop through each entry looks like this { 'name': 'string_length', 'type': 'function', 'kwargs': {}, 'depends_on': [] }
load_rubrics
(evalsetrun)Set the rubrics to be used by this MetricComputer from the given EvalSetRun.
compute_function_metric
compute_metric
compute_rubric_metric
find_function
from_evalrun
invoke_function
process_thread_dependency_graph
process_thread_dependency_graphs
- class flexeval.compute_metrics.MetricGraphBuilder[source]#
Bases:
object
Builds
networkx.DiGraph
s ofObjectMetric
instances that reflect any computational dependencies between them.Methods
find_object_metric_from_depends_on
(...)If you're a Turn metric that depends on a Message metric, then we create a dependency on ALL or ANY Message meeting the criteria.
build_metric_structures
build_thread_task_graph
build_thread_task_graphs
get_index
get_or_create_object_metric
- class flexeval.compute_metrics.ObjectMetric(object: Message | Turn | ToolCall | Thread, metric: dict)[source]#
Bases:
object
- flexeval.compute_metrics.add_all_metrics_to_objects(iterable_of_objects, metrics)[source]#
Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects. This addition is done by appending to the metrics_to_evaluate field, which all instances in iterable_of_objects should have.
- Parameters:
iterable_of_objects – list of objects that have a metrics_to_evaluate field
metrics – list of metric instances to add to each object