flexeval.compute_metrics#

Utilities for computing needed metric computations and actually invoking those computations.

Functions

`add_all_metrics_to_objects`(...)	Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects.
`compute_metrics`(evalrun, evalsetrun)
`count_rubric_metrics`(iterable_of_objects)	Returns the total number of rubric type metrics in the metrics_to_evaluate field in each object.

class flexeval.compute_metrics.MetricComputer(function_modules: list, evalsetrun: EvalSetRun | None = None)[source]#

Bases: object

Methods

`compute_metrics`(object)	we've defined a variable called metrics_to_evaluate it's a list we need to loop through each entry looks like this { 'name': 'string_length', 'type': 'function', 'kwargs': {}, 'depends_on': [] }
`load_rubrics`(evalsetrun)	Set the rubrics to be used by this MetricComputer from the given EvalSetRun.

compute_function_metric
compute_metric
compute_rubric_metric
find_function
from_evalrun
invoke_function
process_thread_dependency_graph
process_thread_dependency_graphs

class flexeval.compute_metrics.MetricGraphBuilder[source]#

Bases: object

Builds networkx.DiGraph s of ObjectMetric instances that reflect any computational dependencies between them.

Methods

find_object_metric_from_depends_on(...)

If you're a Turn metric that depends on a Message metric, then we create a dependency on ALL or ANY Message meeting the criteria.

build_metric_structures
build_thread_task_graph
build_thread_task_graphs
get_index
get_or_create_object_metric

class flexeval.compute_metrics.ObjectMetric(object: Message | Turn | ToolCall | Thread, metric: dict)[source]#: Bases: object

flexeval.compute_metrics.add_all_metrics_to_objects(iterable_of_objects, metrics)[source]#

Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects. This addition is done by appending to the metrics_to_evaluate field, which all instances in iterable_of_objects should have.

Parameters:

iterable_of_objects – list of objects that have a metrics_to_evaluate field
metrics – list of metric instances to add to each object

flexeval.compute_metrics.compute_metrics(evalrun: EvalRun, evalsetrun: EvalSetRun) → list[dict][source]#

flexeval.compute_metrics.count_rubric_metrics(iterable_of_objects)[source]#

Returns the total number of rubric type metrics in the metrics_to_evaluate field in each object.

Parameters:: iterable_of_objects – list of objects that have a metrics_to_evaluate field

flexeval.compute_metrics#

This Page