flexeval.compute_metrics#

Utilities for computing needed metric computations and actually invoking those computations.

Functions

add_all_metrics_to_objects(...)

Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects.

compute_metrics(evalrun, evalsetrun)

count_rubric_metrics(iterable_of_objects)

Returns the total number of rubric type metrics in the metrics_to_evaluate field in each object.

class flexeval.compute_metrics.MetricComputer(function_modules: list, evalsetrun: EvalSetRun | None = None)[source]#

Bases: object

Methods

compute_metrics(object)

we've defined a variable called metrics_to_evaluate it's a list we need to loop through each entry looks like this { 'name': 'string_length', 'type': 'function', 'kwargs': {}, 'depends_on': [] }

load_rubrics(evalsetrun)

Set the rubrics to be used by this MetricComputer from the given EvalSetRun.

compute_function_metric

compute_metric

compute_rubric_metric

find_function

from_evalrun

invoke_function

process_thread_dependency_graph

process_thread_dependency_graphs

class flexeval.compute_metrics.MetricGraphBuilder[source]#

Bases: object

Builds networkx.DiGraphs of ObjectMetric instances that reflect any computational dependencies between them.

Methods

find_object_metric_from_depends_on(...)

If you're a Turn metric that depends on a Message metric, then we create a dependency on ALL or ANY Message meeting the criteria.

build_metric_structures

build_thread_task_graph

build_thread_task_graphs

get_index

get_or_create_object_metric

class flexeval.compute_metrics.ObjectMetric(object: Message | Turn | ToolCall | Thread, metric: dict)[source]#

Bases: object

flexeval.compute_metrics.add_all_metrics_to_objects(iterable_of_objects, metrics)[source]#

Adds all metric instances in metrics_for_level to each instance of an evaluable object (e.g., Turn, Thread, Message, or ToolCall) in iterable_of_objects. This addition is done by appending to the metrics_to_evaluate field, which all instances in iterable_of_objects should have.

Parameters:
  • iterable_of_objects – list of objects that have a metrics_to_evaluate field

  • metrics – list of metric instances to add to each object

flexeval.compute_metrics.compute_metrics(evalrun: EvalRun, evalsetrun: EvalSetRun) list[dict][source]#
flexeval.compute_metrics.count_rubric_metrics(iterable_of_objects)[source]#

Returns the total number of rubric type metrics in the metrics_to_evaluate field in each object.

Parameters:

iterable_of_objects – list of objects that have a metrics_to_evaluate field