flexeval#
FlexEval is a Python package for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.
This top-level import exposes the run()
method.
- flexeval.run(eval_run: EvalRun) EvalRunner [source]#
Runs the evaluations.
Modules
Peewee classes used for saving the results of a FlexEval run. |
|
CLI commands. |
|
Completing conversations using LLMs. |
|
Utilities for computing needed metric computations and actually invoking those computations. |
|
Built-in completion functions, function metrics, and rubric metrics. |
|
Dataset loading functions. |
|
Peewee database utilities. |
|
Determines how configured metrics depend on each other. |
|
Inspection utilities that use type hints to determine the appropriate object to pass to a function metric. |
|
Generic utility functions. |
|
Input/output utilities, primarily for reading and writing the |
|
Logging utilities. |
|
Utility functions for accessing metrics. |
|
Rubric metric IO utilities. |
|
Utilities for |
|
Convenience functions for running an Eval Run. |
|
Pydantic schema for the core configuration options used for FlexEval. |