.. FlexEval documentation master file, created by
sphinx-quickstart on Thu Jul 3 12:21:33 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
FlexEval documentation
======================
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.12729993.svg
:target: https://doi.org/10.5281/zenodo.12729993
:alt: Zenodo DOI
.. image:: https://img.shields.io/github/license/DigitalHarborFoundation/FlexEval
:target: https://github.com/DigitalHarborFoundation/FlexEval/blob/main/LICENSE
:alt: FlexEval license
.. raw:: html
.. image:: /_static/flexeval_banner.svg
:alt: FlexEval banner
FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.
Read about the motivation and design of FlexEval in our `paper `_ at *Educational Data Mining* 2024.
:doc:`Get started ` with FlexEval, go deeper with the :ref:`user_guide`, or learn by example in the :doc:`vignettes`.
Basic Usage
-----------
:ref:`Install ` using pip:
.. code-block:: bash
pip install python-flexeval
Create and run an evaluation:
.. code-block:: python
import flexeval
from flexeval.schema import Eval, EvalRun, FileDataSource, Metrics, FunctionItem, Config
data_sources = [FileDataSource(path="vignettes/conversations.jsonl")]
eval = Eval(metrics=Metrics(function=[FunctionItem(name="flesch_reading_ease")]))
config = Config(clear_tables=True)
eval_run = EvalRun(
data_sources=data_sources,
database_path="eval_results.db",
eval=eval,
config=config,
)
flexeval.run(eval_run)
This example computes `Flesch reading ease `_ for every turn in a list of conversations provided in JSONL format. The metric values are stored in an SQLite database called ``eval_results.db``.
Read more in :doc:`getting_started` and see additional usage examples in the :doc:`vignettes`.
.. toctree::
:maxdepth: 2
:caption: Contents:
getting_started
user_guide/index
vignettes
api