Basic usage#

This vignette demonstrates basic usage of FlexEval. We configure a single FileDataSource - a JSONL file containing one conversation. We compute a single function metric: the index of each message within a thread. This trivial example demonstrates the necessary objects for running an Eval and accessing the resulting metrics. Other vignettes build on this basic approach to compute more complex metrics.

Python source: basic.py

 1import flexeval
 2from flexeval.metrics import access
 3from flexeval.schema import Config, Eval, EvalRun, FileDataSource, FunctionItem, Metrics
 4
 5data_sources = [FileDataSource(path="vignettes/conversations.jsonl")]
 6eval = Eval(metrics=Metrics(function=[FunctionItem(name="index_in_thread")]))
 7config = Config(clear_tables=True)
 8eval_run = EvalRun(
 9    data_sources=data_sources,
10    database_path="eval_results.db",
11    eval=eval,
12    config=config,
13)
14flexeval.run(eval_run)
15for metric in access.get_all_metrics():
16    print(
17        f"{metric['thread']} {metric['turn']} {metric['metric_name']} {metric['metric_value']}"
18    )

conversations.jsonl contents:

1{"input": [{"role": "system", "content": "Be friendly and helpful."}, {"role": "user", "content": "I need help."}, {"role": "assistant", "content": "Help with what?"}, {"role": "user", "content": "My homework."}]}