Basic usage#
This vignette demonstrates basic usage of FlexEval. We configure a single FileDataSource - a JSONL file containing one conversation. We compute a single function metric: the index of each message within a thread. This trivial example demonstrates the necessary objects for running an Eval and accessing the resulting metrics. Other vignettes build on this basic approach to compute more complex metrics.
Python source: basic.py
1import flexeval
2from flexeval.metrics import access
3from flexeval.schema import Config, Eval, EvalRun, FileDataSource, FunctionItem, Metrics
4
5data_sources = [FileDataSource(path="vignettes/conversations.jsonl")]
6eval = Eval(metrics=Metrics(function=[FunctionItem(name="index_in_thread")]))
7config = Config(clear_tables=True)
8eval_run = EvalRun(
9 data_sources=data_sources,
10 database_path="eval_results.db",
11 eval=eval,
12 config=config,
13)
14flexeval.run(eval_run)
15for metric in access.get_all_metrics():
16 print(
17 f"{metric['thread']} {metric['turn']} {metric['metric_name']} {metric['metric_value']}"
18 )
conversations.jsonl
contents:
1{"input": [{"role": "system", "content": "Be friendly and helpful."}, {"role": "user", "content": "I need help."}, {"role": "assistant", "content": "Help with what?"}, {"role": "user", "content": "My homework."}]}