flexeval.data_loader#

Dataset loading functions. Maybe should move to io.

Functions

add_turns(thread)

get_turns(thread)

We're defining a turn as a list of 1 or more consequtive outputs by the same role, where the role is either 'user', or 'assistant/tool'.

load_file(dataset, data_source[, ...])

load_iterable(dataset, iterable)

load_jsonl(dataset, filename[, ...])

load_langgraph_sqlite(dataset, filename[, ...])

Load conversations from a LangGraph SQLite checkpoint database.

load_thread_to_dataset(thread_id, thread, ...)

verify_checkpoints_table_exists(cursor)

flexeval.data_loader.add_turns(thread: Thread)[source]#
flexeval.data_loader.get_turns(thread: Thread)[source]#

We’re defining a turn as a list of 1 or more consequtive outputs by the same role, where the role is either ‘user’, or ‘assistant/tool’. In other words, we would parse as follows: TURN 1 - user TURN 2 - assistant TURN 3 - user TURN 4 - assistant TURN 4 - tool TURN 4 - assistant TURN 5 - user

flexeval.data_loader.load_file(dataset: Dataset, data_source: FileDataSource, max_n_conversation_threads: int | None = None, nb_evaluations_per_thread: int | None = 1)[source]#
flexeval.data_loader.load_iterable(dataset: Dataset, iterable)[source]#
flexeval.data_loader.load_jsonl(dataset: Dataset, filename: str | Path, max_n_conversation_threads: int | None = None, nb_evaluations_per_thread: int | None = 1)[source]#
flexeval.data_loader.load_langgraph_sqlite(dataset: Dataset, filename: str, max_n_conversation_threads: int | None = None, nb_evaluations_per_thread: int | None = 1)[source]#

Load conversations from a LangGraph SQLite checkpoint database.

Reads the final checkpoint for each thread and extracts the cumulative message list from channel_values.messages. Compatible with langgraph >= 1.0.

flexeval.data_loader.load_thread_to_dataset(thread_id: str | int, thread: dict, dataset: Dataset, eval_run_thread_id: str | None = None) Thread[source]#
flexeval.data_loader.verify_checkpoints_table_exists(cursor)[source]#