Custom rubrics in FlexEval#

In addition to the built-in rubrics, you can write your own rubrics.

The eval_run.yaml schema used in Basic CLI usage shows an example EvalRun using a custom rubric.

The easiest way to provide custom rubrics is in a YAML file; here’s vignettes/custom_rubrics.yaml:

 1assistant_asks_a_question:
 2  notes: |-
 3    Rubric to ensure the assistant asks follow-up questions.
 4    The content of this note aren't part of the prompt in any way, this is just a convenient place to write documentation.
 5  prompt: |-
 6    Your Role:
 7      You are a helpful assistant. You have solid knowledge in K-12 math instruction. 
 8
 9    Context:
10      A K-12 student learns math using an online tutoring system. 
11      During the session, the student (user) asks the tutor (assistant) for help with some math problems. 
12
13    Your Task:
14      The tutor (assistant) is supposed to provide explicit follow-up or clarification questions in response to the student.
15      Your job is to determine whether the tutor (assistant) asked a question in their response. 
16
17    Data:
18      The following contains messages from the tutor (assistant).
19      
20      [BEGIN DATA]
21      ***
22      {content}
23      ***
24      [END DATA]
25
26    __start rubric__
27    YES: If the message(s) contain a question in response to the student.
28    NO: If the message has no question.
29
30    Note:
31    If there is no question, then print "NO".
32    __end rubric__
33
34    Output:
35      First, report your reasoning for your decision. 
36      Second, print your decision.
37      IMPORTANT: After your reasoning, print the choice string of "YES" or "NO" on a separate line with NO OTHER TEXT on that line.
38
39  choice_scores:
40    "YES": 1
41    "NO": 0

You then need to specify the path to that YAML file in your EvalRun configuration:

rubric_paths:
 - vignettes/custom_rubrics.yaml

Then, you can use those custom rubrics in your rubrics definition.

Writing rubrics#

Rubrics consist of a prompt and a set of choice_scores.

  • choice_scores are the LLM outputs that will result in a numeric score.

  • prompt is a template that will be formatted and passed to the rubric LLM (see Eval) for scoring.

See the Rubric Guide for additional information on writing rubrics.

Supported template parameters#

The following parameters can be used and replaced in a rubric:

  • context

  • content

In the future, we hope to support templated inputs from other metrics.