Guides

Simple Flexible Evaluation Example

With Flexible Evaluations, you can evaluate an application variant that has multiple inputs as well as multiple steps.

For example, imagine you have an application that takes in a date and users can ask it questions like what was the date 5 days ago?

An evaluation dataset for this application would have two inputs

  1. query: the question the user asked the application
  2. current_date: the current date of reference the application is using to answer the question

An evaluation dataset for this application will look like the following

Let's assume the application has the following Outputs

Now, the dataset has 2 different inputs, so we would need to configure what the annotators can see. As part of the flexible evaluations process, there's an object `` that users can use to tell the platform how to render the annotations UI.

annotation_config={
        "annotation_config_type": "flexible",
        "direction": "row",
        # the components are layed out in a grid
        "components": [
            [
                # Let's put the query and the output side by side since that's what we care about most
                {
                    "data_loc": ["test_case_data", "input", "query"],
                    "label": "Query"
                },
                {
                    "data_loc": ["test_case_output", "output"]
                }
            ],
            [
                {
                    "data_loc": ["test_case_data", "input", "current_date"]
                }
            ],
            [
                {
                    "data_loc": ["test_case_data", "expected_output"]
                }
            ]
        ]
    }

Because the direction is set to row, the outer array of the components key will be rendered as rows and the inner array is displayed as columns of that row. This configuration is telling the UI to display three rows.

  • The first row will have one box showing queryis the label and the contents of queryfrom the dataset in the example above. The second box will show the outputvalue of the test_case_output from the variant above.
  • The second row will show the current_date as the label and the contents of current_date from the dataset above .
  • The third row will show expected_outputas the title and the contents of expected_outputfrom the dataset above.

Here is an example of what the Annotation UI will look like based on this configuration:

Once the evaluation result is complete, you will be able to see the results of the annotation just like any other evaluation run.

Flexible evaluation runs can however be significantly more custom and advanced. To learn more, you can move forward to the full guide to flexible evaluation