If you already have an existing dataset that you want to use to evaluate your application inside the Scale Generative AI Platform, you can manually upload it onto the platform. You can use either the UI or the SDK.

Through the SDK

SDK instructions for uploading datasets can be found here.

Through the UI

To upload datasets manually through the UI, navigate to Datasets under Evaluation. From there, select Create Dataset and choose the Manual Upload option.

From there, it will open a modal that allows you to enter the Dataset Name, Dataset Type (only generation is supported at the moment), and have dataset upload instructions.

Supported Formats

Currently, we support dataset uploads in JSON, JSONL, or CSV format. Depending on which type you choose to upload, there are different requirements.

CSV

CSV files should have the following headers: input, expected output, schema_type, info.

Input: the input to the application to be tested.
Expected Output: the expected output for a given input.
Schema Type: currently, the only schema type supported is STRING.
Info: additional context that are displayed to annotators in annotation tasks for human evaluations. The text will be rendered as-is.

JSON and JSONL

For JSON, the root of the file should be an array of JSON objects.

For JSONL, each line of the "generation" JSONL data should be a JSON object.

JSON objects should have the following keys: input, expected_output, expected_extra_info

input
- The input to the application to be tested.
- Format: text
expected_ouput
- The expected output for a given input.
- Format: text
expected_extra_info
- Additional context displayed to annotators in annotation tasks for human evaluations. The text will be rendered as-is.
- Format: JSON Object
  - ```
  {
    Schema_type: "STRING",
    info: <Insert content text here>
  }
```