Create Completion

post https://api.egp.scale.com/v4/completions

Description

Given a user's input, runs LLM inference to produce the model's response.

Details

LLM completions have many use cases, such as content summarization, question-answering, and text generation.

The model parameter determines which LLM will be used to generate the completion. Keep in mind that different models have varying sizes, costs, and may perform differently across different tasks.

The user input, commonly referred to as the "prompt", is a required field in the request body. The quality of the model's response can vary greatly depending on the input prompt. Good prompt engineering can significantly enhance the response quality. If you encounter suboptimal results, consider writing more specific instructions or providing examples to the LLM before trying more expensive techniques such as swapping in other models or finetuning.

By default, the endpoint will return the entire response as one whole object. If you would prefer to stream the completion in real-time, you can achieve this by setting the stream flag to true.

Language

Credentials

Header

Click Try It! to start a request and see the response here!