API Reference

Generate chat completion

Description

Interact with the LLM model using the specified model_deployment_id. You can include a list of messages as the conversation history. The conversation can feature multiple messages from the roles user, assistant, and system. If the chosen model does not support chat completion, the API will revert to simple completion, disregarding the provided history. The endpoint manages context length exceedance optimistically: it estimates the token count from the provided history and prompt, and if it exceeds the context or approaches 80% of it, the exact token count will be calculated, and the history will be trimmed to fit the context.

{ "prompt": "Generate 5 more", "chat_history": [ { "role": "system", "content": "You are a name generator. Do not generate anything else than names" }, { "role": "user", "content": "Generate 5 names" }, { "role": "assistant", "content": "1. Olivia Bennett\n2. Ethan Carter\n3. Sophia Ramirez\n4. Liam Thompson\n5. Ava Mitchell" } ], }
Log in to see full request history
Path Params
string
required
Body Params
model_request_parameters
object
number
0 to 2
Defaults to 0.2

What sampling temperature to use, between [0, 2]. Higher values like 1.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Setting temperature=0.0 will enable fully deterministic (greedy) sampling.NOTE: The temperature parameter range for some model is limited to [0, 1] if the given value is above the available range, it defaults to the max value. Temperature is ignored for OpenAI reasoning models.

stop_sequences
array of strings
length ≤ 4

List of up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Stop Sequences
integer

The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. If not, specified, max_tokens will be determined based on the model used:
| Model API family | Model API default | EGP applied default |
| --- | --- | --- |
| OpenAI Completions | 16 | context window - prompt size |
| OpenAI Chat Completions | context window - prompt size | context window - prompt size |
| LLM Engine | max_new_tokens parameter is required | 100 |
| Anthropic Claude 2 | max_tokens_to_sample parameter is required | 10000 |

number

The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Available for models provided by Google, LLM Engine, and OpenAI.

number

Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. Available for models provided by Google and LLM Engine.

number

Penalize tokens based on how much they have already appeared in the text. Positive values encourage the model to generate new tokens and negative values encourage the model to repeat tokens. Available for models provided by LLM Engine and OpenAI.

number

Penalize tokens based on if they have already appeared in the text. Positive values encourage the model to generate new tokens and negative values encourage the model to repeat tokens. Available for models provided by LLM Engine and OpenAI.

boolean
Defaults to false

Flag indicating whether to stream the completion response

boolean

Whether to return logprobs. Currently only supported for llmengine chat models.

integer

Number of top logprobs to return. Currently only supported for llmengine chat models.

string

The chat template to use for the completion. Currently only supported for llmengine chat models.

chat_template_kwargs
object

Additional keyword arguments for the chat template. Currently only supported for llmengine chat models.

string

The reasoning effort to use for the completion. Currently only supported for openai models.

chat_history
array
required

Chat history entries with roles and messages. If there's no history, pass an empty list.

Chat History*
string
required

New user prompt. This will be sent to the model with a user role.

Headers
string
Responses

Language
Credentials
Request
Click Try It! to start a request and see the response here! Or choose an example:
application/json