Guides

Completions

Use LLMs to complete your prompts

Overview

What is a completion?

A completion refers to the output generated by a Large Language Model (LLM) in response to a user's input. Essentially, it is an extension of the user's original input, created by utilizing the power of artificial intelligence to respond to user requests effectively.

LLMs, like GPT-3.5, are capable of understanding context, grammar, and semantics, enabling them to generate coherent and contextually relevant responses. When a user provides a query, the LLM processes the input and generates a completion, which can be in the form of text, code, or other structured output, depending on the task at hand. The user's input query to the LLM is also known as a prompt.

To generate the completion, the LLM leverages its vast pre-trained knowledge, gathered from a diverse range of sources up until its last update. This knowledge allows the model to make informed decisions and generate responses that align with the input provided by the user.

Completions have found widespread applications across various domains, including natural language understanding, machine translation, chatbots, content generation, code writing, and much more. Their ability to handle a multitude of tasks has made them a powerful tool in enhancing user experiences and automating complex tasks.

Introductory Example

Let's make a simple request to the Completions API.

All we need to do is set our API Key in the headers, specify a model and prompt to send to the API, and then submit the call with the requests library

from scale_egp.sdk.client import EGPClient

API_KEY = "<INSERT_API_KEY_HERE>"
ENDPOINT_URL = "https://api.<your_workspace_id>.workspace.egp.scale.com"

client = EGPClient(api_key=API_KEY, endpoint_url=ENDPOINT_URL)

completion = client.completions().create(model="gpt-4", prompt="Hello, what can you do?")
print(completion)

Output

{
  "completion": {
    "text": "As an AI developed by OpenAI, I'm designed to assist with various tasks such as providing information, answering questions, writing text, brainstorming ideas, generating creative content, learning new topics, translating languages, tutoring in various subjects, reading and summarizing text, and much more. My capabilities are constantly being expanded and improved as AI technology evolves. However, I don't have the ability to access or retrieve personal data unless it's shared with me in the course of the conversation. I'm designed to respect user privacy and confidentiality.",
    "finish_reason": null
  },
  "token_usage": {
    "prompt": 14,
    "completion": 107,
    "total": 121
  }
}

Now that you know how to use this API, you can easily extend this to build more complex applications! 😄

Streaming Completions

By default, the completions endpoint uses synchronous responses, meaning each request will block until the completion is fully formed, and then sent over in one piece. Since this can take several seconds, we offer the ability to stream responses so that you receive parts of the response sequentially, as soon as they are ready.

With streaming, small sets of tokens (which can be thought of as shorter representations of root words, prefixes, symbols, etc.) are sent over one batch at a time, as soon as they are available from the LLM. This can be especially useful, for instance, when designing a chat app where a user would rather see tokens appearing sequentially over multiple seconds, rather than waiting on a blank screen until the entire text pops up at once.

We can enable streaming completions with simple additions to the code above. We also use the SSE (server-sent events) protocol for streaming. This means we'll need to adjust the way we parse the results for a proper Python dictionary.

import requests
import os
import json  # <-- added
import sys  # <-- added


SPELLBOOK_API_KEY = "sampleapikey12345"

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "x-api-key": SPELLBOOK_API_KEY,
}
payload = {
    "model": "gpt-4",
    "prompt": "Hello, what can you do?",
    "stream": True,  # <-- added
}

response = requests.post(
    url="https://api.spellbook.scale.com/egp/v1/completions", 
    json=payload, 
    headers=headers,
    stream=True,  # <-- added
)

# Different result parsing <-- added
for raw_event in response.iter_lines():
    if raw_event:
        event_str = raw_event.decode()
        if event_str.startswith('data: '):
            event_json_str = event_str[len('data: '):]
            stream_chunk = json.loads(event_json_str)
            print(stream_chunk["completion"]["text"], end="")
            sys.stdout.flush()
print()

Output

{
  "completion": {
    "text": "As an AI developed by OpenAI, I'm designed to assist with various tasks such as providing information, answering questions, writing text, brainstorming ideas, generating creative content, learning new topics, translating languages, tutoring in various subjects, reading and summarizing text, and much more. My capabilities are constantly being expanded and improved as AI technology evolves. However, I don't have the ability to access or retrieve personal data unless it's shared with me in the course of the conversation. I'm designed to respect user privacy and confidentiality.",
    "finish_reason": null
  },
  "token_usage": {
    "prompt": 14,
    "completion": 107,
    "total": 121
  }
}