Get chat completions


Creates a chat completion for the provided prompt, taking into account the content you have trained your project on.

Request body

messagesarrayA list of user and assistant messages to complete.
projectKeystringThe API key associated to your project. If shared publicly, use the production key from a whitelisted domain. If not, for instance on localhost, use the development key.
modelgpt-4 | gpt-4-32k | gpt-4-1106-preview | gpt-4-turbo-preview | gpt-3.5-turbo | text-davinci-003 | text-davinci-002 | text-curie-001 | text-babbage-001 | text-ada-001 | davinci | curie | babbage | adaThe completions model to use.
systemPromptstringCustom system prompt that wraps prompt and context.
streambooleanIf true, return the response as a Readable Stream. Otherwise, return as a plain JSON object. Default: true.
doNotInjectContextbooleanIf true, do not inject the context in the full prompt unless the context tag is present in the template. Default false.
allowFollowUpQuestionsbooleanIf true, the bot may encourage the user to ask a follow-up question, for instance to gather additional information. Default true.
doNotInjectPromptbooleanIf true, do not inject the prompt in the full prompt unless the prompt tag is present in the template. Default false.
excludeFromInsightsbooleanIf true, exclude the prompt from insights. Default false.
redactbooleanIf true, redact sensitive data from prompt and response. Default false.
outputFormat"markdown" | "slack"Output format, e.g. Slack-flavored Markdown. Default markdown.
threadIdstringIf provided, the messages will be tracked as part of the same thread in the insights.
conversationIdstringDeprecated (replaced by threadId). If provided, the messages will be tracked as part of the same conversation in the insights.
threadMetadataobjectAn arbitrary JSON payload to attach to a thread, available in the insights.
conversationMetadataobjectDeprecated (replaced by threadMetadata). An arbitrary JSON payload to attach to a conversation, available in the insights.
temperaturenumberThe model temperature. Default: 0.1.
topPnumberThe model top P. Default: 1.
frequencyPenaltynumberThe model frequency penalty. Default: 0.
presencePenaltynumberThe model presence penalty. Default: 0.
maxTokensnumberThe max number of tokens to include in the response.
maxContextTokensnumberThe max number of tokens to include as part of the context. Default: 10000. Note that this value will automatically be adjusted to fit within the context window allowed by the model.
sectionsMatchCountnumberThe number of sections to include in the prompt context.
sectionsMatchThresholdnumberThe similarity threshold between the input question and selected sections. The higher the threshold, the more relevant the sections. If it's too high, it can potentially miss some sections.
toolsOpenAI.ChatCompletionTool[]A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
tool_choiceOpenAI.ChatCompletionToolChoiceOptionControls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type: "function", "function": {"name": "my_function"}} forces the model to call that function. none is the default when no functions are present. auto is the default if functions are present.
debugbooleanIf set to true, the response will contain additional metadata, such as the full prompt, for debug or other purposes.

Example request

1curl \
2  -X POST \
3  -H "Authorization: Bearer <TOKEN>" \
4  -H "Content-Type: application/json" \
5  -H "X-Markprompt-API-Version: 2024-03-23" \
6  -d '{
7    "messages": [
8      {
9        "role": "user",
10        "content": "What is Markprompt?"
11      },
12      {
13        "role": "assistant",
14        "content": "Markprompt is ..."
15      },
16      {
17        "role": "user",
18        "content": "Explain this to me as if I was a 3 year old."
19      }
20    ],
21    "model": "gpt-4"
22  }'


By default, the response is returned as a ReadableStream of the form:

1So imagine a robot that can answer all the questions you have...

In addition to the stream, the response includes a header named x-markprompt-data, which is an encoded (Uint8Array) JSON object of the form:

2  references: [
3    reference1,
4    reference2,
5    ...
6  ],
7  threadId: "...",
8  messageId: "...",
9  debugInfo: { ... }

It consists of the following:

  • The references (see below) used to generate the answer.
  • A messageId, which is a unique ID representing the response message. It can be used to subsequently attach metadata to the message, such as a CSAT score, via the /messages API.
  • A threadId, which is a unique ID which can be passed to subsequent requests and represents a multi-message thread. It can be used to subsequently attach metadata to the thread, such as user account info, via the /threads API.
  • If the debug parameter is set to true, a debugInfo object containing information about the query, such as the full prompt that was built for the query.

The reference object

A reference is an object of the form:

1type FileSectionReference = {
2  file: {
3    title?: string;
4    path: string;
5    meta?: any;
6    source: Source;
7  };
8  meta?: {
9    leadHeading?: {
10      id?: string;
11      depth?: number;
12      value?: string;
13      slug?: string;
14    };
15  };

and is meant to provide enough information for the client to be able to generate descriptive links to cited sources, including section slugs.

Parsing the header

Here is some example code in JavaScript to decode the references from the response header:

1const res = await fetch('', {
2  /*...*/
5// JSON payload
6const encodedPayload = res.headers.get('x-markprompt-data');
7const headerArray = new Uint8Array(encodedPayload.split(',').map(Number));
8const decoder = new TextDecoder();
9const decodedValue = decoder.decode(headerArray);
10const payload = JSON.parse(decodedValue);
11// ...

If the stream flag is set to false, the response is returned as a plain JSON object with a text field containing the completion, and a references field containing the list of references used to create the completion:

2  "text": "Completion response...",
3  "references": [reference1, reference2, ...]

where references are objects of the form described above.

When querying chat completions, do not use the bearer token if the code is exposed publicly, for instance on a public website. Instead, use the project production key, and make the request from a whitelisted domain. Obtaining the project production key and whitelisting the domain is done in the project settings.

Here is a working example of how to consume the stream in JavaScript. Note the use of projectKey and no authorization header: this code can be shared publicly, and will work from a domain you have whitelisted in the project settings.

1const res = await fetch('', {
2  method: 'POST',
3  headers: {
4    'Content-Type': 'application/json',
5    'X-Markprompt-API-Version': '2024-03-23',
6  },
7  body: JSON.stringify({
8    messages: [
9      {
10        role: 'user',
11        content: 'What is Markprompt?',
12      },
13    ],
14    projectKey: 'YOUR-PROJECT-KEY',
15    model: 'gpt-4',
16  }),
19if (!res.ok || !res.body) {
20  console.error('Error:', await res.text());
21  return;
24// JSON payload
25const encodedPayload = res.headers.get('x-markprompt-data');
26const headerArray = new Uint8Array(encodedPayload.split(',').map(Number));
27const decoder = new TextDecoder();
28const decodedValue = decoder.decode(headerArray);
29const { references } = JSON.parse(decodedValue);
31const reader = res.body.getReader();
32const decoder = new TextDecoder();
33let response = '';
35while (true) {
36  const { value, done } = await;
37  const chunk = decoder.decode(value);
38  response = response + chunk;
39  if (done) {
40    break;
41  }
43'Answer:', response);