Arcus Documentation
Back to homepage

Large Language Models (Text)

Arcus Prompt Enrichment enriches prompts provided to Large Language Models (LLMs) with external data matched by the Arcus Data Platform. The platform automatically matches your prompt to high value external data and signals then composes and injects this context into your LLM prompt to prevent hallucinations and provide important context that helps ground your LLMs in the real world.

Arcus Prompt Enrichment works by using Data Augmented Generation, a technique which first retrieves relevant external data from the platform and then composes and injects this context into your LLM alongside your original prompt. Arcus Prompt Enrichment also provides you a summary of the context that was provided to your LLM to provide transparency into the data that was used to enrich your prompt.

LLM Prompt Enrichment

Let’s walk through how to use Arcus Prompt Enrichment for LLMs using the Arcus Prompt SDK. Before we get started, you should create a prompt enrichment project on the Arcus platform (request early access here) and have your Project ID and API Key ready.


First, you configure your environment to connect the Prompt SDK to your Arcus Project and LLM provider. To do this, you wrap your LLM connection with an Arcus LLM object.

Let’s look at an example using OpenAI GPT-4. We’ll configure our environment by wrapping our OpenAI connection with an Arcus OpenAI object, which connects to the platform to discover and compose relevant external data to enrich our prompts and also handles calling the OpenAI model to generate text completions. Your OpenAI API Key should be specified in the OPENAI_API_KEY environment variable.

In a few lines of code, you can initialize an Arcus Config object and use this to wrap our OpenAI connection. The Config object takes in your Arcus API Key and Project ID, which you can create in the Arcus UI.

# Set the Config object
arcus_config = arcus.prompt.text.Config(

# Initialize an Arcus OpenAI object
llm = arcus.prompt.text.OpenAI(

This new llm object takes all of the relevant arguments which you use to control text generations with OpenAI, such as temperature and frequency_penalty. You can reference the OpenAI API Reference for a full list of supported parameters; Arcus’ OpenAI module transparently supports the complete OpenAI API. The llm object will retrieve relevant external data from the platform and query OpenAI to generate text completions.


Now that you’ve configured your connection to Arcus, you are ready to enrich text completions using OpenAI models with external data matched by the platform.

Let’s look at an example below.

prompt = "Who won the latest world series?"

response = llm.generate(prompt)

When you call llm.generate(prompt), the llm object performs the following steps:

  1. Queries the Arcus Data Platform to discover high value data and signals for your given prompt. Arcus’ matching algorithms rank external data candidates on the platform by their inherent quality and their relevance to your given task.
  2. Retrieves this external data and composes it into your prompt as context. This allows GPT-4 to consume the matched context when generating the text completion.
  3. Queries OpenAI to generate the text completion using your now enriched prompt.

llm.generate(prompt) returns an Arcus LLMOutput object which stores the generated response as well as summary information about the context provided by the platform. You can access the generated response as follows.

>>> print(response.get_generation())

According to the information you provided, the Houston Astros won the latest World Series in 2022, defeating
the Philadelphia Phillies in six games, with Dusty Baker as their manager.


The Arcus Prompt SDK also provides summary information about the context that was provided to your LLM. This helps you understand what context the platform found most valuable and relevant to your prompt.

You can access this information using the get_context_summary() method on your LLMOutput object.

>>> print(response.get_context_summary())

The context lists the winners and the corresponding details of the World Series from 2018 to 2022. It
includes the teams, their managers, the scores, and the number of games played.

In this instance, we see that the platform provided valuable context in the form of live ground truth data. This up-to-date context prevented your LLM from hallucinating and allowed it to infer information beyond its training date (i.e. data that it has never seen before).