Large Language Models (Text)
Arcus Prompt Enrichment enriches prompts provided to Large Language Models (LLMs) with external data matched by the Arcus Data Exchange. The exchange automatically matches your prompt to high value external data and signals then composes and injects this context into your LLM prompt to prevent hallucinations and provide important context that helps ground your LLMs in the real world.
Arcus Prompt Enrichment works by using Data Augmented Generation, a technique which first retrieves relevant external data from the exchange and then composes and injects this context into your LLM alongside your original prompt. Arcus Prompt Enrichment also provides you a summary of the context that was provided to your LLM to provide transparency into the data that was used to enrich your prompt.
Let’s walk through how to use Arcus Prompt Enrichment for LLMs using the Arcus Prompt SDK. Before we get started, you should create a prompt enrichment project on the Arcus platform (request early access here) and have your Project ID and API Key ready.
First, you configure your environment to connect the Prompt SDK to your Arcus Project and LLM provider. To do this, you wrap your LLM connection with an Arcus LLM
object.
Let’s look at an example using OpenAI GPT-4. We’ll configure our environment by wrapping our OpenAI connection with an Arcus OpenAI
object, which connects to the exchange to discover and compose relevant external data to enrich our prompts and also handles calling the OpenAI model to generate text completions. Your OpenAI API Key should be specified in the OPENAI_API_KEY
environment variable.
In a few lines of code, you can initialize an Arcus Config
object and use this to wrap our OpenAI connection. The Config
object takes in your Arcus API Key and Project ID, which you can create in the Arcus platform.
# Set the Config object
arcus_config = arcus.prompt.text.Config(
api_key='MY_API_KEY',
project_id='MY_PROJECT_ID',
)
# Initialize an Arcus OpenAI object
llm = arcus.prompt.text.OpenAI(
model_id='gpt-4',
config=arcus_config
)
This new llm
object takes all of the relevant arguments which you use to control text generations with OpenAI, such as temperature
and frequency_penalty
. You can reference the OpenAI API Reference for a full list of supported parameters; Arcus’ OpenAI module transparently supports the complete OpenAI API. The llm
object will retrieve relevant external data from the exchange and query OpenAI to generate text completions.
Now that you’ve configured your connection to Arcus, you are ready to enrich text completions using OpenAI models with external data matched by the exchange.
Let’s look at an example below.
prompt = "Who won the latest world series?"
response = llm.generate(prompt)
When you call llm.generate(prompt)
, the llm
object performs the following steps:
- Queries the Arcus Data Exchange to discover high value data and signals for your given prompt. Arcus’ matching algorithms rank external data candidates on the exchange by their inherent quality and their relevance to your given task.
- Retrieves this external data and composes it into your prompt as context. This allows GPT-4 to consume the matched context when generating the text completion.
- Queries OpenAI to generate the text completion using your now enriched prompt.
llm.generate(prompt)
returns an Arcus LLMOutput
object which stores the generated response as well as summary information about the context provided by the exchange. You can access the generated response as follows.
>>> print(response.get_generation())
According to the information you provided, the Houston Astros won the latest World Series in 2022, defeating
the Philadelphia Phillies in six games, with Dusty Baker as their manager.
The Arcus Prompt SDK also provides summary information about the context that was provided to your LLM. This helps you understand what context the exchange found most valuable and relevant to your prompt.
You can access this information using the get_context_summary()
method on your LLMOutput
object.
>>> print(response.get_context_summary())
The context lists the winners and the corresponding details of the World Series from 2018 to 2022. It
includes the teams, their managers, the scores, and the number of games played.
In this instance, we see that the exchange provided valuable context in the form of live ground truth data. This up-to-date context prevented your LLM from hallucinating and allowed it to infer information beyond its training date (i.e. data that it has never seen before).