Arcus Documentation
Back to homepage

Trialing

A core aspect of using Arcus Model Enrichment is running trials. The Arcus Data Exchange matches your first-party data and model to various data candidates that are most likely to improve your model’s performance. These candidates are determined to maximize data quality, compatibility and relevance to your application.

During a trial, you can validate the performance of each candidate by seeing how it performs empirically in your application. This way, you understand exactly the value of each data candidate for your given task. While trialing, Arcus runs training loops for your model enriched with each data candidate and measures the key model performance metrics that matter to your application, such as validation accuracy and loss. Based on these results, you evaluate the impact of each data candidate on the performance of your model and choose the best candidate for your application. The results of your trial are visible in the Arcus Platform and through the Arcus Model SDK.

Let’s walk through how to run a trial with the Arcus Model SDK. Before we get started, you should create a model enrichment project on the Arcus platform (request early access here) and have your Project ID and API Key ready.

Configuration

First, you configure your environment to connect the Model SDK to your Arcus Project. To do this, you wrap your existing model with an Arcus Model object. This Model object makes no changes to your original model, but configures it to consume external data which will be used during the trial.

Let’s look at an example using Pytorch. In a few lines of code, we’ll initialize an Arcus Config object and use this to wrap our existing model with an Arcus Model. The Config object takes in your Project ID and API Key, which you can find in the Arcus platform.

import arcus

# Initialize your original Pytorch model
my_model = MyModel()

# Set the Config object
arcus_config = arcus.model.shared.Config(
  api_key='MY_API_KEY',
  project_id='MY_PROJECT_ID',
)

# Wrap the model with an Arcus Model
arcus_model = arcus.model.torch.Model(my_model, arcus_config)

This new arcus_model object contains your existing model and provides the full underlying API, but with the added functionality of combining your first-party data with external data provided the exchange.

Before running a trial, the arcus_model object will not contain any external data and will behave exactly as the original my_model object. This is because it has not yet been matched with any data candidates from the exchange. At the start of our trial, the exchange matches the model and first-party data to external data candidates, which are provided to the arcus_model to train and validate the model. The trialing process will then run training and validation loops to evaluate the performance of each data candidate.

Running a Trial

Now that you’ve configured the model, you are ready to run a trial with Arcus. Using Pytorch Lightning to train the model, the original model training code might look like the following:

my_lightning_module = MyLightningModule(my_model)

my_trainer = pl.Trainer()

trainer.fit(
  my_lightning_module,
  train_dataloader,
  val_dataloader
)

This snippet uses a Pytorch Lightning Trainer to train the model. This trainer runs over the first-party data that is contained in the train_dataloader and val_dataloader objects and trains the model using the MyLightningModule class, which contains the model’s training and validation loops.

Now, using the wrapped arcus_model, you can modify these three lines of code to run a trial with Arcus. During this trial, the exchange will match the model and first-party data to external data candidates, which are passed to the arcus_model to train and validate the model. The metrics that are reported in the MyLightningModule class will be posted to Arcus, which will be used to evaluate and select the best data candidate.

arcus_module = MyLightningModule(arcus_model)

arcus_trainer = arcus.model.torch.Trainer()

trial = arcus_trainer.trial(
  arcus_module,
  train_dataloader,
  val_dataloader
)

The arcus.model.torch.Trainer class implements all of the existing functionality of the Pytorch Lightning Trainer class with the additional functionality needed to connect to the exchange and run a trial. This makes it easy to modify the original code to enrich the model with external data. You run a trial by calling arcus_trainer.trial().

Under the hood, trial() contains a couple of steps:

  1. The model connects to the Arcus Data Exchange to retrieve the matched data candidates.
  2. For each matched data candidate:
    • The Arcus Trainer merges the external data candidate with your first-party data and then trains and validates the enriched model.
    • During this process, the Arcus Trainer reports validation metrics back to Arcus, which will be used to evaluate and select the best data candidate.
    • For best results, MyLightningModule should report at least one of validation loss or validation accuracy using self.log().

What’s Next: Evaluating and Selecting a Data Candidate

Once you run a trial, you can exactly quantify the impact each data candidate has on the model’s performance, using empirical data. You can use the recorded metrics during the trial to evaluate and select a data candidate for use in your ML application’s training and serving workflows.