Skip to main content
You can import datasets previously uploaded to Adaptive directly within your recipes. Datasets stored on Adaptive can be loaded by specifying a parameter of type Dataset from adaptive_harmony.parameters in your recipe’s InputConfig class. You can then load it in your recipe as a list of StringThread objects by calling await dataset.load(ctx).

Load a dataset

First, define your dataset in your recipe’s input config:
from adaptive_harmony.runtime import InputConfig
from adaptive_harmony.parameters import Dataset

class MyConfig(InputConfig):
    dataset: Dataset
To load a dataset from Adaptive, you can use the load method on the dataset:
async def my_recipe(config: MyConfig, ctx: RecipeContext):
    dataset = await config.dataset.load(ctx)
If your dataset contains completions (assistant responses), with_weight_last_assistant_turn() is applied automatically: only the final assistant turn contributes to the training loss. See Turn weights for more details and how to override this behavior.
This utility can also load local files structured in the Adaptive-supported format, which you can leverage if you are testing a recipe locally. Load a local dataset with Dataset(dataset_key="local-file", local_file_path="your_file.jsonl").

Loading from Hugging Face

You can also load datasets directly from Hugging Face in your recipe. adaptive_harmony exposes helper methods to convert arbitrary datasets into a list of StringThread objects by allowing you to specify the column in the original dataset that contains chat messages.
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

def load_hf_dataset():
    # Helper function to convert HF dataset to Adaptive StringThread
    convert_sample_fn = convert_sample_dict(
        turns_key="messages",
        role_key="role",
        content_key="content"
    )

    # Load the dataset
    dataset = load_from_hf(
        "HuggingFaceH4/ultrachat_200k",
        "train_sft",
        convert_sample_fn
    )

    return dataset