Pydantic
Note
Install with pip install labox[pydantic]
The Pydantic integration for Labox provides a way to make you Pydantic models storable.
Basic Usage
Start by inheriting from the StorableModel class and declaring a
class_id. Then define your model as normal:
from datetime import datetime
from labox.extra.pydantic import StorableModel
class ExperimentData(StorableModel, class_id="..."):
description: str
started_at: datetime
results: list[list[int]]
Storable Specs
You can define
custom types that are
annotated with a StorableSpec
which describe how and where to store values.
from typing import Annotated
from labox.builtin import CsvSerializer
from labox.builtin import FileStorage
from labox.extra.pydantic import StorableSpec
class ExperimentData(StorableModel, class_id="..."):
description: str
started_at: datetime
results: Annotated[
list[list[int]],
StorableSpec(serializer=CsvSerializer, storage=FileStorage),
]
You can make these custom types
generic with a
TypeVar:
from typing import TypeVar
T = TypeVar("T")
SaveAsCsv = Annotated[T, StorableSpec(serializer=CsvSerializer)]
SaveInFile = Annotated[T, StorableSpec(storage=FileStorage)]
class ExperimentData(StorableModel, class_id="..."):
description: str
started_at: datetime
results: SaveInFile[SaveAsCsv[list[list[int]]]]
Storable specs can be applied to values nested within a field. For example, you might
have a list of images that you want to attach to your model. You could define a custom
type for the images and add the StorableSpec to it:
from typing import Annotated
from labox.extra.imageio import Media
from labox.extra.imageio import MediaSerializer
from labox.extra.pydantic import StorableSpec
class ExperimentData(StorableModel, class_id="..."):
description: str
started_at: datetime
images: list[Annotated[Media, StorableSpec(serializer=MediaSerializer)]]
Pydantic Unpacker
A StorableModel is saved under one or more
ContentRecords. The majority of the model
is stored in a "body" record, which is nominally a JSON serializable object. The
serializer and storage for this body record can be customized by overriding the
storable_body_serializer
and/or
storable_body_storage
methods in the dataclass itself. Fields within the dataclass are stored within this same
body record unless the field declares a storage or serializer that is a
StreamSerializer. In either case
those fields are captured in separate ContentRecords.
To understand how this works in practice, here's what the unpacker would output for the model below:
from datetime import UTC
from datetime import datetime
from pprint import pprint
from typing import Annotated
from plotly import graph_objs as go
from labox.builtin import CsvSerializer
from labox.builtin import FileStorage
from labox.core import Registry
from labox.extra.plotly import FigureSerializer
from labox.extra.pydantic import StorableModel
from labox.extra.pydantic import StorableSpec
class ExperimentData(StorableModel, class_id="b2138434"):
description: str
started_at: datetime
results: Annotated[
list[list[int]],
StorableSpec(serializer=CsvSerializer, storage=FileStorage),
]
figure: Annotated[
go.Figure,
StorableSpec(serializer=FigureSerializer),
]
exp_data = ExperimentData(
description="My experiment",
started_at=datetime.now(UTC),
results=[[1, 2, 3], [4, 5, 6]],
figure=go.Figure(data=go.Scatter(y=[1, 3, 2])),
)
unpacker = ExperimentData.storable_config().unpacker
registry = Registry(
modules=["labox.builtin", "labox.extra.pydantic", "labox.extra.plotly"], default_storage=True
)
unpacked_obj = unpacker.unpack_object(exp_data, registry)
pprint(unpacked_obj)
{
"body": {
"serializer": JsonSerializer("labox.json.value@v1"),
"storage": MemoryStorage("labox.memory@v1"),
"value": {
"description": "My experiment",
"figure": {
"__labox__": "content",
"content_base64": "eyJkYXRhIjpbeyJ5IjpbMSwzLDJ...",
"content_encoding": "utf-8",
"content_type": "application/vnd.plotly.v1+json",
"serializer_name": "labox.plotly.value@v1",
},
"results": {"__labox__": "ref", "ref": "ref.ExperimentData.results.1"},
"started_at": "2025-08-24T17:11:37.164923Z",
},
},
"ref.ExperimentData.results.1": {
"serializer": CsvSerializer("labox.csv@v1"),
"storage": FileStorage("labox.file@v1"),
"value": [[1, 2, 3], [4, 5, 6]],
},
}
Each item in the resulting dict is an
UnpackedValue that would correspond to a
ContentRecord in the database. As indicated
earlier the fact that the results field of the model had a dedicated storage declared
caused it to be stored separately from the main body record.
Within the body record the model has been dumped into a JSON-serializable dictionary
containing information about the class as well as its fields. Special __labox__ keys
within this dictionary are used to store metadata about how each object and/or fields
was dumped. Notably the body contains a reference to ref.ExperimentData.results.1
which got unpacked separately.
Fields like ExperimentData.figure with custom non-stream serializers are embedded
within the main body record to avoid sending a large number of smaller chunks of data
to storage backends. For cloud storage backends having a smaller number of larger
requests tends to be more efficient.