Builtin
Storables
Labox provides a handful of built-in storables. Most notably for dataclasses.
Dataclasses
Labox provides a base
StorableDataclass class that
can be added to a dataclass to make it storable.
Dataclass Usage
Start by inheriting from the StorableDataclass class and declaring a
class_id. Then define your dataclass as normal:
from dataclasses import dataclass
from dataclasses import field
from datetime import datetime
from labox.builtin import StorableDataclass
@dataclass
class ExperimentData(StorableDataclass, class_id="..."):
description: str
started_at: datetime
results: list[list[int]]
Each field's metadata can be used to specify an explicit serializer and/or storage.
from dataclasses import dataclass
from dataclasses import field
from datetime import datetime
from labox.builtin import CsvSerializer
from labox.builtin import FileStorage
from labox.builtin import StorableDataclass
@dataclass
class ExperimentData(StorableDataclass, class_id="..."):
description: str
# automatic serializer
started_at: datetime
# explicit serializer and storage
results: list[list[int]] = field(
metadata={
"serializer": CsvSerializer,
"storage": FileStorage,
}
)
Dataclass Unpacker
A StorableDataclass is saved under one or more
ContentRecords. The majority of the
dataclass is stored in a "body" record, which is nominally a JSON serializable object.
The serializer and storage for this body record can be customized by overriding the
storable_body_serializer
and/or
storable_body_storage
methods in the dataclass itself. Fields within the dataclass are stored within this same
body record unless the field declares a storage or serializer that is a
StreamSerializer. In either case
those fields are captured in separate ContentRecords.
To understand how this works in practice, here's what the dataclass' unpacker would output for the class above:
from datetime import UTC
from pprint import pprint
from labox.core import Registry
exp_data = ExperimentData(
description="My experiment",
started_at=datetime.now(UTC),
results=[[1, 2, 3], [4, 5, 6]],
)
unpacker = ExperimentData.storable_config().unpacker
registry = Registry(modules=["labox.builtin"], default_storage=True)
unpacked_obj = unpacker.unpack_object(exp_data, registry)
pprint(unpacked_obj)
{
"body": {
"serializer": JsonSerializer("labox.json.value@v1"),
"storage": MemoryStorage("labox.memory@v1"),
"value": {
"__labox__": "dataclass",
"class_id": "...",
"class_name": "__main__.ExperimentData",
"fields": {
"description": "My experiment",
"results": {"__labox__": "ref", "ref": "ref/results"},
"started_at": {
"__labox__": "content",
"content_base64": "MjAyNS0wOC0yMlQxNTowNzo1Ni4zODA3ODIrMDA6MDA=",
"content_encoding": "utf-8",
"content_type": "application/text",
"serializer_name": "labox.datetime.iso8601@v1",
},
},
},
},
"ref/results": {
"serializer": CsvSerializer("labox.csv@v1"),
"storage": FileStorage("labox.file@v1"),
"value": [[1, 2, 3], [4, 5, 6]],
},
}
Each item in the resulting dict is an
UnpackedValue that would correspond to a
ContentRecord in the database. As indicated
earlier the fact that the results field of the dataclass had a dedicated storage
declared caused it to be stored separately from the main body record.
Within the body record the dataclass has been dumped into a JSON-serializable
dictionary containing information about the class as well as its fields. Special
__labox__ keys within this dictionary are used to store metadata about how each object
and/or fields was dumped. Notably the body contains a reference to the ref/results
field, which got unpacked separately.
Serialized fields are embedded within the main body record to avoid sending a large
number of smaller chunks of data to storage backends. For cloud storage backends having
a smaller number of larger requests tends to be more efficient.
Simple Values
If all you need to do is store a single value you can do so using the
StorableValue class. In addition to
the value you want to save you can manually specify its
serializer and storage:
from labox.builtin import JsonSerializer
from labox.builtin import StorableValue
storable = StorableValue(value="Hello, World!", serializer=JsonSerializer)
Simple Streams
Similarly, if you want to store a stream of values, you can use the
StorableStream class. It works
similarly to StorableValue but is
designed for storing an asynchronous stream of values. You can also specify a serializer
and storage:
from labox.builtin import JsonStreamSerializer
from labox.builtin import StorableStream
storable = StorableStream(value="Hello, World!", serializer=JsonStreamSerializer)
Serializers
Labox provides built-in serializers for various stdlib data types.
JSON
Both a JsonSerializer and
JsonStreamSerializer implementations
are available. Either can be configured with an optional
JSONEncoder and/or JSONDecoder.
import json
from labox.builtin import JsonSerializer
from labox.builtin import json_serializer
from labox.builtin import json_stream_serializer
custom_json_serializer = JsonSerializer(
encoder=json.JSONEncoder(indent=2),
decoder=json.JSONDecoder(object_hook=lambda d: d),
)
CSV
A CsvSerializer implementation is
available. It can be configured with
CsvOptions that are similat to those passed to
csv.writer and csv.reader. Unlike those though, you
cannot pass a csv.Dialect directly. Instead, you may pass a dialect
name as a string. For custom dialects, you can first use
csv.register_dialect to add it under a name you choose, and
then pass that name to the serializer.
from labox.builtin import CsvSerializer
from labox.builtin import csv_serializer
unix_csv_serializer = CsvSerializer(dialect="unix")
Datetime
Labox provides a basic
Iso8601Serializer for serializing
datetime objects to ISO 8601 strings.
Storages
A few built-in storage implementations are available in Labox.
Database Storage
The database storage is a built-in storage that saves content under the JSON (JSONB for
PostgreSQL) storage_config column of a
ContentRecord. This storage is best used
when the content is small and needs to be leveraged when querying the database.
Because the content is held in a JSON (or JSONB) column, this storage is limited to
saving JSON data. To enforce this the storage checks that the content_type is set to
application/json or the same with an extension (e.g. application/json+x-labox). This
storage also enforces a maximum size for the content it saves since direct storage in
the database is not recommended for large artifacts. By default the max size is 100kb
with a warning at 10kb. You can configure this maximum size by passing a warn_size
and/or error_size to the
DatabaseStorage constructor.
from labox.builtin import DatabaseStorage
from labox.builtin import database_storage
db_storage = DatabaseStorage(
warn_size=100 * 1024, # 100kb
error_size=1000 * 1024, # 1mb
)
File System
A file based storage is available through the
FileStorage class. This storage saves
content to the file system under a directory. The default instance of this storage
(file_storage) uses a temporary directory that is deleted when the process exits
making it suitable for testing purposes.
from labox.builtin import FileStorage
from labox.builtin import file_storage
my_file_storage = FileStorage("/path/to/storage")
Memory Storage
A memory based storage is available through the
MemoryStorage class. This storage saves
content in memory and is best suited for testing purposes.