Skip to content

Database Schema

Artigraph defines a graph database within SQLAlchemy.

Linkages between nodes are stored in the artigraph_link table whose base set of columns are describe by the OrmLink class:

Column Type Description
id UUID The primary key of the link.
source_id UUID The primary key of the source node.
target_id UUID The primary key of the target node.
label String (nullable) The label of the link.

Note

Labels are not required, but if supplied must be unique for a given source node.

Node

Most data in Artigraph is stored in a single artigraph_node table whose base set of columns are describe by the OrmNode class:

Column Type Description
id UUID The primary key of the node.
node_type String The type of the node (comes from its polymorphic_identity)
node_created_at DateTime The time the node was created.
node_updated_at DateTime The time the node was last updated.

Subclasses of OrmNode utilize single table inheritance to extend the table. As a result of this inheritance strategy, and to avoid name collisions, OrmNode's columns are prefixed with node_. Subclasses of OrmNode ought to do the same. For example, the OrmArtifact class defines its columns with an artifact_ prefix.

Node Inheritance

Here's an example of what a different OrmNode subclass might look like:

import artigraph as ag
from sqlalchemy import UniqueConstraint
from sqlalchemy.declarative import Mapped, mapped_column


class MyOrmNode(ag.OrmNode):
    __mapper_args__ = {"polymorphic_identity": "my_node"}
    __table_args = (UniqueConstraint("node_source_id", "my_node_label"),)
    my_node_label: Mapped[str] = mapped_column(nullable=True)

Note that even though the type annotation on my_node_label is str, the column is marked as nullable. This is because all OrmNode subclasses are stored in the same table and, as such, not all columns will be populated by all rows. Making a column non-nullable will cause other OrmNode class instances to fail to save to the database since they lack a non-nullable column from another subclass.

Under the hood artigraph does several somewhat magical things. First, it inspects the __mapper_args__ for the polymorphic_identity and saves that as a class attribute (e.g. MyNode.polymorphic_identity = "my_node"). Second, since __table_args__ cannot typically be defined on subclasses without a __tablename__, as all subclasses of OrmNode must to use single table inheritance, Artigraph shuttles the __table_args__ to the OrmNode class. Lastly, artigraph looks at any foreign keys and tries to determine what order they should be created in to avoid foreign key constraint violations.

Note

Circular foreign keys are not supported at this time.

Single Table Inheritance

Artigraph uses single table inheritance (STI) to store all data in a single table. This comes with advantages and disadvantages compared to concrete table inheritance. The primary advantage of STI is that the database schema is drastically simplified since there's only one table to manage - queries can avoid joins and thus be more performant. The disadvantages of STI come from a lack of separation - making independent schema changes may be challenging.

It's worth keeping these tradeoffs in mind as you extend Artigraph. The main way to mitigate the disadvantages of STI is to keep the number of OrmNode subclasses to a minimum. Thankfully, the base primitives of Artigraph are powerful enough to support a wide variety of use cases. In general, if you find yourself needing to add a new

Note

Thankfully most modern databases do not suffer from size issues if a table is sparse. For example, in PostgreSQL a null bitmap is used to mark which columns are null for any row with at least one null value. As such, the size of a sparse row is identical to one that is well (but not completely) populated.

Artifact

OrmArtifact is a subclass of OrmNode that defines a set of columns that are shared by all artifacts. It does not contain data or describe where data may be found. Its columns are:

Column Type Description
artifact_serializer String The name of the serializer used to serialize the artifact.

Of note is the artifact_serializer which maps to a serializer by name.

Database Artifact

OrmDatabaseArtifact is a subclass of OrmArtifact that stores data directly in the database. It defines a single column for that purpose:

Column Type Description
database_artifact_data Bytes The data of the artifact.

Model Artifact

OrmModelArtifact is a subclass of DatabaseArtifact that stores the root node of a model.

Column Type Description
model_artifact_type_name str The name of the model type
model_artifact_version int The version of the model

Remote Artifact

OrmRemoteArtifact is a subclass of OrmArtifact that represents an artifact that is stored somewhere else other than the database. Since the data itself is stored elsewhere, all that is stored in the database is a pointer to the artifact. To do this it defines:

Column Type Description
remote_artifact_storage String The name of the storage backend.
remote_artifact_location String The location of the data in in the storage backend.

The remote_artifact_storage column maps to a storage backend by name.

Graph Models

The dataclass-like usage of GraphModel belies the fact that its underlying implementation builds atop database, remote and model artifacts. Under the hood, the hierarchy of a GraphModel and its fields are replicated in the database. So saving a GraphModel like the one below:

import artigraph as ag


@ag.dataclass
class MyDataModel(ag.GraphModel, version=1):
    some_value: int
    inner_model: MyDataModel | None = None


my_data = MyDataModel(some_value=1, inner_model=MyDataModel(some_value=2))
ag.write_one(my_data)

Will result in the following graph being created in the database

graph TB
    m1(["ModelArtifact(type='MyDataModel', version=1)"])
    f1(["DatabaseArtifact(data=1)"])
    m2(["DatabaseArtifact(type='MyDataModel', version=1)"])
    f2(["DatabaseArtifact(data=2)"])
    f3(["DatabaseArtifact(data=None)"])

    m1 --> |some_value| f1
    m1 --> |inner_model| m2
    m2 --> |some_value| f2
    m2 --> |inner_model| f3