How CModel maps Python models to C layouts

CModel combines two concerns that are usually kept apart:

Python-side data validation
binary layout for C-compatible data

That combination is the reason the API feels small. The model class is doing both jobs.

A CModel is still a Pydantic model

CModel subclasses are normal Pydantic models first. Field validation, nested models, and model construction all happen in the usual Pydantic way.

The binary behavior is added by inspecting the Pydantic core schema and deriving a C layout from it.

That has two important consequences:

You describe your data with Python types, not by manually building a binary schema.
The binary layout only exists for shapes that CModel knows how to map from the Pydantic schema.

Field metadata defines the wire format

When you use aliases from cmodel.types, or an explicit CFormat, you are attaching binary format metadata to an otherwise ordinary Python type.

For example, Int means “validate this as an integer, but pack it with the i struct format”. Annotated[tuple[int, int, int], c_int(3)] means “validate this as a tuple of three integers, and pack it as three consecutive int values”.

For variable-length arrays, Annotated[tuple[T, ...], CCountedBy(...)] ties the array length to another field in the same struct, while plain tuple[T, ...] defines an unbounded trailing array that reads to the end of the buffer.

This split is why the models stay readable. Python types communicate intent. Format metadata communicates layout.

CFormat is intentionally limited to one underlying data type per field format string. Formats such as BBB are fine, but mixed-type formats such as Bh are not. When one logical value needs mixed field types, represent it as a tuple or nested model instead.

Nested models become nested structs

Nested CModel subclasses map directly onto nested C structs. That is not just a convenience for code organization; it is how layout structure is preserved.

If your source layout has a nested header and payload, model them as separate classes. The resulting documentation and generated reference will reflect that structure too.

Tagged unions select a variant by a tag field

When a field is annotated as a union of CModel subclasses with a Pydantic Discriminator, CModel treats it as a tagged union. On unpack, the tag field is read first to determine which variant layout to use. The selected variant is then unpacked from the beginning, and surrounding fields continue from where the variant ends.

All variants must share the same tag field name and format. The tag field uses a Literal type to pin each variant to one or more values. This mirrors the C pattern of a union inside a struct where one member indicates which union field is active.

Alignment is a struct-level decision

CFormat describes a field. c_alignment describes how a struct is laid out.

By default, CModel computes a struct alignment from the fields it contains. When you set c_alignment to 1, you are saying that this struct should be packed with no alignment padding.

This separation matters because layout bugs often come from conflating field format with struct packing. A field can still be an Int; what changes is where the next field is allowed to begin.

Byte order and data type sizes are set at class definition time

The byte order and data type sizes of a struct are determined by c_endian_type and c_size_type, which are set as keyword arguments on the class definition. These two values are combined into a struct format prefix that is baked into the schema when the class is created. This means the binary layout of a model is fully determined by its class definition, not by arguments passed at pack or unpack time.

CModel prefers explicitness over ABI magic

CModel is good at describing fixed layouts that you can reason about directly. It is not trying to infer every compiler- and platform-specific ABI rule for you.

That tradeoff is useful:

models stay small and inspectable
the binary contract lives next to the Python type definition
mismatches are easier to spot in tests

If you need confidence in a layout, round-trip representative bytes in tests. That is a better fit for this library than hoping the implementation guessed the same rules as a specific compiler.