Skip to content

pyarrow

Classes:

Attributes:

arrow_record_batch_stream_serializer module-attribute

arrow_record_batch_stream_serializer = (
    ArrowRecordBatchStreamSerializer()
)

ArrowRecordBatchStreamSerializer with default settings.

arrow_table_serializer module-attribute

arrow_table_serializer = ArrowTableSerializer()

ArrowTableSerializer with default settings.

parquet_record_batch_stream_serializer module-attribute

parquet_record_batch_stream_serializer = (
    ParquetRecordBatchStreamSerializer()
)

ParquetRecordBatchStreamSerializer with default settings.

parquet_table_serializer module-attribute

parquet_table_serializer = ParquetTableSerializer()

ParquetTableSerializer with default settings.

ArrowRecordBatchStreamSerializer

ArrowRecordBatchStreamSerializer(
    *,
    write_options: IpcWriteOptions | None = None,
    read_options: IpcReadOptions | None = None
)

Bases: _ArrowTableBase, StreamSerializer[RecordBatch]

Serialize a stream of PyArrow record batches to the arrow stream format.

Methods:

Attributes:

content_types class-attribute instance-attribute

content_types: tuple[str, ...] = ()

The content types that the serializer uses.

Used to get serializers by content type in the registry.

deserialize_config

deserialize_config(config: str) -> C

Deserialize the configuration from a JSON string.

deserialize_data_stream

deserialize_data_stream(
    content: SerializedDataStream,
) -> AsyncGenerator[RecordBatch]

Deserialize the given stream of Arrow record batches.

serialize_config

serialize_config(config: C) -> str

Serialize the configuration to a JSON string.

serialize_data_stream

serialize_data_stream(
    stream: AsyncIterable[RecordBatch],
) -> SerializedDataStream

Serialize the given stream of Arrow record batches.

ArrowTableSerializer

ArrowTableSerializer(
    *,
    write_options: IpcWriteOptions | None = None,
    read_options: IpcReadOptions | None = None
)

Bases: _ArrowTableBase, Serializer[Table]

Serialize a PyArrow table to the arrow file format.

Methods:

Attributes:

content_types class-attribute instance-attribute

content_types: tuple[str, ...] = ()

The content types that the serializer uses.

Used to get serializers by content type in the registry.

deserialize_config

deserialize_config(config: str) -> C

Deserialize the configuration from a JSON string.

deserialize_data

deserialize_data(content: SerializedData) -> Table

Deserialize the given Arrow table.

serialize_config

serialize_config(config: C) -> str

Serialize the configuration to a JSON string.

serialize_data

serialize_data(value: Table) -> SerializedData

Serialize the given Arrow table.

ParquetReadOptions

Bases: TypedDict

Constructor arguments for ParquetFile.

ParquetRecordBatchStreamSerializer

ParquetRecordBatchStreamSerializer(
    *,
    write_options: ParquetWriteOptions | None = None,
    write_option_extras: Mapping[str, Any] | None = None,
    read_options: ParquetReadOptions | None = None
)

Bases: StreamSerializer[RecordBatch]

Serialize a stream of PyArrow record batches to the parquet file format.

Methods:

Attributes:

content_types class-attribute instance-attribute

content_types: tuple[str, ...] = ()

The content types that the serializer uses.

Used to get serializers by content type in the registry.

deserialize_config

deserialize_config(config: str) -> C

Deserialize the configuration from a JSON string.

deserialize_data_stream

deserialize_data_stream(
    content: SerializedDataStream,
) -> AsyncGenerator[RecordBatch]

Deserialize the given stream of Arrow record batches.

serialize_config

serialize_config(config: C) -> str

Serialize the configuration to a JSON string.

serialize_data_stream

serialize_data_stream(
    stream: AsyncIterable[RecordBatch],
) -> SerializedDataStream

Serialize the given stream of Arrow record batches.

ParquetTableSerializer

ParquetTableSerializer(
    *,
    write_options: ParquetWriteOptions | None = None,
    write_option_extras: Mapping[str, Any] | None = None,
    read_options: ParquetReadOptions | None = None
)

Bases: Serializer[Table]

Serialize a PyArrow table to the parquet file format.

Methods:

Attributes:

content_types class-attribute instance-attribute

content_types: tuple[str, ...] = ()

The content types that the serializer uses.

Used to get serializers by content type in the registry.

deserialize_config

deserialize_config(config: str) -> C

Deserialize the configuration from a JSON string.

deserialize_data

deserialize_data(content: SerializedData) -> Table

Deserialize the given Arrow table.

serialize_config

serialize_config(config: C) -> str

Serialize the configuration to a JSON string.

serialize_data

serialize_data(value: Table) -> SerializedData

Serialize the given Arrow table.

ParquetWriteOptions

Bases: TypedDict

Constructor arguments for ParquetWriter.