pyarrow
Classes:
-
ArrowRecordBatchStreamSerializer–Serialize a stream of PyArrow record batches to the arrow stream format.
-
ArrowTableSerializer–Serialize a PyArrow table to the arrow file format.
-
ParquetReadOptions–Constructor arguments for ParquetFile.
-
ParquetRecordBatchStreamSerializer–Serialize a stream of PyArrow record batches to the parquet file format.
-
ParquetTableSerializer–Serialize a PyArrow table to the parquet file format.
-
ParquetWriteOptions–Constructor arguments for ParquetWriter.
Attributes:
-
arrow_record_batch_stream_serializer–ArrowRecordBatchStreamSerializer with default settings.
-
arrow_table_serializer–ArrowTableSerializer with default settings.
-
parquet_record_batch_stream_serializer–ParquetRecordBatchStreamSerializer with default settings.
-
parquet_table_serializer–ParquetTableSerializer with default settings.
arrow_record_batch_stream_serializer
module-attribute
arrow_record_batch_stream_serializer = (
ArrowRecordBatchStreamSerializer()
)
ArrowRecordBatchStreamSerializer with default settings.
arrow_table_serializer
module-attribute
arrow_table_serializer = ArrowTableSerializer()
ArrowTableSerializer with default settings.
parquet_record_batch_stream_serializer
module-attribute
parquet_record_batch_stream_serializer = (
ParquetRecordBatchStreamSerializer()
)
ParquetRecordBatchStreamSerializer with default settings.
parquet_table_serializer
module-attribute
parquet_table_serializer = ParquetTableSerializer()
ParquetTableSerializer with default settings.
ArrowRecordBatchStreamSerializer
ArrowRecordBatchStreamSerializer(
*,
write_options: IpcWriteOptions | None = None,
read_options: IpcReadOptions | None = None
)
Bases: _ArrowTableBase, StreamSerializer[RecordBatch]
Serialize a stream of PyArrow record batches to the arrow stream format.
Methods:
-
deserialize_config–Deserialize the configuration from a JSON string.
-
deserialize_data_stream–Deserialize the given stream of Arrow record batches.
-
serialize_config–Serialize the configuration to a JSON string.
-
serialize_data_stream–Serialize the given stream of Arrow record batches.
Attributes:
-
content_types(tuple[str, ...]) –The content types that the serializer uses.
content_types
class-attribute
instance-attribute
The content types that the serializer uses.
Used to get serializers by content type in the registry.
deserialize_config
deserialize_config(config: str) -> C
Deserialize the configuration from a JSON string.
deserialize_data_stream
deserialize_data_stream(
content: SerializedDataStream,
) -> AsyncGenerator[RecordBatch]
Deserialize the given stream of Arrow record batches.
serialize_data_stream
serialize_data_stream(
stream: AsyncIterable[RecordBatch],
) -> SerializedDataStream
Serialize the given stream of Arrow record batches.
ArrowTableSerializer
ArrowTableSerializer(
*,
write_options: IpcWriteOptions | None = None,
read_options: IpcReadOptions | None = None
)
Bases: _ArrowTableBase, Serializer[Table]
Serialize a PyArrow table to the arrow file format.
Methods:
-
deserialize_config–Deserialize the configuration from a JSON string.
-
deserialize_data–Deserialize the given Arrow table.
-
serialize_config–Serialize the configuration to a JSON string.
-
serialize_data–Serialize the given Arrow table.
Attributes:
-
content_types(tuple[str, ...]) –The content types that the serializer uses.
content_types
class-attribute
instance-attribute
The content types that the serializer uses.
Used to get serializers by content type in the registry.
deserialize_config
deserialize_config(config: str) -> C
Deserialize the configuration from a JSON string.
deserialize_data
deserialize_data(content: SerializedData) -> Table
Deserialize the given Arrow table.
ParquetReadOptions
ParquetRecordBatchStreamSerializer
ParquetRecordBatchStreamSerializer(
*,
write_options: ParquetWriteOptions | None = None,
write_option_extras: Mapping[str, Any] | None = None,
read_options: ParquetReadOptions | None = None
)
Bases: StreamSerializer[RecordBatch]
Serialize a stream of PyArrow record batches to the parquet file format.
Methods:
-
deserialize_config–Deserialize the configuration from a JSON string.
-
deserialize_data_stream–Deserialize the given stream of Arrow record batches.
-
serialize_config–Serialize the configuration to a JSON string.
-
serialize_data_stream–Serialize the given stream of Arrow record batches.
Attributes:
-
content_types(tuple[str, ...]) –The content types that the serializer uses.
content_types
class-attribute
instance-attribute
The content types that the serializer uses.
Used to get serializers by content type in the registry.
deserialize_config
deserialize_config(config: str) -> C
Deserialize the configuration from a JSON string.
deserialize_data_stream
deserialize_data_stream(
content: SerializedDataStream,
) -> AsyncGenerator[RecordBatch]
Deserialize the given stream of Arrow record batches.
serialize_data_stream
serialize_data_stream(
stream: AsyncIterable[RecordBatch],
) -> SerializedDataStream
Serialize the given stream of Arrow record batches.
ParquetTableSerializer
ParquetTableSerializer(
*,
write_options: ParquetWriteOptions | None = None,
write_option_extras: Mapping[str, Any] | None = None,
read_options: ParquetReadOptions | None = None
)
Bases: Serializer[Table]
Serialize a PyArrow table to the parquet file format.
Methods:
-
deserialize_config–Deserialize the configuration from a JSON string.
-
deserialize_data–Deserialize the given Arrow table.
-
serialize_config–Serialize the configuration to a JSON string.
-
serialize_data–Serialize the given Arrow table.
Attributes:
-
content_types(tuple[str, ...]) –The content types that the serializer uses.
content_types
class-attribute
instance-attribute
The content types that the serializer uses.
Used to get serializers by content type in the registry.
deserialize_config
deserialize_config(config: str) -> C
Deserialize the configuration from a JSON string.
deserialize_data
deserialize_data(content: SerializedData) -> Table
Deserialize the given Arrow table.