onnx¶

onnx ¶

Attributes¶

doctitle `module-attribute` ¶

__doctitle__ = 'ONNX Runtime (Async)'

logger `module-attribute` ¶

logger = getLogger('inferflow.asyncio.runtime.onnx')

all `module-attribute` ¶

__all__ = ['ONNXRuntime']

Classes¶

ONNXRuntime ¶

ONNXRuntime(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224), providers: list[str] | None = None)

Bases: RuntimeConfigMixin, ONNXRuntimeMixin, BatchableRuntime[ndarray, Any]

ONNX Runtime for model inference (async version).

Asynchronous version of inferflow.runtime.onnx.ONNXRuntime.

All I/O operations (model loading, inference) are executed in a thread pool to avoid blocking the event loop. The API is identical to the sync version, but all methods are async.

Supports

ONNX (.onnx) models
CPU, CUDA execution providers
FP32, FP16 precision
Batch inference
Automatic warmup

Attributes:

Name	Type	Description
`session`	`InferenceSession \| None`	Loaded ONNX inference session (None before load()).
`input_name`	`str \| None`	Name of the model's input tensor.
`output_names`	`list[str] \| None`	Names of the model's output tensors.
`providers`		List of execution providers to use.

Parameters:

Name	Type	Description	Default
`model_path`	`str \| PathLike[str]`	Path to ONNX model file.	required
`device`	`str \| Device`	Device to run inference on (default: "cpu").	required
`precision`	`Precision`	Model precision (default: FP32).	`FP32`
`warmup_iterations`	`int`	Number of warmup iterations (default: 3).	`3`
`warmup_shape`	`tuple[int, ...]`	Input shape for warmup (default: (1, 3, 224, 224)).	`(1, 3, 224, 224)`
`providers`	`list[str] \| None`	ONNX execution providers (default: auto-detect).	`None`

Raises:

Type	Description
`FileNotFoundError`	If model file does not exist.
`ImportError`	If onnxruntime is not installed.

Example

import inferflow.asyncio as iff
import numpy as np

# Initialize runtime
runtime = iff.ONNXRuntime(
    model_path="model.onnx",
    device="cuda:0",
    precision=iff.Precision.FP16,
)

# Single inference
async with runtime:
    input_array = np.random.randn(1, 3, 224, 224).astype(
        np.float32
    )
    output = await runtime.infer(input_array)

# Batch inference
async with runtime:
    batch = [
        np.random.randn(1, 3, 224, 224).astype(np.float32),
        np.random.randn(1, 3, 224, 224).astype(np.float32),
    ]
    outputs = await runtime.infer_batch(batch)

Source code in inferflow/asyncio/runtime/onnx.py

def __init__(
    self,
    model_path: str | os.PathLike[str],
    device: str | Device,
    precision: Precision = Precision.FP32,
    warmup_iterations: int = 3,
    warmup_shape: tuple[int, ...] = (1, 3, 224, 224),
    providers: list[str] | None = None,
):
    super().__init__(
        model_path=model_path,
        device=device,
        precision=precision,
        warmup_iterations=warmup_iterations,
        warmup_shape=warmup_shape,
    )

    # Get providers (reuse mixin)
    self.providers = self._get_onnx_providers(providers)

    self.session: ort.InferenceSession | None = None
    self.input_name: str | None = None
    self.output_names: list[str] | None = None

    logger.info(
        f"ONNXRuntime (async) initialized: "
        f"model={self.model_path}, device={self.device}, "
        f"providers={self.providers}, precision={self.precision.value}"
    )

Attributes¶

providers `instance-attribute` ¶

providers = _get_onnx_providers(providers)

session `instance-attribute` ¶

session: InferenceSession | None = None

input_name `instance-attribute` ¶

input_name: str | None = None

output_names `instance-attribute` ¶

output_names: list[str] | None = None

Functions¶

load `async` ¶

load() -> None

Load ONNX model and prepare for inference (async).

Performs

Configure session options
Load model from disk (in thread pool)
Extract input/output names
Warmup inference (in thread pool)

Raises:

Type	Description
`FileNotFoundError`	If model file does not exist.
`RuntimeError`	If ONNX Runtime fails to load model.

Source code in inferflow/asyncio/runtime/onnx.py

async def load(self) -> None:
    """Load ONNX model and prepare for inference (async).

    Performs:
        - Configure session options
        - Load model from disk (in thread pool)
        - Extract input/output names
        - Warmup inference (in thread pool)

    Raises:
        FileNotFoundError: If model file does not exist.
        RuntimeError: If ONNX Runtime fails to load model.
    """
    logger.info(f"Loading ONNX model from {self.model_path}")

    # Session options
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

    # Load in thread pool
    loop = asyncio.get_event_loop()
    self.session = await loop.run_in_executor(
        None,
        lambda _: ort.InferenceSession(
            str(self.model_path),
            sess_options=sess_options,
            providers=self.providers,
        ),
        None,
    )

    # Get input/output names
    self.input_name = self.session.get_inputs()[0].name
    self.output_names = [output.name for output in self.session.get_outputs()]

    logger.info(
        f"Model loaded:  input={self.input_name}, "
        f"outputs={self.output_names}, "
        f"providers={self.session.get_providers()}"
    )

    # Warmup
    await self._warmup()

infer `async` ¶

infer(input: ndarray) -> Any

Run inference on a single input (async).

Automatically handles

Converting to correct dtype

Parameters:

Name	Type	Description	Default
`input`	`ndarray`	Input numpy array.	required

Returns:

Type	Description
`Any`	Output array or tuple of arrays (if multi-output model).

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

async with runtime:
    input = np.random.randn(1, 3, 224, 224).astype(
        np.float32
    )
    output = await runtime.infer(input)

Source code in inferflow/asyncio/runtime/onnx.py

async def infer(self, input: np.ndarray) -> t.Any:
    """Run inference on a single input (async).

    Automatically handles:
        - Converting to correct dtype

    Args:
        input: Input numpy array.

    Returns:
        Output array or tuple of arrays (if multi-output model).

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        async with runtime:
            input = np.random.randn(1, 3, 224, 224).astype(
                np.float32
            )
            output = await runtime.infer(input)
        ```
    """
    if self.session is None:
        raise RuntimeError("Model not loaded. Call load() first.")

    # Prepare input (reuse mixin)
    input = self._prepare_onnx_input(input)

    # Run inference in thread pool
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, self._infer_sync, input)

infer_batch `async` ¶

infer_batch(inputs: list[ndarray]) -> list[Any]

Run inference on a batch of inputs (async).

Concatenates inputs into a single batch array for efficient processing, then splits the output back into individual results.

Parameters:

Name	Type	Description	Default
`inputs`	`list[ndarray]`	List of input arrays. Each should have shape (1, C, H, W).	required

Returns:

Type	Description
`list[Any]`	List of outputs, one per input. Each maintains batch dimension.

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

async with runtime:
    batch = [
        np.random.randn(1, 3, 224, 224).astype(np.float32),
        np.random.randn(1, 3, 224, 224).astype(np.float32),
    ]
    outputs = await runtime.infer_batch(batch)

Source code in inferflow/asyncio/runtime/onnx.py

async def infer_batch(self, inputs: list[np.ndarray]) -> list[t.Any]:
    """Run inference on a batch of inputs (async).

    Concatenates inputs into a single batch array for efficient
    processing, then splits the output back into individual results.

    Args:
        inputs: List of input arrays. Each should have shape (1, C, H, W).

    Returns:
        List of outputs, one per input. Each maintains batch dimension.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        async with runtime:
            batch = [
                np.random.randn(1, 3, 224, 224).astype(np.float32),
                np.random.randn(1, 3, 224, 224).astype(np.float32),
            ]
            outputs = await runtime.infer_batch(batch)
        ```
    """
    if self.session is None:
        raise RuntimeError("Model not loaded. Call load() first.")

    # Concatenate inputs
    batch = np.concatenate(inputs, axis=0)

    # Prepare input (reuse mixin)
    batch = self._prepare_onnx_input(batch)

    # Run batch inference
    batch_output = await self.infer(batch)

    # Split outputs (reuse mixin)
    return self._split_onnx_batch_output(batch_output, len(inputs))

unload `async` ¶

unload() -> None

Unload model and free resources (async).

Performs

Release session from memory

Safe to call multiple times.

Source code in inferflow/asyncio/runtime/onnx.py

async def unload(self) -> None:
    """Unload model and free resources (async).

    Performs:
        - Release session from memory

    Safe to call multiple times.
    """
    logger.info("Unloading model")
    self.session = None
    self.input_name = None
    self.output_names = None
    logger.info("Model unloaded")

onnx¶

onnx ¶

Attributes¶

__doctitle__ module-attribute ¶

logger module-attribute ¶

__all__ module-attribute ¶

Classes¶

ONNXRuntime ¶

Attributes¶

providers instance-attribute ¶

session instance-attribute ¶

input_name instance-attribute ¶

output_names instance-attribute ¶

Functions¶

load async ¶

infer async ¶

infer_batch async ¶

unload async ¶

doctitle `module-attribute` ¶

logger `module-attribute` ¶

all `module-attribute` ¶

providers `instance-attribute` ¶

session `instance-attribute` ¶

input_name `instance-attribute` ¶

output_names `instance-attribute` ¶

load `async` ¶

infer `async` ¶

infer_batch `async` ¶

unload `async` ¶