onnx¶
onnx ¶
Attributes¶
Classes¶
ONNXRuntimeMixin ¶
Shared ONNX runtime logic for sync and async implementations.
This mixin provides common ONNX-specific logic that is shared between synchronous and asynchronous runtime implementations. It handles:
- Execution provider selection (CPU, CUDA)
- Input precision conversion
- Output parsing
- Batch output splitting
This mixin is pure logic with no I/O operations, making it safe to reuse across sync and async implementations.
Attributes:
| Name | Type | Description |
|---|---|---|
device | Any | Device configuration (provided by subclass). |
precision | Precision | Precision configuration (provided by subclass). |
Example
# In sync runtime
class ONNXRuntime(
ONNXRuntimeMixin, RuntimeConfigMixin, BatchableRuntime
):
def load(self):
providers = self._get_onnx_providers() # Use mixin
# ...
# In async runtime
class ONNXRuntime(
ONNXRuntimeMixin, RuntimeConfigMixin, BatchableRuntime
):
async def load(self):
providers = (
self._get_onnx_providers()
) # Same mixin!
# ...
ONNXRuntime ¶
ONNXRuntime(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224), providers: list[str] | None = None)
Bases: RuntimeConfigMixin, ONNXRuntimeMixin, BatchableRuntime[ndarray, Any]
ONNX Runtime for model inference (sync version).
Supports
- ONNX (.onnx) models
- CPU, CUDA execution providers
- FP32, FP16 precision
- Batch inference
- Automatic warmup
This is the synchronous version. For async support, see inferflow.asyncio.runtime.onnx.ONNXRuntime.
Attributes:
| Name | Type | Description |
|---|---|---|
session | InferenceSession | None | Loaded ONNX inference session (None before load()). |
input_name | str | None | Name of the model's input tensor. |
output_names | list[str] | None | Names of the model's output tensors. |
providers | List of execution providers to use. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path | str | PathLike[str] | Path to ONNX model file. | required |
device | str | Device | Device to run inference on (default: "cpu"). | required |
precision | Precision | Model precision (default: FP32). | FP32 |
warmup_iterations | int | Number of warmup iterations (default: 3). | 3 |
warmup_shape | tuple[int, ...] | Input shape for warmup (default: (1, 3, 224, 224)). | (1, 3, 224, 224) |
providers | list[str] | None | ONNX execution providers (default: auto-detect). | None |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
ImportError | If onnxruntime is not installed. |
Example
import inferflow as iff
import numpy as np
# Initialize runtime
runtime = iff.ONNXRuntime(
model_path="model.onnx",
device="cuda:0",
precision=iff.Precision.FP16,
)
# Single inference
with runtime:
input_array = np.random.randn(1, 3, 224, 224).astype(
np.float32
)
output = runtime.infer(input_array)
# Batch inference
with runtime:
batch = [
np.random.randn(1, 3, 224, 224).astype(np.float32),
np.random.randn(1, 3, 224, 224).astype(np.float32),
]
outputs = runtime.infer_batch(batch)
Source code in inferflow/runtime/onnx.py
Attributes¶
Functions¶
load ¶
Load ONNX model and prepare for inference.
Performs
- Configure session options
- Load model from disk
- Extract input/output names
- Warmup inference
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If ONNX Runtime fails to load model. |
Source code in inferflow/runtime/onnx.py
infer ¶
Run inference on a single input.
Automatically handles
- Converting to correct dtype
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | ndarray | Input numpy array. | required |
Returns:
| Type | Description |
|---|---|
Any | Output array or tuple of arrays (if multi-output model). |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/runtime/onnx.py
infer_batch ¶
Run inference on a batch of inputs.
Concatenates inputs into a single batch array for efficient processing, then splits the output back into individual results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | list[ndarray] | List of input arrays. Each should have shape (1, C, H, W). | required |
Returns:
| Type | Description |
|---|---|
list[Any] | List of outputs, one per input. Each maintains batch dimension. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/runtime/onnx.py
unload ¶
Unload model and free resources.
Performs
- Release session from memory
Safe to call multiple times.