onnx¶
onnx ¶
Attributes¶
Classes¶
ONNXRuntime ¶
ONNXRuntime(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224), providers: list[str] | None = None)
Bases: RuntimeConfigMixin, ONNXRuntimeMixin, BatchableRuntime[ndarray, Any]
ONNX Runtime for model inference (async version).
Asynchronous version of inferflow.runtime.onnx.ONNXRuntime.
All I/O operations (model loading, inference) are executed in a thread pool to avoid blocking the event loop. The API is identical to the sync version, but all methods are async.
Supports
- ONNX (.onnx) models
- CPU, CUDA execution providers
- FP32, FP16 precision
- Batch inference
- Automatic warmup
Attributes:
| Name | Type | Description |
|---|---|---|
session | InferenceSession | None | Loaded ONNX inference session (None before load()). |
input_name | str | None | Name of the model's input tensor. |
output_names | list[str] | None | Names of the model's output tensors. |
providers | List of execution providers to use. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path | str | PathLike[str] | Path to ONNX model file. | required |
device | str | Device | Device to run inference on (default: "cpu"). | required |
precision | Precision | Model precision (default: FP32). | FP32 |
warmup_iterations | int | Number of warmup iterations (default: 3). | 3 |
warmup_shape | tuple[int, ...] | Input shape for warmup (default: (1, 3, 224, 224)). | (1, 3, 224, 224) |
providers | list[str] | None | ONNX execution providers (default: auto-detect). | None |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
ImportError | If onnxruntime is not installed. |
Example
import inferflow.asyncio as iff
import numpy as np
# Initialize runtime
runtime = iff.ONNXRuntime(
model_path="model.onnx",
device="cuda:0",
precision=iff.Precision.FP16,
)
# Single inference
async with runtime:
input_array = np.random.randn(1, 3, 224, 224).astype(
np.float32
)
output = await runtime.infer(input_array)
# Batch inference
async with runtime:
batch = [
np.random.randn(1, 3, 224, 224).astype(np.float32),
np.random.randn(1, 3, 224, 224).astype(np.float32),
]
outputs = await runtime.infer_batch(batch)
Source code in inferflow/asyncio/runtime/onnx.py
Attributes¶
Functions¶
load async ¶
Load ONNX model and prepare for inference (async).
Performs
- Configure session options
- Load model from disk (in thread pool)
- Extract input/output names
- Warmup inference (in thread pool)
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If ONNX Runtime fails to load model. |
Source code in inferflow/asyncio/runtime/onnx.py
infer async ¶
Run inference on a single input (async).
Automatically handles
- Converting to correct dtype
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | ndarray | Input numpy array. | required |
Returns:
| Type | Description |
|---|---|
Any | Output array or tuple of arrays (if multi-output model). |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/asyncio/runtime/onnx.py
infer_batch async ¶
Run inference on a batch of inputs (async).
Concatenates inputs into a single batch array for efficient processing, then splits the output back into individual results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | list[ndarray] | List of input arrays. Each should have shape (1, C, H, W). | required |
Returns:
| Type | Description |
|---|---|
list[Any] | List of outputs, one per input. Each maintains batch dimension. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/asyncio/runtime/onnx.py
unload async ¶
Unload model and free resources (async).
Performs
- Release session from memory
Safe to call multiple times.