runtime¶
runtime ¶
Attributes¶
__all__ module-attribute ¶
Classes¶
RuntimeConfigMixin ¶
RuntimeConfigMixin(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224))
Shared configuration and validation logic for all runtimes.
This mixin provides common configuration handling and validation that is shared across all runtime implementations (sync and async, all backends).
It handles
- Model path validation
- Device configuration
- Precision settings
- Warmup configuration
- Input shape specification
Attributes:
| Name | Type | Description |
|---|---|---|
model_path | Path to the model file. | |
device | Device to run inference on. | |
precision | Model precision (FP32, FP16, etc.). | |
warmup_iterations | Number of warmup iterations. | |
warmup_shape | Input shape for warmup. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path | str | PathLike[str] | Path to model file. | required |
device | str | Device | Device specification (e.g., "cpu", "cuda: 0", "mps"). | required |
precision | Precision | Model precision (default: FP32). | FP32 |
warmup_iterations | int | Number of warmup iterations (default: 3). | 3 |
warmup_shape | tuple[int, ...] | Input shape for warmup (default: (1, 3, 224, 224)). | (1, 3, 224, 224) |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
ValueError | If warmup_iterations is negative. |
Example
Source code in inferflow/runtime/__init__.py
Runtime ¶
Abstract runtime for model inference (async version).
A runtime encapsulates
- Model loading/unloading
- Device management
- Inference execution
- Memory management
This is the synchronous version of the runtime. For async support, see inferflow.asyncio.runtime.Runtime.
Example
import inferflow.asyncio as iff
runtime = iff.TorchScriptRuntime(
model_path="model.pt",
device="cuda: 0",
)
# Using async context manager
async with runtime:
result = runtime.infer(input_tensor)
# Manual lifecycle
await runtime.load()
try:
result = await runtime.infer(input_tensor)
finally:
await runtime.unload()
Functions¶
load abstractmethod async ¶
infer abstractmethod async ¶
Run inference on preprocessed input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | P | Preprocessed input ready for model inference. Type depends on backend (e.g., torch.Tensor for PyTorch). | required |
Returns:
| Type | Description |
|---|---|
R | Raw model output. Type depends on model architecture. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Source code in inferflow/asyncio/runtime/__init__.py
unload abstractmethod async ¶
Unload model and free resources.
This method should
- Release model from memory
- Clear device cache
- Close any open handles
Source code in inferflow/asyncio/runtime/__init__.py
context async ¶
Async context manager for automatic lifecycle management.
Automatically calls load() on entry and unload() on exit, even if an exception occurs.
Yields:
| Name | Type | Description |
|---|---|---|
Self | AsyncIterator[Self] | The runtime instance. |
Example
Source code in inferflow/asyncio/runtime/__init__.py
__aenter__ async ¶
Async context manager entry.
Loads the model.
Returns:
| Name | Type | Description |
|---|---|---|
Self | Self | The runtime instance. |
Source code in inferflow/asyncio/runtime/__init__.py
__aexit__ async ¶
__aexit__(exc_type: type[BaseException] | None, exc_val: BaseException | None, exc_tb: TracebackType | None) -> None
Async context manager exit.
Unloads the model, even if an exception occurred.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc_type | type[BaseException] | None | Exception type if an exception occurred. | required |
exc_val | BaseException | None | Exception value if an exception occurred. | required |
exc_tb | TracebackType | None | Exception traceback if an exception occurred. | required |
Source code in inferflow/asyncio/runtime/__init__.py
BatchableRuntime ¶
Runtime that supports batch inference natively (async version).
Some runtimes (like TorchScript, ONNX) can process multiple inputs simultaneously for better throughput. This base class provides a common interface for batch inference.
The default infer() implementation delegates to infer_batch(), so subclasses only need to implement batch inference.
Example
Functions¶
infer_batch abstractmethod async ¶
Run inference on a batch of inputs asynchronously.
Process multiple inputs in a single forward pass for better throughput. Inputs should already have batch dimension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | list[P] | List of preprocessed inputs. Each input should have shape (1, ...) for proper batching. | required |
Returns:
| Type | Description |
|---|---|
list[R] | List of raw outputs, one per input. Each output maintains |
list[R] | the batch dimension (1, ...). |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/asyncio/runtime/__init__.py
infer async ¶
Single inference (delegates to batch inference).
Wraps the input in a list, calls infer_batch(), and returns the first result. This provides a convenient single-input API while reusing the batch implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | P | Preprocessed input ready for model inference. | required |
Returns:
| Type | Description |
|---|---|
R | Raw model output. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |