torch¶
torch ¶
Attributes¶
Classes¶
TorchRuntimeMixin ¶
Shared TorchScript runtime logic for sync and async implementations.
This mixin provides common TorchScript-specific logic that is shared between synchronous and asynchronous runtime implementations. It handles:
- Device setup (CUDA, CPU, MPS)
- Precision conversion (FP32, FP16)
- Input preparation and validation
- Batch dimension management
- Output post-processing
This mixin is pure logic with no I/O operations, making it safe to reuse across sync and async implementations.
Attributes:
| Name | Type | Description |
|---|---|---|
device | Any | Device configuration (provided by subclass). |
precision | Precision | Precision configuration (provided by subclass). |
Example
# In sync runtime
class TorchScriptRuntime(
TorchRuntimeMixin, RuntimeConfigMixin, BatchableRuntime
):
def load(self):
self._torch_device = (
self._setup_torch_device()
) # Use mixin
# ...
# In async runtime
class TorchScriptRuntime(
TorchRuntimeMixin, RuntimeConfigMixin, BatchableRuntime
):
async def load(self):
self._torch_device = (
self._setup_torch_device()
) # Same mixin!
# ...
TorchScriptRuntime ¶
TorchScriptRuntime(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224), auto_add_batch_dim: bool = False)
Bases: RuntimeConfigMixin, TorchRuntimeMixin, BatchableRuntime[Tensor, R]
TorchScript model runtime (sync version).
Supports
- TorchScript (.pt, .pth) models
- CUDA, CPU, MPS devices
- FP32, FP16 precision
- Batch inference
- Automatic warmup
- Optional automatic batch dimension handling
Attributes:
| Name | Type | Description |
|---|---|---|
model | ScriptModule | None | Loaded TorchScript model (None before load()). |
auto_add_batch_dim | Whether to auto-add batch dimension for 3D inputs. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path | str | PathLike[str] | Path to TorchScript model file. | required |
device | str | Device | Device to run inference on (default: "cpu"). | required |
precision | Precision | Model precision (default: FP32). | FP32 |
warmup_iterations | int | Number of warmup iterations (default: 3). | 3 |
warmup_shape | tuple[int, ...] | Input shape for warmup (default: (1, 3, 224, 224)). | (1, 3, 224, 224) |
auto_add_batch_dim | bool | Whether to automatically add batch dimension if input is 3D (default: False). | False |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If CUDA/MPS is requested but not available. |
ImportError | If torch is not installed. |
Example
import inferflow as iff
import torch
# Initialize runtime
runtime = iff.TorchScriptRuntime(
model_path="model.pt",
device="cuda: 0",
precision=iff.Precision.FP16,
auto_add_batch_dim=True,
)
# Single inference
with runtime:
input_tensor = torch.randn(3, 224, 224) # 3D input
output = runtime.infer(
input_tensor
) # Batch dim auto-added
# Batch inference
with runtime:
batch = [
torch.randn(1, 3, 224, 224),
torch.randn(1, 3, 224, 224),
]
outputs = runtime.infer_batch(batch)
Source code in inferflow/runtime/torch.py
Attributes¶
Functions¶
load ¶
Load TorchScript model and prepare for inference.
Performs: - Load model from disk - Setup device - Move model to device - Set evaluation mode - Apply precision - Warmup inference
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If device is not available. |
Source code in inferflow/runtime/torch.py
infer ¶
infer(input: Tensor) -> R
Run inference on a single input.
Automatically handles: - Moving input to correct device - Converting to correct precision - Adding batch dimension (if configured) - Removing batch dimension (if added)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | Tensor | Input tensor. Can be 3D (C, H, W) if auto_add_batch_dim=True, or 4D (1, C, H, W) otherwise. | required |
Returns:
| Type | Description |
|---|---|
R | Model output. Type depends on model architecture (tensor or tuple). |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/runtime/torch.py
infer_batch ¶
infer_batch(inputs: list[Tensor]) -> list[R]
Run inference on a batch of inputs.
Concatenates inputs into a single batch tensor for efficient processing, then splits the output back into individual results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | list[Tensor] | List of input tensors. Each should have shape (1, C, H, W). | required |
Returns:
| Type | Description |
|---|---|
list[R] | List of outputs, one per input. Each maintains batch dimension. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/runtime/torch.py
unload ¶
Unload model and free resources.
Performs: - Release model from memory - Clear CUDA cache (if using CUDA)
Safe to call multiple times.