torch¶
torch ¶
Attributes¶
Classes¶
TorchScriptRuntime ¶
TorchScriptRuntime(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224), auto_add_batch_dim: bool = False)
Bases: RuntimeConfigMixin, TorchRuntimeMixin, BatchableRuntime[Tensor, Any]
TorchScript model runtime (async version).
Asynchronous version of inferflow.runtime.torch.TorchScriptRuntime.
Supports
- TorchScript (.pt, .pth) models
- CUDA, CPU, MPS devices
- FP32, FP16 precision
- Batch inference
- Automatic warmup
- Optional automatic batch dimension handling
Attributes:
| Name | Type | Description |
|---|---|---|
model | ScriptModule | None | Loaded TorchScript model (None before load()). |
auto_add_batch_dim | Whether to auto-add batch dimension for 3D inputs. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path | str | PathLike[str] | Path to TorchScript model file. | required |
device | str | Device | Device to run inference on (default: "cpu"). | required |
precision | Precision | Model precision (default: FP32). | FP32 |
warmup_iterations | int | Number of warmup iterations (default: 3). | 3 |
warmup_shape | tuple[int, ...] | Input shape for warmup (default: (1, 3, 224, 224)). | (1, 3, 224, 224) |
auto_add_batch_dim | bool | Whether to automatically add batch dimension if input is 3D (default: False). | False |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If CUDA/MPS is requested but not available. |
ImportError | If torch is not installed. |
Example
import inferflow.asyncio as iff
import torch
# Initialize runtime
runtime = iff.TorchScriptRuntime(
model_path="model.pt",
device="cuda: 0",
precision=iff.Precision.FP16,
auto_add_batch_dim=True,
)
# Single inference
async with runtime:
input_tensor = torch.randn(3, 224, 224) # 3D input
output = await runtime.infer(
input_tensor
) # Batch dim auto-added
# Batch inference
async with runtime:
batch = [
torch.randn(1, 3, 224, 224),
torch.randn(1, 3, 224, 224),
]
outputs = await runtime.infer_batch(batch)
Source code in inferflow/asyncio/runtime/torch.py
Attributes¶
Functions¶
load async ¶
Load TorchScript model and prepare for inference (async).
Performs
- Load model from disk (in thread pool)
- Setup device
- Move model to device
- Set evaluation mode
- Apply precision
- Warmup inference (in thread pool)
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If model file does not exist. |
RuntimeError | If device is not available. |
Source code in inferflow/asyncio/runtime/torch.py
infer async ¶
Run inference on a single input (async).
Automatically handles
- Moving input to correct device
- Converting to correct precision
- Adding batch dimension (if configured)
- Removing batch dimension (if added)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | Tensor | Input tensor. Can be 3D (C, H, W) if auto_add_batch_dim=True, or 4D (1, C, H, W) otherwise. | required |
Returns:
| Type | Description |
|---|---|
Any | Model output. Type depends on model architecture (tensor or tuple). |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/asyncio/runtime/torch.py
infer_batch async ¶
Run inference on a batch of inputs (async).
Concatenates inputs into a single batch tensor for efficient processing, then splits the output back into individual results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | list[Tensor] | List of input tensors. Each should have shape (1, C, H, W). | required |
Returns:
| Type | Description |
|---|---|
list[Any] | List of outputs, one per input. Each maintains batch dimension. |
Raises:
| Type | Description |
|---|---|
RuntimeError | If model is not loaded. |
Example
Source code in inferflow/asyncio/runtime/torch.py
unload async ¶
Unload model and free resources (async).
Performs
- Release model from memory
- Clear CUDA cache (if using CUDA)
Safe to call multiple times.