Skip to content

runtime

runtime

Attributes

__doctitle__ module-attribute

__doctitle__ = 'Inference Runtimes'

__all__ module-attribute

__all__ = ['Runtime', 'BatchableRuntime', 'RuntimeConfigMixin', 'onnx', 'tensorrt', 'torch']

Classes

RuntimeConfigMixin

RuntimeConfigMixin(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224))

Shared configuration and validation logic for all runtimes.

This mixin provides common configuration handling and validation that is shared across all runtime implementations (sync and async, all backends).

It handles
  • Model path validation
  • Device configuration
  • Precision settings
  • Warmup configuration
  • Input shape specification

Attributes:

Name Type Description
model_path

Path to the model file.

device

Device to run inference on.

precision

Model precision (FP32, FP16, etc.).

warmup_iterations

Number of warmup iterations.

warmup_shape

Input shape for warmup.

Parameters:

Name Type Description Default
model_path str | PathLike[str]

Path to model file.

required
device str | Device

Device specification (e.g., "cpu", "cuda: 0", "mps").

required
precision Precision

Model precision (default: FP32).

FP32
warmup_iterations int

Number of warmup iterations (default: 3).

3
warmup_shape tuple[int, ...]

Input shape for warmup (default: (1, 3, 224, 224)).

(1, 3, 224, 224)

Raises:

Type Description
FileNotFoundError

If model file does not exist.

ValueError

If warmup_iterations is negative.

Example
class MyRuntime(RuntimeConfigMixin, Runtime):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # self.model_path, self.device, etc. are now available
Source code in inferflow/runtime/__init__.py
def __init__(
    self,
    model_path: str | os.PathLike[str],
    device: str | Device,
    precision: Precision = Precision.FP32,
    warmup_iterations: int = 3,
    warmup_shape: tuple[int, ...] = (1, 3, 224, 224),
):
    self.model_path = pathlib.Path(model_path)
    self.device = Device(device) if isinstance(device, str) else device
    self.precision = precision
    self.warmup_iterations = warmup_iterations
    self.warmup_shape = warmup_shape

    self._validate_config()
Attributes
model_path instance-attribute
model_path = Path(model_path)
device instance-attribute
device = Device(device) if isinstance(device, str) else device
precision instance-attribute
precision = precision
warmup_iterations instance-attribute
warmup_iterations = warmup_iterations
warmup_shape instance-attribute
warmup_shape = warmup_shape
Functions

Runtime

Bases: ABC, Generic[P, R]

Abstract runtime for model inference (sync version).

A runtime encapsulates
  • Model loading/unloading
  • Device management
  • Inference execution
  • Memory management

This is the synchronous version of the runtime. For async support, see inferflow.asyncio.runtime.Runtime.

Example
import inferflow as iff

runtime = iff.TorchScriptRuntime(
    model_path="model.pt",
    device="cuda: 0",
)

# Using context manager
with runtime:
    result = runtime.infer(input_tensor)

# Manual lifecycle
runtime.load()
try:
    result = runtime.infer(input_tensor)
finally:
    runtime.unload()
Functions
load abstractmethod
load() -> None

Load model into memory and prepare for inference.

This method should
  • Load model weights from disk
  • Move model to target device
  • Perform warmup inference
  • Set model to evaluation mode

Raises:

Type Description
FileNotFoundError

If model file does not exist.

RuntimeError

If device is not available.

Source code in inferflow/runtime/__init__.py
@abc.abstractmethod
def load(self) -> None:
    """Load model into memory and prepare for inference.

    This method should:
        - Load model weights from disk
        - Move model to target device
        - Perform warmup inference
        - Set model to evaluation mode

    Raises:
        FileNotFoundError: If model file does not exist.
        RuntimeError: If device is not available.
    """
infer abstractmethod
infer(input: P) -> R

Run inference on preprocessed input.

Parameters:

Name Type Description Default
input P

Preprocessed input ready for model inference. Type depends on backend (e.g., torch.Tensor for PyTorch).

required

Returns:

Type Description
R

Raw model output. Type depends on model architecture.

Raises:

Type Description
RuntimeError

If model is not loaded.

Example
with runtime:
    output = runtime.infer(input_tensor)
Source code in inferflow/runtime/__init__.py
@abc.abstractmethod
def infer(self, input: P) -> R:
    """Run inference on preprocessed input.

    Args:
        input: Preprocessed input ready for model inference.
            Type depends on backend (e.g., torch.Tensor for PyTorch).

    Returns:
        Raw model output. Type depends on model architecture.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            output = runtime.infer(input_tensor)
        ```
    """
unload abstractmethod
unload() -> None

Unload model and free resources.

This method should
  • Release model from memory
  • Clear device cache
  • Close any open handles
Example
runtime.load()
# ... do inference ...
runtime.unload()  # Free resources
Source code in inferflow/runtime/__init__.py
@abc.abstractmethod
def unload(self) -> None:
    """Unload model and free resources.

    This method should:
        - Release model from memory
        - Clear device cache
        - Close any open handles

    Example:
        ```python
        runtime.load()
        # ... do inference ...
        runtime.unload()  # Free resources
        ```
    """
context
context() -> Iterator[Self]

Context manager for automatic lifecycle management.

Automatically calls load() on entry and unload() on exit, even if an exception occurs.

Yields:

Name Type Description
Self Self

The runtime instance.

Example
with runtime.context():
    result = runtime.infer(input)
# Model is automatically unloaded here
Source code in inferflow/runtime/__init__.py
@contextlib.contextmanager
def context(self) -> t.Iterator[t.Self]:
    """Context manager for automatic lifecycle management.

    Automatically calls `load()` on entry and `unload()` on exit,
    even if an exception occurs.

    Yields:
        Self: The runtime instance.

    Example:
        ```python
        with runtime.context():
            result = runtime.infer(input)
        # Model is automatically unloaded here
        ```
    """
    self.load()
    try:
        yield self
    finally:
        self.unload()
__enter__
__enter__() -> Self

Context manager entry.

Loads the model.

Returns:

Name Type Description
Self Self

The runtime instance.

Example
with runtime:  # Calls __enter__
    result = runtime.infer(input)
Source code in inferflow/runtime/__init__.py
def __enter__(self) -> t.Self:
    """Context manager entry.

    Loads the model.

    Returns:
        Self: The runtime instance.

    Example:
        ```python
        with runtime:  # Calls __enter__
            result = runtime.infer(input)
        ```
    """
    self.load()
    return self
__exit__
__exit__(exc_type: type[BaseException] | None, exc_val: BaseException | None, exc_tb: TracebackType | None) -> None

Context manager exit.

Unloads the model, even if an exception occurred.

Parameters:

Name Type Description Default
exc_type type[BaseException] | None

Exception type if an exception occurred.

required
exc_val BaseException | None

Exception value if an exception occurred.

required
exc_tb TracebackType | None

Exception traceback if an exception occurred.

required
Source code in inferflow/runtime/__init__.py
def __exit__(
    self,
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: types.TracebackType | None,
) -> None:
    """Context manager exit.

    Unloads the model, even if an exception occurred.

    Args:
        exc_type: Exception type if an exception occurred.
        exc_val: Exception value if an exception occurred.
        exc_tb: Exception traceback if an exception occurred.
    """
    self.unload()

BatchableRuntime

Bases: Runtime[P, R], ABC

Runtime that supports batch inference natively (sync version).

Some runtimes (like TorchScript, ONNX) can process multiple inputs simultaneously for better throughput. This base class provides a common interface for batch inference.

The default infer() implementation delegates to infer_batch(), so subclasses only need to implement batch inference.

Example
with runtime:
    # Single inference (delegates to batch)
    result = runtime.infer(input)

    # Batch inference (more efficient)
    results = runtime.infer_batch([input1, input2, input3])
Functions
infer_batch abstractmethod
infer_batch(inputs: list[P]) -> list[R]

Run inference on a batch of inputs.

Process multiple inputs in a single forward pass for better throughput. Inputs should already have batch dimension.

Parameters:

Name Type Description Default
inputs list[P]

List of preprocessed inputs. Each input should have shape (1, ...) for proper batching.

required

Returns:

Type Description
list[R]

List of raw outputs, one per input. Each output maintains

list[R]

the batch dimension (1, ...).

Raises:

Type Description
RuntimeError

If model is not loaded.

Example
with runtime:
    # Prepare batch
    batch = [
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
    ]

    # Batch inference
    results = runtime.infer_batch(batch)

    # results[0], results[1], results[2] correspond to inputs
Source code in inferflow/runtime/__init__.py
@abc.abstractmethod
def infer_batch(self, inputs: list[P]) -> list[R]:
    """Run inference on a batch of inputs.

    Process multiple inputs in a single forward pass for better
    throughput. Inputs should already have batch dimension.

    Args:
        inputs: List of preprocessed inputs. Each input should have
            shape (1, ...) for proper batching.

    Returns:
        List of raw outputs, one per input. Each output maintains
        the batch dimension (1, ...).

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            # Prepare batch
            batch = [
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
            ]

            # Batch inference
            results = runtime.infer_batch(batch)

            # results[0], results[1], results[2] correspond to inputs
        ```
    """
infer
infer(input: P) -> R

Single inference (delegates to batch inference).

Wraps the input in a list, calls infer_batch(), and returns the first result. This provides a convenient single-input API while reusing the batch implementation.

Parameters:

Name Type Description Default
input P

Preprocessed input ready for model inference.

required

Returns:

Type Description
R

Raw model output.

Raises:

Type Description
RuntimeError

If model is not loaded.

Example
with runtime:
    # These are equivalent:
    result = runtime.infer(input)
    result = runtime.infer_batch([input])[0]
Source code in inferflow/runtime/__init__.py
def infer(self, input: P) -> R:
    """Single inference (delegates to batch inference).

    Wraps the input in a list, calls `infer_batch()`, and returns
    the first result. This provides a convenient single-input API
    while reusing the batch implementation.

    Args:
        input: Preprocessed input ready for model inference.

    Returns:
        Raw model output.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            # These are equivalent:
            result = runtime.infer(input)
            result = runtime.infer_batch([input])[0]
        ```
    """
    results = self.infer_batch([input])
    return results[0]

Functions

__getattr__

__getattr__(name: str) -> Any
Source code in inferflow/runtime/__init__.py
def __getattr__(name: str) -> t.Any:
    if name in __all__:
        return importlib.import_module("." + name, __name__)
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

Submodules