runtime¶

runtime ¶

Attributes¶

doctitle `module-attribute` ¶

__doctitle__ = 'Inference Runtimes'

all `module-attribute` ¶

__all__ = ['Runtime', 'BatchableRuntime', 'RuntimeConfigMixin', 'onnx', 'tensorrt', 'torch']

Classes¶

RuntimeConfigMixin ¶

RuntimeConfigMixin(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224))

Shared configuration and validation logic for all runtimes.

This mixin provides common configuration handling and validation that is shared across all runtime implementations (sync and async, all backends).

It handles

Model path validation
Device configuration
Precision settings
Warmup configuration
Input shape specification

Attributes:

Name	Type	Description
`model_path`		Path to the model file.
`device`		Device to run inference on.
`precision`		Model precision (FP32, FP16, etc.).
`warmup_iterations`		Number of warmup iterations.
`warmup_shape`		Input shape for warmup.

Parameters:

Name	Type	Description	Default
`model_path`	`str \| PathLike[str]`	Path to model file.	required
`device`	`str \| Device`	Device specification (e.g., "cpu", "cuda: 0", "mps").	required
`precision`	`Precision`	Model precision (default: FP32).	`FP32`
`warmup_iterations`	`int`	Number of warmup iterations (default: 3).	`3`
`warmup_shape`	`tuple[int, ...]`	Input shape for warmup (default: (1, 3, 224, 224)).	`(1, 3, 224, 224)`

Raises:

Type	Description
`FileNotFoundError`	If model file does not exist.
`ValueError`	If warmup_iterations is negative.

Example

class MyRuntime(RuntimeConfigMixin, Runtime):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # self.model_path, self.device, etc. are now available

Source code in inferflow/runtime/__init__.py

def __init__(
    self,
    model_path: str | os.PathLike[str],
    device: str | Device,
    precision: Precision = Precision.FP32,
    warmup_iterations: int = 3,
    warmup_shape: tuple[int, ...] = (1, 3, 224, 224),
):
    self.model_path = pathlib.Path(model_path)
    self.device = Device(device) if isinstance(device, str) else device
    self.precision = precision
    self.warmup_iterations = warmup_iterations
    self.warmup_shape = warmup_shape

    self._validate_config()

Attributes¶

model_path `instance-attribute` ¶

model_path = Path(model_path)

device `instance-attribute` ¶

device = Device(device) if isinstance(device, str) else device

precision `instance-attribute` ¶

precision = precision

warmup_iterations `instance-attribute` ¶

warmup_iterations = warmup_iterations

warmup_shape `instance-attribute` ¶

warmup_shape = warmup_shape

Functions¶

Runtime ¶

Bases: ABC, Generic[P, R]

Abstract runtime for model inference (sync version).

A runtime encapsulates

Model loading/unloading
Device management
Inference execution
Memory management

This is the synchronous version of the runtime. For async support, see inferflow.asyncio.runtime.Runtime.

Example

import inferflow as iff

runtime = iff.TorchScriptRuntime(
    model_path="model.pt",
    device="cuda: 0",
)

# Using context manager
with runtime:
    result = runtime.infer(input_tensor)

# Manual lifecycle
runtime.load()
try:
    result = runtime.infer(input_tensor)
finally:
    runtime.unload()

Functions¶

load `abstractmethod` ¶

load() -> None

Load model into memory and prepare for inference.

This method should

Load model weights from disk
Move model to target device
Perform warmup inference
Set model to evaluation mode

Raises:

Type	Description
`FileNotFoundError`	If model file does not exist.
`RuntimeError`	If device is not available.

Source code in inferflow/runtime/__init__.py

@abc.abstractmethod
def load(self) -> None:
    """Load model into memory and prepare for inference.

    This method should:
        - Load model weights from disk
        - Move model to target device
        - Perform warmup inference
        - Set model to evaluation mode

    Raises:
        FileNotFoundError: If model file does not exist.
        RuntimeError: If device is not available.
    """

infer `abstractmethod` ¶

infer(input: P) -> R

Run inference on preprocessed input.

Parameters:

Name	Type	Description	Default
`input`	`P`	Preprocessed input ready for model inference. Type depends on backend (e.g., torch.Tensor for PyTorch).	required

Returns:

Type	Description
`R`	Raw model output. Type depends on model architecture.

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

with runtime:
    output = runtime.infer(input_tensor)

Source code in inferflow/runtime/__init__.py

@abc.abstractmethod
def infer(self, input: P) -> R:
    """Run inference on preprocessed input.

    Args:
        input: Preprocessed input ready for model inference.
            Type depends on backend (e.g., torch.Tensor for PyTorch).

    Returns:
        Raw model output. Type depends on model architecture.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            output = runtime.infer(input_tensor)
        ```
    """

unload `abstractmethod` ¶

unload() -> None

Unload model and free resources.

This method should

Release model from memory
Clear device cache
Close any open handles

Example

runtime.load()
# ... do inference ...
runtime.unload()  # Free resources

Source code in inferflow/runtime/__init__.py

@abc.abstractmethod
def unload(self) -> None:
    """Unload model and free resources.

    This method should:
        - Release model from memory
        - Clear device cache
        - Close any open handles

    Example:
        ```python
        runtime.load()
        # ... do inference ...
        runtime.unload()  # Free resources
        ```
    """

context ¶

context() -> Iterator[Self]

Context manager for automatic lifecycle management.

Automatically calls load() on entry and unload() on exit, even if an exception occurs.

Yields:

Name	Type	Description
`Self`	`Self`	The runtime instance.

Example

with runtime.context():
    result = runtime.infer(input)
# Model is automatically unloaded here

Source code in inferflow/runtime/__init__.py

@contextlib.contextmanager
def context(self) -> t.Iterator[t.Self]:
    """Context manager for automatic lifecycle management.

    Automatically calls `load()` on entry and `unload()` on exit,
    even if an exception occurs.

    Yields:
        Self: The runtime instance.

    Example:
        ```python
        with runtime.context():
            result = runtime.infer(input)
        # Model is automatically unloaded here
        ```
    """
    self.load()
    try:
        yield self
    finally:
        self.unload()

enter ¶

__enter__() -> Self

Context manager entry.

Loads the model.

Returns:

Name	Type	Description
`Self`	`Self`	The runtime instance.

Example

with runtime:  # Calls __enter__
    result = runtime.infer(input)

Source code in inferflow/runtime/__init__.py

def __enter__(self) -> t.Self:
    """Context manager entry.

    Loads the model.

    Returns:
        Self: The runtime instance.

    Example:
        ```python
        with runtime:  # Calls __enter__
            result = runtime.infer(input)
        ```
    """
    self.load()
    return self

exit ¶

__exit__(exc_type: type[BaseException] | None, exc_val: BaseException | None, exc_tb: TracebackType | None) -> None

Context manager exit.

Unloads the model, even if an exception occurred.

Parameters:

Name	Type	Description	Default
`exc_type`	`type[BaseException] \| None`	Exception type if an exception occurred.	required
`exc_val`	`BaseException \| None`	Exception value if an exception occurred.	required
`exc_tb`	`TracebackType \| None`	Exception traceback if an exception occurred.	required

Source code in inferflow/runtime/__init__.py

def __exit__(
    self,
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: types.TracebackType | None,
) -> None:
    """Context manager exit.

    Unloads the model, even if an exception occurred.

    Args:
        exc_type: Exception type if an exception occurred.
        exc_val: Exception value if an exception occurred.
        exc_tb: Exception traceback if an exception occurred.
    """
    self.unload()

BatchableRuntime ¶

Bases: Runtime[P, R], ABC

Runtime that supports batch inference natively (sync version).

Some runtimes (like TorchScript, ONNX) can process multiple inputs simultaneously for better throughput. This base class provides a common interface for batch inference.

The default infer() implementation delegates to infer_batch(), so subclasses only need to implement batch inference.

Example

with runtime:
    # Single inference (delegates to batch)
    result = runtime.infer(input)

    # Batch inference (more efficient)
    results = runtime.infer_batch([input1, input2, input3])

Functions¶

infer_batch `abstractmethod` ¶

infer_batch(inputs: list[P]) -> list[R]

Run inference on a batch of inputs.

Process multiple inputs in a single forward pass for better throughput. Inputs should already have batch dimension.

Parameters:

Name	Type	Description	Default
`inputs`	`list[P]`	List of preprocessed inputs. Each input should have shape (1, ...) for proper batching.	required

Returns:

Type	Description
`list[R]`	List of raw outputs, one per input. Each output maintains
`list[R]`	the batch dimension (1, ...).

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

with runtime:
    # Prepare batch
    batch = [
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
    ]

    # Batch inference
    results = runtime.infer_batch(batch)

    # results[0], results[1], results[2] correspond to inputs

Source code in inferflow/runtime/__init__.py

@abc.abstractmethod
def infer_batch(self, inputs: list[P]) -> list[R]:
    """Run inference on a batch of inputs.

    Process multiple inputs in a single forward pass for better
    throughput. Inputs should already have batch dimension.

    Args:
        inputs: List of preprocessed inputs. Each input should have
            shape (1, ...) for proper batching.

    Returns:
        List of raw outputs, one per input. Each output maintains
        the batch dimension (1, ...).

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            # Prepare batch
            batch = [
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
            ]

            # Batch inference
            results = runtime.infer_batch(batch)

            # results[0], results[1], results[2] correspond to inputs
        ```
    """

infer ¶

infer(input: P) -> R

Single inference (delegates to batch inference).

Wraps the input in a list, calls infer_batch(), and returns the first result. This provides a convenient single-input API while reusing the batch implementation.

Parameters:

Name	Type	Description	Default
`input`	`P`	Preprocessed input ready for model inference.	required

Returns:

Type	Description
`R`	Raw model output.

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

with runtime:
    # These are equivalent:
    result = runtime.infer(input)
    result = runtime.infer_batch([input])[0]

Source code in inferflow/runtime/__init__.py

def infer(self, input: P) -> R:
    """Single inference (delegates to batch inference).

    Wraps the input in a list, calls `infer_batch()`, and returns
    the first result. This provides a convenient single-input API
    while reusing the batch implementation.

    Args:
        input: Preprocessed input ready for model inference.

    Returns:
        Raw model output.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        with runtime:
            # These are equivalent:
            result = runtime.infer(input)
            result = runtime.infer_batch([input])[0]
        ```
    """
    results = self.infer_batch([input])
    return results[0]

Functions¶

getattr ¶

__getattr__(name: str) -> Any

Source code in inferflow/runtime/__init__.py

def __getattr__(name: str) -> t.Any:
    if name in __all__:
        return importlib.import_module("." + name, __name__)
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

runtime¶

runtime ¶

Attributes¶

__doctitle__ module-attribute ¶

__all__ module-attribute ¶

Classes¶

RuntimeConfigMixin ¶

Attributes¶

model_path instance-attribute ¶

device instance-attribute ¶

precision instance-attribute ¶

warmup_iterations instance-attribute ¶

warmup_shape instance-attribute ¶

Functions¶

Runtime ¶

Functions¶

load abstractmethod ¶

infer abstractmethod ¶

unload abstractmethod ¶

context ¶

__enter__ ¶

__exit__ ¶

BatchableRuntime ¶

Functions¶

infer_batch abstractmethod ¶

infer ¶

Functions¶

__getattr__ ¶

Submodules¶

doctitle `module-attribute` ¶

all `module-attribute` ¶

model_path `instance-attribute` ¶

device `instance-attribute` ¶

precision `instance-attribute` ¶

warmup_iterations `instance-attribute` ¶

warmup_shape `instance-attribute` ¶

load `abstractmethod` ¶

infer `abstractmethod` ¶

unload `abstractmethod` ¶

enter ¶

exit ¶

infer_batch `abstractmethod` ¶

getattr ¶