runtime¶

runtime ¶

Attributes¶

doctitle `module-attribute` ¶

__doctitle__ = 'Inference Runtimes (Async)'

all `module-attribute` ¶

__all__ = ['Runtime', 'BatchableRuntime', 'RuntimeConfigMixin', 'onnx', 'tensorrt', 'torch']

Classes¶

RuntimeConfigMixin ¶

RuntimeConfigMixin(model_path: str | PathLike[str], device: str | Device, precision: Precision = FP32, warmup_iterations: int = 3, warmup_shape: tuple[int, ...] = (1, 3, 224, 224))

Shared configuration and validation logic for all runtimes.

This mixin provides common configuration handling and validation that is shared across all runtime implementations (sync and async, all backends).

It handles

Model path validation
Device configuration
Precision settings
Warmup configuration
Input shape specification

Attributes:

Name	Type	Description
`model_path`		Path to the model file.
`device`		Device to run inference on.
`precision`		Model precision (FP32, FP16, etc.).
`warmup_iterations`		Number of warmup iterations.
`warmup_shape`		Input shape for warmup.

Parameters:

Name	Type	Description	Default
`model_path`	`str \| PathLike[str]`	Path to model file.	required
`device`	`str \| Device`	Device specification (e.g., "cpu", "cuda: 0", "mps").	required
`precision`	`Precision`	Model precision (default: FP32).	`FP32`
`warmup_iterations`	`int`	Number of warmup iterations (default: 3).	`3`
`warmup_shape`	`tuple[int, ...]`	Input shape for warmup (default: (1, 3, 224, 224)).	`(1, 3, 224, 224)`

Raises:

Type	Description
`FileNotFoundError`	If model file does not exist.
`ValueError`	If warmup_iterations is negative.

Example

class MyRuntime(RuntimeConfigMixin, Runtime):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # self.model_path, self.device, etc. are now available

Source code in inferflow/runtime/__init__.py

def __init__(
    self,
    model_path: str | os.PathLike[str],
    device: str | Device,
    precision: Precision = Precision.FP32,
    warmup_iterations: int = 3,
    warmup_shape: tuple[int, ...] = (1, 3, 224, 224),
):
    self.model_path = pathlib.Path(model_path)
    self.device = Device(device) if isinstance(device, str) else device
    self.precision = precision
    self.warmup_iterations = warmup_iterations
    self.warmup_shape = warmup_shape

    self._validate_config()

Attributes¶

model_path `instance-attribute` ¶

model_path = Path(model_path)

device `instance-attribute` ¶

device = Device(device) if isinstance(device, str) else device

precision `instance-attribute` ¶

precision = precision

warmup_iterations `instance-attribute` ¶

warmup_iterations = warmup_iterations

warmup_shape `instance-attribute` ¶

warmup_shape = warmup_shape

Functions¶

Runtime ¶

Bases: ABC, Generic[P, R]

Abstract runtime for model inference (async version).

A runtime encapsulates

Model loading/unloading
Device management
Inference execution
Memory management

This is the synchronous version of the runtime. For async support, see inferflow.asyncio.runtime.Runtime.

Example

import inferflow.asyncio as iff

runtime = iff.TorchScriptRuntime(
    model_path="model.pt",
    device="cuda: 0",
)

# Using async context manager
async with runtime:
    result = runtime.infer(input_tensor)

# Manual lifecycle
await runtime.load()
try:
    result = await runtime.infer(input_tensor)
finally:
    await runtime.unload()

Functions¶

load `abstractmethod` `async` ¶

load() -> None

Load model into memory and prepare for inference.

Source code in inferflow/asyncio/runtime/__init__.py

@abc.abstractmethod
async def load(self) -> None:
    """Load model into memory and prepare for inference."""

infer `abstractmethod` `async` ¶

infer(input: P) -> R

Run inference on preprocessed input.

Parameters:

Name	Type	Description	Default
`input`	`P`	Preprocessed input ready for model inference. Type depends on backend (e.g., torch.Tensor for PyTorch).	required

Returns:

Type	Description
`R`	Raw model output. Type depends on model architecture.

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

async with runtime:
    output = await runtime.infer(input_tensor)

Source code in inferflow/asyncio/runtime/__init__.py

@abc.abstractmethod
async def infer(self, input: P) -> R:
    """Run inference on preprocessed input.

    Args:
        input: Preprocessed input ready for model inference.
            Type depends on backend (e.g., torch.Tensor for PyTorch).

    Returns:
        Raw model output. Type depends on model architecture.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        async with runtime:
            output = await runtime.infer(input_tensor)
        ```
    """

unload `abstractmethod` `async` ¶

unload() -> None

Unload model and free resources.

This method should

Release model from memory
Clear device cache
Close any open handles

Example

await runtime.load()
# ... do inference ...
await runtime.unload()  # Free resources

Source code in inferflow/asyncio/runtime/__init__.py

@abc.abstractmethod
async def unload(self) -> None:
    """Unload model and free resources.

    This method should:
        - Release model from memory
        - Clear device cache
        - Close any open handles

    Example:
        ```python
        await runtime.load()
        # ... do inference ...
        await runtime.unload()  # Free resources
        ```
    """

context `async` ¶

context() -> AsyncIterator[Self]

Async context manager for automatic lifecycle management.

Automatically calls load() on entry and unload() on exit, even if an exception occurs.

Yields:

Name	Type	Description
`Self`	`AsyncIterator[Self]`	The runtime instance.

Example

async with runtime.context():
    result = await runtime.infer(input)
# Model is automatically unloaded here

Source code in inferflow/asyncio/runtime/__init__.py

@contextlib.asynccontextmanager
async def context(self) -> t.AsyncIterator[t.Self]:
    """Async context manager for automatic lifecycle management.

    Automatically calls `load()` on entry and `unload()` on exit,
    even if an exception occurs.

    Yields:
        Self: The runtime instance.

    Example:
        ```python
        async with runtime.context():
            result = await runtime.infer(input)
        # Model is automatically unloaded here
        ```
    """
    await self.load()
    try:
        yield self
    finally:
        await self.unload()

aenter `async` ¶

__aenter__() -> Self

Async context manager entry.

Loads the model.

Returns:

Name	Type	Description
`Self`	`Self`	The runtime instance.

Example

async with runtime:  # Calls __aenter__
    result = runtime.infer(input)

Source code in inferflow/asyncio/runtime/__init__.py

async def __aenter__(self) -> t.Self:
    """Async context manager entry.

    Loads the model.

    Returns:
        Self: The runtime instance.

    Example:
        ```python
        async with runtime:  # Calls __aenter__
            result = runtime.infer(input)
        ```
    """
    await self.load()
    return self

aexit `async` ¶

__aexit__(exc_type: type[BaseException] | None, exc_val: BaseException | None, exc_tb: TracebackType | None) -> None

Async context manager exit.

Unloads the model, even if an exception occurred.

Parameters:

Name	Type	Description	Default
`exc_type`	`type[BaseException] \| None`	Exception type if an exception occurred.	required
`exc_val`	`BaseException \| None`	Exception value if an exception occurred.	required
`exc_tb`	`TracebackType \| None`	Exception traceback if an exception occurred.	required

Source code in inferflow/asyncio/runtime/__init__.py

async def __aexit__(
    self,
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: types.TracebackType | None,
) -> None:
    """Async context manager exit.

    Unloads the model, even if an exception occurred.

    Args:
        exc_type: Exception type if an exception occurred.
        exc_val: Exception value if an exception occurred.
        exc_tb: Exception traceback if an exception occurred.
    """
    await self.unload()

BatchableRuntime ¶

Bases: Runtime[P, R], ABC

Runtime that supports batch inference natively (async version).

Some runtimes (like TorchScript, ONNX) can process multiple inputs simultaneously for better throughput. This base class provides a common interface for batch inference.

The default infer() implementation delegates to infer_batch(), so subclasses only need to implement batch inference.

Example

async with runtime:
    # Single inference (delegates to batch)
    result = await runtime.infer(input)

    # Batch inference (more efficient)
    results = await runtime.infer_batch([
        input1,
        input2,
        input3,
    ])

Functions¶

infer_batch `abstractmethod` `async` ¶

infer_batch(inputs: list[P]) -> list[R]

Run inference on a batch of inputs asynchronously.

Process multiple inputs in a single forward pass for better throughput. Inputs should already have batch dimension.

Parameters:

Name	Type	Description	Default
`inputs`	`list[P]`	List of preprocessed inputs. Each input should have shape (1, ...) for proper batching.	required

Returns:

Type	Description
`list[R]`	List of raw outputs, one per input. Each output maintains
`list[R]`	the batch dimension (1, ...).

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

async with runtime:
    # Prepare batch
    batch = [
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
        torch.randn(1, 3, 224, 224),
    ]

    # Batch inference
    results = await runtime.infer_batch(batch)

    # results[0], results[1], results[2] correspond to inputs

Source code in inferflow/asyncio/runtime/__init__.py

@abc.abstractmethod
async def infer_batch(self, inputs: list[P]) -> list[R]:
    """Run inference on a batch of inputs asynchronously.

    Process multiple inputs in a single forward pass for better
    throughput. Inputs should already have batch dimension.

    Args:
        inputs: List of preprocessed inputs. Each input should have
            shape (1, ...) for proper batching.

    Returns:
        List of raw outputs, one per input. Each output maintains
        the batch dimension (1, ...).

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        async with runtime:
            # Prepare batch
            batch = [
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
                torch.randn(1, 3, 224, 224),
            ]

            # Batch inference
            results = await runtime.infer_batch(batch)

            # results[0], results[1], results[2] correspond to inputs
        ```
    """

infer `async` ¶

infer(input: P) -> R

Single inference (delegates to batch inference).

Wraps the input in a list, calls infer_batch(), and returns the first result. This provides a convenient single-input API while reusing the batch implementation.

Parameters:

Name	Type	Description	Default
`input`	`P`	Preprocessed input ready for model inference.	required

Returns:

Type	Description
`R`	Raw model output.

Raises:

Type	Description
`RuntimeError`	If model is not loaded.

Example

async with runtime:
    # These are equivalent:
    result = await runtime.infer(input)
    result = await runtime.infer_batch([input])[0]

Source code in inferflow/asyncio/runtime/__init__.py

async def infer(self, input: P) -> R:
    """Single inference (delegates to batch inference).

    Wraps the input in a list, calls `infer_batch()`, and returns
    the first result. This provides a convenient single-input API
    while reusing the batch implementation.

    Args:
        input: Preprocessed input ready for model inference.

    Returns:
        Raw model output.

    Raises:
        RuntimeError: If model is not loaded.

    Example:
        ```python
        async with runtime:
            # These are equivalent:
            result = await runtime.infer(input)
            result = await runtime.infer_batch([input])[0]
        ```
    """
    results = await self.infer_batch([input])
    return results[0]

Functions¶

getattr ¶

__getattr__(name: str) -> Any

Source code in inferflow/asyncio/runtime/__init__.py

def __getattr__(name: str) -> t.Any:
    if name in __all__:
        return importlib.import_module("." + name, __name__)
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

runtime¶

runtime ¶

Attributes¶

__doctitle__ module-attribute ¶

__all__ module-attribute ¶

Classes¶

RuntimeConfigMixin ¶

Attributes¶

model_path instance-attribute ¶

device instance-attribute ¶

precision instance-attribute ¶

warmup_iterations instance-attribute ¶

warmup_shape instance-attribute ¶

Functions¶

Runtime ¶

Functions¶

load abstractmethod async ¶

infer abstractmethod async ¶

unload abstractmethod async ¶

context async ¶

__aenter__ async ¶

__aexit__ async ¶

BatchableRuntime ¶

Functions¶

infer_batch abstractmethod async ¶

infer async ¶

Functions¶

__getattr__ ¶

Submodules¶

doctitle `module-attribute` ¶

all `module-attribute` ¶

model_path `instance-attribute` ¶

device `instance-attribute` ¶

precision `instance-attribute` ¶

warmup_iterations `instance-attribute` ¶

warmup_shape `instance-attribute` ¶

load `abstractmethod` `async` ¶

infer `abstractmethod` `async` ¶

unload `abstractmethod` `async` ¶

context `async` ¶

aenter `async` ¶

aexit `async` ¶

infer_batch `abstractmethod` `async` ¶

infer `async` ¶

getattr ¶