PyTorch Adapter¶
torchadapter ¶
Attributes¶
ImageTargetPair module-attribute ¶
ImageTargetPair: TypeAlias = tuple[Any, Target]
Type alias for a tuple of (image, target).
The image can be either a PIL Image or a torch.Tensor depending on whether transforms have been applied.
Classes¶
Target ¶
Bases: TypedDict
Target dictionary containing annotation information for an image.
This TypedDict defines the structure of target data returned by the dataset adapter, following torchvision's object detection format conventions.
Attributes:
| Name | Type | Description |
|---|---|---|
boxes | Tensor | Bounding boxes tensor of shape (N, 4) where N is the number of objects. Format depends on the adapter's return_format setting. |
labels | Tensor | Class label tensor of shape (N,) containing integer category IDs. |
image_id | Tensor | Image identifier tensor of shape (1,). |
area | Tensor | Area values tensor of shape (N,) for each bounding box. |
iscrowd | Tensor | Crowd flag tensor of shape (N,) indicating if object is a crowd. |
Attributes¶
TorchDatasetAdapter ¶
TorchDatasetAdapter(dataset: Dataset, transform: Callable[..., Tensor] | None = None, target_transform: Callable[[Target], Target] | None = None, return_format: Literal['xyxy', 'xywh', 'cxcywh'] = 'xyxy')
Bases: TorchDataset[ImageTargetPair]
Adapter to convert BoxLab datasets to PyTorch-compatible format.
This adapter wraps Dataset instances and provides a PyTorch Dataset interface suitable for use with DataLoader and torchvision transforms. It handles image loading, annotation formatting, and coordinate conversion.
The adapter follows torchvision's object detection conventions, making it compatible with models like Faster R-CNN, Mask R-CNN, and other detection architectures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | Source BoxLab Dataset instance. | required |
transform | Callable[..., Tensor] | None | Optional torchvision transforms pipeline for images. Applied to PIL Images before returning. | None |
target_transform | Callable[[Target], Target] | None | Optional transforms for targets/annotations. Applied to the target dictionary. | None |
return_format | Literal['xyxy', 'xywh', 'cxcywh'] | Format for bounding boxes. Options: - "xyxy": [x_min, y_min, x_max, y_max] - "xywh": [x_min, y_min, width, height] - "cxcywh": [center_x, center_y, width, height] | 'xyxy' |
Attributes:
| Name | Type | Description |
|---|---|---|
dataset | The wrapped Dataset instance. | |
transform | Image transformation pipeline. | |
target_transform | Target transformation pipeline. | |
return_format | Bounding box format string. | |
image_ids | List of image IDs for indexing. |
Note
This adapter requires torch, torchvision, and pillow to be installed. Install with: pip install torch torchvision pillow
Raises:
| Type | Description |
|---|---|
RequiredModuleNotFoundError | If torch, torchvision, or PIL are not installed. |
Example
from boxlab.dataset import Dataset
from boxlab.dataset.torchadapter import TorchDatasetAdapter
from torchvision import transforms as T
# Create dataset
dataset = Dataset(name="my_dataset")
# ... populate dataset ...
# Create adapter with transforms
transform = T.Compose([
T.Resize((640, 640)),
T.ToTensor(),
])
torch_dataset = TorchDatasetAdapter(
dataset, transform=transform, return_format="xyxy"
)
# Use with DataLoader
from torch.utils.data import DataLoader
loader = DataLoader(
torch_dataset,
batch_size=4,
collate_fn=torch_dataset.collate,
)
Example
Source code in boxlab/dataset/torchadapter.py
Functions¶
__len__ ¶
Return the total number of samples in the dataset.
Returns:
| Type | Description |
|---|---|
int | Number of images in the dataset. |
__getitem__ ¶
__getitem__(idx: int) -> ImageTargetPair
Get a sample by index.
Loads the image and its annotations, applies transforms, and returns them in PyTorch-compatible format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx | int | Sample index (0-based integer). | required |
Returns:
| Type | Description |
|---|---|
ImageTargetPair | Tuple of (image, target) where: - image: PIL Image or torch.Tensor (if transform applied) - target: Dictionary containing: - boxes: Tensor of shape (N, 4) with bounding boxes - labels: Tensor of shape (N,) with class labels (1-indexed) - image_id: Tensor with image identifier - area: Tensor of shape (N,) with box areas - iscrowd: Tensor of shape (N,) with crowd flags |
Raises:
| Type | Description |
|---|---|
DatasetError | If image is not found in dataset. |
DatasetNotFoundError | If image file does not exist on disk. |
Example
Source code in boxlab/dataset/torchadapter.py
collate ¶
collate(batch: list[ImageTargetPair]) -> tuple[list[Tensor], list[Target]]
Custom collate function for DataLoader.
This collate function is useful when images have different numbers of objects, which is common in object detection. Instead of stacking tensors (which requires same dimensions), it returns lists of tensors and targets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch | list[ImageTargetPair] | List of (image, target) tuples from getitem. | required |
Returns:
| Type | Description |
|---|---|
tuple[list[Tensor], list[Target]] | Tuple of (images, targets) where: - images: List of image tensors - targets: List of target dictionaries |
Example
from torch.utils.data import DataLoader
loader = DataLoader(
torch_dataset,
batch_size=4,
collate_fn=torch_dataset.collate,
shuffle=True,
)
for images, targets in loader:
# images: list of 4 tensors
# targets: list of 4 target dicts
for img, tgt in zip(images, targets):
print(f"Image: {img.shape}")
print(f"Objects: {len(tgt['boxes'])}")
Source code in boxlab/dataset/torchadapter.py
Functions¶
build_torchdataset ¶
build_torchdataset(dataset: Dataset, image_size: int | tuple[int, int] | None = None, augment: bool = False, normalize: bool = False, *transforms: Callable[..., Any], return_format: Literal['xyxy', 'xywh', 'cxcywh'] = 'xyxy') -> TorchDatasetAdapter
Create a PyTorch-compatible dataset with standard transforms.
This convenience function builds a TorchDatasetAdapter with commonly used transforms for object detection, including resizing, augmentation, and normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | Source BoxLab Dataset instance. | required |
image_size | int | tuple[int, int] | None | Target image size. Can be: - int: Square resize (size, size) - tuple: (height, width) - None: No resizing | None |
augment | bool | Whether to apply data augmentation. Includes: - Random horizontal flip (p=0.5) - Color jitter (brightness, contrast, saturation, hue) - Random affine (rotation, translation, scale) | False |
normalize | bool | Whether to normalize images using ImageNet statistics: - mean=[0.485, 0.456, 0.406] - std=[0.229, 0.224, 0.225] | False |
*transforms | Callable[..., Any] | Additional user-defined transforms to append. | () |
return_format | Literal['xyxy', 'xywh', 'cxcywh'] | Bounding box format ("xyxy", "xywh", or "cxcywh"). | 'xyxy' |
Returns:
| Type | Description |
|---|---|
TorchDatasetAdapter | TorchDatasetAdapter instance with configured transforms. |
Note
This function requires torch, torchvision, and pillow to be installed. Install with: pip install torch torchvision pillow
Raises:
| Type | Description |
|---|---|
RequiredModuleNotFoundError | If required packages are not installed. |
Example
from boxlab.dataset import Dataset
from boxlab.dataset.torchadapter import build_torchdataset
from torch.utils.data import DataLoader
# Create dataset
dataset = Dataset(name="my_dataset")
# ... populate dataset ...
# Build training dataset with augmentation
train_dataset = build_torchdataset(
dataset,
image_size=640,
augment=True,
normalize=True,
return_format="xyxy",
)
# Build validation dataset without augmentation
val_dataset = build_torchdataset(
dataset,
image_size=640,
augment=False,
normalize=True,
return_format="xyxy",
)
# Create DataLoaders
train_loader = DataLoader(
train_dataset,
batch_size=16,
shuffle=True,
collate_fn=train_dataset.collate,
)
val_loader = DataLoader(
val_dataset,
batch_size=16,
shuffle=False,
collate_fn=val_dataset.collate,
)
Example
Example
Source code in boxlab/dataset/torchadapter.py
389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 | |
options: show_root_heading: true show_source: true heading_level: 2 members_order: source show_signature_annotations: true separate_signature: true
Overview¶
The PyTorch adapter module provides seamless integration between BoxLab datasets and PyTorch. It converts BoxLab datasets into PyTorch-compatible format, enabling direct use with DataLoader, torchvision transforms, and popular detection models.
Installation Requirements¶
This module requires additional dependencies:
If these packages are not installed, import errors will be raised with helpful installation instructions.
Key Components¶
TorchDatasetAdapter¶
Wraps BoxLab Dataset instances to provide PyTorch Dataset interface. Handles:
- Image loading and format conversion
- Annotation format conversion
- Transform pipeline application
- Batch collation for variable-sized objects
Target Dictionary¶
The adapter returns targets in torchvision's standard object detection format:
{
'boxes': Tensor, # Shape: (N, 4) - bounding boxes
'labels': Tensor, # Shape: (N,) - class labels
'image_id': Tensor, # Shape: (1,) - image identifier
'area': Tensor, # Shape: (N,) - box areas
'iscrowd': Tensor # Shape: (N,) - crowd flags
}
Bounding Box Formats¶
Three formats are supported:
- xyxy:
[x_min, y_min, x_max, y_max]- Top-left and bottom-right corners - xywh:
[x_min, y_min, width, height]- COCO format - cxcywh:
[center_x, center_y, width, height]- YOLO format
Common Usage Patterns¶
Basic Training Setup¶
from boxlab.dataset import Dataset
from boxlab.dataset.torchadapter import build_torchdataset
from torch.utils.data import DataLoader
# Load dataset
dataset = Dataset(name="my_dataset")
# ... populate dataset ...
# Create training dataset with augmentation
train_ds = build_torchdataset(
dataset,
image_size=640,
augment=True,
normalize=True
)
# Create DataLoader
train_loader = DataLoader(
train_ds,
batch_size=16,
shuffle=True,
num_workers=4,
collate_fn=train_ds.collate
)
# Training loop
for images, targets in train_loader:
# images: list of tensors
# targets: list of dicts
...
Train/Val Split¶
from boxlab.dataset import Dataset
from boxlab.dataset.types import SplitRatio
from boxlab.dataset.torchadapter import build_torchdataset
# Split dataset
dataset = Dataset(name="full_dataset")
splits = dataset.split(SplitRatio(train=0.8, val=0.2, test=0.0), seed=42)
# Create separate Dataset instances
train_dataset = Dataset(name="train")
val_dataset = Dataset(name="val")
# Populate split datasets
for img_id in splits['train']:
img_info = dataset.get_image(img_id)
train_dataset.add_image(img_info)
for ann in dataset.get_annotations(img_id):
train_dataset.add_annotation(ann)
for img_id in splits['val']:
img_info = dataset.get_image(img_id)
val_dataset.add_image(img_info)
for ann in dataset.get_annotations(img_id):
val_dataset.add_annotation(ann)
# Create PyTorch datasets
train_torch = build_torchdataset(train_dataset, image_size=640, augment=True, normalize=True)
val_torch = build_torchdataset(val_dataset, image_size=640, augment=False, normalize=True)
Custom Transforms¶
from torchvision import transforms as T
from boxlab.dataset.torchadapter import TorchDatasetAdapter
# Define custom transform pipeline
transform = T.Compose(
[
T.Resize((640, 640)),
T.RandomRotation(10),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
]
)
# Create adapter with custom transforms
adapter = TorchDatasetAdapter(
dataset,
transform=transform,
return_format="xyxy"
)
Using with Detection Models¶
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from boxlab.dataset.torchadapter import build_torchdataset
# Prepare dataset
torch_dataset = build_torchdataset(
dataset,
image_size=800,
augment=True,
normalize=True,
return_format="xyxy" # Faster R-CNN expects xyxy
)
loader = DataLoader(
torch_dataset,
batch_size=4,
collate_fn=torch_dataset.collate
)
# Load model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.train()
# Training
optimizer = torch.optim.SGD(model.parameters(), lr=0.005)
for images, targets in loader:
images = [img.to(device) for img in images]
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
optimizer.zero_grad()
losses.backward()
optimizer.step()
Transform Pipeline Order¶
When using build_torchdataset(), transforms are applied in this order:
- Resize (if
image_sizespecified) - Augmentation (if
augment=True):- Random horizontal flip
- Color jitter
- Random affine transformations
- ToTensor (always applied)
- Normalization (if
normalize=True) - Custom transforms (additional args)
Error Handling¶
Missing Dependencies¶
try:
from boxlab.dataset.torchadapter import build_torchdataset
except RequiredModuleNotFoundError as e:
print(f"Missing dependency: {e}")
print("Install with: pip install torch torchvision pillow")
Missing Images¶
from boxlab.exceptions import DatasetNotFoundError
try:
image, target = torch_dataset[0]
except DatasetNotFoundError as e:
print(f"Image file not found: {e}")
See Also¶
- Dataset Core - Core dataset management
- Plugin System: Extend dataset functionality
- Types: Data structures and type definitions
- I/O Operations - Loading and exporting datasets
- PyTorch DataLoader - Official docs
- Torchvision Transforms - Transform reference