YOLO Plugin¶

yolo ¶

Classes¶

YOLOLoader ¶

Bases: LoaderPlugin

YOLOv5/YOLOv8 format dataset loader.

This loader handles datasets in YOLO format, which consists of: - A YAML configuration file (data.yaml) defining classes and paths - Images organized in directories (typically images/train, images/val, images/test) - Label files in TXT format with normalized coordinates (labels/train, etc.)

The loader supports both YOLOv5 and YOLOv8 format specifications, automatically handling different category naming conventions (dict or list format in YAML).

Label Format

Each line in a label file represents one object: All coordinates are normalized to [0, 1] range.

Attributes¶

name `property` ¶

name: str

Get the loader name.

Returns:

Type	Description
`str`	The string "yolo".

description `property` ¶

description: str

Get the loader description.

Returns:

Type	Description
`str`	Description string for YOLOv5/YOLOv8 format.

supported_extensions `property` ¶

supported_extensions: list[str]

Get supported file extensions.

Returns:

Type	Description
`list[str]`	List containing [".yaml", ".yml"].

Functions¶

load ¶

load(path: str | PathLike[str], name: str | None = None, splits: str | list[str] | None = None, **_kwargs: Any) -> Dataset

Load YOLO format dataset.

Loads a YOLO dataset from the specified directory. The directory should contain a YAML configuration file and subdirectories for images and labels.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike[str]`	Path to YOLO dataset YAML configuration file.	required
`name`	`str \| None`	Optional custom name for the dataset. If None, uses directory name.	`None`
`splits`	`str \| list[str] \| None`	Which split(s) to load. Can be: - None: Load all splits (train, val, test) - str: Load single split (e.g., "train") - list[str]: Load specific splits (e.g., ["train", "val"])	`None`
`**_kwargs`	`Any`	Additional parameters (currently unused, reserved for future extensions).	`{}`

Returns:

Type	Description
`Dataset`	Loaded Dataset instance containing all images, annotations, and
`Dataset`	categories.

Raises:

Type	Description
`FileNotFoundError`	If the YAML configuration file is not found.
`ValueError`	If YAML configuration is missing required 'names' field.

Source code in boxlab/dataset/plugins/yolo.py

def load(
    self,
    path: str | os.PathLike[str],
    name: str | None = None,
    splits: str | list[str] | None = None,
    **_kwargs: t.Any,
) -> Dataset:
    """Load YOLO format dataset.

    Loads a YOLO dataset from the specified directory. The directory should
    contain a YAML configuration file and subdirectories for images and
    labels.

    Args:
        path: Path to YOLO dataset YAML configuration file.
        name: Optional custom name for the dataset. If None, uses directory
            name.
        splits: Which split(s) to load. Can be:
            - None: Load all splits (train, val, test)
            - str: Load single split (e.g., "train")
            - list[str]: Load specific splits (e.g., ["train", "val"])
        **_kwargs: Additional parameters (currently unused, reserved for
            future extensions).

    Returns:
        Loaded Dataset instance containing all images, annotations, and
        categories.

    Raises:
        FileNotFoundError: If the YAML configuration file is not found.
        ValueError: If YAML configuration is missing required 'names' field.
    """
    yaml_path = pathlib.Path(path)

    if not yaml_path.exists():
        raise FileNotFoundError(f"YAML file not found: {yaml_path}")

    # Load YAML configuration
    with yaml_path.open(mode="r") as f:
        yaml_config = yaml.safe_load(f)

    dataset_name = name or yaml_path.name
    dataset = Dataset(name=dataset_name)
    dataset_dir = (
        pathlib.Path(yaml_config["path"]) if "path" in yaml_config else yaml_path.parent
    )

    logger.info(f"Loading YOLOv5 dataset from {dataset_dir}")

    # Load categories
    self._load_categories(yaml_config, dataset)

    logger.info(f"Loaded {len(dataset.categories)} categories")

    # Determine splits to load
    splits_to_load = self._determine_splits(splits)

    # Load each split
    total_images = 0
    total_annotations = 0

    for split in splits_to_load:
        images_dir = dataset_dir / "images" / split
        labels_dir = dataset_dir / "labels" / split

        if not images_dir.exists():
            logger.warning(f"Images directory not found for {split}: {images_dir}")
            continue

        split_images, split_annotations = self._load_split(
            dataset,
            images_dir,
            labels_dir,
            total_images,
            total_annotations,
        )

        total_images += split_images
        total_annotations += split_annotations

        logger.info(
            f"Loaded {split} split: {split_images} images, {split_annotations} annotations"
        )

    logger.info(f"Total loaded: {total_images} images, {total_annotations} annotations")

    return dataset

YOLOExporter ¶

Bases: ExporterPlugin

YOLOv5/YOLOv8 format dataset exporter.

This exporter converts datasets to YOLO format, creating: - A data.yaml configuration file with class definitions and paths - Image files organized in split subdirectories (images/train, etc.) - Label files in TXT format with normalized coordinates (labels/train, etc.)

The exporter supports: - Train/val/test splits or single dataset export - Custom naming strategies for files - Optional image copying (can export annotations only) - Unified or standard directory structure

Output Structure (standard): output_dir/ ├── data.yaml ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── labels/ ├── train/ ├── val/ └── test/

Output Structure (unified): output_dir/ ├── data.yaml ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── annotations/ ├── train/ ├── val/ └── test/

Attributes¶

name `property` ¶

name: str

Get the exporter name.

Returns:

Type	Description
`str`	The string "yolo".

description `property` ¶

description: str

Get the exporter description.

Returns:

Type	Description
`str`	Description string for YOLOv5/YOLOv8 format.

default_extension `property` ¶

default_extension: str

Get default file extension for label files.

Returns:

Type	Description
`str`	The string ".txt".

Functions¶

export ¶

export(dataset: Dataset, output_dir: str | PathLike[str], split_ratio: SplitRatio | None = None, seed: int | None = None, naming_strategy: NamingStrategy | None = None, copy_images: bool = True, unified_structure: bool = False, **_kwargs: Any) -> None

Export dataset to YOLO format.

Creates a YOLO-compatible dataset with proper directory structure, label files, and YAML configuration.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	Dataset instance to export.	required
`output_dir`	`str \| PathLike[str]`	Output directory path. Will be created if it doesn't exist.	required
`split_ratio`	`SplitRatio \| None`	Optional SplitRatio for train/val/test division. If None, exports entire dataset as 'train' split.	`None`
`seed`	`int \| None`	Random seed for reproducible splits. Only used if split_ratio is provided.	`None`
`naming_strategy`	`NamingStrategy \| None`	Strategy for generating output file names. If None, uses OriginalNaming (preserves original filenames).	`None`
`copy_images`	`bool`	If True, copies image files to output directory. If False, only creates label files.	`True`
`unified_structure`	`bool`	If True, uses 'annotations' directory instead of labels'. Useful for compatibility with some training frameworks.	`False`
`**_kwargs`	`Any`	Additional parameters (currently unused, reserved for future extensions).	`{}`

Note

Category IDs in label files are 0-indexed (YOLO convention), even though the Dataset uses 1-indexed IDs internally.

Source code in boxlab/dataset/plugins/yolo.py

def export(
    self,
    dataset: Dataset,
    output_dir: str | os.PathLike[str],
    split_ratio: SplitRatio | None = None,
    seed: int | None = None,
    naming_strategy: NamingStrategy | None = None,
    copy_images: bool = True,
    unified_structure: bool = False,
    **_kwargs: t.Any,
) -> None:
    """Export dataset to YOLO format.

    Creates a YOLO-compatible dataset with proper directory structure,
    label files, and YAML configuration.

    Args:
        dataset: Dataset instance to export.
        output_dir: Output directory path. Will be created if it doesn't
            exist.
        split_ratio: Optional SplitRatio for train/val/test division. If
            None, exports entire dataset as 'train' split.
        seed: Random seed for reproducible splits. Only used if split_ratio
            is provided.
        naming_strategy: Strategy for generating output file names. If None,
            uses OriginalNaming (preserves original filenames).
        copy_images: If True, copies image files to output directory. If
            False, only creates label files.
        unified_structure: If True, uses 'annotations' directory instead of
            labels'. Useful for compatibility with some training frameworks.
        **_kwargs: Additional parameters (currently unused, reserved for
            future extensions).

    Note:
        Category IDs in label files are 0-indexed (YOLO convention), even
        though the Dataset uses 1-indexed IDs internally.
    """
    output_dir = pathlib.Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    naming_strategy = naming_strategy or OriginalNaming()

    logger.info(f"Exporting YOLOv5 dataset to {output_dir}")

    if split_ratio is None:
        all_image_ids = list(dataset.images.keys())
        self._export_split(
            dataset,
            output_dir,
            "train",
            all_image_ids,
            naming_strategy,
            copy_images,
            unified_structure,
        )
        splits_to_write = ["train"]
    else:
        splits = dataset.split(split_ratio, seed)
        splits_to_write = []
        for split_name, image_ids in splits.items():
            if image_ids:
                self._export_split(
                    dataset,
                    output_dir,
                    split_name,
                    image_ids,
                    naming_strategy,
                    copy_images,
                    unified_structure,
                )
                splits_to_write.append(split_name)

    # Create data.yaml
    self._create_yaml(dataset, output_dir, splits_to_write)

    logger.info(f"YOLOv5 dataset exported to: {output_dir}")

Functions¶

options: show_root_heading: true show_source: true heading_level: 2 members_order: source show_signature_annotations: true separate_signature: true

Overview¶

The YOLO plugin provides support for loading and exporting datasets in YOLOv5/YOLOv8 format. It handles YAML configuration files, normalized bounding box coordinates, and the standard YOLO directory structure.

Format Specification¶

Directory Structure¶

dataset/
├── data.yaml              # Configuration file
├── images/
│   ├── train/            # Training images
│   ├── val/              # Validation images
│   └── test/             # Test images
└── labels/
    ├── train/            # Training labels
    ├── val/              # Validation labels
    └── test/             # Test labels

YAML Configuration¶

# data.yaml
path: /path/to/dataset
train: images/train
val: images/val
test: images/test

nc: 3  # Number of classes

names:
  0: person
  1: car
  2: bicycle

Label Format¶

Each label file (.txt) contains one line per object:

<class_id> <x_center> <y_center> <width> <height>

All coordinates are normalized to [0, 1] range:

x_center: Center X coordinate / image width
y_center: Center Y coordinate / image height
width: Bounding box width / image width
height: Bounding box height / image height

YOLOLoader¶

Load datasets from YOLO format.

Basic Usage¶

from boxlab.dataset.plugins.registry import get_loader

loader = get_loader("yolo")
dataset = loader.load("path/to/yolo_dataset")

Load Specific Splits¶

# Load only training data
dataset = loader.load("path/to/yolo_dataset", splits="train")

# Load multiple splits
dataset = loader.load("path/to/yolo_dataset", splits=["train", "val"])

# Load all splits (default)
dataset = loader.load("path/to/yolo_dataset", splits=None)

Custom YAML File¶

# Use custom YAML filename
dataset = loader.load(
    "path/to/yolo_dataset",
    yaml_file="custom.yaml"
)

Features¶

Supports both YOLOv5 and YOLOv8 formats
Handles dict or list category definitions in YAML
Converts normalized coordinates to absolute pixels
Validates label file format
Logs warnings for invalid annotations
Supports multiple image formats (jpg, png, bmp, tiff, webp)

YOLOExporter¶

Export datasets to YOLO format.

Basic Usage¶

from boxlab.dataset.plugins.registry import get_exporter

exporter = get_exporter("yolo")
exporter.export(dataset, output_dir="output/yolo_format")

Export with Splits¶

from boxlab.dataset.types import SplitRatio

# Define split ratios
split_ratio = SplitRatio(train=0.7, val=0.2, test=0.1)

exporter.export(
    dataset,
    output_dir="output/yolo_format",
    split_ratio=split_ratio,
    seed=42  # For reproducibility
)

Export Options¶

from boxlab.dataset.plugins.naming import SequentialNaming

# Custom naming strategy
strategy = SequentialNaming(prefix="img", start=1, digits=6)

# Export with options
exporter.export(
    dataset,
    output_dir="output/yolo_format",
    split_ratio=split_ratio,
    seed=42,
    naming_strategy=strategy,
    copy_images=True,  # Copy image files
    unified_structure=False  # Use standard structure
)

Unified Structure¶

Use unified directory structure (annotations instead of labels):

exporter.export(
    dataset,
    output_dir="output/yolo_format",
    unified_structure=True  # Uses 'annotations' directory
)

Output structure:

output/
├── data.yaml
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── annotations/          # Instead of 'labels'
    ├── train/
    ├── val/
    └── test/

Features¶

Generates compliant YAML configuration
Converts absolute coordinates to normalized format
Handles filename conflicts automatically
Supports custom naming strategies
Optional image copying
0-indexed class IDs in output (YOLO convention)
Preserves annotation precision with 6 decimal places

Coordinate Conversion¶

Loading (Normalized → Absolute)¶

# YOLO label: 0 0.5 0.5 0.3 0.2
# Image size: 640x480

cx_norm, cy_norm = 0.5, 0.5
w_norm, h_norm = 0.3, 0.2

cx = cx_norm * 640  # 320.0
cy = cy_norm * 480  # 240.0
w = w_norm * 640    # 192.0
h = h_norm * 480    # 96.0

Exporting (Absolute → Normalized)¶

# BBox: x_min=224, y_min=144, x_max=416, y_max=336
# Image size: 640x480

cx = (x_min + x_max) / 2  # 320.0
cy = (y_min + y_max) / 2  # 240.0
w = x_max - x_min          # 192.0
h = y_max - y_min          # 192.0

cx_norm = cx / 640  # 0.5
cy_norm = cy / 480  # 0.5
w_norm = w / 640    # 0.3
h_norm = h / 480    # 0.4

Category ID Handling¶

YOLO uses 0-indexed category IDs, while BoxLab's Dataset uses 1-indexed IDs internally.

During Loading¶

# YOLO label: class_id = 0
# Internal: category_id = 1
category_id = yolo_class_id + 1

During Export¶

# Internal: category_id = 1
# YOLO label: class_id = 0
yolo_class_id = category_id - 1

Error Handling¶

The YOLO plugin handles various error conditions:

Missing YAML: Raises FileNotFoundError
Invalid YAML: Raises ValueError if 'names' field is missing
Missing directories: Logs warning and skips
Invalid label format: Logs warning and skips line
Unknown category: Logs warning and skips annotation
Image read errors: Logs error and continues

YOLO Plugin¶

yolo ¶

Classes¶

YOLOLoader ¶

Attributes¶

name property ¶

description property ¶

supported_extensions property ¶

Functions¶

load ¶

YOLOExporter ¶

Attributes¶

name property ¶

description property ¶

default_extension property ¶

Functions¶

export ¶

Functions¶

Overview¶

Format Specification¶

Directory Structure¶

YAML Configuration¶

Label Format¶

YOLOLoader¶

Basic Usage¶

Load Specific Splits¶

Custom YAML File¶

Features¶

YOLOExporter¶

Basic Usage¶

Export with Splits¶

Export Options¶

Unified Structure¶

Features¶

Coordinate Conversion¶

Loading (Normalized → Absolute)¶

Exporting (Absolute → Normalized)¶

Category ID Handling¶

During Loading¶

During Export¶

Error Handling¶

See Also¶

name `property` ¶

description `property` ¶

supported_extensions `property` ¶

name `property` ¶

description `property` ¶

default_extension `property` ¶