Skip to content

YOLO Plugin

yolo

Classes

YOLOLoader

Bases: LoaderPlugin

YOLOv5/YOLOv8 format dataset loader.

This loader handles datasets in YOLO format, which consists of: - A YAML configuration file (data.yaml) defining classes and paths - Images organized in directories (typically images/train, images/val, images/test) - Label files in TXT format with normalized coordinates (labels/train, etc.)

The loader supports both YOLOv5 and YOLOv8 format specifications, automatically handling different category naming conventions (dict or list format in YAML).

Label Format

Each line in a label file represents one object: All coordinates are normalized to [0, 1] range.

Attributes
name property
name: str

Get the loader name.

Returns:

Type Description
str

The string "yolo".

description property
description: str

Get the loader description.

Returns:

Type Description
str

Description string for YOLOv5/YOLOv8 format.

supported_extensions property
supported_extensions: list[str]

Get supported file extensions.

Returns:

Type Description
list[str]

List containing [".yaml", ".yml"].

Functions
load
load(path: str | PathLike[str], name: str | None = None, splits: str | list[str] | None = None, **_kwargs: Any) -> Dataset

Load YOLO format dataset.

Loads a YOLO dataset from the specified directory. The directory should contain a YAML configuration file and subdirectories for images and labels.

Parameters:

Name Type Description Default
path str | PathLike[str]

Path to YOLO dataset YAML configuration file.

required
name str | None

Optional custom name for the dataset. If None, uses directory name.

None
splits str | list[str] | None

Which split(s) to load. Can be: - None: Load all splits (train, val, test) - str: Load single split (e.g., "train") - list[str]: Load specific splits (e.g., ["train", "val"])

None
**_kwargs Any

Additional parameters (currently unused, reserved for future extensions).

{}

Returns:

Type Description
Dataset

Loaded Dataset instance containing all images, annotations, and

Dataset

categories.

Raises:

Type Description
FileNotFoundError

If the YAML configuration file is not found.

ValueError

If YAML configuration is missing required 'names' field.

Source code in boxlab/dataset/plugins/yolo.py
def load(
    self,
    path: str | os.PathLike[str],
    name: str | None = None,
    splits: str | list[str] | None = None,
    **_kwargs: t.Any,
) -> Dataset:
    """Load YOLO format dataset.

    Loads a YOLO dataset from the specified directory. The directory should
    contain a YAML configuration file and subdirectories for images and
    labels.

    Args:
        path: Path to YOLO dataset YAML configuration file.
        name: Optional custom name for the dataset. If None, uses directory
            name.
        splits: Which split(s) to load. Can be:
            - None: Load all splits (train, val, test)
            - str: Load single split (e.g., "train")
            - list[str]: Load specific splits (e.g., ["train", "val"])
        **_kwargs: Additional parameters (currently unused, reserved for
            future extensions).

    Returns:
        Loaded Dataset instance containing all images, annotations, and
        categories.

    Raises:
        FileNotFoundError: If the YAML configuration file is not found.
        ValueError: If YAML configuration is missing required 'names' field.
    """
    yaml_path = pathlib.Path(path)

    if not yaml_path.exists():
        raise FileNotFoundError(f"YAML file not found: {yaml_path}")

    # Load YAML configuration
    with yaml_path.open(mode="r") as f:
        yaml_config = yaml.safe_load(f)

    dataset_name = name or yaml_path.name
    dataset = Dataset(name=dataset_name)
    dataset_dir = (
        pathlib.Path(yaml_config["path"]) if "path" in yaml_config else yaml_path.parent
    )

    logger.info(f"Loading YOLOv5 dataset from {dataset_dir}")

    # Load categories
    self._load_categories(yaml_config, dataset)

    logger.info(f"Loaded {len(dataset.categories)} categories")

    # Determine splits to load
    splits_to_load = self._determine_splits(splits)

    # Load each split
    total_images = 0
    total_annotations = 0

    for split in splits_to_load:
        images_dir = dataset_dir / "images" / split
        labels_dir = dataset_dir / "labels" / split

        if not images_dir.exists():
            logger.warning(f"Images directory not found for {split}: {images_dir}")
            continue

        split_images, split_annotations = self._load_split(
            dataset,
            images_dir,
            labels_dir,
            total_images,
            total_annotations,
        )

        total_images += split_images
        total_annotations += split_annotations

        logger.info(
            f"Loaded {split} split: {split_images} images, {split_annotations} annotations"
        )

    logger.info(f"Total loaded: {total_images} images, {total_annotations} annotations")

    return dataset

YOLOExporter

Bases: ExporterPlugin

YOLOv5/YOLOv8 format dataset exporter.

This exporter converts datasets to YOLO format, creating: - A data.yaml configuration file with class definitions and paths - Image files organized in split subdirectories (images/train, etc.) - Label files in TXT format with normalized coordinates (labels/train, etc.)

The exporter supports: - Train/val/test splits or single dataset export - Custom naming strategies for files - Optional image copying (can export annotations only) - Unified or standard directory structure

Output Structure (standard): output_dir/ ├── data.yaml ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── labels/ ├── train/ ├── val/ └── test/

Output Structure (unified): output_dir/ ├── data.yaml ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── annotations/ ├── train/ ├── val/ └── test/

Attributes
name property
name: str

Get the exporter name.

Returns:

Type Description
str

The string "yolo".

description property
description: str

Get the exporter description.

Returns:

Type Description
str

Description string for YOLOv5/YOLOv8 format.

default_extension property
default_extension: str

Get default file extension for label files.

Returns:

Type Description
str

The string ".txt".

Functions
export
export(dataset: Dataset, output_dir: str | PathLike[str], split_ratio: SplitRatio | None = None, seed: int | None = None, naming_strategy: NamingStrategy | None = None, copy_images: bool = True, unified_structure: bool = False, **_kwargs: Any) -> None

Export dataset to YOLO format.

Creates a YOLO-compatible dataset with proper directory structure, label files, and YAML configuration.

Parameters:

Name Type Description Default
dataset Dataset

Dataset instance to export.

required
output_dir str | PathLike[str]

Output directory path. Will be created if it doesn't exist.

required
split_ratio SplitRatio | None

Optional SplitRatio for train/val/test division. If None, exports entire dataset as 'train' split.

None
seed int | None

Random seed for reproducible splits. Only used if split_ratio is provided.

None
naming_strategy NamingStrategy | None

Strategy for generating output file names. If None, uses OriginalNaming (preserves original filenames).

None
copy_images bool

If True, copies image files to output directory. If False, only creates label files.

True
unified_structure bool

If True, uses 'annotations' directory instead of labels'. Useful for compatibility with some training frameworks.

False
**_kwargs Any

Additional parameters (currently unused, reserved for future extensions).

{}
Note

Category IDs in label files are 0-indexed (YOLO convention), even though the Dataset uses 1-indexed IDs internally.

Source code in boxlab/dataset/plugins/yolo.py
def export(
    self,
    dataset: Dataset,
    output_dir: str | os.PathLike[str],
    split_ratio: SplitRatio | None = None,
    seed: int | None = None,
    naming_strategy: NamingStrategy | None = None,
    copy_images: bool = True,
    unified_structure: bool = False,
    **_kwargs: t.Any,
) -> None:
    """Export dataset to YOLO format.

    Creates a YOLO-compatible dataset with proper directory structure,
    label files, and YAML configuration.

    Args:
        dataset: Dataset instance to export.
        output_dir: Output directory path. Will be created if it doesn't
            exist.
        split_ratio: Optional SplitRatio for train/val/test division. If
            None, exports entire dataset as 'train' split.
        seed: Random seed for reproducible splits. Only used if split_ratio
            is provided.
        naming_strategy: Strategy for generating output file names. If None,
            uses OriginalNaming (preserves original filenames).
        copy_images: If True, copies image files to output directory. If
            False, only creates label files.
        unified_structure: If True, uses 'annotations' directory instead of
            labels'. Useful for compatibility with some training frameworks.
        **_kwargs: Additional parameters (currently unused, reserved for
            future extensions).

    Note:
        Category IDs in label files are 0-indexed (YOLO convention), even
        though the Dataset uses 1-indexed IDs internally.
    """
    output_dir = pathlib.Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    naming_strategy = naming_strategy or OriginalNaming()

    logger.info(f"Exporting YOLOv5 dataset to {output_dir}")

    if split_ratio is None:
        all_image_ids = list(dataset.images.keys())
        self._export_split(
            dataset,
            output_dir,
            "train",
            all_image_ids,
            naming_strategy,
            copy_images,
            unified_structure,
        )
        splits_to_write = ["train"]
    else:
        splits = dataset.split(split_ratio, seed)
        splits_to_write = []
        for split_name, image_ids in splits.items():
            if image_ids:
                self._export_split(
                    dataset,
                    output_dir,
                    split_name,
                    image_ids,
                    naming_strategy,
                    copy_images,
                    unified_structure,
                )
                splits_to_write.append(split_name)

    # Create data.yaml
    self._create_yaml(dataset, output_dir, splits_to_write)

    logger.info(f"YOLOv5 dataset exported to: {output_dir}")

Functions

options: show_root_heading: true show_source: true heading_level: 2 members_order: source show_signature_annotations: true separate_signature: true

Overview

The YOLO plugin provides support for loading and exporting datasets in YOLOv5/YOLOv8 format. It handles YAML configuration files, normalized bounding box coordinates, and the standard YOLO directory structure.

Format Specification

Directory Structure

dataset/
├── data.yaml              # Configuration file
├── images/
│   ├── train/            # Training images
│   ├── val/              # Validation images
│   └── test/             # Test images
└── labels/
    ├── train/            # Training labels
    ├── val/              # Validation labels
    └── test/             # Test labels

YAML Configuration

# data.yaml
path: /path/to/dataset
train: images/train
val: images/val
test: images/test

nc: 3  # Number of classes

names:
  0: person
  1: car
  2: bicycle

Label Format

Each label file (.txt) contains one line per object:

<class_id> <x_center> <y_center> <width> <height>

All coordinates are normalized to [0, 1] range:

  • x_center: Center X coordinate / image width
  • y_center: Center Y coordinate / image height
  • width: Bounding box width / image width
  • height: Bounding box height / image height

YOLOLoader

Load datasets from YOLO format.

Basic Usage

from boxlab.dataset.plugins.registry import get_loader

loader = get_loader("yolo")
dataset = loader.load("path/to/yolo_dataset")

Load Specific Splits

# Load only training data
dataset = loader.load("path/to/yolo_dataset", splits="train")

# Load multiple splits
dataset = loader.load("path/to/yolo_dataset", splits=["train", "val"])

# Load all splits (default)
dataset = loader.load("path/to/yolo_dataset", splits=None)

Custom YAML File

# Use custom YAML filename
dataset = loader.load(
    "path/to/yolo_dataset",
    yaml_file="custom.yaml"
)

Features

  • Supports both YOLOv5 and YOLOv8 formats
  • Handles dict or list category definitions in YAML
  • Converts normalized coordinates to absolute pixels
  • Validates label file format
  • Logs warnings for invalid annotations
  • Supports multiple image formats (jpg, png, bmp, tiff, webp)

YOLOExporter

Export datasets to YOLO format.

Basic Usage

from boxlab.dataset.plugins.registry import get_exporter

exporter = get_exporter("yolo")
exporter.export(dataset, output_dir="output/yolo_format")

Export with Splits

from boxlab.dataset.types import SplitRatio

# Define split ratios
split_ratio = SplitRatio(train=0.7, val=0.2, test=0.1)

exporter.export(
    dataset,
    output_dir="output/yolo_format",
    split_ratio=split_ratio,
    seed=42  # For reproducibility
)

Export Options

from boxlab.dataset.plugins.naming import SequentialNaming

# Custom naming strategy
strategy = SequentialNaming(prefix="img", start=1, digits=6)

# Export with options
exporter.export(
    dataset,
    output_dir="output/yolo_format",
    split_ratio=split_ratio,
    seed=42,
    naming_strategy=strategy,
    copy_images=True,  # Copy image files
    unified_structure=False  # Use standard structure
)

Unified Structure

Use unified directory structure (annotations instead of labels):

exporter.export(
    dataset,
    output_dir="output/yolo_format",
    unified_structure=True  # Uses 'annotations' directory
)

Output structure:

output/
├── data.yaml
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── annotations/          # Instead of 'labels'
    ├── train/
    ├── val/
    └── test/

Features

  • Generates compliant YAML configuration
  • Converts absolute coordinates to normalized format
  • Handles filename conflicts automatically
  • Supports custom naming strategies
  • Optional image copying
  • 0-indexed class IDs in output (YOLO convention)
  • Preserves annotation precision with 6 decimal places

Coordinate Conversion

Loading (Normalized → Absolute)

# YOLO label: 0 0.5 0.5 0.3 0.2
# Image size: 640x480

cx_norm, cy_norm = 0.5, 0.5
w_norm, h_norm = 0.3, 0.2

cx = cx_norm * 640  # 320.0
cy = cy_norm * 480  # 240.0
w = w_norm * 640    # 192.0
h = h_norm * 480    # 96.0

Exporting (Absolute → Normalized)

# BBox: x_min=224, y_min=144, x_max=416, y_max=336
# Image size: 640x480

cx = (x_min + x_max) / 2  # 320.0
cy = (y_min + y_max) / 2  # 240.0
w = x_max - x_min          # 192.0
h = y_max - y_min          # 192.0

cx_norm = cx / 640  # 0.5
cy_norm = cy / 480  # 0.5
w_norm = w / 640    # 0.3
h_norm = h / 480    # 0.4

Category ID Handling

YOLO uses 0-indexed category IDs, while BoxLab's Dataset uses 1-indexed IDs internally.

During Loading

# YOLO label: class_id = 0
# Internal: category_id = 1
category_id = yolo_class_id + 1

During Export

# Internal: category_id = 1
# YOLO label: class_id = 0
yolo_class_id = category_id - 1

Error Handling

The YOLO plugin handles various error conditions:

  • Missing YAML: Raises FileNotFoundError
  • Invalid YAML: Raises ValueError if 'names' field is missing
  • Missing directories: Logs warning and skips
  • Invalid label format: Logs warning and skips line
  • Unknown category: Logs warning and skips annotation
  • Image read errors: Logs error and continues

See Also