Skip to content

BoxLab Documentation

Welcome to BoxLab - A Python toolkit for managing, converting, and annotating object detection datasets with support for COCO and YOLO formats.

What is BoxLab?

BoxLab is a comprehensive solution for working with object detection datasets. It provides:

  • Dataset Management - Load, merge, split, and analyze datasets
  • Format Conversion - Seamlessly convert between COCO and YOLO formats
  • GUI Annotator - Interactive desktop application for viewing and editing annotations
  • CLI Tools - Powerful command-line interface for batch operations
  • PyTorch Integration - Direct integration with PyTorch training pipelines
  • Plugin System - Extensible architecture for custom formats

Installation

pip install boxlab

From Source

# Clone repository
git clone https://github.com/6ixGODD/boxlab.git
cd boxlab

# Install with Poetry
poetry install

# Or with pip
pip install -e .

See the Installation Guide for detailed instructions.

Quick Start

View Dataset Info

boxlab dataset info data/coco/annotations.json --format coco

Convert Format

# COCO to YOLO
boxlab dataset convert input.json -if coco output -of yolo

# YOLO to COCO
boxlab dataset convert data/yolo -if yolo output -of coco

Launch Annotator

boxlab annotator

Python API

from boxlab.dataset.io import load_dataset, export_dataset

# Load dataset
dataset = load_dataset("annotations.json", format="coco")

# Export to different format
export_dataset(dataset, "output/yolo", format="yolo")

See the Quick Start Guide for more examples.

Features

Dataset Management

  • Multi-format Support - COCO JSON and YOLO formats
  • Source Tracking - Track dataset origins in merged datasets
  • Statistics - Comprehensive dataset analysis
  • Visualization - Generate distribution plots and sample images

Format Conversion

  • Bidirectional - Convert between COCO and YOLO
  • Flexible Splitting - Custom train/val/test ratios
  • Naming Strategies - Multiple file naming options
  • Validation - Automatic format validation

Annotation Tools

  • Visual Editor - Interactive bounding box editing
  • Audit Workflow - Approve/reject images systematically
  • Tagging System - Organize images with custom tags
  • Workspace Persistence - Save and restore work sessions

Command Line Interface

  • Intuitive Commands - Easy-to-use CLI structure
  • Rich Output - Formatted tables and progress indicators
  • Batch Operations - Process multiple datasets
  • Scriptable - Integration with automation workflows

PyTorch Integration

  • Dataset Adapter - Direct PyTorch Dataset compatibility
  • Transform Support - Built-in augmentation pipelines
  • DataLoader Ready - Custom collate functions
  • Training Workflows - Seamless integration with training loops

Use Cases

Format Conversion

Convert your existing datasets to the format required by your training framework:

boxlab dataset convert coco_annotations.json -if coco yolo_output -of yolo

Dataset Merging

Combine multiple annotation sources into a single unified dataset:

boxlab dataset merge \
  -i manual_labels.json coco manual \
  -i auto_labels.json coco automatic \
  -o merged_dataset

Quality Assurance

Use the annotator to review and audit dataset quality:

boxlab annotator
# Enable Audit Mode → Review images → Export report

Training Preparation

Prepare datasets for model training with PyTorch:

from boxlab.dataset.io import load_dataset
from boxlab.dataset.torchadapter import build_torchdataset
from torch.utils.data import DataLoader

dataset = load_dataset("train.json", format="coco")
torch_dataset = build_torchdataset(dataset, image_size=640, augment=True)
loader = DataLoader(torch_dataset, batch_size=16, collate_fn=torch_dataset.collate)

Documentation Structure

Guides

Step-by-step tutorials and conceptual guides:

API Reference

Complete technical documentation:

Examples

Convert COCO to YOLO with Split

boxlab dataset convert \
  annotations.json \
  -if coco \
  output/yolo \
  -of yolo \
  --train-ratio 0.7 \
  --val-ratio 0.2 \
  --test-ratio 0.1 \
  --seed 42

Merge Three Datasets

boxlab dataset merge \
  -i dataset1/ann.json coco source1 \
  -i dataset2/ann.json coco source2 \
  -i dataset3 yolo source3 \
  -o merged_output \
  --output-format coco

Visualize Dataset

boxlab dataset visualize \
  data/yolo \
  --format yolo \
  -o visualizations \
  --samples 10 \
  --show-heatmap

PyTorch Training Loop

from boxlab.dataset.io import load_dataset
from boxlab.dataset.torchadapter import build_torchdataset
from torch.utils.data import DataLoader
import torch

# Prepare dataset
dataset = load_dataset("train_annotations.json", format="coco")
train_dataset = build_torchdataset(
    dataset,
    image_size=640,
    augment=True,
    normalize=True,
    return_format="xyxy"
)

# Create DataLoader
train_loader = DataLoader(
    train_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4,
    collate_fn=train_dataset.collate
)

# Training loop
model = YourDetectionModel()
optimizer = torch.optim.Adam(model.parameters())

for epoch in range(num_epochs):
    for images, targets in train_loader:
        images = [img.to(device) for img in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

System Requirements

  • Python: 3.10 or higher
  • Operating System: Linux, macOS, Windows
  • RAM: 2GB minimum, 4GB recommended
  • Disk Space: 500MB for installation

Optional Dependencies

  • PyTorch: For training integration (pip install torch torchvision)
  • CUDA: For GPU acceleration (with PyTorch GPU version)

Project Information

Getting Help

Documentation

Community

Contributing

Contributions are welcome! Please see:

What's Next?