Plugins¶
plugins ¶
Classes¶
NamingStrategy ¶
Bases: Protocol
Protocol for file naming strategies.
This protocol defines the interface for generating file names when exporting datasets. Custom naming strategies can be implemented by creating classes that follow this protocol.
Example
Implementing a custom naming strategy:
class CustomNamingStrategy:
def gen_name(
self, origin: str, source: str | None, image_id: str
) -> str:
# Generate name with source prefix
if source:
return f"{source}_{image_id}_{origin}"
return f"{image_id}_{origin}"
# Use with exporter
strategy = CustomNamingStrategy()
exporter.export(
dataset, output_dir="output/", naming_strategy=strategy
)
Example
Simple naming strategy that preserves original names:
Functions¶
gen_name ¶
Generate a new file name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
origin | str | Original file name (e.g., "image001.jpg"). | required |
source | str | None | Source name if available (e.g., "camera1"), None otherwise. | required |
image_id | str | Unique image identifier (e.g., "img_12345"). | required |
Returns:
| Type | Description |
|---|---|
str | Generated file name as a string. |
Example
Source code in boxlab/dataset/plugins/__init__.py
LoaderPlugin ¶
Bases: ABC
Base class for dataset loaders.
LoaderPlugin provides the abstract interface for implementing dataset loaders that can read various object detection dataset formats. Subclasses must implement the abstract methods to support specific formats like COCO, YOLO, etc.
This class handles dataset loading with validation and format detection capabilities. Each loader plugin should focus on a specific dataset format and implement the necessary parsing logic.
Example
Implementing a custom loader:
from boxlab.dataset import Dataset
from boxlab.dataset.plugins import LoaderPlugin
import json
class CustomLoader(LoaderPlugin):
@property
def name(self) -> str:
return "custom"
@property
def description(self) -> str:
return "Custom JSON format loader"
@property
def supported_extensions(self) -> list[str]:
return [".json", ".jsonl"]
def load(self, path, **kwargs):
dataset = Dataset(name="custom_dataset")
with open(path, "r") as f:
data = json.load(f)
# Parse and populate dataset
for item in data["images"]:
# Add images, annotations, categories
pass
return dataset
# Use the loader
loader = CustomLoader()
dataset = loader.load("path/to/dataset.json")
Example
Using a loader with validation:
Attributes¶
name abstractmethod property ¶
description abstractmethod property ¶
supported_extensions property ¶
List of supported file extensions (e.g., ['.json', '.yaml']).
Returns:
| Type | Description |
|---|---|
list[str] | List of file extensions this loader can handle, including the dot. |
list[str] | Return empty list if not applicable. |
Functions¶
load abstractmethod ¶
load(path: str | PathLike[str], name: str | None = None, **kwargs: Any) -> Dataset
Load dataset from path.
This method should parse the dataset file(s) at the given path and construct a Dataset object with all images, annotations, and categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | str | PathLike[str] | Path to dataset file or directory. Can be a JSON file, YAML file, or directory containing dataset files. | required |
name | str | None | Name to assign to the loaded Dataset instance. | None |
**kwargs | Any | Additional loader-specific parameters. Common options: - image_root (str): Root directory for image files - source_name (str): Name to tag this data source - strict (bool): Whether to fail on parse errors | {} |
Returns:
| Type | Description |
|---|---|
Dataset | A populated Dataset instance containing all loaded data. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If the specified path doesn't exist. |
ValueError | If the dataset format is invalid or corrupted. |
PermissionError | If files cannot be read due to permissions. |
Example
class COCOLoader(LoaderPlugin):
def load(self, path, **kwargs):
dataset = Dataset(name="coco")
image_root = kwargs.get("image_root", ".")
with open(path) as f:
data = json.load(f)
# Load categories
for cat in data["categories"]:
dataset.add_category(cat["id"], cat["name"])
# Load images and annotations
# ... implementation details ...
return dataset
# Usage
loader = COCOLoader()
dataset = loader.load(
"annotations.json",
image_root="/data/images",
source_name="train2017",
)
Source code in boxlab/dataset/plugins/__init__.py
validate ¶
Check if this loader can handle the given path.
Performs basic validation to determine if the file or directory at the given path appears to be in a format this loader can handle. This is typically used for automatic format detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | str | PathLike[str] | Path to validate. Can be a file or directory. | required |
Returns:
| Type | Description |
|---|---|
bool | True if this loader can likely handle the path, False otherwise. |
Note
This method performs basic checks (existence, extension). It does not guarantee that load() will succeed, as it doesn't validate the full file contents.
Example
Example
Custom validation logic:
class YOLOLoader(LoaderPlugin):
def validate(self, path):
# Call parent validation first
if not super().validate(path):
return False
# Additional YOLO-specific checks
path = pathlib.Path(path)
if path.is_file() and path.suffix in [
".yaml",
".yml",
]:
# Check for YOLO-specific keys
with open(path) as f:
data = yaml.safe_load(f)
return "names" in data and "path" in data
return False
Source code in boxlab/dataset/plugins/__init__.py
ExporterPlugin ¶
Bases: ABC
Base class for dataset exporters.
ExporterPlugin provides the abstract interface for implementing dataset exporters that can write datasets to various object detection formats. Subclasses must implement the abstract methods to support specific formats like COCO, YOLO, etc.
This class handles dataset export with support for train/val/test splits, custom naming strategies, and optional image copying. Each exporter plugin should focus on a specific output format.
Example
Implementing a custom exporter:
from boxlab.dataset import Dataset, SplitRatio
from boxlab.dataset.plugins import ExporterPlugin
import json
import shutil
from pathlib import Path
class CustomExporter(ExporterPlugin):
@property
def name(self) -> str:
return "custom"
@property
def description(self) -> str:
return "Export to custom JSON format"
@property
def default_extension(self) -> str:
return ".json"
def export(
self,
dataset,
output_dir,
split_ratio=None,
seed=None,
naming_strategy=None,
copy_images=True,
**kwargs,
):
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Handle splits if requested
if split_ratio:
splits = dataset.split(split_ratio, seed=seed)
else:
splits = {"all": list(dataset.images.keys())}
# Export each split
for split_name, image_ids in splits.items():
split_data = {"images": [], "annotations": []}
# Export logic here
# ...
# Write JSON file
output_file = output_dir / f"{split_name}.json"
with open(output_file, "w") as f:
json.dump(split_data, f, indent=2)
# Use the exporter
exporter = CustomExporter()
exporter.export(
dataset,
output_dir="output/custom",
split_ratio=SplitRatio(train=0.7, val=0.2, test=0.1),
seed=42,
)
Example
Using an exporter with custom configuration:
from boxlab.dataset import Dataset
from boxlab.dataset.plugins import ExporterPlugin
exporter = MyExporter()
# Get default configuration
config = exporter.get_default_config()
print(
config
) # {'copy_images': True, 'naming_strategy': 'original'}
# Export with custom settings
exporter.export(
dataset=my_dataset,
output_dir="output/",
copy_images=False,
indent=4, # Custom parameter
)
Attributes¶
name abstractmethod property ¶
description abstractmethod property ¶
default_extension property ¶
Default file extension for exported files.
Returns:
| Type | Description |
|---|---|
str | File extension string including the dot (e.g., ".json", ".txt"). |
str | Return empty string if not applicable. |
Functions¶
export abstractmethod ¶
export(dataset: Dataset, output_dir: str | PathLike[str], split_ratio: SplitRatio | None = None, seed: int | None = None, naming_strategy: NamingStrategy | None = None, copy_images: bool = True, **kwargs: Any) -> None
Export dataset to output directory.
This method should write the dataset to disk in the format supported by this exporter. It should handle creating output directories, optionally splitting the dataset, copying images, and writing annotation files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | The Dataset instance to export. | required |
output_dir | str | PathLike[str] | Path to the output directory. Will be created if it doesn't exist. | required |
split_ratio | SplitRatio | None | Optional SplitRatio object defining train/val/test proportions. If None, exports entire dataset without splitting. | None |
seed | int | None | Random seed for reproducible splits. Only used if split_ratio is provided. | None |
naming_strategy | NamingStrategy | None | Optional NamingStrategy instance for generating image file names. If None, uses original file names. | None |
copy_images | bool | If True, copies image files to output directory. If False, only writes annotation files. | True |
**kwargs | Any | Additional exporter-specific parameters. Common options: - indent (int): JSON indentation level - include_metadata (bool): Whether to include extra metadata - compress (bool): Whether to compress output files | {} |
Raises:
| Type | Description |
|---|---|
ValueError | If export parameters are invalid (e.g., invalid split_ratio). |
OSError | If file operations fail (permission denied, disk full, etc.). |
DatasetError | If dataset is empty or malformed. |
Example
from boxlab.dataset import Dataset, SplitRatio
from pathlib import Path
class COCOExporter(ExporterPlugin):
def export(
self,
dataset,
output_dir,
split_ratio=None,
seed=None,
naming_strategy=None,
copy_images=True,
**kwargs,
):
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Handle splits
if split_ratio:
splits = dataset.split(split_ratio, seed=seed)
else:
splits = {"all": list(dataset.images.keys())}
# Export each split
for split_name, image_ids in splits.items():
# Create COCO format dictionary
coco_data = {
"images": [],
"annotations": [],
"categories": [],
}
# Add categories
for (
cat_id,
cat_name,
) in dataset.categories.items():
coco_data["categories"].append({
"id": cat_id,
"name": cat_name,
})
# Add images and annotations
# ... implementation ...
# Write JSON
output_file = output_dir / f"{split_name}.json"
with open(output_file, "w") as f:
json.dump(coco_data, f, indent=2)
# Copy images if requested
if copy_images:
# ... copy logic ...
pass
# Usage
exporter = COCOExporter()
exporter.export(
dataset=my_dataset,
output_dir="output/coco",
split_ratio=SplitRatio(train=0.8, val=0.1, test=0.1),
seed=42,
copy_images=True,
indent=4,
)
Example
Exporting without splits:
Source code in boxlab/dataset/plugins/__init__.py
487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 | |
get_default_config ¶
Get default configuration for this exporter.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | Dictionary of default configuration values that will be used |
dict[str, Any] | if not overridden in export() call. |
Note
Subclasses can override this to provide format-specific defaults.
Example
class YOLOExporter(ExporterPlugin):
def get_default_config(self) -> dict[str, t.Any]:
return {
"copy_images": True,
"naming_strategy": "original",
"normalize_coords": True,
"include_yaml": True,
}
# Usage
exporter = YOLOExporter()
config = exporter.get_default_config()
print(config["normalize_coords"]) # True
Example
Using default config in export:
class CustomExporter(ExporterPlugin):
def get_default_config(self) -> dict[str, t.Any]:
return {
"copy_images": True,
"naming_strategy": "original",
"compression": "zip",
}
def export(self, dataset, output_dir, **kwargs):
# Merge with defaults
config = self.get_default_config()
config.update(kwargs)
# Use configuration
if config["compression"] == "zip":
# ... compression logic ...
pass
Source code in boxlab/dataset/plugins/__init__.py
options: show_root_heading: true show_source: true heading_level: 2 members_order: source show_signature_annotations: true separate_signature: true
Overview¶
The plugin system provides extensible interfaces for loading and exporting datasets in various formats. BoxLab comes with built-in plugins for popular formats like COCO and YOLO, and allows custom plugin development.
Architecture¶
The plugin system consists of three main components:
- NamingStrategy: Protocol for generating file names during export
- LoaderPlugin: Abstract base class for dataset loaders
- ExporterPlugin: Abstract base class for dataset exporters
NamingStrategy Protocol¶
Define custom file naming strategies when exporting datasets:
class CustomNamingStrategy:
def gen_name(self, origin: str, source: str | None, image_id: str) -> str:
if source:
return f"{source}_{image_id}_{origin}"
return f"{image_id}_{origin}"
# Use with exporter
strategy = CustomNamingStrategy()
exporter.export(dataset, output_dir="output/", naming_strategy=strategy)
LoaderPlugin¶
Create custom dataset loaders by implementing the LoaderPlugin abstract class:
import json
from boxlab.dataset import Dataset
from boxlab.dataset.plugins import LoaderPlugin
class CustomLoader(LoaderPlugin):
@property
def name(self) -> str:
return "custom"
@property
def description(self) -> str:
return "Custom JSON format loader"
@property
def supported_extensions(self) -> list[str]:
return [".json", ".jsonl"]
def load(self, path, **kwargs):
dataset = Dataset(name="custom_dataset")
with open(path, "r") as f:
data = json.load(f)
# Parse and populate dataset
for item in data["images"]:
# Add images, annotations, categories
pass
return dataset
# Register and use the loader
from boxlab.dataset.plugins.registry import register_loader
register_loader("custom", CustomLoader)
loader = get_loader("custom")
dataset = loader.load("path/to/dataset.json")
ExporterPlugin¶
Create custom dataset exporters by implementing the ExporterPlugin abstract class:
from boxlab.dataset import Dataset, SplitRatio
from boxlab.dataset.plugins import ExporterPlugin
import json
from pathlib import Path
class CustomExporter(ExporterPlugin):
@property
def name(self) -> str:
return "custom"
@property
def description(self) -> str:
return "Export to custom JSON format"
@property
def default_extension(self) -> str:
return ".json"
def export(
self,
dataset,
output_dir,
split_ratio=None,
seed=None,
naming_strategy=None,
copy_images=True,
**kwargs,
):
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Handle splits if requested
if split_ratio:
splits = dataset.split(split_ratio, seed=seed)
else:
splits = {"all": list(dataset.images.keys())}
# Export each split
for split_name, image_ids in splits.items():
split_data = {"images": [], "annotations": []}
# Export logic here
# ...
# Write JSON file
output_file = output_dir / f"{split_name}.json"
with open(output_file, "w") as f:
json.dump(split_data, f, indent=2)
# Register and use the exporter
from boxlab.dataset.plugins.registry import register_exporter
register_exporter("custom", CustomExporter)
exporter = get_exporter("custom")
exporter.export(
dataset,
output_dir="output/custom",
split_ratio=SplitRatio(train=0.7, val=0.2, test=0.1),
seed=42,
)
Built-in Plugins¶
BoxLab includes the following built-in plugins:
Plugin Registry¶
Manage plugins using the registry system:
- Registry: Register, retrieve, and discover plugins
Key Methods¶
LoaderPlugin¶
name: Unique plugin identifierdescription: Human-readable descriptionsupported_extensions: List of supported file extensionsload(): Load dataset from pathvalidate(): Check if loader can handle a path
ExporterPlugin¶
name: Unique plugin identifierdescription: Human-readable descriptiondefault_extension: Default file extensionexport(): Export dataset to directoryget_default_config(): Get default configuration
See Also¶
- Registry: Plugin registration and discovery
- COCO Plugin: COCO format implementation
- YOLO Plugin: YOLO format implementation
- Dataset: Core dataset management