Dataset Statistics

Summary metrics, split guidance, and a class-level view of cyst, debris, and root annotations.

Published release

12,400

Total Images

Published across train, validation, and test partitions.

Published release

27,800

Total Annotations

Bounding-box instances carried through canonical, YOLO, and COCO exports.

Published release

3

Classes

Cyst, debris, and root remain explicit throughout conversion.

Published release

3

Published Splits

Separate partitions for fitting, tuning, and held-out reporting.

Release snapshot

Why these statistics matter

The statistics page should function as a release-quality summary, not just a placeholder dashboard. It frames the dataset around split-aware evaluation, inspection-friendly canonical coordinates, and class context that stays intact when you export back into training formats.

  • Focused on SCN cyst localization and counting rather than generic agricultural detection.
  • Canonical JSON keeps denormalized x1, y1, x2, y2 boxes for easier visual inspection and QA.
  • The same internal dataset representation is reused to generate YOLO and COCO-oriented exports.
  • Debris and root remain in the vocabulary so false positives and contextual errors can be studied directly.

Split discipline

Split Distribution

Treat the published partitions as part of the benchmark definition so training, threshold tuning, and final reporting stay separated.

Train 70%

Primary fitting split for augmentation, model learning, and batch-level experimentation.

Validation 15%

Used for threshold tuning, failure review, and regression checks during development.

Test 15%

Held out for final reporting, cross-model comparison, and publication-ready results.

Technical profile

Image & Annotation Profile

These properties affect how the dataset is inspected, converted, and reused in downstream pipelines.

Primary image formats: JPG / PNG Annotation type: bounding-box detection Canonical JSON stores denormalized x1, y1, x2, y2 pixel coordinates YOLO uploads require matching images or image_manifest.json for dimension recovery Available export targets include Canonical JSON, YOLOv5-v10, and COCO bundles

Label structure

Class Composition

The class map is intentionally compact so detection and counting experiments stay interpretable while difficult non-target context remains visible.

Cyst

Primary target

1

Each annotation marks one countable SCN cyst instance and should be treated as the core measurement signal in detection and counting workflows.

Debris

Hard negative context

2

Debris captures visually confusing non-target material that can inflate false positives if the class boundary is not modeled explicitly.

Root

Scene structure

3

Root annotations preserve biological context so models can separate cyst targets from surrounding plant material instead of learning a flattened foreground/background view.

Class reference

Class Reference

Class Role Why it matters
Cyst Primary target Each row in the canonical annotations corresponds to one counted SCN cyst instance.
Debris Confusing background Helps document hard negatives and visually similar non-target material that can reduce precision.
Root Scene context Keeps plant structure visible in the label vocabulary so the dataset remains useful for robust detection analysis.

Release checklist

Release Checklist

  • Keep split reporting fixed so results remain comparable across YOLO- and COCO-based training runs.
  • Use canonical JSON as the inspection layer when validating coordinate fidelity or parser behavior.
  • Update only the metrics and release wording here when the archived publication bundle is finalized.

Interpretation

How to use this page

Treat this page as the public-facing summary of the release. The layout now separates benchmark statistics, technical profile, and class interpretation so final numbers can be updated in JSON without reworking the page template.