Benchmark CLI¶

e3r benchmark runs a method's predictions against a full dataset split and produces aggregate metrics (Chamfer, accuracy, completeness, F-score) plus a per-scene CSV/JSON breakdown. It is the multi-scene counterpart of e3r metric.

e3r benchmark <dataset> --pred-root <preds_root> --gt-root <gt_root> [...options]

The dataset is a subcommand such as scannet, tanks-temples, or generic. --pred-root points at predictions — one subdirectory per scene (or any structure the prediction locator can resolve, e.g. a manifest).

Two modes¶

e3r benchmark decides what ground truth to load based on the subcommand:

Mode	Trigger	GT layout	Thresholds & sampling
Registered dataset	`e3r benchmark <dataset>`	adapter class	dataset preset
Manual	`e3r benchmark generic`	path template	required on the CLI

Mode 1: registered dataset¶

Use this when the data sits in one of the layouts an adapter already understands (scannet, replica, dtu, eth3d, tum_rgbd, tanks_temples). The adapter handles file discovery; the matching preset supplies sensible defaults.

e3r benchmark scannet \
    --pred-root outputs/scannet \
    --gt-root /data/scannet \
    --split /data/scannet/splits/scannetv2_test.txt \
    --thresholds 0.05 --workers 8 \
    --out results.json --csv results.csv

To reshape an adapter's expected layout (different filenames or subdirs), pass -o key=value overrides. Run e3r benchmark <dataset> --help to see the available knobs for each adapter.

e3r benchmark eth3d \
    --pred-root outputs/eth3d \
    --gt-root /data/eth3d -o track=dslr

e3r benchmark tum-rgbd \
    --pred-root outputs/tum \
    --gt-root /data/tum -o intrinsics_fx=535.4 -o intrinsics_cx=320.1

Mode 2: generic layout¶

Use this when your data does not match any registered adapter. Use the generic subcommand and point at the ground truth via a {scene_id} template:

e3r benchmark generic \
    --pred-root outputs/mine \
    --gt-root /data/mydataset \
    --gt-path '{scene_id}/gt.ply' \
    --scenes-file splits/val.txt \
    --thresholds 0.05 --aligner none --unit m

The GT file may be a mesh (any trimesh-loadable format with faces) or a point cloud — eval3r probes the first scene to decide which.

Required flags in manual mode¶

--gt-root: directory the --gt-path template is resolved against.
--gt-path: path template with {scene_id} substitution.
A scene source — exactly one of:
--scenes 's1,s2,s3' (inline comma-separated list)
--scenes-file <path> (one scene id per line)
--split <path> (same format as --scenes-file)
--thresholds: explicit since no preset is consulted in manual mode (e.g. --thresholds 0.05 for metres, --thresholds 1.0 2.0 5.0 for mm).

Caveats¶

GenericAdapter only exposes ground-truth geometry. Trajectory-based alignment modes (traj_*) need pose data, which the generic adapter does not provide. Pass poses externally with --pred-pose-dir if you need trajectory alignment, or write a real adapter.
Adapter -o overrides are rejected in manual mode: the generic adapter is configured solely through --gt-path and the scene source.

Presets¶

Each registered dataset ships built-in defaults for evaluation policy (thresholds, sampling, alignment, Chamfer variant, unit). Inspect the defaults for a dataset with:

e3r benchmark <dataset> --help

Any default can be overridden on the CLI (--thresholds, --aligner, --samples, --seed, --chamfer-variant).

Output¶

The benchmark reports two summaries plus coverage:

summary — mean / median / std / n over successful scenes only.
summary_all — mean / median / std / n over all scenes; missing or failed scenes are filled with the configured defaults (--missing-distance-default, --missing-fscore-default).
coverage — counts of n_total, n_evaluated, n_missing_pred, n_missing_gt, n_failed.

Use --out result.json for the full machine-readable payload (per-scene results, paths, errors) and --csv result.csv for a flat per-scene table.

Combining with other features¶

Occlusion masks — pass --mask-dir plus optional path patterns. See Occlusion Masks for the file-format and transform conventions.
Trajectory alignment — set --align traj_sim3 (or similar) and provide poses either via the prediction manifest or --pred-pose-dir. See Trajectory Formats for accepted file formats.
Cropping to GT bbox — --crop plus --crop-margin <metres>.

Quick reference¶

Flag	Purpose
`<dataset>`	Dataset subcommand, e.g. `scannet`, `generic`.
`--pred-root PATH`	Predictions root. Always required.
`--gt-root PATH`	Ground-truth filesystem root. Always required.
`--split PATH`	Scene-id list. Optional for adapters that auto-discover.
`--gt-path TEMPLATE`	Manual-mode GT template (e.g. `{scene_id}/gt.ply`).
`--scenes "id1,id2"`	Manual-mode inline scene list.
`--scenes-file PATH`	Manual-mode scene file (one id per line).
`-o key=value`	Adapter-specific override (registered mode only).
`--thresholds X [Y ...]`	F-score / precision / recall thresholds.
`--aligner {none,icp_se3,...}`	Alignment mode applied before metrics.
`--samples N`, `--seed N`	Override preset sampling defaults.
`--mask-dir PATH`	Root for mask path patterns. See `masks.md`.
`--out result.json`	Write full per-scene + summary JSON.
`--csv result.csv`	Write per-scene CSV.
`--json`	Emit the JSON payload to stdout (no rich table).
`--workers N`	Process workers; defaults to `min(8, ncpu)`.
`--fail-on-missing`	Hard-error if any scene's prediction can't be located.