Skip to content

Getting Started

This guide explains the end‑to‑end workflow with collections as the primary entry point. It focuses on "what" and "where" — not code.

For detailed methods and syntax examples, see API

For a runnable code pipeline, see Examples


Core objects

The library is organised around three paired layers — a single-recording object and a collection that batches over many recordings. In normal use you operate on collections and let .each dispatch to individual recordings.

Layer Purpose Single Collection Created via
Tracking Raw keypoint coordinates — load, preprocess, QA Tracking TrackingCollection loaders (from_dlc_folder, …)
Features Per-frame derived signals — speeds, distances, boundaries, cluster labels Features FeaturesCollection tc.to_features()
Summary Per-recording scalars and statistics — time in zone, total distance, transition matrices Summary SummaryCollection fc.to_summary()

A fourth object, TrackingMV, wraps multiple views plus calibration for 3D / stereo setups.


Collection helpers

These behaviours are shared by all three collection types (TrackingCollection, FeaturesCollection, SummaryCollection).

.each batch dispatch

.each is the primary way to call a method on every element of a collection at once. coll.each.method(arg) runs element.method(arg) for each element and collects the results.

  • Inplace methods (most Tracking preprocessing) return a BatchResult — a handle-keyed dict of the return values.
  • Non-inplace methods (when inplace=False) return a new collection of the same type.
  • A BatchResult passed as an argument to a subsequent .each call is automatically mapped per-handle rather than broadcast.

Grouping

Collections can be grouped by any subset of tags. Groupings persist when you convert between layers (e.g. a grouped TrackingCollection produces a grouped FeaturesCollection).

  • .groupby(tags) — returns a grouped view keyed by (tag_value, …) tuples.
  • .flatten() — collapse back to a flat collection.
  • .regroup() — recompute grouping after tags have changed.
  • .get_group(key) — retrieve a sub-collection for one group.

Indexing

  • coll["recording1"] — by handle string
  • coll[0] — by integer position
  • coll[0:3] — by slice
  • .loc[] / .iloc[] — label- or integer-based slicing applied in batch to each leaf

Metadata

  • .tags_info() — coverage and unique-value counts per tag across the collection.
  • .stored_info() — overview of what is stored in each element (tracked points / feature columns / summary metrics depending on layer).

Save and load

All objects and collections implement .save(path) / .load(path) that serialise the full object tree — data, metadata, boundaries, groupings, and tags. Parquet is the default format; CSV is available as a fallback.

copy()

All objects and collections implement .copy() for deep copies — useful when exploring parameters without altering the main pipeline object.


Typical workflow

The typical flow is: load → preprocess → features and clustering → summary stats → group comparisons and plots.

1. Load a TrackingCollection

Tracking holds the raw frame-by-frame keypoint coordinates for one recording, together with fps, rescaling metadata, and tags. A TrackingCollection is a keyed mapping of these, loaded from a folder in a single call — each file becomes a Tracking keyed by its filename stem.

Or from an explicit {handle: filepath} mapping: TrackingCollection.from_dlc.

If files are grouped into a few separate directories, each directory can be loaded into a separate TrackingCollection using the relevant from_folder method and then combined using TrackingCollection.merge.


2. Add experimental tags

A tags CSV maps handles to experimental metadata (treatment, genotype, timepoint, …). The handle column must match the filename stems. Tags are preserved through all downstream conversions and drive grouping and CSV export.


3. Batch preprocessing

Standard preprocessing order — all run via .each:

Step Method
Remove low-confidence detections Tracking.filter_likelihood
Interpolate short gaps Tracking.interpolate
Smooth trajectories Tracking.smooth_all
Rescale pixels → metres Tracking.rescale_by_known_distance
Trim start/end Tracking.trim

Most preprocessing methods guard against re-application; use inplace=False on a copy when iterating on parameters.

QA checks:


4. Create a FeaturesCollection

Features wraps a Tracking and provides methods that compute per-frame derived signals — speeds, distances, boundary membership, cluster labels — stored as named columns in Features.data. Convert the whole preprocessed collection:

fc = tc.to_features()

TrackingCollection.to_features preserves grouped structure. The underlying classmethod is FeaturesCollection.from_tracking_collection.


5. Compute features

All feature methods return a FeaturesResult (a pd.Series subclass). Call .store() on the result to persist it with an auto-generated name in Features.data. Batch via .each: fc.each.speed("bodycentre").store().

Distance and movement

Orientation

Boundaries and locations

Define named boundary regions first, then query against them:

Speed thresholds

BatchResult composition

BatchResult objects returned by .each support element-wise operations, letting you build compound features without loops:

  • Logical: in_centre & above_speed, ~in_centre, a | b
  • Arithmetic: in_centre.astype("Int64") * dist_change
  • Comparison: (fc.each.speed("bodycentre") * 100) > 10.0

The composed BatchResult can be stored directly with .store("name").

State composition from booleans

Features.compose_state_from_booleans collapses multiple Boolean columns into a single categorical state column via a {label: series_or_column_name} dict. Frames that are unassigned or overlap become NaN.


6. Embeddings and clustering

Build time-shifted embeddings and run k-means to obtain per-frame behavioural cluster labels, then store them as a feature column for downstream summary.

Use fc.stored_info() to inspect all stored feature columns.


7. Create a SummaryCollection

Summary collapses the per-frame feature time-series for one recording into scalars and Series — total distance, time in zone, state-transition matrices. SummaryCollection batches these and provides group-level analyses and export.

sc = fc.to_summary()

FeaturesCollection.to_summary preserves grouped structure. The underlying classmethod is SummaryCollection.from_features_collection.


8. Generate summary statistics

All summary methods return a SummaryResult. Call .store() to persist with a name. Batch via sc.each.method(...).store(...).

Scalar metrics

State / sequence metrics (return Series)

Pass all_states= to any of these to guarantee a fixed state domain (including states absent in a recording) — important for comparisons across recordings.

Per-state dispatch with .by_state

Summary.by_state(column) returns a proxy that re-runs an aggregate method separately for each value of a state column, returning a Series indexed by state. Supports mean_column, median_column, max_column, min_column.

Temporal binning


9. Export per-recording results

  • SummaryCollection.to_df — flatten stored scalar metrics + tag columns into one tidy DataFrame indexed by handle. Use series="separate" to also export Series metrics (time_in_state, …) as separate DataFrames.
  • SummaryCollection.stored_info — overview of stored metrics across the collection.

10. Group analyses and plotting

Group-level analyses require a grouped SummaryCollection — i.e. one produced from a grouped FeaturesCollection, or grouped after the fact with .groupby(tags). Groupings control how data is split for comparisons, how plot colours are assigned, and which groups BFA contrasts.

seaborn wrapper plots

SummaryCollection provides a family of sns* convenience methods that turn stored metrics into publication-ready group comparison plots. Each accepts a metric name (or a BatchResult) and handles tidy-data preparation, group colouring, and axis layout automatically. All support annotate= for statistical annotations (see below).

Statistical annotations

All sns* methods accept an annotate= argument that passes through to statannotations. Pass "help" to print available tests and options, or a dict with pairs, test, correction, and other statannotations.Annotator.configure kwargs.

Chord diagrams (state transitions)

Transition UMAP

  • SummaryCollection.plot_transition_umap — UMAP embedding of per-recording transition matrices, coloured by group; useful for visualising behavioural similarity structure across the dataset.

Behaviour Flow Analysis (BFA)

BFA tests whether state-transition patterns differ between groups using a shuffle-based null distribution over Manhattan distances between group-level transition matrices.


Working with individual recordings

All methods are available directly on the leaf objects when you don't need the collection layer.