Run this example yourself

Download the oft_pipeline example — unzip and open oft_pipeline.ipynb in Jupyter.

Open Field Test (OFT) — Full analysis pipeline example¶

Setup¶

import json
import os
from pathlib import Path

import numpy as np
import pandas as pd

import py3r.behaviour as p3b

try:
    from IPython.display import display
except ImportError:

    def display(x):
        print(x)


# Skip heavy visualisation deps (pycirclize, umap-learn) in CI
SKIP_HEAVY_VIZ = os.environ.get("CI", "").lower() in ("true", "1", "yes")

# Paths
DATA_DIR = Path("data/tracking")
TAGS_CSV = Path("data/tags.csv")

OUT_DIR = Path(os.environ.get("NB_OUT_DIR", Path.cwd() / "_artifacts"))
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Constants
FPS = 30
N_CLUSTERS = 25

Load & Preprocess¶

Load tracking data¶

Load a TrackingCollection from a folder of DeepLabCut CSV files. Each CSV becomes one Tracking object keyed by its filename stem. The provided fps is written into each leaf's metadata for downstream methods. Return type here is TrackingCollection. Alternative loaders with the same pattern: from_yolo3r_folder, from_dlcma_folder.

tc = p3b.TrackingCollection.from_dlc_folder(
    folder_path=DATA_DIR,
    fps=FPS,
)
print(tc)
# Main object types in `py3r.behaviour` implement `.copy()`.
# We'll keep an untouched copy for didactic examples in this notebook.
tc_raw_for_demo = tc.copy()

<TrackingCollection with 56 Tracking objects>

All Collection objects, like TrackingCollection, implement stored_info() to give a quick overview of their accessible contents

tc.stored_info()

	attached_to	missing_from	dims
point_name
bcl	56	0	[x, y]
bcr	56	0	[x, y]
bl	56	0	[x, y]
bodycentre	56	0	[x, y]
br	56	0	[x, y]
earl	56	0	[x, y]
earr	56	0	[x, y]
headcentre	56	0	[x, y]
hipl	56	0	[x, y]
hipr	56	0	[x, y]
neck	56	0	[x, y]
nose	56	0	[x, y]
tailbase	56	0	[x, y]
tailcentre	56	0	[x, y]
tailtip	56	0	[x, y]
tl	56	0	[x, y]
tr	56	0	[x, y]

Add experimental tags¶

A tags CSV maps recording handles to experimental metadata. It must contain a handle column matching filename stems; every other column becomes a tag key–value pair. tags_info() is a quick schema check: coverage and cardinality per tag. add_tags_from_csv(...) mutates each Tracking in-place and returns None.

tc.add_tags_from_csv(csv_path=TAGS_CSV)
tc.tags_info()

added 168 tags to 56 elements in collection.

	attached_to	missing_from	unique_values
tag
sex	56	0	2
timepoint	56	0	2
treatment	56	0	2

Didactic: batch processing¶

With a TrackingCollection, .each delegates calls to each Tracking. Think "batch call the same Tracking method for all recordings". This .each batch processing pattern also applies to FeaturesCollection and SummaryCollection, as we will see later.

Methods on Tracking are inplace=True by default, so .each returns a BatchResult. If inplace=False, .each returns a TrackingCollection.

Passing a BatchResult back into .each maps values by handle.

demo_inplace = tc_raw_for_demo.copy().each.filter_likelihood(threshold=0.9)
demo_new_collection = tc_raw_for_demo.copy().each.filter_likelihood(
    threshold=0.9,
    inplace=False,
)
print(type(demo_inplace).__name__)  # expected: BatchResult
print(type(demo_new_collection).__name__)  # expected: TrackingCollection

BatchResult
TrackingCollection

Preprocess¶

Standard preprocessing chain: remove low-confidence detections, interpolate short gaps, smooth trajectories, and rescale coordinates to real-world units. This order is intentional: filter -> interpolate -> smooth -> rescale.

In this main path we use in-place behaviour (typical analysis workflow). Equivalent non-in-place variants are shown above in the didactic batch section.

tc.each.filter_likelihood(threshold=0.9)
tc.each.interpolate(limit=5)
tc.each.smooth_all(window=3, method="mean")
tc.each.rescale_by_known_distance(
    point1="tl",
    point2="br",
    distance_in_metres=0.64,
)

BatchResult: 56 items processed (in-place)

Re-running preprocessing¶

Most preprocessing methods guard against re-application. For parameter tuning, prefer inplace=False and work on a copy.

try:
    tc.each.interpolate(limit=5)
except Exception as e:
    print(e)

Error in collection 'None', object 'OFT2_11', method 'interpolate': data already interpolated. re-load the raw data to interpolate again

Quality check — trajectory plots¶

Save trajectory plots for every recording and display one inline for QC. Pattern used here: - batch save all (tc.each.plot(..., savedir=...)) - inspect one representative recording inline (tc[0].plot(...))

trajectories = ["bodycentre"]
static = ["tl", "tr", "bl", "br"]
lines = [("tr", "tl"), ("tl", "bl"), ("bl", "br"), ("br", "tr")]

tc.each.plot(trajectories=trajectories, static=static, lines=lines, show=False, savedir=OUT_DIR)

# Single inline plot for visual QC
tc[0].plot(trajectories=trajectories, static=static, lines=lines, show=True)

(<Figure size 500x500 with 1 Axes>,
 <Axes: title={'center': 'OFT2_11'}, xlabel='x', ylabel='y'>)

Compute Features¶

Create FeaturesCollection¶

A FeaturesCollection wraps every recording's tracking data with methods for computing time-series features. Most feature methods return FeaturesResult; call .store() to persist to Features.data and register metadata in Features.meta.

fc = p3b.FeaturesCollection.from_tracking_collection(tc)

Spatial features — boundaries¶

Define/store named boundaries on each Features leaf, then use either: - mapped BatchResult boundary objects (smart per-handle passthrough), or - boundary names (resolved from stored per-recording assets). Here we use both and assert they match.

ordered_oft_corners = ["tl", "tr", "br", "bl"]

Define and store a centre boundary for each recording.

centre_boundary = fc.each.define_static_boundary(
    ordered_oft_corners,
    scale_dim1=0.5,
    scale_dim2=0.5,
    name="centre",
)

Compare boundary usage styles: pass boundary objects vs stored boundary names.

in_centre = fc.each.within_boundary(point="bodycentre", boundary=centre_boundary)
in_centre_by_name = fc.each.within_boundary(point="bodycentre", boundary="centre")
for handle in fc.keys():
    assert in_centre[handle].equals(in_centre_by_name[handle])

Store the result. Without a manual name, an automatic descriptive name is used. .store always returns the stored name

in_centre.store()

'within_boundary_static_bodycentre_in_centre'

BatchResult supports logical composition (for example, arena periphery).

_ = fc.each.define_static_boundary(
    ordered_oft_corners,
    scale_dim1=0.8,
    scale_dim2=0.8,
    name="not_periphery",
)
_ = fc.each.define_static_boundary(
    ordered_oft_corners,
    name="oft",
)
(
    fc.each.within_boundary("bodycentre", "oft")
    & (~fc.each.within_boundary("bodycentre", "not_periphery"))
).store("in_periphery")

'in_periphery'

Corner occupancy can be represented as a single state feature instead of many independent booleans.

in_corners = dict()
for c in ordered_oft_corners:
    _ = fc.each.define_static_boundary(
        ordered_oft_corners,
        scale_dim1=0.2,
        scale_dim2=0.2,
        name=f"{c}_corner",
        anchor=c,
    )
    in_corners[c] = fc.each.within_boundary("bodycentre", boundary=f"{c}_corner")

# Store a convenience boolean for "in any corner".
(in_corners["tl"] | in_corners["tr"] | in_corners["bl"] | in_corners["br"]).store("in_corner")

'in_corner'

# Store a categorical corner-state feature for state-based analyses.
fc.each.compose_state_from_booleans(in_corners).store("corner_state")

'corner_state'

# Keep these existing columns out of clustering feature selection.
non_bfa_feats = fc[0].data.columns

BatchResult also supports element-wise arithmetic across handles.

dist_change = fc.each.distance_change("bodycentre")
dist_change_in_centre = in_centre.astype("Int64") * dist_change
dist_change_in_centre.store(name="dist_change_bodycentre_in_centre")

'dist_change_bodycentre_in_centre'

# `BatchResult` also supports general binary operations.
fast_outside_centre = ~in_centre & ((fc.each.speed("bodycentre") * 100) > 10.0)
# This is an example only; we do not store it.

Kinematic features for BFA¶

Speeds, angle deviations, inter-keypoint distances, body-part areas, and distance to the arena boundary — the standard feature set for behavioural flow analysis clustering. The loop pattern below intentionally stores each feature as a named column, so later clustering/summary code can reference columns deterministically.

Specific choices used here: - For kinematic polygons, we define named dynamic boundaries, then compute dynamic area (median=False) over their ordered points. - For arena distance, we define named static boundaries and loop over points.

# Speeds
for pt in ["nose", "neck", "earr", "earl", "bodycentre", "hipl", "hipr", "tailbase"]:
    fc.each.speed(pt).store()

Compute angular features.

# Angle deviations
for basepoint, pointdirection1, pointdirection2 in [
    ("tailbase", "hipr", "hipl"),
    ("bodycentre", "tailbase", "neck"),
    ("neck", "bodycentre", "headcentre"),
    ("headcentre", "earr", "earl"),
]:
    fc.each.azimuth_deviation(basepoint, pointdirection1, pointdirection2).store()

Compute inter-keypoint distances.

# Inter-keypoint distances
for p1, p2 in [
    ("nose", "headcentre"),
    ("neck", "headcentre"),
    ("neck", "bodycentre"),
    ("bcr", "bodycentre"),
    ("bcl", "bodycentre"),
    ("tailbase", "bodycentre"),
    ("tailbase", "hipr"),
    ("tailbase", "hipl"),
    ("bcr", "hipr"),
    ("bcl", "hipl"),
    ("bcl", "earl"),
    ("bcr", "earr"),
    ("nose", "earr"),
    ("nose", "earl"),
]:
    fc.each.distance_between(p1, p2).store()

Define dynamic body boundaries and store per-boundary area features.

DYNAMIC_BODY_BOUNDARIES = [
    ("mouse_rear", ["tailbase", "hipr", "hipl"]),
    ("mouse_mid", ["hipr", "hipl", "bcl", "bcr"]),
    ("mouse_front", ["bcr", "earr", "earl", "bcl"]),
    ("mouse_face", ["earr", "nose", "earl"]),
]

for boundary_name, boundary_points in DYNAMIC_BODY_BOUNDARIES:
    fc.each.define_dynamic_boundary(boundary_points, name=boundary_name)
    fc.each.area_of_boundary(boundary_name).store()

Compute distance-to-boundary features for selected points.

STATIC_DISTANCE_TO_BOUNDARY_POINTS = ["nose", "neck", "bodycentre", "tailbase"]

for pt in STATIC_DISTANCE_TO_BOUNDARY_POINTS:
    fc.each.distance_to_boundary(pt, "oft").store()

Inspect stored boundary assets on one recording.

fc[0].list_boundaries()

	kind	n_points	has_vertices
name
centre	static	4	True
not_periphery	static	4	True
oft	static	4	True
tl_corner	static	4	True
tr_corner	static	4	True
br_corner	static	4	True
bl_corner	static	4	True
mouse_rear	dynamic	3	False
mouse_mid	dynamic	4	False
mouse_front	dynamic	4	False
mouse_face	dynamic	3	False

K-means clustering¶

Embed the feature time-series with temporal offsets, then cluster the embedded space with k-means. Returns (cluster_labels, centroids, scaling_factors), where: - cluster_labels is a per-handle BatchResult of label series - centroids is a DataFrame with n_clusters rows

Option notes: - offset controls temporal context window. - cluster_embedding also supports weighting/normalization knobs for advanced runs.

cluster_features = list(set(fc[0].data.columns) - set(non_bfa_feats))
offset = list(np.arange(-15, 16, 1))
embedding_dict = {f: offset for f in cluster_features}

cluster_labels, centroids, _ = fc.cluster_embedding_stream(
    embedding_dict=embedding_dict, n_clusters=N_CLUSTERS
)
cluster_labels.store("kmeans_25", overwrite=True)

'kmeans_25'

# A quick overview of the stored features

fc.stored_info()

	attached_to	missing_from	type
feature
area_of_boundary_mouse_face_dynamic	56	0	float64
area_of_boundary_mouse_front_dynamic	56	0	float64
area_of_boundary_mouse_mid_dynamic	56	0	float64
area_of_boundary_mouse_rear_dynamic	56	0	float64
azimuth_deviation_bodycentre_to_tailbase_and_neck	56	0	float64
azimuth_deviation_headcentre_to_earr_and_earl	56	0	float64
azimuth_deviation_neck_to_bodycentre_and_headcentre	56	0	float64
azimuth_deviation_tailbase_to_hipr_and_hipl	56	0	float64
corner_state	56	0	object
dist_change_bodycentre_in_centre	56	0	Float64
distance_between_bcl_and_bodycentre_in_xy	56	0	float64
distance_between_bcl_and_earl_in_xy	56	0	float64
distance_between_bcl_and_hipl_in_xy	56	0	float64
distance_between_bcr_and_bodycentre_in_xy	56	0	float64
distance_between_bcr_and_earr_in_xy	56	0	float64
distance_between_bcr_and_hipr_in_xy	56	0	float64
distance_between_neck_and_bodycentre_in_xy	56	0	float64
distance_between_neck_and_headcentre_in_xy	56	0	float64
distance_between_nose_and_earl_in_xy	56	0	float64
distance_between_nose_and_earr_in_xy	56	0	float64
distance_between_nose_and_headcentre_in_xy	56	0	float64
distance_between_tailbase_and_bodycentre_in_xy	56	0	float64
distance_between_tailbase_and_hipl_in_xy	56	0	float64
distance_between_tailbase_and_hipr_in_xy	56	0	float64
distance_to_boundary_static_bodycentre_in_oft	56	0	float64
distance_to_boundary_static_neck_in_oft	56	0	float64
distance_to_boundary_static_nose_in_oft	56	0	float64
distance_to_boundary_static_tailbase_in_oft	56	0	float64
in_corner	56	0	boolean
in_periphery	56	0	boolean
kmeans_25	56	0	Int64
speed_of_bodycentre_in_xy	56	0	float64
speed_of_earl_in_xy	56	0	float64
speed_of_earr_in_xy	56	0	float64
speed_of_hipl_in_xy	56	0	float64
speed_of_hipr_in_xy	56	0	float64
speed_of_neck_in_xy	56	0	float64
speed_of_nose_in_xy	56	0	float64
speed_of_tailbase_in_xy	56	0	float64
within_boundary_static_bodycentre_in_centre	56	0	boolean

Save features to disk¶

save() writes a collection manifest plus per-handle element folders. This makes downstream loading deterministic and auditable. Later you can reconstruct with p3b.FeaturesCollection.load(path).

fc.save(f"{OUT_DIR}/features", data_format="csv", overwrite=True)

Summarise¶

Create SummaryCollection¶

Each Summary object holds scalar (or Series) metrics computed from a single recording's features. Return type here is SummaryCollection.

sc = p3b.SummaryCollection.from_features_collection(fc)

Compute summary measures¶

Call summary methods and .store() the result to persist it. Stored summary metrics become scalar columns in each Summary.data record.

Same pattern as features: - compute result (sc.total_distance(...), etc.) - then .store(...) to persist by metric name.

sc.each.total_distance("bodycentre").store()
sc.each.time_true("within_boundary_static_bodycentre_in_centre").store("time_in_centre")
sc.each.sum_column("dist_change_bodycentre_in_centre").store(name="distance_moved_in_centre")

# by_state API example: average speed by composed spatial zone.
sc.each.by_state(
    "corner_state",
    all_states=ordered_oft_corners,
).mean_column("speed_of_bodycentre_in_xy").store("mean_speed_corners")

# by_state + all_states API example: force explicit cluster domain (0-9),
# including states absent in a recording.
sc.each.by_state("kmeans_25", all_states=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).mean_column(
    "speed_of_bodycentre_in_xy"
).store("mean_speed_bodycentre_by_kmeans_25")

'mean_speed_bodycentre_by_kmeans_25'

A quick overview of the stored summaries

sc.stored_info()

	attached_to	missing_from	type
summary
distance_moved_in_centre	56	0	float64
mean_speed_bodycentre_by_kmeans_25	56	0	Series
mean_speed_corners	56	0	Series
time_in_centre	56	0	float64
total_distance_bodycentre	56	0	float64

Export results to CSV¶

to_df(include_tags=True) flattens summary metrics + selected tag columns into one analysis-ready table (indexed by handle). By default, series metrics, like time_in_state, are ignored (series="ignore"). If series="separate" then each series metric will be output as its own df over the collection.

summary_df, series_dfs = sc.to_df(include_tags=True, series="separate")
summary_df.to_csv(f"{OUT_DIR}/OFT_results.csv")

display(summary_df.head())
for key, val in series_dfs.items():
    print(key)
    display(val.head())

	total_distance_bodycentre	time_in_centre	distance_moved_in_centre	tag_timepoint	tag_treatment	tag_sex
handle
OFT2_11	5.670102	4.300000	0.616402	post	control	M
OFT1_6	10.727781	10.733333	1.647000	pre	stressor	F
OFT1_7	11.464600	7.033333	1.098946	pre	stressor	M
OFT2_10	5.981597	1.800000	0.467466	post	stressor	F
OFT2_12	3.890576	1.333333	0.088791	post	stressor	F

mean_speed_corners

	tl	tr	br	bl	tag_timepoint	tag_treatment	tag_sex
handle
OFT2_11	0.042689	0.061581	0.046867	0.015505	post	control	M
OFT1_6	0.079932	0.085120	0.055881	0.062372	pre	stressor	F
OFT1_7	0.082124	0.079620	0.072632	0.073711	pre	stressor	M
OFT2_10	0.036688	0.030134	0.065599	0.073706	post	stressor	F
OFT2_12	0.017380	0.059851	0.025541	0.044923	post	stressor	F

mean_speed_bodycentre_by_kmeans_25

	0	1	2	3	4	5	6	7	8	9	tag_timepoint	tag_treatment	tag_sex
handle
OFT2_11	0.025574	0.101113	0.099135	0.057878	0.143741	0.069975	0.081711	0.089199	0.112823	0.136072	post	control	M
OFT1_6	0.136441	0.071986	0.147371	0.109992	0.130053	0.143835	0.090680	0.126470	0.149374	0.053129	pre	stressor	F
OFT1_7	0.131161	0.117809	0.113949	0.135518	0.169374	0.166234	0.083589	0.061509	0.133972	0.117163	pre	stressor	M
OFT2_10	0.029245	0.107414	0.172519	0.070635	0.151966	0.082346	0.052644	0.120902	0.156323	0.209330	post	stressor	F
OFT2_12	0.020928	0.050408	0.029152	0.067833	0.135331	0.032628	0.028133	0.008224	0.074553	0.041765	post	stressor	F

Visualise¶

The sns* methods on SummaryCollection wrap seaborn categorical plots with sensible defaults — auto titles, y-labels, filenames, and colour palettes. All sns* helpers return (fig, ax, tidy_df) to support both quick plotting and explicit downstream checks/customization.

In practice: - pass a stored metric name ("total_distance_bodycentre") for reuse - or pass a live SummaryResult for one-off plotting.

Plot types compared (ungrouped)¶

Three views of the same metric — total_distance_bodycentre — to compare what each plot type looks like.

Also available: snsbox, snsviolin, snspoint, snsswarm.

sc.each.time_in_state("kmeans_25").store("time_in_cluster")
fig, ax, df_strip = sc.snsstrip(
    "time_in_cluster",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=OUT_DIR,
)
fig, ax, df_bar = sc.snsbar(
    "time_in_cluster",
    show=True,
    savedir=OUT_DIR,
)
fig, ax, df_super = sc.snssuperplot(
    "time_in_cluster",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=OUT_DIR,
)

Single Summary delegation¶

Individual Summary objects can call the same sns* methods. They delegate to a 1-item SummaryCollection internally. The auto filename is prefixed with the recording handle.

single = sc[list(sc.keys())[0]]
fig, ax, df_single = single.snsbar(
    single.time_in_state("within_boundary_static_bodycentre_in_centre"),
    show=True,
    savedir=OUT_DIR,
)

Grouped plots¶

Group by experimental tags with groupby() to compare conditions directly. Use group_order to control x-axis arrangement. groupby(...) returns a grouped SummaryCollection with the same plotting API.

sc_grouped = sc.groupby(tags=["treatment", "timepoint"])

# Keys = tag names (must match groupby tags), values = desired display order
GROUP_ORDER = {"treatment": ["control", "stressor"], "timepoint": ["pre", "post"]}

# Scalar metric — grouped superplot
fig, ax, df_gsup = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=str(OUT_DIR),
)

# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
    sc_grouped.each.time_in_state("kmeans_25"),
    group_order=GROUP_ORDER,
    show=True,
    savedir=str(OUT_DIR),
)

Even though the summary metric 'time_in_cluster' was created before grouping, the grouped plots work as expected with this summary metric (but the auto-generated title is different, because we stored it with a manual name).

# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
    "time_in_cluster",
    group_order=GROUP_ORDER,
    show=True,
    savedir=str(OUT_DIR),
)

sort_by — independent spatial ordering¶

sort_by overrides the spatial arrangement on the x-axis without changing colour assignment. Here groupby(tags=["treatment", "timepoint"]) means treatment drives the base colour (control=blue, stressor=orange). Adding sort_by="timepoint" interleaves control/stressor within each timepoint.

# Interleaved superplot — timepoint as primary spatial axis, colours by treatment
fig, ax, df_interleaved = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    sort_by="timepoint",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=str(OUT_DIR),
    filename="total_distance_interleaved_superplot.png",
)

# Power-user workflow with prepare_plot — full seaborn control
import seaborn as sns

spec = sc_grouped.prepare_plot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    sort_by=["timepoint", "treatment"],
)
sns.boxplot(**spec.sns_kwargs, width=0.6)
spec.ax.set_ylabel(spec.ylabel)
spec.ax.set_title("Custom: prepare_plot + boxplot")
import matplotlib.pyplot as plt

plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

Statistical annotations¶

Use annotate="help" to discover available tests, corrections, and the group labels in your data. Then pass annotate={...} with actual pairs.

# Discover labels and options (no annotation applied, just prints a guide)
fig_ann, ax_ann, df_ann = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    annotate="help",
    random_state=42,  # optional, for point jitter
    show=False,
)

=== Statistical Annotation Guide ===

annotate={
    "pairs": [("groupA", "groupB"), ...],  # REQUIRED
    "test": "Mann-Whitney",                # see below
    "correction": None,                    # see below
    "text_format": "star",                 # "star", "simple", "full"
    "headroom": None,                      # float multiplier, see below
}

Available tests:
  Parametric:     t-test_ind, t-test_welch, t-test_paired
  Non-parametric: Mann-Whitney, Wilcoxon, Kruskal, Brunner-Munzel
  Other:          Levene (variance equality)

  Tip: Mann-Whitney is a safe default for most behavioural data.
  Use paired tests (t-test_paired, Wilcoxon) for repeated measures.
  Use parametric tests only if data is normally distributed.

Multiple comparisons correction (recommended for >3 pairs):
  FWER (conservative): bonferroni, holm
  FDR  (less conservative): fdr_bh (Benjamini-Hochberg), fdr_by

Headroom:
  Extra vertical space for brackets, as a fraction of the y range.
  E.g. headroom=0.3 adds 30%% extra room above the data.

Your labels: ['control, post', 'control, pre', 'stressor, post', 'stressor, pre']

# Apply annotations
fig_ann, ax_ann, df_ann = sc_grouped.snsbox(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    annotate={
        "pairs": [("control, pre", "stressor, pre"), ("control, post", "stressor, post")],
        "test": "Mann-Whitney",
        "correction": None,
        "text_format": "star",
        "headroom": 0.0,  # add extra space for annotations if needed
    },
    savedir=str(OUT_DIR),
    filename="total_distance_annotated_superplot.png",
    show=True,
)

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

control, pre vs. stressor, pre: Mann-Whitney-Wilcoxon test two-sided, P_val:5.053e-01 U_stat=1.130e+02
control, post vs. stressor, post: Mann-Whitney-Wilcoxon test two-sided, P_val:1.926e-03 U_stat=1.660e+02

Metric input options¶

Two ways to pass a metric to any sns* method:

String key — a previously stored metric name
SummaryResult object — inline computation (not stored) Both options can represent single- or multi-component metrics.

# 1. String key
fig, ax, _ = sc.snsstrip(
    "total_distance_bodycentre",
    random_state=42,  # optional, for point jitter
    show=False,
)

# 2. SummaryResult object (inline)
fig, ax, df_mc = sc.snsbar(
    sc.each.time_in_state("within_boundary_static_bodycentre_in_centre"),
    show=False,
)

Multi-metric plotting¶

sns* methods can accept multiple metrics via list input, or alias maps via dict input. merge_by controls how metrics are combined (default: "metric"). When plotting multiple metrics together, they must share a common y-axis label.

# Ungrouped multi-metric demo combining two by_state metrics with the same y-axis
# (mean speed of bodycentre):
# - corners
# - kmeans clusters with explicit all_states=[0..9]
fig, ax, df_multi_flat = sc.snsbar(
    {
        "corners": "mean_speed_corners",
        "kmeans_0_to_9": "mean_speed_bodycentre_by_kmeans_25",
    },
    show=True,
    savedir=OUT_DIR,
    filename="demo_multi_metric_by_state_speed_barplot.png",
)

# Grouped multi-metric demo
fig, ax, df_multi_grouped = sc_grouped.snsbar(
    ["time_in_centre", "time_in_cluster"],
    merge_by=None,
    group_order=GROUP_ORDER,
    show=True,
    savedir=OUT_DIR,
    filename="demo_multi_metric_grouped_barplot.png",
)

Behaviour Flow Analysis (BFA)¶

Compute BFA results and statistics¶

bfa() returns a nested dict of observed/shuffled transition statistics, and bfa_stats() derives effect-size-style summaries for reporting. all_states=np.arange(0, N_CLUSTERS) makes the state space explicit.

bfa_results = sc_grouped.bfa(
    column="kmeans_25",
    all_states=np.arange(0, N_CLUSTERS),
    random_state=42,
)
bfa_stats = p3b.SummaryCollection.bfa_stats(bfa_results)

with open(f"{OUT_DIR}/bfa_results.json", "w") as f:
    json.dump(bfa_results, f, indent=4)
with open(f"{OUT_DIR}/bfa_stats.json", "w") as f:
    json.dump(bfa_stats, f, indent=4)

BFA histograms¶

Distribution of shuffled transition values vs observed, per group comparison. Useful as a quick sanity check before interpreting chord/UMAP views.

p3b.SummaryCollection.plot_bfa_results(
    bfa_results,
    add_stats=True,
    stats=bfa_stats,
    bins=20,
    figsize=(4, 3),
    save_dir=OUT_DIR,
    show=True,
)

{"('control', 'post')_vs_('stressor', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('stressor', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('control', 'post')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
 "('control', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'pre')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'pre')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'pre')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'pre')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>)}

Chord diagrams¶

Requires pycirclize — install with pip install py3r-behaviour[viz].

if not SKIP_HEAVY_VIZ:
    sc_grouped.plot_chord(
        column="kmeans_25",
        all_states=np.arange(0, N_CLUSTERS),
        save_dir=OUT_DIR,
        show=True,
        start=-265,
        end=95,
        space=5,
        r_lim=(93, 100),
        label_kws=dict(r=94, size=12, color="white"),
        link_kws=dict(ec="black", lw=0.5),
    )

UMAP embedding of transition matrices¶

Requires umap-learn — install with pip install py3r-behaviour[viz].

if not SKIP_HEAVY_VIZ:
    fig, ax = sc_grouped.plot_transition_umap(
        column="kmeans_25",
        all_states=np.arange(0, N_CLUSTERS),
        n_neighbors=15,
        min_dist=0.1,
        random_state=42,
        figsize=(6, 5),
        show=True,
        save_dir=str(OUT_DIR),
    )