Run this example yourself
Download the oft_pipeline example — unzip and open oft_pipeline.ipynb in Jupyter.
Open Field Test (OFT) — Full analysis pipeline example¶
Setup¶
import json
import os
from pathlib import Path
import numpy as np
import pandas as pd
import py3r.behaviour as p3b
try:
from IPython.display import display
except ImportError:
def display(x):
print(x)
# Skip heavy visualisation deps (pycirclize, umap-learn) in CI
SKIP_HEAVY_VIZ = os.environ.get("CI", "").lower() in ("true", "1", "yes")
# Paths
DATA_DIR = Path("data/tracking")
TAGS_CSV = Path("data/tags.csv")
OUT_DIR = Path(os.environ.get("NB_OUT_DIR", Path.cwd() / "_artifacts"))
OUT_DIR.mkdir(parents=True, exist_ok=True)
# Constants
FPS = 30
N_CLUSTERS = 25
Load & Preprocess¶
Load tracking data¶
Load a TrackingCollection from a folder of DeepLabCut CSV files.
Each CSV becomes one Tracking object keyed by its filename stem.
The provided fps is written into each leaf's metadata for downstream methods.
Return type here is TrackingCollection.
Alternative loaders with the same pattern: from_yolo3r_folder, from_dlcma_folder.
tc = p3b.TrackingCollection.from_dlc_folder(
folder_path=DATA_DIR,
fps=FPS,
)
print(tc)
# Main object types in `py3r.behaviour` implement `.copy()`.
# We'll keep an untouched copy for didactic examples in this notebook.
tc_raw_for_demo = tc.copy()
<TrackingCollection with 56 Tracking objects>
All Collection objects, like TrackingCollection, implement stored_info()
to give a quick overview of their accessible contents
tc.stored_info()
| attached_to | missing_from | dims | |
|---|---|---|---|
| point_name | |||
| bcl | 56 | 0 | [x, y] |
| bcr | 56 | 0 | [x, y] |
| bl | 56 | 0 | [x, y] |
| bodycentre | 56 | 0 | [x, y] |
| br | 56 | 0 | [x, y] |
| earl | 56 | 0 | [x, y] |
| earr | 56 | 0 | [x, y] |
| headcentre | 56 | 0 | [x, y] |
| hipl | 56 | 0 | [x, y] |
| hipr | 56 | 0 | [x, y] |
| neck | 56 | 0 | [x, y] |
| nose | 56 | 0 | [x, y] |
| tailbase | 56 | 0 | [x, y] |
| tailcentre | 56 | 0 | [x, y] |
| tailtip | 56 | 0 | [x, y] |
| tl | 56 | 0 | [x, y] |
| tr | 56 | 0 | [x, y] |
Add experimental tags¶
A tags CSV maps recording handles to experimental metadata.
It must contain a handle column matching filename stems;
every other column becomes a tag key–value pair.
tags_info() is a quick schema check: coverage and cardinality per tag.
add_tags_from_csv(...) mutates each Tracking in-place and returns None.
tc.add_tags_from_csv(csv_path=TAGS_CSV)
tc.tags_info()
added 168 tags to 56 elements in collection.
| attached_to | missing_from | unique_values | |
|---|---|---|---|
| tag | |||
| sex | 56 | 0 | 2 |
| timepoint | 56 | 0 | 2 |
| treatment | 56 | 0 | 2 |
Didactic: batch processing¶
With a TrackingCollection, .each delegates calls to each Tracking.
Think "batch call the same Tracking method for all recordings".
This .each batch processing pattern also applies to FeaturesCollection
and SummaryCollection, as we will see later.
Methods on Tracking are inplace=True by default, so .each returns a
BatchResult. If inplace=False, .each returns a TrackingCollection.
Passing a BatchResult back into .each maps values by handle.
demo_inplace = tc_raw_for_demo.copy().each.filter_likelihood(threshold=0.9)
demo_new_collection = tc_raw_for_demo.copy().each.filter_likelihood(
threshold=0.9,
inplace=False,
)
print(type(demo_inplace).__name__) # expected: BatchResult
print(type(demo_new_collection).__name__) # expected: TrackingCollection
BatchResult
TrackingCollection
Preprocess¶
Standard preprocessing chain: remove low-confidence detections, interpolate short gaps, smooth trajectories, and rescale coordinates to real-world units. This order is intentional: filter -> interpolate -> smooth -> rescale.
In this main path we use in-place behaviour (typical analysis workflow). Equivalent non-in-place variants are shown above in the didactic batch section.
tc.each.filter_likelihood(threshold=0.9)
tc.each.interpolate(limit=5)
tc.each.smooth_all(window=3, method="mean")
tc.each.rescale_by_known_distance(
point1="tl",
point2="br",
distance_in_metres=0.64,
)
BatchResult: 56 items processed (in-place)
Re-running preprocessing¶
Most preprocessing methods guard against re-application. For parameter tuning,
prefer inplace=False and work on a copy.
try:
tc.each.interpolate(limit=5)
except Exception as e:
print(e)
Error in collection 'None', object 'OFT2_11', method 'interpolate': data already interpolated. re-load the raw data to interpolate again
Quality check — trajectory plots¶
Save trajectory plots for every recording and display one inline for QC.
Pattern used here:
- batch save all (tc.each.plot(..., savedir=...))
- inspect one representative recording inline (tc[0].plot(...))
trajectories = ["bodycentre"]
static = ["tl", "tr", "bl", "br"]
lines = [("tr", "tl"), ("tl", "bl"), ("bl", "br"), ("br", "tr")]
tc.each.plot(trajectories=trajectories, static=static, lines=lines, show=False, savedir=OUT_DIR)
# Single inline plot for visual QC
tc[0].plot(trajectories=trajectories, static=static, lines=lines, show=True)
(<Figure size 500x500 with 1 Axes>,
<Axes: title={'center': 'OFT2_11'}, xlabel='x', ylabel='y'>)
Compute Features¶
Create FeaturesCollection¶
A FeaturesCollection wraps every recording's tracking data with
methods for computing time-series features.
Most feature methods return FeaturesResult; call .store() to persist
to Features.data and register metadata in Features.meta.
fc = p3b.FeaturesCollection.from_tracking_collection(tc)
Spatial features — boundaries¶
Define/store named boundaries on each Features leaf, then use either:
- mapped BatchResult boundary objects (smart per-handle passthrough), or
- boundary names (resolved from stored per-recording assets).
Here we use both and assert they match.
ordered_oft_corners = ["tl", "tr", "br", "bl"]
Define and store a centre boundary for each recording.
centre_boundary = fc.each.define_static_boundary(
ordered_oft_corners,
scale_dim1=0.5,
scale_dim2=0.5,
name="centre",
)
Compare boundary usage styles: pass boundary objects vs stored boundary names.
in_centre = fc.each.within_boundary(point="bodycentre", boundary=centre_boundary)
in_centre_by_name = fc.each.within_boundary(point="bodycentre", boundary="centre")
for handle in fc.keys():
assert in_centre[handle].equals(in_centre_by_name[handle])
Store the result. Without a manual name, an automatic descriptive name is used.
.store always returns the stored name
in_centre.store()
'within_boundary_static_bodycentre_in_centre'
BatchResult supports logical composition (for example, arena periphery).
_ = fc.each.define_static_boundary(
ordered_oft_corners,
scale_dim1=0.8,
scale_dim2=0.8,
name="not_periphery",
)
_ = fc.each.define_static_boundary(
ordered_oft_corners,
name="oft",
)
(
fc.each.within_boundary("bodycentre", "oft")
& (~fc.each.within_boundary("bodycentre", "not_periphery"))
).store("in_periphery")
'in_periphery'
Corner occupancy can be represented as a single state feature instead of many independent booleans.
in_corners = dict()
for c in ordered_oft_corners:
_ = fc.each.define_static_boundary(
ordered_oft_corners,
scale_dim1=0.2,
scale_dim2=0.2,
name=f"{c}_corner",
anchor=c,
)
in_corners[c] = fc.each.within_boundary("bodycentre", boundary=f"{c}_corner")
# Store a convenience boolean for "in any corner".
(in_corners["tl"] | in_corners["tr"] | in_corners["bl"] | in_corners["br"]).store("in_corner")
'in_corner'
# Store a categorical corner-state feature for state-based analyses.
fc.each.compose_state_from_booleans(in_corners).store("corner_state")
'corner_state'
# Keep these existing columns out of clustering feature selection.
non_bfa_feats = fc[0].data.columns
BatchResult also supports element-wise arithmetic across handles.
dist_change = fc.each.distance_change("bodycentre")
dist_change_in_centre = in_centre.astype("Int64") * dist_change
dist_change_in_centre.store(name="dist_change_bodycentre_in_centre")
'dist_change_bodycentre_in_centre'
# `BatchResult` also supports general binary operations.
fast_outside_centre = ~in_centre & ((fc.each.speed("bodycentre") * 100) > 10.0)
# This is an example only; we do not store it.
Kinematic features for BFA¶
Speeds, angle deviations, inter-keypoint distances, body-part areas, and distance to the arena boundary — the standard feature set for behavioural flow analysis clustering. The loop pattern below intentionally stores each feature as a named column, so later clustering/summary code can reference columns deterministically.
Specific choices used here:
- For kinematic polygons, we define named dynamic boundaries, then compute
dynamic area (median=False) over their ordered points.
- For arena distance, we define named static boundaries and loop over points.
# Speeds
for pt in ["nose", "neck", "earr", "earl", "bodycentre", "hipl", "hipr", "tailbase"]:
fc.each.speed(pt).store()
Compute angular features.
# Angle deviations
for basepoint, pointdirection1, pointdirection2 in [
("tailbase", "hipr", "hipl"),
("bodycentre", "tailbase", "neck"),
("neck", "bodycentre", "headcentre"),
("headcentre", "earr", "earl"),
]:
fc.each.azimuth_deviation(basepoint, pointdirection1, pointdirection2).store()
Compute inter-keypoint distances.
# Inter-keypoint distances
for p1, p2 in [
("nose", "headcentre"),
("neck", "headcentre"),
("neck", "bodycentre"),
("bcr", "bodycentre"),
("bcl", "bodycentre"),
("tailbase", "bodycentre"),
("tailbase", "hipr"),
("tailbase", "hipl"),
("bcr", "hipr"),
("bcl", "hipl"),
("bcl", "earl"),
("bcr", "earr"),
("nose", "earr"),
("nose", "earl"),
]:
fc.each.distance_between(p1, p2).store()
Define dynamic body boundaries and store per-boundary area features.
DYNAMIC_BODY_BOUNDARIES = [
("mouse_rear", ["tailbase", "hipr", "hipl"]),
("mouse_mid", ["hipr", "hipl", "bcl", "bcr"]),
("mouse_front", ["bcr", "earr", "earl", "bcl"]),
("mouse_face", ["earr", "nose", "earl"]),
]
for boundary_name, boundary_points in DYNAMIC_BODY_BOUNDARIES:
fc.each.define_dynamic_boundary(boundary_points, name=boundary_name)
fc.each.area_of_boundary(boundary_name).store()
Compute distance-to-boundary features for selected points.
STATIC_DISTANCE_TO_BOUNDARY_POINTS = ["nose", "neck", "bodycentre", "tailbase"]
for pt in STATIC_DISTANCE_TO_BOUNDARY_POINTS:
fc.each.distance_to_boundary(pt, "oft").store()
Inspect stored boundary assets on one recording.
fc[0].list_boundaries()
| kind | n_points | has_vertices | |
|---|---|---|---|
| name | |||
| centre | static | 4 | True |
| not_periphery | static | 4 | True |
| oft | static | 4 | True |
| tl_corner | static | 4 | True |
| tr_corner | static | 4 | True |
| br_corner | static | 4 | True |
| bl_corner | static | 4 | True |
| mouse_rear | dynamic | 3 | False |
| mouse_mid | dynamic | 4 | False |
| mouse_front | dynamic | 4 | False |
| mouse_face | dynamic | 3 | False |
K-means clustering¶
Embed the feature time-series with temporal offsets, then cluster
the embedded space with k-means.
Returns (cluster_labels, centroids, scaling_factors), where:
- cluster_labels is a per-handle BatchResult of label series
- centroids is a DataFrame with n_clusters rows
Option notes:
- offset controls temporal context window.
- cluster_embedding also supports weighting/normalization knobs for advanced runs.
cluster_features = list(set(fc[0].data.columns) - set(non_bfa_feats))
offset = list(np.arange(-15, 16, 1))
embedding_dict = {f: offset for f in cluster_features}
cluster_labels, centroids, _ = fc.cluster_embedding_stream(
embedding_dict=embedding_dict, n_clusters=N_CLUSTERS
)
cluster_labels.store("kmeans_25", overwrite=True)
'kmeans_25'
# A quick overview of the stored features
fc.stored_info()
| attached_to | missing_from | type | |
|---|---|---|---|
| feature | |||
| area_of_boundary_mouse_face_dynamic | 56 | 0 | float64 |
| area_of_boundary_mouse_front_dynamic | 56 | 0 | float64 |
| area_of_boundary_mouse_mid_dynamic | 56 | 0 | float64 |
| area_of_boundary_mouse_rear_dynamic | 56 | 0 | float64 |
| azimuth_deviation_bodycentre_to_tailbase_and_neck | 56 | 0 | float64 |
| azimuth_deviation_headcentre_to_earr_and_earl | 56 | 0 | float64 |
| azimuth_deviation_neck_to_bodycentre_and_headcentre | 56 | 0 | float64 |
| azimuth_deviation_tailbase_to_hipr_and_hipl | 56 | 0 | float64 |
| corner_state | 56 | 0 | object |
| dist_change_bodycentre_in_centre | 56 | 0 | Float64 |
| distance_between_bcl_and_bodycentre_in_xy | 56 | 0 | float64 |
| distance_between_bcl_and_earl_in_xy | 56 | 0 | float64 |
| distance_between_bcl_and_hipl_in_xy | 56 | 0 | float64 |
| distance_between_bcr_and_bodycentre_in_xy | 56 | 0 | float64 |
| distance_between_bcr_and_earr_in_xy | 56 | 0 | float64 |
| distance_between_bcr_and_hipr_in_xy | 56 | 0 | float64 |
| distance_between_neck_and_bodycentre_in_xy | 56 | 0 | float64 |
| distance_between_neck_and_headcentre_in_xy | 56 | 0 | float64 |
| distance_between_nose_and_earl_in_xy | 56 | 0 | float64 |
| distance_between_nose_and_earr_in_xy | 56 | 0 | float64 |
| distance_between_nose_and_headcentre_in_xy | 56 | 0 | float64 |
| distance_between_tailbase_and_bodycentre_in_xy | 56 | 0 | float64 |
| distance_between_tailbase_and_hipl_in_xy | 56 | 0 | float64 |
| distance_between_tailbase_and_hipr_in_xy | 56 | 0 | float64 |
| distance_to_boundary_static_bodycentre_in_oft | 56 | 0 | float64 |
| distance_to_boundary_static_neck_in_oft | 56 | 0 | float64 |
| distance_to_boundary_static_nose_in_oft | 56 | 0 | float64 |
| distance_to_boundary_static_tailbase_in_oft | 56 | 0 | float64 |
| in_corner | 56 | 0 | boolean |
| in_periphery | 56 | 0 | boolean |
| kmeans_25 | 56 | 0 | Int64 |
| speed_of_bodycentre_in_xy | 56 | 0 | float64 |
| speed_of_earl_in_xy | 56 | 0 | float64 |
| speed_of_earr_in_xy | 56 | 0 | float64 |
| speed_of_hipl_in_xy | 56 | 0 | float64 |
| speed_of_hipr_in_xy | 56 | 0 | float64 |
| speed_of_neck_in_xy | 56 | 0 | float64 |
| speed_of_nose_in_xy | 56 | 0 | float64 |
| speed_of_tailbase_in_xy | 56 | 0 | float64 |
| within_boundary_static_bodycentre_in_centre | 56 | 0 | boolean |
Save features to disk¶
save() writes a collection manifest plus per-handle element folders.
This makes downstream loading deterministic and auditable.
Later you can reconstruct with p3b.FeaturesCollection.load(path).
fc.save(f"{OUT_DIR}/features", data_format="csv", overwrite=True)
Summarise¶
Create SummaryCollection¶
Each Summary object holds scalar (or Series) metrics computed from
a single recording's features.
Return type here is SummaryCollection.
sc = p3b.SummaryCollection.from_features_collection(fc)
Compute summary measures¶
Call summary methods and .store() the result to persist it.
Stored summary metrics become scalar columns in each Summary.data record.
Same pattern as features:
- compute result (sc.total_distance(...), etc.)
- then .store(...) to persist by metric name.
sc.each.total_distance("bodycentre").store()
sc.each.time_true("within_boundary_static_bodycentre_in_centre").store("time_in_centre")
sc.each.sum_column("dist_change_bodycentre_in_centre").store(name="distance_moved_in_centre")
# by_state API example: average speed by composed spatial zone.
sc.each.by_state(
"corner_state",
all_states=ordered_oft_corners,
).mean_column("speed_of_bodycentre_in_xy").store("mean_speed_corners")
# by_state + all_states API example: force explicit cluster domain (0-9),
# including states absent in a recording.
sc.each.by_state("kmeans_25", all_states=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).mean_column(
"speed_of_bodycentre_in_xy"
).store("mean_speed_bodycentre_by_kmeans_25")
'mean_speed_bodycentre_by_kmeans_25'
A quick overview of the stored summaries
sc.stored_info()
| attached_to | missing_from | type | |
|---|---|---|---|
| summary | |||
| distance_moved_in_centre | 56 | 0 | float64 |
| mean_speed_bodycentre_by_kmeans_25 | 56 | 0 | Series |
| mean_speed_corners | 56 | 0 | Series |
| time_in_centre | 56 | 0 | float64 |
| total_distance_bodycentre | 56 | 0 | float64 |
Export results to CSV¶
to_df(include_tags=True) flattens summary metrics + selected tag columns
into one analysis-ready table (indexed by handle).
By default, series metrics, like time_in_state, are ignored (series="ignore").
If series="separate" then each series metric will be output as its own df over the collection.
summary_df, series_dfs = sc.to_df(include_tags=True, series="separate")
summary_df.to_csv(f"{OUT_DIR}/OFT_results.csv")
display(summary_df.head())
for key, val in series_dfs.items():
print(key)
display(val.head())
| total_distance_bodycentre | time_in_centre | distance_moved_in_centre | tag_timepoint | tag_treatment | tag_sex | |
|---|---|---|---|---|---|---|
| handle | ||||||
| OFT2_11 | 5.670102 | 4.300000 | 0.616402 | post | control | M |
| OFT1_6 | 10.727781 | 10.733333 | 1.647000 | pre | stressor | F |
| OFT1_7 | 11.464600 | 7.033333 | 1.098946 | pre | stressor | M |
| OFT2_10 | 5.981597 | 1.800000 | 0.467466 | post | stressor | F |
| OFT2_12 | 3.890576 | 1.333333 | 0.088791 | post | stressor | F |
mean_speed_corners
| tl | tr | br | bl | tag_timepoint | tag_treatment | tag_sex | |
|---|---|---|---|---|---|---|---|
| handle | |||||||
| OFT2_11 | 0.042689 | 0.061581 | 0.046867 | 0.015505 | post | control | M |
| OFT1_6 | 0.079932 | 0.085120 | 0.055881 | 0.062372 | pre | stressor | F |
| OFT1_7 | 0.082124 | 0.079620 | 0.072632 | 0.073711 | pre | stressor | M |
| OFT2_10 | 0.036688 | 0.030134 | 0.065599 | 0.073706 | post | stressor | F |
| OFT2_12 | 0.017380 | 0.059851 | 0.025541 | 0.044923 | post | stressor | F |
mean_speed_bodycentre_by_kmeans_25
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | tag_timepoint | tag_treatment | tag_sex | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| handle | |||||||||||||
| OFT2_11 | 0.025574 | 0.101113 | 0.099135 | 0.057878 | 0.143741 | 0.069975 | 0.081711 | 0.089199 | 0.112823 | 0.136072 | post | control | M |
| OFT1_6 | 0.136441 | 0.071986 | 0.147371 | 0.109992 | 0.130053 | 0.143835 | 0.090680 | 0.126470 | 0.149374 | 0.053129 | pre | stressor | F |
| OFT1_7 | 0.131161 | 0.117809 | 0.113949 | 0.135518 | 0.169374 | 0.166234 | 0.083589 | 0.061509 | 0.133972 | 0.117163 | pre | stressor | M |
| OFT2_10 | 0.029245 | 0.107414 | 0.172519 | 0.070635 | 0.151966 | 0.082346 | 0.052644 | 0.120902 | 0.156323 | 0.209330 | post | stressor | F |
| OFT2_12 | 0.020928 | 0.050408 | 0.029152 | 0.067833 | 0.135331 | 0.032628 | 0.028133 | 0.008224 | 0.074553 | 0.041765 | post | stressor | F |
Visualise¶
The sns* methods on SummaryCollection wrap seaborn categorical plots
with sensible defaults — auto titles, y-labels, filenames, and colour
palettes.
All sns* helpers return (fig, ax, tidy_df) to support both quick plotting
and explicit downstream checks/customization.
In practice:
- pass a stored metric name ("total_distance_bodycentre") for reuse
- or pass a live SummaryResult for one-off plotting.
Plot types compared (ungrouped)¶
Three views of the same metric — total_distance_bodycentre — to
compare what each plot type looks like.
Also available: snsbox, snsviolin, snspoint, snsswarm.
sc.each.time_in_state("kmeans_25").store("time_in_cluster")
fig, ax, df_strip = sc.snsstrip(
"time_in_cluster",
random_state=42, # optional, for point jitter
show=True,
savedir=OUT_DIR,
)
fig, ax, df_bar = sc.snsbar(
"time_in_cluster",
show=True,
savedir=OUT_DIR,
)
fig, ax, df_super = sc.snssuperplot(
"time_in_cluster",
random_state=42, # optional, for point jitter
show=True,
savedir=OUT_DIR,
)
Single Summary delegation¶
Individual Summary objects can call the same sns* methods.
They delegate to a 1-item SummaryCollection internally.
The auto filename is prefixed with the recording handle.
single = sc[list(sc.keys())[0]]
fig, ax, df_single = single.snsbar(
single.time_in_state("within_boundary_static_bodycentre_in_centre"),
show=True,
savedir=OUT_DIR,
)
Grouped plots¶
Group by experimental tags with groupby() to compare conditions directly.
Use group_order to control x-axis arrangement.
groupby(...) returns a grouped SummaryCollection with the same plotting API.
sc_grouped = sc.groupby(tags=["treatment", "timepoint"])
# Keys = tag names (must match groupby tags), values = desired display order
GROUP_ORDER = {"treatment": ["control", "stressor"], "timepoint": ["pre", "post"]}
# Scalar metric — grouped superplot
fig, ax, df_gsup = sc_grouped.snssuperplot(
"total_distance_bodycentre",
group_order=GROUP_ORDER,
random_state=42, # optional, for point jitter
show=True,
savedir=str(OUT_DIR),
)
# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
sc_grouped.each.time_in_state("kmeans_25"),
group_order=GROUP_ORDER,
show=True,
savedir=str(OUT_DIR),
)
Even though the summary metric 'time_in_cluster' was created before grouping, the grouped plots work as expected with this summary metric (but the auto-generated title is different, because we stored it with a manual name).
# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
"time_in_cluster",
group_order=GROUP_ORDER,
show=True,
savedir=str(OUT_DIR),
)
sort_by — independent spatial ordering¶
sort_by overrides the spatial arrangement on the x-axis without changing
colour assignment. Here groupby(tags=["treatment", "timepoint"]) means
treatment drives the base colour (control=blue, stressor=orange). Adding
sort_by="timepoint" interleaves control/stressor within each timepoint.
# Interleaved superplot — timepoint as primary spatial axis, colours by treatment
fig, ax, df_interleaved = sc_grouped.snssuperplot(
"total_distance_bodycentre",
group_order=GROUP_ORDER,
sort_by="timepoint",
random_state=42, # optional, for point jitter
show=True,
savedir=str(OUT_DIR),
filename="total_distance_interleaved_superplot.png",
)
# Power-user workflow with prepare_plot — full seaborn control
import seaborn as sns
spec = sc_grouped.prepare_plot(
"total_distance_bodycentre",
group_order=GROUP_ORDER,
sort_by=["timepoint", "treatment"],
)
sns.boxplot(**spec.sns_kwargs, width=0.6)
spec.ax.set_ylabel(spec.ylabel)
spec.ax.set_title("Custom: prepare_plot + boxplot")
import matplotlib.pyplot as plt
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
Statistical annotations¶
Use annotate="help" to discover available tests, corrections, and the
group labels in your data. Then pass annotate={...} with actual pairs.
# Discover labels and options (no annotation applied, just prints a guide)
fig_ann, ax_ann, df_ann = sc_grouped.snssuperplot(
"total_distance_bodycentre",
group_order=GROUP_ORDER,
annotate="help",
random_state=42, # optional, for point jitter
show=False,
)
=== Statistical Annotation Guide ===
annotate={
"pairs": [("groupA", "groupB"), ...], # REQUIRED
"test": "Mann-Whitney", # see below
"correction": None, # see below
"text_format": "star", # "star", "simple", "full"
"headroom": None, # float multiplier, see below
}
Available tests:
Parametric: t-test_ind, t-test_welch, t-test_paired
Non-parametric: Mann-Whitney, Wilcoxon, Kruskal, Brunner-Munzel
Other: Levene (variance equality)
Tip: Mann-Whitney is a safe default for most behavioural data.
Use paired tests (t-test_paired, Wilcoxon) for repeated measures.
Use parametric tests only if data is normally distributed.
Multiple comparisons correction (recommended for >3 pairs):
FWER (conservative): bonferroni, holm
FDR (less conservative): fdr_bh (Benjamini-Hochberg), fdr_by
Headroom:
Extra vertical space for brackets, as a fraction of the y range.
E.g. headroom=0.3 adds 30%% extra room above the data.
Your labels: ['control, post', 'control, pre', 'stressor, post', 'stressor, pre']
# Apply annotations
fig_ann, ax_ann, df_ann = sc_grouped.snsbox(
"total_distance_bodycentre",
group_order=GROUP_ORDER,
annotate={
"pairs": [("control, pre", "stressor, pre"), ("control, post", "stressor, post")],
"test": "Mann-Whitney",
"correction": None,
"text_format": "star",
"headroom": 0.0, # add extra space for annotations if needed
},
savedir=str(OUT_DIR),
filename="total_distance_annotated_superplot.png",
show=True,
)
p-value annotation legend:
ns: 5.00e-02 < p <= 1.00e+00
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04
control, pre vs. stressor, pre: Mann-Whitney-Wilcoxon test two-sided, P_val:5.053e-01 U_stat=1.130e+02
control, post vs. stressor, post: Mann-Whitney-Wilcoxon test two-sided, P_val:1.926e-03 U_stat=1.660e+02
Metric input options¶
Two ways to pass a metric to any sns* method:
- String key — a previously stored metric name
- SummaryResult object — inline computation (not stored) Both options can represent single- or multi-component metrics.
# 1. String key
fig, ax, _ = sc.snsstrip(
"total_distance_bodycentre",
random_state=42, # optional, for point jitter
show=False,
)
# 2. SummaryResult object (inline)
fig, ax, df_mc = sc.snsbar(
sc.each.time_in_state("within_boundary_static_bodycentre_in_centre"),
show=False,
)
Multi-metric plotting¶
sns* methods can accept multiple metrics via list input, or alias maps via dict input.
merge_by controls how metrics are combined (default: "metric").
When plotting multiple metrics together, they must share a common y-axis label.
# Ungrouped multi-metric demo combining two by_state metrics with the same y-axis
# (mean speed of bodycentre):
# - corners
# - kmeans clusters with explicit all_states=[0..9]
fig, ax, df_multi_flat = sc.snsbar(
{
"corners": "mean_speed_corners",
"kmeans_0_to_9": "mean_speed_bodycentre_by_kmeans_25",
},
show=True,
savedir=OUT_DIR,
filename="demo_multi_metric_by_state_speed_barplot.png",
)
# Grouped multi-metric demo
fig, ax, df_multi_grouped = sc_grouped.snsbar(
["time_in_centre", "time_in_cluster"],
merge_by=None,
group_order=GROUP_ORDER,
show=True,
savedir=OUT_DIR,
filename="demo_multi_metric_grouped_barplot.png",
)
Behaviour Flow Analysis (BFA)¶
Compute BFA results and statistics¶
bfa() returns a nested dict of observed/shuffled transition statistics,
and bfa_stats() derives effect-size-style summaries for reporting.
all_states=np.arange(0, N_CLUSTERS) makes the state space explicit.
bfa_results = sc_grouped.bfa(
column="kmeans_25",
all_states=np.arange(0, N_CLUSTERS),
random_state=42,
)
bfa_stats = p3b.SummaryCollection.bfa_stats(bfa_results)
with open(f"{OUT_DIR}/bfa_results.json", "w") as f:
json.dump(bfa_results, f, indent=4)
with open(f"{OUT_DIR}/bfa_stats.json", "w") as f:
json.dump(bfa_stats, f, indent=4)
BFA histograms¶
Distribution of shuffled transition values vs observed, per group comparison. Useful as a quick sanity check before interpreting chord/UMAP views.
p3b.SummaryCollection.plot_bfa_results(
bfa_results,
add_stats=True,
stats=bfa_stats,
bins=20,
figsize=(4, 3),
save_dir=OUT_DIR,
show=True,
)
{"('control', 'post')_vs_('stressor', 'pre')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('control', 'post')_vs_('stressor', 'pre')"}, xlabel='distance', ylabel='count'>),
"('control', 'post')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('control', 'post')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
"('control', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('control', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
"('stressor', 'pre')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('stressor', 'pre')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
"('stressor', 'pre')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('stressor', 'pre')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
"('stressor', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
<Axes: title={'center': "('stressor', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>)}
Chord diagrams¶
Requires pycirclize — install with pip install py3r-behaviour[viz].
if not SKIP_HEAVY_VIZ:
sc_grouped.plot_chord(
column="kmeans_25",
all_states=np.arange(0, N_CLUSTERS),
save_dir=OUT_DIR,
show=True,
start=-265,
end=95,
space=5,
r_lim=(93, 100),
label_kws=dict(r=94, size=12, color="white"),
link_kws=dict(ec="black", lw=0.5),
)
UMAP embedding of transition matrices¶
Requires umap-learn — install with pip install py3r-behaviour[viz].
if not SKIP_HEAVY_VIZ:
fig, ax = sc_grouped.plot_transition_umap(
column="kmeans_25",
all_states=np.arange(0, N_CLUSTERS),
n_neighbors=15,
min_dist=0.1,
random_state=42,
figsize=(6, 5),
show=True,
save_dir=str(OUT_DIR),
)























