Skip to content

Run this example yourself

Download the oft_pipeline example — unzip and open oft_pipeline.ipynb in Jupyter.

Open Field Test (OFT) — Full analysis pipeline example

Setup

import json
import os
from pathlib import Path

import numpy as np
import pandas as pd

import py3r.behaviour as p3b

try:
    from IPython.display import display
except ImportError:

    def display(x):
        print(x)


# Skip heavy visualisation deps (pycirclize, umap-learn) in CI
SKIP_HEAVY_VIZ = os.environ.get("CI", "").lower() in ("true", "1", "yes")

# Paths
DATA_DIR = Path("data/tracking")
TAGS_CSV = Path("data/tags.csv")

OUT_DIR = Path(os.environ.get("NB_OUT_DIR", Path.cwd() / "_artifacts"))
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Constants
FPS = 30
N_CLUSTERS = 25

Load & Preprocess

Load tracking data

Load a TrackingCollection from a folder of DeepLabCut CSV files. Each CSV becomes one Tracking object keyed by its filename stem. The provided fps is written into each leaf's metadata for downstream methods. Return type here is TrackingCollection. Alternative loaders with the same pattern: from_yolo3r_folder, from_dlcma_folder.

tc = p3b.TrackingCollection.from_dlc_folder(
    folder_path=DATA_DIR,
    fps=FPS,
)
print(tc)
# Main object types in `py3r.behaviour` implement `.copy()`.
# We'll keep an untouched copy for didactic examples in this notebook.
tc_raw_for_demo = tc.copy()
<TrackingCollection with 56 Tracking objects>

All Collection objects, like TrackingCollection, implement stored_info() to give a quick overview of their accessible contents

tc.stored_info()
attached_to missing_from dims
point_name
bcl 56 0 [x, y]
bcr 56 0 [x, y]
bl 56 0 [x, y]
bodycentre 56 0 [x, y]
br 56 0 [x, y]
earl 56 0 [x, y]
earr 56 0 [x, y]
headcentre 56 0 [x, y]
hipl 56 0 [x, y]
hipr 56 0 [x, y]
neck 56 0 [x, y]
nose 56 0 [x, y]
tailbase 56 0 [x, y]
tailcentre 56 0 [x, y]
tailtip 56 0 [x, y]
tl 56 0 [x, y]
tr 56 0 [x, y]

Add experimental tags

A tags CSV maps recording handles to experimental metadata. It must contain a handle column matching filename stems; every other column becomes a tag key–value pair. tags_info() is a quick schema check: coverage and cardinality per tag. add_tags_from_csv(...) mutates each Tracking in-place and returns None.

tc.add_tags_from_csv(csv_path=TAGS_CSV)
tc.tags_info()
added 168 tags to 56 elements in collection.
attached_to missing_from unique_values
tag
sex 56 0 2
timepoint 56 0 2
treatment 56 0 2

Didactic: batch processing

With a TrackingCollection, .each delegates calls to each Tracking. Think "batch call the same Tracking method for all recordings". This .each batch processing pattern also applies to FeaturesCollection and SummaryCollection, as we will see later.

Methods on Tracking are inplace=True by default, so .each returns a BatchResult. If inplace=False, .each returns a TrackingCollection.

Passing a BatchResult back into .each maps values by handle.

demo_inplace = tc_raw_for_demo.copy().each.filter_likelihood(threshold=0.9)
demo_new_collection = tc_raw_for_demo.copy().each.filter_likelihood(
    threshold=0.9,
    inplace=False,
)
print(type(demo_inplace).__name__)  # expected: BatchResult
print(type(demo_new_collection).__name__)  # expected: TrackingCollection
BatchResult
TrackingCollection

Preprocess

Standard preprocessing chain: remove low-confidence detections, interpolate short gaps, smooth trajectories, and rescale coordinates to real-world units. This order is intentional: filter -> interpolate -> smooth -> rescale.

In this main path we use in-place behaviour (typical analysis workflow). Equivalent non-in-place variants are shown above in the didactic batch section.

tc.each.filter_likelihood(threshold=0.9)
tc.each.interpolate(limit=5)
tc.each.smooth_all(window=3, method="mean")
tc.each.rescale_by_known_distance(
    point1="tl",
    point2="br",
    distance_in_metres=0.64,
)
BatchResult: 56 items processed (in-place)

Re-running preprocessing

Most preprocessing methods guard against re-application. For parameter tuning, prefer inplace=False and work on a copy.

try:
    tc.each.interpolate(limit=5)
except Exception as e:
    print(e)
Error in collection 'None', object 'OFT2_11', method 'interpolate': data already interpolated. re-load the raw data to interpolate again

Quality check — trajectory plots

Save trajectory plots for every recording and display one inline for QC. Pattern used here: - batch save all (tc.each.plot(..., savedir=...)) - inspect one representative recording inline (tc[0].plot(...))

trajectories = ["bodycentre"]
static = ["tl", "tr", "bl", "br"]
lines = [("tr", "tl"), ("tl", "bl"), ("bl", "br"), ("br", "tr")]

tc.each.plot(trajectories=trajectories, static=static, lines=lines, show=False, savedir=OUT_DIR)

# Single inline plot for visual QC
tc[0].plot(trajectories=trajectories, static=static, lines=lines, show=True)

output

(<Figure size 500x500 with 1 Axes>,
 <Axes: title={'center': 'OFT2_11'}, xlabel='x', ylabel='y'>)

Compute Features

Create FeaturesCollection

A FeaturesCollection wraps every recording's tracking data with methods for computing time-series features. Most feature methods return FeaturesResult; call .store() to persist to Features.data and register metadata in Features.meta.

fc = p3b.FeaturesCollection.from_tracking_collection(tc)

Spatial features — boundaries

Define/store named boundaries on each Features leaf, then use either: - mapped BatchResult boundary objects (smart per-handle passthrough), or - boundary names (resolved from stored per-recording assets). Here we use both and assert they match.

ordered_oft_corners = ["tl", "tr", "br", "bl"]

Define and store a centre boundary for each recording.

centre_boundary = fc.each.define_static_boundary(
    ordered_oft_corners,
    scale_dim1=0.5,
    scale_dim2=0.5,
    name="centre",
)

Compare boundary usage styles: pass boundary objects vs stored boundary names.

in_centre = fc.each.within_boundary(point="bodycentre", boundary=centre_boundary)
in_centre_by_name = fc.each.within_boundary(point="bodycentre", boundary="centre")
for handle in fc.keys():
    assert in_centre[handle].equals(in_centre_by_name[handle])

Store the result. Without a manual name, an automatic descriptive name is used. .store always returns the stored name

in_centre.store()
'within_boundary_static_bodycentre_in_centre'

BatchResult supports logical composition (for example, arena periphery).

_ = fc.each.define_static_boundary(
    ordered_oft_corners,
    scale_dim1=0.8,
    scale_dim2=0.8,
    name="not_periphery",
)
_ = fc.each.define_static_boundary(
    ordered_oft_corners,
    name="oft",
)
(
    fc.each.within_boundary("bodycentre", "oft")
    & (~fc.each.within_boundary("bodycentre", "not_periphery"))
).store("in_periphery")
'in_periphery'

Corner occupancy can be represented as a single state feature instead of many independent booleans.

in_corners = dict()
for c in ordered_oft_corners:
    _ = fc.each.define_static_boundary(
        ordered_oft_corners,
        scale_dim1=0.2,
        scale_dim2=0.2,
        name=f"{c}_corner",
        anchor=c,
    )
    in_corners[c] = fc.each.within_boundary("bodycentre", boundary=f"{c}_corner")
# Store a convenience boolean for "in any corner".
(in_corners["tl"] | in_corners["tr"] | in_corners["bl"] | in_corners["br"]).store("in_corner")
'in_corner'
# Store a categorical corner-state feature for state-based analyses.
fc.each.compose_state_from_booleans(in_corners).store("corner_state")
'corner_state'
# Keep these existing columns out of clustering feature selection.
non_bfa_feats = fc[0].data.columns

BatchResult also supports element-wise arithmetic across handles.

dist_change = fc.each.distance_change("bodycentre")
dist_change_in_centre = in_centre.astype("Int64") * dist_change
dist_change_in_centre.store(name="dist_change_bodycentre_in_centre")
'dist_change_bodycentre_in_centre'
# `BatchResult` also supports general binary operations.
fast_outside_centre = ~in_centre & ((fc.each.speed("bodycentre") * 100) > 10.0)
# This is an example only; we do not store it.

Kinematic features for BFA

Speeds, angle deviations, inter-keypoint distances, body-part areas, and distance to the arena boundary — the standard feature set for behavioural flow analysis clustering. The loop pattern below intentionally stores each feature as a named column, so later clustering/summary code can reference columns deterministically.

Specific choices used here: - For kinematic polygons, we define named dynamic boundaries, then compute dynamic area (median=False) over their ordered points. - For arena distance, we define named static boundaries and loop over points.

# Speeds
for pt in ["nose", "neck", "earr", "earl", "bodycentre", "hipl", "hipr", "tailbase"]:
    fc.each.speed(pt).store()

Compute angular features.

# Angle deviations
for basepoint, pointdirection1, pointdirection2 in [
    ("tailbase", "hipr", "hipl"),
    ("bodycentre", "tailbase", "neck"),
    ("neck", "bodycentre", "headcentre"),
    ("headcentre", "earr", "earl"),
]:
    fc.each.azimuth_deviation(basepoint, pointdirection1, pointdirection2).store()

Compute inter-keypoint distances.

# Inter-keypoint distances
for p1, p2 in [
    ("nose", "headcentre"),
    ("neck", "headcentre"),
    ("neck", "bodycentre"),
    ("bcr", "bodycentre"),
    ("bcl", "bodycentre"),
    ("tailbase", "bodycentre"),
    ("tailbase", "hipr"),
    ("tailbase", "hipl"),
    ("bcr", "hipr"),
    ("bcl", "hipl"),
    ("bcl", "earl"),
    ("bcr", "earr"),
    ("nose", "earr"),
    ("nose", "earl"),
]:
    fc.each.distance_between(p1, p2).store()

Define dynamic body boundaries and store per-boundary area features.

DYNAMIC_BODY_BOUNDARIES = [
    ("mouse_rear", ["tailbase", "hipr", "hipl"]),
    ("mouse_mid", ["hipr", "hipl", "bcl", "bcr"]),
    ("mouse_front", ["bcr", "earr", "earl", "bcl"]),
    ("mouse_face", ["earr", "nose", "earl"]),
]

for boundary_name, boundary_points in DYNAMIC_BODY_BOUNDARIES:
    fc.each.define_dynamic_boundary(boundary_points, name=boundary_name)
    fc.each.area_of_boundary(boundary_name).store()

Compute distance-to-boundary features for selected points.

STATIC_DISTANCE_TO_BOUNDARY_POINTS = ["nose", "neck", "bodycentre", "tailbase"]

for pt in STATIC_DISTANCE_TO_BOUNDARY_POINTS:
    fc.each.distance_to_boundary(pt, "oft").store()

Inspect stored boundary assets on one recording.

fc[0].list_boundaries()
kind n_points has_vertices
name
centre static 4 True
not_periphery static 4 True
oft static 4 True
tl_corner static 4 True
tr_corner static 4 True
br_corner static 4 True
bl_corner static 4 True
mouse_rear dynamic 3 False
mouse_mid dynamic 4 False
mouse_front dynamic 4 False
mouse_face dynamic 3 False

K-means clustering

Embed the feature time-series with temporal offsets, then cluster the embedded space with k-means. Returns (cluster_labels, centroids, scaling_factors), where: - cluster_labels is a per-handle BatchResult of label series - centroids is a DataFrame with n_clusters rows

Option notes: - offset controls temporal context window. - cluster_embedding also supports weighting/normalization knobs for advanced runs.

cluster_features = list(set(fc[0].data.columns) - set(non_bfa_feats))
offset = list(np.arange(-15, 16, 1))
embedding_dict = {f: offset for f in cluster_features}

cluster_labels, centroids, _ = fc.cluster_embedding_stream(
    embedding_dict=embedding_dict, n_clusters=N_CLUSTERS
)
cluster_labels.store("kmeans_25", overwrite=True)
'kmeans_25'
# A quick overview of the stored features
fc.stored_info()
attached_to missing_from type
feature
area_of_boundary_mouse_face_dynamic 56 0 float64
area_of_boundary_mouse_front_dynamic 56 0 float64
area_of_boundary_mouse_mid_dynamic 56 0 float64
area_of_boundary_mouse_rear_dynamic 56 0 float64
azimuth_deviation_bodycentre_to_tailbase_and_neck 56 0 float64
azimuth_deviation_headcentre_to_earr_and_earl 56 0 float64
azimuth_deviation_neck_to_bodycentre_and_headcentre 56 0 float64
azimuth_deviation_tailbase_to_hipr_and_hipl 56 0 float64
corner_state 56 0 object
dist_change_bodycentre_in_centre 56 0 Float64
distance_between_bcl_and_bodycentre_in_xy 56 0 float64
distance_between_bcl_and_earl_in_xy 56 0 float64
distance_between_bcl_and_hipl_in_xy 56 0 float64
distance_between_bcr_and_bodycentre_in_xy 56 0 float64
distance_between_bcr_and_earr_in_xy 56 0 float64
distance_between_bcr_and_hipr_in_xy 56 0 float64
distance_between_neck_and_bodycentre_in_xy 56 0 float64
distance_between_neck_and_headcentre_in_xy 56 0 float64
distance_between_nose_and_earl_in_xy 56 0 float64
distance_between_nose_and_earr_in_xy 56 0 float64
distance_between_nose_and_headcentre_in_xy 56 0 float64
distance_between_tailbase_and_bodycentre_in_xy 56 0 float64
distance_between_tailbase_and_hipl_in_xy 56 0 float64
distance_between_tailbase_and_hipr_in_xy 56 0 float64
distance_to_boundary_static_bodycentre_in_oft 56 0 float64
distance_to_boundary_static_neck_in_oft 56 0 float64
distance_to_boundary_static_nose_in_oft 56 0 float64
distance_to_boundary_static_tailbase_in_oft 56 0 float64
in_corner 56 0 boolean
in_periphery 56 0 boolean
kmeans_25 56 0 Int64
speed_of_bodycentre_in_xy 56 0 float64
speed_of_earl_in_xy 56 0 float64
speed_of_earr_in_xy 56 0 float64
speed_of_hipl_in_xy 56 0 float64
speed_of_hipr_in_xy 56 0 float64
speed_of_neck_in_xy 56 0 float64
speed_of_nose_in_xy 56 0 float64
speed_of_tailbase_in_xy 56 0 float64
within_boundary_static_bodycentre_in_centre 56 0 boolean

Save features to disk

save() writes a collection manifest plus per-handle element folders. This makes downstream loading deterministic and auditable. Later you can reconstruct with p3b.FeaturesCollection.load(path).

fc.save(f"{OUT_DIR}/features", data_format="csv", overwrite=True)

Summarise

Create SummaryCollection

Each Summary object holds scalar (or Series) metrics computed from a single recording's features. Return type here is SummaryCollection.

sc = p3b.SummaryCollection.from_features_collection(fc)

Compute summary measures

Call summary methods and .store() the result to persist it. Stored summary metrics become scalar columns in each Summary.data record.

Same pattern as features: - compute result (sc.total_distance(...), etc.) - then .store(...) to persist by metric name.

sc.each.total_distance("bodycentre").store()
sc.each.time_true("within_boundary_static_bodycentre_in_centre").store("time_in_centre")
sc.each.sum_column("dist_change_bodycentre_in_centre").store(name="distance_moved_in_centre")

# by_state API example: average speed by composed spatial zone.
sc.each.by_state(
    "corner_state",
    all_states=ordered_oft_corners,
).mean_column("speed_of_bodycentre_in_xy").store("mean_speed_corners")

# by_state + all_states API example: force explicit cluster domain (0-9),
# including states absent in a recording.
sc.each.by_state("kmeans_25", all_states=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).mean_column(
    "speed_of_bodycentre_in_xy"
).store("mean_speed_bodycentre_by_kmeans_25")
'mean_speed_bodycentre_by_kmeans_25'

A quick overview of the stored summaries

sc.stored_info()
attached_to missing_from type
summary
distance_moved_in_centre 56 0 float64
mean_speed_bodycentre_by_kmeans_25 56 0 Series
mean_speed_corners 56 0 Series
time_in_centre 56 0 float64
total_distance_bodycentre 56 0 float64

Export results to CSV

to_df(include_tags=True) flattens summary metrics + selected tag columns into one analysis-ready table (indexed by handle). By default, series metrics, like time_in_state, are ignored (series="ignore"). If series="separate" then each series metric will be output as its own df over the collection.

summary_df, series_dfs = sc.to_df(include_tags=True, series="separate")
summary_df.to_csv(f"{OUT_DIR}/OFT_results.csv")

display(summary_df.head())
for key, val in series_dfs.items():
    print(key)
    display(val.head())
total_distance_bodycentre time_in_centre distance_moved_in_centre tag_timepoint tag_treatment tag_sex
handle
OFT2_11 5.670102 4.300000 0.616402 post control M
OFT1_6 10.727781 10.733333 1.647000 pre stressor F
OFT1_7 11.464600 7.033333 1.098946 pre stressor M
OFT2_10 5.981597 1.800000 0.467466 post stressor F
OFT2_12 3.890576 1.333333 0.088791 post stressor F
mean_speed_corners
tl tr br bl tag_timepoint tag_treatment tag_sex
handle
OFT2_11 0.042689 0.061581 0.046867 0.015505 post control M
OFT1_6 0.079932 0.085120 0.055881 0.062372 pre stressor F
OFT1_7 0.082124 0.079620 0.072632 0.073711 pre stressor M
OFT2_10 0.036688 0.030134 0.065599 0.073706 post stressor F
OFT2_12 0.017380 0.059851 0.025541 0.044923 post stressor F
mean_speed_bodycentre_by_kmeans_25
0 1 2 3 4 5 6 7 8 9 tag_timepoint tag_treatment tag_sex
handle
OFT2_11 0.025574 0.101113 0.099135 0.057878 0.143741 0.069975 0.081711 0.089199 0.112823 0.136072 post control M
OFT1_6 0.136441 0.071986 0.147371 0.109992 0.130053 0.143835 0.090680 0.126470 0.149374 0.053129 pre stressor F
OFT1_7 0.131161 0.117809 0.113949 0.135518 0.169374 0.166234 0.083589 0.061509 0.133972 0.117163 pre stressor M
OFT2_10 0.029245 0.107414 0.172519 0.070635 0.151966 0.082346 0.052644 0.120902 0.156323 0.209330 post stressor F
OFT2_12 0.020928 0.050408 0.029152 0.067833 0.135331 0.032628 0.028133 0.008224 0.074553 0.041765 post stressor F

Visualise

The sns* methods on SummaryCollection wrap seaborn categorical plots with sensible defaults — auto titles, y-labels, filenames, and colour palettes. All sns* helpers return (fig, ax, tidy_df) to support both quick plotting and explicit downstream checks/customization.

In practice: - pass a stored metric name ("total_distance_bodycentre") for reuse - or pass a live SummaryResult for one-off plotting.

Plot types compared (ungrouped)

Three views of the same metric — total_distance_bodycentre — to compare what each plot type looks like.

Also available: snsbox, snsviolin, snspoint, snsswarm.

sc.each.time_in_state("kmeans_25").store("time_in_cluster")
fig, ax, df_strip = sc.snsstrip(
    "time_in_cluster",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=OUT_DIR,
)
fig, ax, df_bar = sc.snsbar(
    "time_in_cluster",
    show=True,
    savedir=OUT_DIR,
)
fig, ax, df_super = sc.snssuperplot(
    "time_in_cluster",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=OUT_DIR,
)

output

output

output

Single Summary delegation

Individual Summary objects can call the same sns* methods. They delegate to a 1-item SummaryCollection internally. The auto filename is prefixed with the recording handle.

single = sc[list(sc.keys())[0]]
fig, ax, df_single = single.snsbar(
    single.time_in_state("within_boundary_static_bodycentre_in_centre"),
    show=True,
    savedir=OUT_DIR,
)

output

Grouped plots

Group by experimental tags with groupby() to compare conditions directly. Use group_order to control x-axis arrangement. groupby(...) returns a grouped SummaryCollection with the same plotting API.

sc_grouped = sc.groupby(tags=["treatment", "timepoint"])

# Keys = tag names (must match groupby tags), values = desired display order
GROUP_ORDER = {"treatment": ["control", "stressor"], "timepoint": ["pre", "post"]}
# Scalar metric — grouped superplot
fig, ax, df_gsup = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=str(OUT_DIR),
)

output

# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
    sc_grouped.each.time_in_state("kmeans_25"),
    group_order=GROUP_ORDER,
    show=True,
    savedir=str(OUT_DIR),
)

output

Even though the summary metric 'time_in_cluster' was created before grouping, the grouped plots work as expected with this summary metric (but the auto-generated title is different, because we stored it with a manual name).

# Multi-component metric — 25 clusters × 4 groups
fig, ax, df_gbar = sc_grouped.snsbar(
    "time_in_cluster",
    group_order=GROUP_ORDER,
    show=True,
    savedir=str(OUT_DIR),
)

output

sort_by — independent spatial ordering

sort_by overrides the spatial arrangement on the x-axis without changing colour assignment. Here groupby(tags=["treatment", "timepoint"]) means treatment drives the base colour (control=blue, stressor=orange). Adding sort_by="timepoint" interleaves control/stressor within each timepoint.

# Interleaved superplot — timepoint as primary spatial axis, colours by treatment
fig, ax, df_interleaved = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    sort_by="timepoint",
    random_state=42,  # optional, for point jitter
    show=True,
    savedir=str(OUT_DIR),
    filename="total_distance_interleaved_superplot.png",
)

output

# Power-user workflow with prepare_plot — full seaborn control
import seaborn as sns

spec = sc_grouped.prepare_plot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    sort_by=["timepoint", "treatment"],
)
sns.boxplot(**spec.sns_kwargs, width=0.6)
spec.ax.set_ylabel(spec.ylabel)
spec.ax.set_title("Custom: prepare_plot + boxplot")
import matplotlib.pyplot as plt

plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

output

Statistical annotations

Use annotate="help" to discover available tests, corrections, and the group labels in your data. Then pass annotate={...} with actual pairs.

# Discover labels and options (no annotation applied, just prints a guide)
fig_ann, ax_ann, df_ann = sc_grouped.snssuperplot(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    annotate="help",
    random_state=42,  # optional, for point jitter
    show=False,
)
=== Statistical Annotation Guide ===

annotate={
    "pairs": [("groupA", "groupB"), ...],  # REQUIRED
    "test": "Mann-Whitney",                # see below
    "correction": None,                    # see below
    "text_format": "star",                 # "star", "simple", "full"
    "headroom": None,                      # float multiplier, see below
}

Available tests:
  Parametric:     t-test_ind, t-test_welch, t-test_paired
  Non-parametric: Mann-Whitney, Wilcoxon, Kruskal, Brunner-Munzel
  Other:          Levene (variance equality)

  Tip: Mann-Whitney is a safe default for most behavioural data.
  Use paired tests (t-test_paired, Wilcoxon) for repeated measures.
  Use parametric tests only if data is normally distributed.

Multiple comparisons correction (recommended for >3 pairs):
  FWER (conservative): bonferroni, holm
  FDR  (less conservative): fdr_bh (Benjamini-Hochberg), fdr_by

Headroom:
  Extra vertical space for brackets, as a fraction of the y range.
  E.g. headroom=0.3 adds 30%% extra room above the data.

Your labels: ['control, post', 'control, pre', 'stressor, post', 'stressor, pre']
# Apply annotations
fig_ann, ax_ann, df_ann = sc_grouped.snsbox(
    "total_distance_bodycentre",
    group_order=GROUP_ORDER,
    annotate={
        "pairs": [("control, pre", "stressor, pre"), ("control, post", "stressor, post")],
        "test": "Mann-Whitney",
        "correction": None,
        "text_format": "star",
        "headroom": 0.0,  # add extra space for annotations if needed
    },
    savedir=str(OUT_DIR),
    filename="total_distance_annotated_superplot.png",
    show=True,
)
p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

control, pre vs. stressor, pre: Mann-Whitney-Wilcoxon test two-sided, P_val:5.053e-01 U_stat=1.130e+02
control, post vs. stressor, post: Mann-Whitney-Wilcoxon test two-sided, P_val:1.926e-03 U_stat=1.660e+02

output

Metric input options

Two ways to pass a metric to any sns* method:

  1. String key — a previously stored metric name
  2. SummaryResult object — inline computation (not stored) Both options can represent single- or multi-component metrics.
# 1. String key
fig, ax, _ = sc.snsstrip(
    "total_distance_bodycentre",
    random_state=42,  # optional, for point jitter
    show=False,
)

# 2. SummaryResult object (inline)
fig, ax, df_mc = sc.snsbar(
    sc.each.time_in_state("within_boundary_static_bodycentre_in_centre"),
    show=False,
)

Multi-metric plotting

sns* methods can accept multiple metrics via list input, or alias maps via dict input. merge_by controls how metrics are combined (default: "metric"). When plotting multiple metrics together, they must share a common y-axis label.

# Ungrouped multi-metric demo combining two by_state metrics with the same y-axis
# (mean speed of bodycentre):
# - corners
# - kmeans clusters with explicit all_states=[0..9]
fig, ax, df_multi_flat = sc.snsbar(
    {
        "corners": "mean_speed_corners",
        "kmeans_0_to_9": "mean_speed_bodycentre_by_kmeans_25",
    },
    show=True,
    savedir=OUT_DIR,
    filename="demo_multi_metric_by_state_speed_barplot.png",
)

output

# Grouped multi-metric demo
fig, ax, df_multi_grouped = sc_grouped.snsbar(
    ["time_in_centre", "time_in_cluster"],
    merge_by=None,
    group_order=GROUP_ORDER,
    show=True,
    savedir=OUT_DIR,
    filename="demo_multi_metric_grouped_barplot.png",
)

output

Behaviour Flow Analysis (BFA)

Compute BFA results and statistics

bfa() returns a nested dict of observed/shuffled transition statistics, and bfa_stats() derives effect-size-style summaries for reporting. all_states=np.arange(0, N_CLUSTERS) makes the state space explicit.

bfa_results = sc_grouped.bfa(
    column="kmeans_25",
    all_states=np.arange(0, N_CLUSTERS),
    random_state=42,
)
bfa_stats = p3b.SummaryCollection.bfa_stats(bfa_results)

with open(f"{OUT_DIR}/bfa_results.json", "w") as f:
    json.dump(bfa_results, f, indent=4)
with open(f"{OUT_DIR}/bfa_stats.json", "w") as f:
    json.dump(bfa_stats, f, indent=4)

BFA histograms

Distribution of shuffled transition values vs observed, per group comparison. Useful as a quick sanity check before interpreting chord/UMAP views.

p3b.SummaryCollection.plot_bfa_results(
    bfa_results,
    add_stats=True,
    stats=bfa_stats,
    bins=20,
    figsize=(4, 3),
    save_dir=OUT_DIR,
    show=True,
)

output

output

output

output

output

output

{"('control', 'post')_vs_('stressor', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('stressor', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('control', 'post')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
 "('control', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('control', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'pre')_vs_('stressor', 'post')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'pre')_vs_('stressor', 'post')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'pre')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'pre')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>),
 "('stressor', 'post')_vs_('control', 'pre')": (<Figure size 400x300 with 1 Axes>,
  <Axes: title={'center': "('stressor', 'post')_vs_('control', 'pre')"}, xlabel='distance', ylabel='count'>)}

Chord diagrams

Requires pycirclize — install with pip install py3r-behaviour[viz].

if not SKIP_HEAVY_VIZ:
    sc_grouped.plot_chord(
        column="kmeans_25",
        all_states=np.arange(0, N_CLUSTERS),
        save_dir=OUT_DIR,
        show=True,
        start=-265,
        end=95,
        space=5,
        r_lim=(93, 100),
        label_kws=dict(r=94, size=12, color="white"),
        link_kws=dict(ec="black", lw=0.5),
    )

output

output

output

output

UMAP embedding of transition matrices

Requires umap-learn — install with pip install py3r-behaviour[viz].

if not SKIP_HEAVY_VIZ:
    fig, ax = sc_grouped.plot_transition_umap(
        column="kmeans_25",
        all_states=np.arange(0, N_CLUSTERS),
        n_neighbors=15,
        min_dist=0.1,
        random_state=42,
        figsize=(6, 5),
        show=True,
        save_dir=str(OUT_DIR),
    )

output

Done