FeaturesCollection

py3r.behaviour.features.features_collection.FeaturesCollection ¶

FeaturesCollection(features_dict: dict[str, Features])

Bases: BaseCollection, FeaturesCollectionBatchMixin

Collection of Features objects, keyed by name. note: type-hints refer to Features, but factory methods allow for other classes these are intended ONLY for subclasses of Features, and this is enforced

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
>>> fc = FeaturesCollection.from_tracking_collection(tc)
>>> list(sorted(fc.keys()))
['A', 'B']

features_dict `property` ¶

features_dict

loc `property` ¶

loc

iloc `property` ¶

iloc

is_grouped `property` ¶

is_grouped

True if this collection is a grouped view.

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         a = d / 'A.csv'; b = d / 'B.csv'
...         _ = shutil.copy(p, a); _ = shutil.copy(p, b)
...     coll = TrackingCollection.from_dlc({'A': str(a), 'B': str(b)}, fps=30)
>>> coll.is_grouped
False

groupby_tags `property` ¶

groupby_tags

The tag names used to form this grouped view (or None if flat).

group_keys `property` ¶

group_keys

Keys for the groups in a grouped view. Empty list if not grouped.

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         a = d / 'A.csv'; b = d / 'B.csv'
...         _ = shutil.copy(p, a); _ = shutil.copy(p, b)
...     coll = TrackingCollection.from_dlc({'A': str(a), 'B': str(b)}, fps=30)
...     coll['A'].add_tag('group','G1'); coll['B'].add_tag('group','G2')
>>> g = coll.groupby('group')
>>> sorted(g.group_keys)
[('G1',), ('G2',)]

from_tracking_collection `classmethod` ¶

from_tracking_collection(tracking_collection: TrackingCollection, feature_cls=Features)

Create a FeaturesCollection from a TrackingCollection.

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
>>> fc = FeaturesCollection.from_tracking_collection(tc)
>>> isinstance(fc['A'], Features) and isinstance(fc['B'], Features)
True

within_boundary_static ¶

within_boundary_static(point: str, boundary, boundary_name: str = None)

Collection-aware wrapper that supports: - a single static boundary (list[(x,y)]) applied to all items, or - a per-handle mapping of boundaries produced by batch define_boundary: - flat: {handle: list[(x,y)]} - grouped: {group_key: {handle: list[(x,y)]}} - BatchResult in either of the above shapes

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> import pandas as pd
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
>>> fc = FeaturesCollection.from_tracking_collection(tc)
>>> boundaries = fc.define_boundary(['p1','p2','p3'], scaling=1.0)
>>> res = fc.within_boundary_static('p1', boundaries)
>>> isinstance(res, dict)
True
>>> any(isinstance(v, pd.Series) for v in res.values())
True

>>> # Grouped case: add tags on Tracking, group, then build grouped FeaturesCollection
>>> # (boundaries BatchResult structure matches grouped layout)
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
...     tc['A'].add_tag('group', 'G1'); tc['B'].add_tag('group', 'G2')
...     gtc = tc.groupby('group')
...     gfc = FeaturesCollection.from_tracking_collection(gtc)
...     g_boundaries = gfc.define_boundary(['p1','p2','p3'], scaling=1.0)
...     g_res = gfc.within_boundary_static('p1', g_boundaries)
>>> isinstance(g_res, dict)
True
>>> any(any(isinstance(s, pd.Series) for s in sub.values()) for sub in g_res.values())
True

distance_to_boundary_static ¶

distance_to_boundary_static(point: str, boundary, boundary_name: str = None)

Collection-aware wrapper that supports: - a single static boundary (list[(x,y)]) applied to all items, or - a per-handle mapping of boundaries produced by batch define_boundary: - flat: {handle: list[(x,y)]} - grouped: {group_key: {handle: list[(x,y)]}} - BatchResult in either of the above shapes

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> import pandas as pd
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
>>> fc = FeaturesCollection.from_tracking_collection(tc)
>>> boundaries = fc.define_boundary(['p1','p2','p3'], scaling=1.0)
>>> res = fc.distance_to_boundary_static('p1', boundaries)
>>> isinstance(res, dict)
True
>>> any(isinstance(v, pd.Series) for v in res.values())
True

>>> # Grouped case
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
...     tc['A'].add_tag('group', 'G1'); tc['B'].add_tag('group', 'G2')
...     gtc = tc.groupby('group')
...     gfc = FeaturesCollection.from_tracking_collection(gtc)
...     g_boundaries = gfc.define_boundary(['p1','p2','p3'], scaling=1.0)
...     g_res = gfc.distance_to_boundary_static('p1', g_boundaries)
>>> isinstance(g_res, dict)
True
>>> any(any(isinstance(s, pd.Series) for s in sub.values()) for sub in g_res.values())
True

from_list `classmethod` ¶

from_list(features_list: list[Features])

Create a FeaturesCollection from a list of Features objects, keyed by handle

Examples:

>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking import Tracking
>>> with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...     t1 = Tracking.from_dlc(str(p), handle='A', fps=30)
...     t2 = Tracking.from_dlc(str(p), handle='B', fps=30)
>>> f1, f2 = Features(t1), Features(t2)
>>> fc = FeaturesCollection.from_list([f1, f2])
>>> list(sorted(fc.keys()))
['A', 'B']

cluster_embedding ¶

cluster_embedding(embedding_dict: dict[str, list[int]], n_clusters: int, random_state: int = 0, *, auto_normalize: bool = False, rescale_factors: dict | None = None, lowmem: bool = False, decimation_factor: int = 10, custom_scaling: dict[str, dict] | None = None)

Perform k-means clustering using the specified embedding.

Unified behaviour for flat and grouped collections. Returns a BatchResult mapping: - grouped: {group_key: {feature_handle: FeaturesResult}} - flat: {feature_handle: FeaturesResult} along with (centroids, normalization_factors or None).

Examples:

>>> import tempfile, shutil
>>> from pathlib import Path
>>> import pandas as pd
>>> from py3r.behaviour.util.docdata import data_path
>>> from py3r.behaviour.tracking.tracking_collection import TrackingCollection
>>> with tempfile.TemporaryDirectory() as d:
...     d = Path(d)
...     with data_path('py3r.behaviour.tracking._data', 'dlc_single.csv') as p:
...         _ = shutil.copy(p, d / 'A.csv'); _ = shutil.copy(p, d / 'B.csv')
...     tc = TrackingCollection.from_dlc({'A': str(d/'A.csv'), 'B': str(d/'B.csv')}, fps=30)
>>> fc = FeaturesCollection.from_tracking_collection(tc)
>>> # Create a trivial feature 'counter' in each Features to embed
>>> for f in fc.values():
...     s = pd.Series(range(len(f.tracking.data)), index=f.tracking.data.index)
...     f.store(s, 'counter')
>>> batch, centroids, norm = fc.cluster_embedding({'counter':[0]}, n_clusters=2, lowmem=True)
>>> isinstance(centroids, pd.DataFrame)
True

cluster_diagnostics ¶

cluster_diagnostics(labels_result, n_clusters: int | None = None, *, low: float = 0.05, high: float = 0.9, verbose: bool = True)

Compute diagnostic stats for cluster label assignments.

Parameters:

Name	Type	Description	Default
`labels_result` ¶		Mapping from handle (or group->handle) to FeaturesResult of integer labels (with NA). Accepts the return shape of `cluster_embedding(...)[0]` (BatchResult or dict).	required
`n_clusters` ¶	`int \| None`	Optional number of clusters. If None, inferred from labels (max label + 1).	`None`
`low` ¶	`float`	Prevalence thresholds for low/high cluster labels per recording.	`0.05`
`high` ¶	`float`	Prevalence thresholds for low/high cluster labels per recording.	`0.05`
`verbose` ¶	`bool`	If True, print a compact summary.	`True`

Returns:

Type	Description
`dict with:`	'global': {'cluster_prevalence': {label: frac, ...}, 'percent_nan': frac} 'per_recording': pandas.DataFrame with rows per recording and columns: ['percent_nan', 'num_missing', 'num_low', 'num_high'] 'summary': min/median/max for the per_recording columns if grouped: 'per_group': {group_key: {'per_recording': df, 'summary': {...}}}