Add unified encoder pytorch implementation #251

CeliaBenquet · 2025-05-01T14:36:20Z

This PR adds a PyTorch implementation of a unified CEBRA encoder, which is composed of:

A new sampling scheme that samples across all sessions so that they can be aligned on the neuron axis to train a single encoder.
A unified Dataset and Loader, adapted to the new sampling scheme.
A unified Solver that considers multiple sessions to be aligned at inference.
A new masked modeling training option, with different types of masking.

🚧 A preprint is pending "Unified CEBRA Encoders for Integrating Neural Recordings via Behavioral Alignment" by Célia Benquet, Hossein Mirzaei, Steffen Schneider, Mackenzie W. Mathis.

…ional models in _transform

…est accordingly

…veMotorControlLab#249)

add headers to new files

- header

MMathisLab · 2025-05-26T10:23:42Z

cebra/data/single_session.py

-            positive=self[index.positive],
-            negative=self[index.negative],
-            reference=self[index.reference],
+            positive=self.apply_mask(self[index.positive]),


quick sanity check; this is backwards compatable? @CeliaBenquet

backward compatible yes, I added a check on if the function doesn't exist for people who might want to use the adapt functionality on an older model, good catch.

MMathisLab · 2025-05-26T10:24:05Z

cebra/datasets/__init__.py

@@ -97,6 +97,8 @@ def get_datapath(path: str = None) -> str:
    from cebra.datasets.hippocampus import *
    from cebra.datasets.monkey_reaching import *
    from cebra.datasets.synthetic_data import *
+    from cebra.datasets.perich import *


do we need to package this, or is this downloaded from DANDI?

That's a leftover line. Now I don't add the files for datasets for perich (and NLB) because it requires adapting the packages installed when installing CEBRA.

For NLB, it requires having the data downloaded already + the nlb_tools package installed.
For Perich, it also requires to have the data downloaded already but the issue is that it requires POYO code that they removed now, so we would need to go to a previous commit etc.

Let me know if that's necessary to have. And in that case I suppose we also need the S1 M1 dataset class then if you need one.

cebra/integrations/sklearn/cebra.py

- header

MMathisLab

thanks @CeliaBenquet ! I went through and left comments for disucssion

MMathisLab · 2025-05-26T10:25:28Z

cebra/models/decoders.py

should we put this into integrations vs. models? Models to me is encoders only cc @stes

We currently have some decoders here, although these are sklearn specific.

I think this module here is fine, at least right now I dont see a better place in the codebase to put them in. An argument to leave them here would be that they are an "extension" of the encoders we train, plus they are "raw" torch objects, which we currently all collected in cebra.models.

I dont have a strong opinion, just don't see where they would fit better... In integrations, we currently have only "standalone" helper functions, which these aren't.

@CeliaBenquet where are these decoders used around the codebase? and how are they trained?

cebra/models/model.py

cebra/solver/multi_session.py

cebra/solver/single_session.py

cebra/models/decoders.py

MMathisLab · 2025-05-26T10:34:59Z

cebra/models/decoders.py

@@ -0,0 +1,38 @@
+import torch.nn as nn


why not have decoders somewhere like integrations? to me model is the encoders only cc @stes

stes

Looks good overall; left some comments!

Implementation of the Mixin class for the masking: If I understood correctly, the only change is that this apply_mask function is applied after loading a batch. This seems to be a change that could be minimally invasively applied not in the dataset, but actually in the data loader. Is there a good case why the datasets themselves need to be modified?
Discussion on where to place the decoders: currently in cebra.models.decoders; are the decoders useful as "standalone" models? where are they currently used? based on that we could determine if we move them e.g. as standalone to integrations
see other comments; mostly on class design, removing duplicated code, etc.

stes · 2025-05-28T01:34:22Z

cebra/data/masking.py

I wouldn't do two new modules. using cebra.data.mask or cebra.data.masking, and keep the code together I'd say.

stes · 2025-05-28T01:35:40Z

cebra/data/multi_session.py

+        if hasattr(self, "apply_mask"):
+            batch = [
+                cebra_data.Batch(
+                    reference=self.apply_mask(
+                        session[index.reference[session_id]]),
+                    positive=self.apply_mask(
+                        session[index.positive[session_id]]),
+                    negative=self.apply_mask(
+                        session[index.negative[session_id]]),
+                    index=index.index,
+                    index_reversed=index.index_reversed,
+                ) for session_id, session in enumerate(self.iter_sessions())
+            ]
+        else:
+            batch = [
+                cebra_data.Batch(
+                    reference=session[index.reference[session_id]],
+                    positive=session[index.positive[session_id]],
+                    negative=session[index.negative[session_id]],
+                    index=index.index,
+                    index_reversed=index.index_reversed,
+                ) for session_id, session in enumerate(self.iter_sessions())
+            ]
+        return batch


can we convert this if/else statement into a subclass

stes · 2025-05-28T01:37:33Z

cebra/data/multi_session.py

+                    )
+                session.configure_for(model[i])
+            else:
+                session.configure_for(model)



I would restructure. Make one common baseclass, then the "old" multisession class is one subclass of it, and the unified one is another subclass. removes all if statements and cleanly separates the logic.

stes · 2025-05-28T01:38:21Z

cebra/data/single_session.py

+        if hasattr(self, "apply_mask"):
+            # If the dataset has a mask, apply it to the data.
+            batch = Batch(
+                positive=self.apply_mask(self[index.positive]),
+                negative=self.apply_mask(self[index.negative]),
+                reference=self.apply_mask(self[index.reference]),
+            )
+        else:
+            batch = Batch(
+                positive=self[index.positive],
+                negative=self[index.negative],
+                reference=self[index.reference],
+            )
+        return batch


see above; a better way to implement this is by having the masking simply override the load_batch function, vs. introducing this if/else logic.

stes · 2025-05-28T01:38:42Z

cebra/datasets/demo.py

@@ -33,6 +33,8 @@
 from cebra.datasets import register

 _DEFAULT_NUM_TIMEPOINTS = 1_000
+NUMS_NEURAL = [3, 4, 5]


Suggested change

NUMS_NEURAL = [3, 4, 5]

_NUMS_NEURAL = [3, 4, 5]

not public (and adapt below)

stes · 2025-05-28T01:57:12Z

docs/source/api/pytorch/helpers.rst

+Masking helpers
+----------------
+
+.. automodule:: cebra.data.masking
+   :members:
+   :show-inheritance:


as written above, i would only add a single module dedicated to masking, vs. splitting this up further

stes · 2025-05-28T01:57:28Z

tests/test_data_masking.py

+#### Tests for Mask class ####
+


Suggested change

#### Tests for Mask class ####

stes · 2025-05-28T01:59:56Z

tests/test_solver.py

+        assert emb.shape == (loader.dataset.num_timepoints, 3)
+
+        emb = solver.transform(data, labels, session_id=i, batch_size=300)
+        assert emb.shape == (loader.dataset.num_timepoints, 3)


nit, looks like pre-commit was not run on this PR.

stes · 2025-05-28T02:00:26Z

cebra/data/masking.py

+        The `set_masks` method should be called to set the masking types
+        and their corresponding probabilities.
+    """
+    masks = []  # a list of Mask instances


does this need to be public?

stes · 2025-05-28T02:01:39Z

cebra/data/base.py

@@ -36,7 +37,7 @@
 __all__ = ["Dataset", "Loader"]


-class Dataset(abc.ABC, cebra.io.HasDevice):
+class Dataset(abc.ABC, cebra.io.HasDevice, cebra_data_masking.MaskedMixin):


Is this Mixin used anywhere else in the codebase?

gonlairo and others added 30 commits August 23, 2024 13:54

first proposal for batching in tranform method

283de06

first running version of padding with batched inference

202e379

start tests

1f1989d

add pad_before_transform to fit function and add support for convolut…

8665660

…ional models in _transform

remove print statements

8d5b114

first passing test

32c5ecd

add support for hybrid models

9928f63

rewrite transform in sklearn API

be5630a

baseline version of a torch.Datset

1300b20

move batching logic outside solver

bc6af24

move functionality to base file in solver and separate in functions

ec377b9

add test_select_model for single session

6f9ca98

add checks and test for _process_batch

fbe7eb4

add test_select_model for multisession

463b0f8

make self.num_sessions compatible with single session training

5219171

improve test_batched_transform_singlesession

f9bd1a6

make it work with small batches

e23a7ef

make test with multisession work

19c3f87

change to torch padding

87bebac

add argument to sklearn api

f0303e0

add torch padding to _transform

8c8be85

convert to torch if numpy array as inputs

59df402

add distinction between pad with data and pad with zeros and modify t…

1aadc8b

…est accordingly

differentiate between data padding and zero padding

bc8ee25

remove float16

5e7a14c

change argument position

928d882

clean test

07bac1c

clean test

0823b54

Fix warning

9fe3af3

Improve modularity remove duplicate code and todos

b417a23

CeliaBenquet added 4 commits May 1, 2025 13:38

Merge branch 'batched-inference-and-padding' into unified-cebra

464f4aa

Remove masking init()

57c9494

Remove shuffled neurons in unified dataset

0d953fc

Remove extra datasets

eba09b6

cla-bot bot added the CLA signed label May 1, 2025

CeliaBenquet added 5 commits May 20, 2025 11:14

Add tests on the private functions in base solver

cc8671c

Update tests and duplicate code based on review

b83421d

Fix quantized_embedding_norm undefined when normalize=False (Adapti…

f2d1e3a

…veMotorControlLab#249)

Fix tests

619a662

Adapt unified code to get_model method

32fae46

CeliaBenquet mentioned this pull request May 20, 2025

Add path to unified CEBRA demo icon #252

Open

Merge branch 'AdaptiveMotorControlLab:main' into unified-cebra

e7bedff

CeliaBenquet requested review from stes and MMathisLab May 20, 2025 10:51

MMathisLab mentioned this pull request May 23, 2025

Batched inference CEBRA & padding at the Solver level #168

Merged

MMathisLab changed the base branch from batched-inference-and-padding to main May 23, 2025 13:39

MMathisLab and others added 3 commits May 23, 2025 15:51

Update mask.py

b4caf3a

add headers to new files

Merge branch 'main' into unified-cebra

718f7ca

Update masking.py

bbf0e8f

- header

MMathisLab reviewed May 26, 2025

View reviewed changes

cebra/integrations/sklearn/cebra.py Outdated Show resolved Hide resolved

Update test_data_masking.py

858b77b

- header

MMathisLab requested changes May 26, 2025

View reviewed changes

CeliaBenquet added 5 commits May 26, 2025 15:10

Implement review comments and fix typos

4d5e9c3

Fix docs errors

4424ba1

Remove np.int typing error

a968768

Fix docstring warning

535cef3

Fix indentation docstrings

8798aa0

stes requested changes May 28, 2025

View reviewed changes

Add unified encoder pytorch implementation #251

Are you sure you want to change the base?

Add unified encoder pytorch implementation #251

Uh oh!

Conversation

CeliaBenquet commented May 1, 2025 • edited by MMathisLab Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MMathisLab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CeliaBenquet commented May 1, 2025 •

edited by MMathisLab

Loading