Fitting a GLM to discrete-time events #315

AamnaLawrence · 2025-02-24T23:00:47Z

AamnaLawrence
Feb 24, 2025

Hello everyone! I am trying to determine whether the activity of a neuron is modulated by discrete-time events in the task (lever insertion in the box, reward delivery etc). I prepare the events variable as follows:

events = nap.TsGroup(
    {
        1: nap.Ts(LeverInsert.t),
        2: nap.Ts(Reward.t),
     },
    metadata={"event_type": ["Lever Insert", "Reward"]},time_support=nap.IntervalSet(0,np.max(LeverInsert.t)))

An example of how the Lever Insert event looks like is:

Time (s)
46.153708
107.813152
197.822303
287.831483
340.031011
430.039936
465.079573
555.088619
645.097632
735.106611
...
3625.244659
3677.16383
3733.702946
3823.711426
3913.719999
4003.728668
4093.737208
4183.745749
4273.754195
4363.76283
shape: 61

When I try to compute the features for these time inputs, I get a bunch of nans. I have been playing around with different window sizes but it did not help me. Because of the possibly faulty feature matrix, all my GLM coefficients are 0. Here is the code I am using for finding the features:

bin_size = 0.01
binned_events = events.count(bin_size)
# pass time support to make sure they span the same range
count  = spikes_real.count(bin_size, ep=binned_events.time_support)

# define a basis over 500ms
window_size = int(0.5 / bin_size)
# add the basis using the label
add_basis = sum(*(nmo.basis.RaisedCosineLogConv(5, window_size, label=label)
      for label in events.event_type))

# pass each time series individually to the basis
X = add_basis.compute_features(binned_events[:, 0], binned_events[:, 1])

Has anyone used discrete-time events as inputs to GLM? Any insights on choosing the right basis functions on NeMoS would be very helpful.

Thank you!!

BalzaniEdoardo · 2025-02-25T17:02:44Z

BalzaniEdoardo
Feb 25, 2025
Maintainer

@AamnaLawrence thanks the discussion and for sharing the data!

I’ve looked into this more closely, and your approach to modeling events is sound. However, there are a few important considerations:

The total recording time is about 1 hour and 12 minutes, during which there are 24 reward events and 61 lever presses. This event imbalance means that if both events equally influence firing rate, statistical methods will be more confident in attributing an effect to lever presses, simply due to their higher frequency. If you’re still running experiments, increasing the number of both reward and lever events would improve robustness. Alternatively, balancing the events post hoc (by undersampling) is another option before performing any model comparison.
Standard 80/20 train-test splits (such as in 5-fold cross-validation) are not well-suited here, since each test set would contain only ~4.8 reward events (24 * 0.2), which is too few for reliable evaluation. Instead, we use a 50/50 split to ensure enough predictive power in the test set. If additional generalization testing is needed, time-based validation may also be worth considering.
Before modeling the entire population, it's useful to select a neuron that shows clear peri-event modulation. By plotting peri-event time histograms (PETHs) for all neurons, we can visually identify one that exhibits a strong firing rate change around events. This helps in choosing appropriate basis parameters, particularly the number of basis functions and whether a causal or acausal structure is needed. In the example, unit 482 was chosen because it showed strong modulation.
The firing rate is not stationary over time, fluctuating significantly. This is evident when smoothing the spike count (spikes_real[482].count(0.01).smooth(std=4)). A model with only an intercept and event predictors will likely struggle unless we account for non-stationarity. One effective approach is to add a self-history predictor (as we did in the model using the RaisedCosineLogConv basis function), allowing the model to capture longer timescale dependencies.
Since classic splitting does not work well, an alternative approach is to rank variable importance using a GroupLasso-regularized model. The strategy is to incrementally increase the regularization strength and observe the order in which predictors drop out. The most important variable will remain significant the longest. This method requires balancing events before applying it, or else the model will tend to overemphasize the more frequent event.
The first step is to fine-tune the basis function parameters by fitting an unregularized GLM and assessing its performance in terms of peri-event modulation (both qualitatively by comparing predicted and observed PETHs, and quantitatively using pseudo-R² scores). Once the model is satisfactory, we proceed to ranking variables by fitting a GroupLasso regularized model over a range of regularization strengths and tracking both the pseudo-R² score and coefficient norms.

Interpreting the results

If the test score drops significantly compared to training, the model may be overfitting.
If the test set is too small, the test score will have high variance, making direct train-test comparisons unreliable.
As regularization increases, the first predictors to drop out are typically the least relevant ones, allowing us to infer the most predictive features.

This first script can be used for wrangling with the basis parameters until we are satisfied with the fit (in terms of the score we get and the prediction of the peri-event modulation).

# Fit a GLM
import pynapple as nap
import numpy as np
import matplotlib.pyplot as plt
import nemos as nmo
import jax

jax.config.update("jax_enable_x64", True)

# load data and store events in a TsGroup
data = nap.load_file("/Users/ebalzani/Downloads/NWB_phy.nwb")

lever = data["LeverInsert"]
rew = data["Reward"]

events = nap.TsGroup(
    {
        1: nap.Ts(lever.t),
        2: nap.Ts(rew.t),
    },
    metadata={"event_type": ["Lever Insert", "Reward"]}, time_support=nap.IntervalSet(0, np.max(lever.t)))

# print number of events
print(f"Num reward events: {len(rew)}")
print(f"Num lever events: {len(lever)}")

# filter for active neurons (at least > 1Hz)
spikes_real = data["units"]
spikes_real = spikes_real[spikes_real.rate>1]

# select a neuron, here to find this one I plotted all the peri-events and pick one that looked modulated by eye
# as a starting point to design a model.
# from the peri-event you guess if the rate modulation is causal (after the event you see a change in rate) or not (the rate pre and post event show some modulation)
neu_id = 482
f,axs=plt.subplots(1,2)
cc = 0
for cc, item in enumerate(events.items()):
    i, ev = item
    peth = nap.compute_perievent(
      data=spikes_real[neu_id],
      tref=ev,
      minmax=(-3, 3),
      time_unit="s")
    axs[cc].plot(np.mean(peth.count(0.05), 1)/0.05, linewidth=3, color="red")
    axs[cc].set_title(events.metadata["event_type"][i])
    axs[cc].axvline(0, *axs[cc].get_ylim())
    

# use a 50-50 split to have some predictive power in the test set
duration = lever.time_support.end[-1] - lever.time_support.start[0]
train = nap.IntervalSet(0, duration * 0.5)
test = nap.IntervalSet(train.end[-1] + 0.0001, lever.time_support.end[-1])

# define a window size 
bin_size = 0.01
window = int(2 / bin_size)

# bin events and spike times
binned_events = events.count(bin_size)
count = spikes_real.count(bin_size, ep=binned_events.time_support) # pass time support to make sure they span the same range

# define a basis:
# for the most fair the comparison use the same number of basis function and (same type, at least for the two variable that you are directly comparing)
# for this neuron, it looks like the lever press is causal (rate increases after the press), for the reward, it looks like an acausal modulation
# you can model both type of predictors ("causal" is the default and most common)
add_basis = (
    nmo.basis.RaisedCosineLinearConv(11, window, width=4, label="Lever Insert") +  # default is causal
    nmo.basis.RaisedCosineLinearConv(11, window, width=4, label="Reward", conv_kwargs={"predictor_causality": "acausal"}) +
    nmo.basis.RaisedCosineLogConv(11, window, width=4, label="Spike History")
)

# define train and test design matrix (restrict first to avoid border effects)
X_train = add_basis.compute_features(
    binned_events[:, 0].restrict(train),
    binned_events[:, 1].restrict(train),
    count.loc[neu_id].restrict(train),
)
X_test = add_basis.compute_features(
    binned_events[:, 0].restrict(test),
    binned_events[:, 1].restrict(test),
    count.loc[neu_id].restrict(test),
)

# first check the the performance of an un-regularized model
model = nmo.glm.GLM(solver_name="LBFGS", solver_kwargs={"tol": 10 ** -12}).fit(
    X_train, count.loc[neu_id].restrict(train)
)

# print the scores: if there is a massive drop in score then we are over fitting.
# if the test set is too small the test score has too much variance so the K-fold directly may not be very informative
score_train = model.score(X_train, count.loc[neu_id].restrict(train), score_type="pseudo-r2-McFadden")
score_test = model.score(X_test, count.loc[neu_id].restrict(test), score_type="pseudo-r2-McFadden")
print(f"Train score: {score_train}\nTest score: {score_test}")

# compare the peri-event from the test set based on the predicted rate and the raw spikes 
rate_test = model.predict(X_test) / bin_size
f,axs=plt.subplots(1,2)
cc = 0
for cc, item in enumerate(events.items()):
    i, ev = item
    peth = nap.compute_perievent(
      data=spikes_real[neu_id].restrict(test),
      tref=ev.restrict(test),
      minmax=(-3, 3),
      time_unit="s")
    peth_rate = nap.compute_perievent_continuous(
      data=rate_test,
      tref=ev.restrict(test),
      minmax=(-3, 3),
      time_unit="s")
    axs[cc].plot(np.mean(peth.count(0.05), 1)/0.05, linewidth=3, color="red", label="raw")
    axs[cc].plot(np.nanmean(peth_rate, 1),label="model")
    axs[cc].set_title(events.metadata["event_type"][i])
    axs[cc].axvline(0, *axs[cc].get_ylim())
    axs[cc].legend()
plt.show()

Once this is satisfactory proceed with the GroupLasso ranking of the variable. (as a note, I quickly balanced the events - not in the script below because it was not systematic - and the variables seemed to be equally contributing for this neuron).

from tqdm import tqdm  # progress bar
# create some log-spaced regularization strength for group lasso
reg_str = np.geomspace(1E-5, 0.1, 20)
# one can use the whole data for this
X = add_basis.compute_features(binned_events[:,0], binned_events[:,1], count.loc[neu_id])

# define the variable grouping by constructing a mask
mask = np.zeros((len(add_basis), X.shape[1]))
cc = 0
for k, bas in enumerate(add_basis):
    mask[k, cc:cc+bas.n_basis_funcs] = 1
    cc += bas.n_basis_funcs
print(mask)

# initalize the arrays to store the norm of the coefficients for each predictor and the score
coeff_norms = np.zeros((len(reg_str), len(add_basis)))
scores = np.zeros(len(reg_str))

# loop over reg strength
for i, reg in tqdm(enumerate(reg_str)):
    regularizer = nmo.regularizer.GroupLasso(mask=mask)
    model = nmo.glm.GLM(
        regularizer=regularizer, solver_kwargs={"tol": 10 ** -12},
        regularizer_strength=reg
    ).fit(
        X, count.loc[neu_id]
    )
    coeff_dict = add_basis.split_by_feature(model.coef_, axis=0)
    # jax funciton that applies the norm to each vector in the coeff_dict equivalent of looping over the dict items
    coeff_norms[i] = jax.tree_util.tree_leaves(jax.tree_util.tree_map(np.linalg.norm, coeff_dict))
    scores[i] = model.score(X, count.loc[neu_id], score_type="pseudo-r2-McFadden")

# plot the results (since the events are unbalance, it may be misleading to interpret the output now, but if you balance
# the number of events, you can rank the variables from most to least significant)
# in general, as one adds more and more regularization, the score drops, and the norm of the coefficient too
f, axs = plt.subplots(2, 1, sharex=True)
axs[0].plot(reg_str, scores,"-ok")
axs[0].set_xscale('log')
axs[1].plot(reg_str, coeff_norms)
axs[1].legend([b.label for b in add_basis])
plt.show()

If the basis configuration is appropriate for multiple neurons, you can use the PopulationGLM to fit multiple neuron at the same time, this would speed up the computation time.

6 replies

AamnaLawrence Feb 26, 2025
Author

Hi @BalzaniEdoardo! After running the code I noticed that my r2 scores were pretty low (~0.05ish). I wonder if training the model around a certain window near the events might be better instead of taking the entire interval from first lever insertion to nth lever insertion since there are several events that happen during that time (the rat is freely moving and may groom etc) leading to modulation in firing rate and poor fitting.

BalzaniEdoardo Feb 27, 2025
Maintainer

Hello!

I see your point—I had similar concerns when I first started fitting Poisson models. Restricting the analysis to intervals around the event is a good idea. It should lead to a better score and also speed up the fit, even though it won’t change the estimated coefficients.

I have a few thoughts on pseudo-R² scores and how I interpret them that might be helpful.

Understanding Pseudo-R² in Poisson Models

First, pseudo-R² cannot be interpreted in the same way as variance explained in linear regression. In Poisson models with low counts (e.g., when binning finely or when units have a low mean firing rate), pseudo-R² values tend to be much lower. From my experience, an extremely good fit might yield a score between 0.1 and 0.3.

Unfortunately, to my knowledge, there isn’t an established reference for interpreting the absolute values of these scores. My advice is to use pseudo-R² primarily for comparing models rather than for assessing absolute goodness-of-fit. For instance, in this paper by Benjamin et al., the authors compare GLMs to more advanced nonlinear encoding models, and the reported scores are comparable to what you might observe in your case.

Validating Model Scores via Simulation

To check whether my model’s scores are reasonable given the input and count statistics, I often simulate spike trains from a GLM with fixed coefficients. Typically, I take the coefficients I learned from fitting a GLM and use them to generate synthetic data. If the pseudo-R² scores for the simulated spikes fall within the same range as my actual fits, it suggests that my observed scores are near the best I can expect.

Here’s an example of how I generate simulated spikes and compute the pseudo-R² score:

# First, fit a GLM model
model = nmo.glm.GLM(solver_name="LBFGS", solver_kwargs={"tol": 1e-12}).fit(
    X_train, count.loc[neu_id].restrict(train)
)

# Extract model coefficients
coeff_lever = add_basis.split_by_feature(model.coef_, axis=0)["Lever Insert"]
intercept = model.intercept_

# Create a new model using the estimated coefficients
model_sim = nmo.glm.GLM()
model_sim.coef_ = coeff_lever
model_sim.intercept_ = intercept

# Define the basis function for the predictor
lever_basis = nmo.basis.RaisedCosineLogConv(
    11, window, width=4, label="Lever Insert", conv_kwargs={"predictor_causality": "causal"}
)

# Extract the corresponding design matrix columns
predictor_lever = add_basis.split_by_feature(X_train, axis=1)["Lever Insert"]

# Simulate spikes (without recurrent connections, the simulate method does not allow for recurrent simulations)
simulated_spikes, simulated_rate = model_sim.simulate(jax.random.PRNGKey(123), predictor_lever)

# Compute pseudo-R² score
model_sim.score(predictor_lever, simulated_spikes, score_type="pseudo-r2-McFadden")

Running this with a single predictor gives a pseudo-R² score of 0.00080415, which is very close to zero—even though I’m literally using the true model to generate the spikes.

The reason for this is that the pseudo-R² is measuring how much better the model’s predictions are compared to a Poisson model with a constant firing rate. Since lever events are rare, most design matrix entries are zero, so the improvement over a simple mean model is small.

So your intuition is correct! 🚀

PS The fact that your score was 0.05 combined with this simulation result suggests that the self-connectivity is quite important for model predictions here.

BalzaniEdoardo Feb 27, 2025
Maintainer

you can also see this by plotting the simulated rate over time, and it is basically always constant with some brief spikes when the lever event happen

AamnaLawrence Mar 4, 2025
Author

Hi Edoardo!
I have been trying to use different predictors for instance the cumulative number of trials (in case a neuron is tracking the time in the session) in addition to the lever insertion time and spike history. Here is how I have been doing it:

bin_size=0.01
window = int(2 / bin_size)
lever=LeverInsert.count(bin_size) #LeverInsert are the time stamps of lever insertion event
trialcum=nap.Tsd(d=np.cumsum(lever),t=lever.t) #cumulative trials



count = spikes_real.count(bin_size, ep=LeverInsert.time_support)
basis1 = nmo.basis.RaisedCosineLogConv(11, window, width=4,label='Spike History')
basis2 = nmo.basis.RaisedCosineLinearConv(11, window, width=4, label="Lever Insert")
basis3 = nmo.basis.RaisedCosineLinearConv(11, window, width=4, label='Cumulative trials')

basis=basis1+basis2+basis3

time, basis_kernel1 = basis1.evaluate_on_grid(window)
time, basis_kernel2 = basis2.evaluate_on_grid(window)
time, basis_kernel3 = basis3.evaluate_on_grid(window)

duration = LeverInsert.time_support.end[-1] - LeverInsert.time_support.start[0]
train = nap.IntervalSet(0, duration * 0.5)
test = nap.IntervalSet(train.end[-1] + 0.0001, lever.time_support.end[-1])

X_train = basis.compute_features(
   
    count.loc[neu_id].restrict(train),
    lever.restrict(train),
    trialcum.restrict(train),
   
)
X_test = basis.compute_features(

    count.loc[neu_id].restrict(test),
    lever.restrict(test),
    trialcum.restrict(test),
    
)

# first check the the performance of an un-regularized model
model = nmo.glm.GLM(solver_name="LBFGS", solver_kwargs={"tol": 10 ** -12}).fit(
    X_train, count.loc[neu_id].restrict(train)
)

score_train = model.score(X_train, count.loc[neu_id].restrict(train), score_type="pseudo-r2-McFadden")
score_test = model.score(X_test, count.loc[neu_id].restrict(test), score_type="pseudo-r2-McFadden")
print(f"Train score: {score_train}\nTest score: {score_test}")

coef=basis.split_by_feature(model.coef_, axis=0)
fig, ax = plt.subplots(1, 3, figsize=(12, 6))
temp_weights = np.einsum('b, t b -> t', coef['Spike History'], basis_kernel1)
ax[0].plot(time, temp_weights)
ax[0].set_title('Spike history kernel weight')
temp_weights = np.einsum('b, t b -> t', coef['Lever Insert'], basis_kernel2)
ax[1].plot(time, temp_weights)
ax[1].set_title('Lever insertion time kernel weight')
temp_weights = np.einsum('b, t b -> t', coef['Cumulative trials'], basis_kernel3)
ax[2].plot(time, temp_weights)
ax[2].set_title('Cumulative trial kernel weight')
fig.tight_layout(h_pad=1)

The test and train scores are around 0.005 and I get the following plots:

I want to understand a clear interpretation of the figures above and if the features make sense. I would also like to add features like cumulative rewards etc but I am not sure how to construct the input feature for that.

Thank you for your help!

BalzaniEdoardo Mar 7, 2025
Maintainer

Hello Aamna, I finally had time to check out your latest script. Before I even start, do you need the prediction over time or would it be sufficient for you to predict the count relative to the start of the trial?

That would need a much simpler use of a GLM, where instead of predicting counts over time, your samples are directly the trials. This can be more robust to noise in case you don't have a lot of repetitions and it is a much simpler use case for GLMs.

To respond directly to your question about the interpretation see below:

Interpretation

The spike train and lever insertion filters are straight-forward to interpret:

Spike history: the filter is always positive, showing that the probability of spiking increases after a spike,with a much higher probability of firing a few after the spikes. The fact that you don't see the refractory period is due to aliasing (we are binning at 10ms while the refractory period is 4ms followed by a sharp increase in firing probability). You can use pynapple to plot the autocorr at 1ms resolution to see that.
Lever insertion: firing rate in your plot show that the firing rate decreases after lever insertion. This is not what I am getting when I fit the data, so are you using a different neuron id? or the plot do not reflect the script (for example the basis convolution was anti-causal or acausal instead of your script config?)

It is hard to interpret the feature related to the cumulative trial. If you plot them you'll see that with the convolution you get an increasing time series for each feature. The slope of this increasing line depends on the area under the filter. So basically, it is kind of like having multiple feature all representing the trial identity, which is not very useful.

If you need a mean rate per trial feature, I would use a 1-hot encoding of the trials as a predictor.

Stability

I did a minimal check for filter stability by fitting train and test data and plot the resulting filters (similar to what we did with the tutorial on head direction). The result was that the spike history filter very consistent, the lever press is highly correlated but the amplitude drops significantly, the cumulative trial one is basically uncorrelated.

I tried also to fit a model with much finer binning (1ms) and the refractory period starts showing.

duration = LeverInsert.time_support.end[-1] - LeverInsert.time_support.start[0]
train = nap.IntervalSet(0, duration * 0.5)
test = nap.IntervalSet(train.end[-1] + 0.0001, lever.time_support.end[-1])

X_train = basis.compute_features(

    count.loc[neu_id].restrict(train),
    lever.restrict(train),
    trialcum.restrict(train),

)
X_test = basis.compute_features(

    count.loc[neu_id].restrict(test),
    lever.restrict(test),
    trialcum.restrict(test),

)

dict_splits = basis.split_by_feature(X_train, axis=1)

plt.title("Cumulative trials features")
plt.plot(dict_splits["Cumulative trials"])
plt.show()


# first check the the performance of an un-regularized model
model = nmo.glm.GLM(solver_name="LBFGS", solver_kwargs={"tol": 10 ** -12}).fit(
    X_train, count.loc[neu_id].restrict(train)
)

model_test = nmo.glm.GLM(solver_name="LBFGS", solver_kwargs={"tol": 10 ** -12}).fit(
    X_test, count.loc[neu_id].restrict(test)
)


score_train = model.score(X_train, count.loc[neu_id].restrict(train), score_type="pseudo-r2-McFadden")
score_test = model.score(X_test, count.loc[neu_id].restrict(test), score_type="pseudo-r2-McFadden")
print(f"Train score: {score_train}\nTest score: {score_test}")

coef = basis.split_by_feature(model.coef_, axis=0)
coef_test = basis.split_by_feature(model_test.coef_, axis=0)
fig, ax = plt.subplots(1, 3, figsize=(12, 6))
temp_weights = np.einsum('b, t b -> t', coef['Spike History'], basis_kernel1)
temp_weights_test = np.einsum('b, t b -> t', coef_test['Spike History'], basis_kernel1)
ax[0].plot(time * acg_window, temp_weights)
ax[0].plot(time * acg_window, temp_weights_test)
ax[0].set_title('Spike history kernel weight')
temp_weights = np.einsum('b, t b -> t', coef['Lever Insert'], basis_kernel2)
temp_weights_test = np.einsum('b, t b -> t', coef_test['Lever Insert'], basis_kernel2)
ax[1].plot(time, temp_weights)
ax[1].plot(time, temp_weights_test)
ax[1].set_title('Lever insertion time kernel weight')
temp_weights = np.einsum('b, t b -> t', coef['Cumulative trials'], basis_kernel3)
temp_weights_test = np.einsum('b, t b -> t', coef_test['Cumulative trials'], basis_kernel3)
ax[2].plot(time * window, temp_weights)
ax[2].plot(time * window, temp_weights_test)
ax[2].set_title('Cumulative trial kernel weight')
fig.tight_layout(h_pad=1)
plt.show()

Here are the plots on filter stability with 1ms.

sjvenditto · 2025-03-07T21:20:02Z

sjvenditto
Mar 7, 2025
Maintainer

Hi @AamnaLawrence, bouncing off of @BalzaniEdoardo last comment, for your question, you might not need a GLM to model each time point. Instead you could construct a GLM that only predicts the window around the lever or reward presentations, and you wouldn't need to use basis functions at all. For example,

import pynapple as nap
import matplotlib.pyplot as plt
import numpy as np

data = nap.load_file("NWB_phy.nwb")
lever = data["LeverInsert"]
rew = data["Reward"]

# some control times when there is neither a lever insert nor a reward
control = np.sort(np.hstack((lever.t, rew.t)))
control = (control[1:] + control[:-1]) / 2

events = nap.TsGroup(
    {
        1: nap.Ts(lever.t),
        2: nap.Ts(rew.t),
        3: nap.Ts(control),
    },
    metadata={"event_type": ["Lever Insert", "Reward", "Control"]}, time_support=nap.IntervalSet(0, np.max(lever.t)))

# example unit 21 likes reward, 34 likes lever insert
spikes = data["units"][34]

# put all the times in a single array
all_times = np.hstack((lever.t, rew.t, control))
# feature for lever insert: 1 at lever insert times, 0 otherwise
is_lever = np.hstack((np.ones_like(lever.t), np.zeros_like(rew.t), np.zeros_like(control)))
# feature for reward: 1 at reward times, 0 otherwise
is_reward = np.hstack((np.zeros_like(lever.t), np.ones_like(rew.t), np.zeros_like(control)))
# sort all times and features
sort_idx = np.argsort(all_times)
all_times = all_times[sort_idx]
is_lever = is_lever[sort_idx]
is_reward = is_reward[sort_idx]
# concatenate the features
X = np.vstack((is_lever, is_reward)).T
# compute counts in 1 second window after each time (0 s before, 1 s after)
tref = nap.Ts(all_times)
y = nap.compute_perievent(data=spikes, tref=tref, minmax=(0,1)).count().d.ravel()
# there won't be any overlap because times are all >1 second apart
print("minimum distance between time points: ",np.min(np.diff(all_times)))

# fit a model
model = nmo.glm.GLM()
model.fit(X, y)
print("model coefficients: ",model.coef_)

which should return:

minimum distance between time points:  1.414945500000158
model coefficients:  [-0.2029657  1.0960554]

Suggesting this unit is positively modulated by reward, with a lesser influence from lever presses. Note that I added some control times as a "baseline" when there are neither a lever insert or reward. This can be done in any number of ways, but for simplicity I just took each midpoint between adjacent events.

Since you don't have many trials to do cross validation, there are other ways you can determine the impact of these features, such as the partial-R2 or coefficient of partial determination (CPD) (see https://online.stat.psu.edu/stat462/node/138/). You can compute this using the summed deviance of the model fits

# get deviance for full model
dev_full = np.sum(model.observation_model.deviance(y, model.predict(X)))

# fit a model with only the reward feature (i.e. drop the lever insert feature)
model_nolever = nmo.glm.GLM()
model_nolever.fit(X[:,1:], y)
dev_nolever = np.sum(model_nolever.observation_model.deviance(y, model_nolever.predict(X[:,1:])))

# fit a model with only the lever insert feature (i.e. drop the reward feature)
model_noreward = nmo.glm.GLM()
model_noreward.fit(X[:,:1], y)
dev_noreward = np.sum(model_noreward.observation_model.deviance(y, model_noreward.predict(X[:,:1])))

# cpd for lever insert
cpd_lever = (dev_nolever - dev_full) / dev_nolever
print("lever: ", cpd_lever)

# cpd for reward
cpd_reward = (dev_noreward - dev_full) / dev_noreward
print("reward: ", cpd_reward)

which should return:

lever:  0.004883038
reward:  0.14686693

This suggests that 14% of variation during this window is explained by reward, and only 0.4% of variation is explained by lever presses. You can see this type of GLM and analysis being done in this study here: https://elifesciences.org/articles/64575 (see figure 4), although it takes the example above a step further and instead of fitting it for a single trial-locked window, it fits multiple GLMs for many smaller trial-locked windows, and looks at how the CPD changes over time. The significance of the CPD can be computed by comparing it to the CPDs of "null" datasets by circularly permuting the stimulus identity (see the previous manuscript for further explanation).

However, since these stimuli never seem to overlap, fitting a GLM for this question might be overkill. You can get the same results by just computing PETHs separately for each event and comparing them

# PETH for lever presses
lever_pe = nap.compute_perievent(data=spikes, tref=events[1], minmax=(-5,5))
lever_peth = np.mean(lever_pe.count(0.2), axis=1) / 0.2
plt.plot(lever_peth, label="Lever Insert")

# PETH for rewards
reward_pe = nap.compute_perievent(data=spikes, tref=events[2], minmax=(-5,5))
reward_peth = np.mean(reward_pe.count(0.2), axis=1) / 0.2
plt.plot(reward_peth, label="Reward")

# PETH for control times
control_pe = nap.compute_perievent(data=spikes, tref=events[3], minmax=(-5,5))
control_peth = np.mean(control_pe.count(0.2), axis=1) / 0.2
plt.plot(control_peth, label="Control")
plt.legend()
plt.ylabel("Firing Rate (Hz)")
plt.xlabel("Time from event (s)")

As you can see for this unit, there is a clear peak at the reward time that doesn't occur during lever insert or control windows.

2 replies

BalzaniEdoardo Mar 7, 2025
Maintainer

i agree with Sarah, especially if you are comparing just two variables that are not correlated a glm could be an overkill. the model based approach could still be interesting to rank variables by relevance if you have more than two predictors.
you can calculate the CPD or use the lasso based ranking (the approach i mentioned in the first post). note that a regular lasso would be what you need here since the glm approach that sarah showed doesn't require multiple coeffs per event.

independently of the method, if you do any sort of testing (like permutation testing or computing multiple peth over randomly sampled times to get a noise level) always balance the number of events between your categories otherwise the variance in whichever estimate (glm coeffs or peth) could be higher for spaser events biasing your conclusion.

AamnaLawrence Mar 7, 2025
Author

Hi Sarah and Edoardo!
This was incredibly helpful. Thank you so much for your time and support. I will have more variables in my model: lever press times, lick times, port entry times, time across the session, cumulative rewards (to investigate satiety effects) etc and then ranking them for each neuron so I think GLM should still be valid. Would normalizing features like cumulative rewards and time along the session need to be done? I am aiming to build a glm model like this one: https://www.science.org/doi/full/10.1126/sciadv.abc9321
Edoardo, could you elaborate a little more on balancing events and/or refer to some publications/tutorials that have done that? Also, how can I find interaction among the regressors?
Thank you once again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fitting a GLM to discrete-time events #315

{{title}}

Replies: 2 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Fitting a GLM to discrete-time events #315

AamnaLawrence Feb 24, 2025

Replies: 2 comments · 8 replies

BalzaniEdoardo Feb 25, 2025 Maintainer

Interpreting the results

AamnaLawrence Feb 26, 2025 Author

BalzaniEdoardo Feb 27, 2025 Maintainer

Understanding Pseudo-R² in Poisson Models

Validating Model Scores via Simulation

BalzaniEdoardo Feb 27, 2025 Maintainer

AamnaLawrence Mar 4, 2025 Author

BalzaniEdoardo Mar 7, 2025 Maintainer

Interpretation

Stability

sjvenditto Mar 7, 2025 Maintainer

BalzaniEdoardo Mar 7, 2025 Maintainer

AamnaLawrence Mar 7, 2025 Author

AamnaLawrence
Feb 24, 2025

Replies: 2 comments 8 replies

BalzaniEdoardo
Feb 25, 2025
Maintainer

AamnaLawrence Feb 26, 2025
Author

BalzaniEdoardo Feb 27, 2025
Maintainer

BalzaniEdoardo Feb 27, 2025
Maintainer

AamnaLawrence Mar 4, 2025
Author

BalzaniEdoardo Mar 7, 2025
Maintainer

sjvenditto
Mar 7, 2025
Maintainer

BalzaniEdoardo Mar 7, 2025
Maintainer

AamnaLawrence Mar 7, 2025
Author