Skip to content

Output files MATLAB

klaragerlei edited this page Sep 13, 2018 · 1 revision

Outputs of post-clustering matlab script

These are uploaded to the lab's datastore and saved in the folder of the specific recording.

Firing times and snippets

Firings0 - This is for the dataset where all channels are whitened together and then tetrodes are sorted separately

Firings1 - This is when tetrodes are whitened separately and the sorted separately

http://mountainsort.readthedocs.io/en/latest/first_sort.html?highlight=output%20file

These are matlab files, created by the post-processing matlab script.

Spreadsheet containing high level information on clusters. This is used by Makedatasheet later on.

datasave_all.mat

datasave_separate.mat


Outputs of Makedatasheet curation script

This script runs on a folder that may contain multiple recording folders, and saves its output to the highest level. It uses Firings0 and Firings1 files saved by the post-clustering script

data_all.csv - this is based on tetrode data where all channels were whitened together

data_separate.csv - this is based on tetrode data that was whitened tetrode by tetrode

In these spreadsheets, each row represents a cluster. In the next section, I will define what the columns of this spreadsheet are.

Basic session information and IDs

animal - name of animal taken from the first part of the folder name, [M6]

day - date of the recording based on folder name, [12/03/2018]

tetrode - ID of tetrode where the cluster was recorded, [3]

cluster - ID of cluster on given tetrode and day [10]

nspikes - number of spikes that belong to the cluster from the whole session [278960]

coverage - percentage of the surface area of the open field arena that the animal covered during the exploration

The arena is binned to 2.5 cm * 2.5 cm squares, and then based on the position information, the percentage of bins visited by the animal is calculated.

avgFR - average firing rate

The total number of spikes (within the cluster) is divided by the total recording time.

Waveform data

maxamplitude - This is the highest out of the 4 average amplitudes of the tetrode

maxchannel - the channel of the tetrode (1-4) that has the highest average amplitude

spikewidth - peak-trough of mean waveform, the mean is taken from the channel with the highest amplitude spike (given in number of samples)

Head-direction data

function [frh_hd,meandir_hd,r_hd]=plothd(hd,spkhd,sampling_rate,subplots)

Parameters hd is a vector calculated in readBonsai for each unit of time as the angle (degrees) of the line between two beads attached to the headstage. It is an output from get_position_data, which extracts it from the raw data with GetPostSyncedCorr, which uses readBonsai to extract data. Bonsai records positions of two beads on the headstage of the animal. spkhd = is a vector of values of hd for times when the cell fired. sampling_rate = the sampling rate. subplots = indicates coordinates on the output figure where subplots should go.

frh_hd is calculated from frh, which is the polar histogram of the heading direction corresponding to firing of each spike normalized to the overall distribution of heading directions. It's the heading direction (degrees) at which the firing rate is maximal.

meandir_hd is the same as frh_hd.

r_hd is the radius of the preferred head direction.

These are calculated in plothd.m and polat_hist.m and definitely need double checking These are the three values (Peak, PFD, |r|) that are written above the HD plot on the output figure.

HD_maxFR - maximum firing rate of the cell among firing rates calculated for each angle the animal's head can face. This is calculated by dividing the polar histogram of spikes of the cluster by the polar histogram of how much time the animal spent looking each way (spike_hist/head_dir_hist). Firing rate is given in Hz (1/s) This is called frh_hd in plothd.m

meanHD - the mean direction the animal was facing when the cell fired. This is the 'preferred direction'. It is given in degrees. This is called meandir_hd in plothd.m

r_HD - radius of head-direction histogram when the cell fired at preferred direction. This represents how much the mouse was facing that way while the cell fired relative to the whole recording session. x and y components of the firing vectors in the polar plot are calculated (sin and cos alpha are used in a unit circle). All x and y components of the polar plot are added together normalized to firing rate, and then 'r' is calculated by applying the Pythagorean theorem. This number is between 0 and 1, the higher the more direction specific the firing.

These values are calculated from data when the mouse was stationary

skaggs ?? According to Bri's thesis, this is called 'spatial information score' and was published by Skaggs, McNaughton and Markus (1993). This represents the amount of information about the location of the animal which is encoded in each spike.

sum(PiLi/Llog2()Li/L),

where Li is the avg firing rate of a unit in the i-th bin, L is the overall avg firing rate, and pi is the probability of the animal being in the ith bin (dwell time in ith bin/total recording time).

sparsity - script says : 'Warning this may all be incorrect' - quantifies how spatially sparse the firing of the cell is Jung et. al. 1994 J Neurosci.

spatialcoherence - spatial coherence is calculated based on what was described in Kubie et. al. 1990 J Neurosci., use Fisher r-Z' transform to normalise the correlation +-1.96 are 95% confidence interval http://www.jneurosci.org/content/jneuro/9/12/4101.full.pdf Spatial coherence is an estimate of firing pattern quality. The third estimate of orderliness of the spatial firing distribution is a first-order autocorrelation that will be called 'coherence'. Coherence is the z-transform of the correlation between a list of firing rates in each pixel and a corresponding list of firing rates averaged over the 8 nearest neighbors of each pixel. Coherence measures the extent to which the firing rate in a pixel is predicted by the rates of its neighbors, and therefore estimated the local orderliness of the spatial firing pattern.

maxFRspatial - firing rate is calculated for each bin of the open field, and the highest number is taken out of these

gridscore - we could also output grid spacing, field size, and grid orientation,an ellipticity but these are not saved now

A script which analyses grid cell autocorrelograms and outputs several commonly used grid field measures.

Input:
amap = the autocorrelation matrix for processing
binsize = the bin size (cm) used to create the original firing rate map

Output:
	Grid score:
	Defined by Krupic, Bauza, Burton, Barry, O'Keefe (2015) as the difference between the minimum correlation coefficient for autocorrelogram 
	rotations of 60 and 120 degrees and the maximum correlation coefficient for autocorrelogram rotations of 30, 90 and 150 degrees.
	This score can vary between -2 and 2, although generally values above below -1.5 or above 1.5 are uncommon

	Grid spacing/wavelength:
	Defined by Hafting, Fyhn, Molden, Moser, Moser (2005) as the distance from the central autocorrelogram peak to the vertices of the inner 
	hexagon in the autocorrelogram (the median of the six distances). 
	This should be in cm.

	Field Size:
	Defined by Wills, Barry, Cacucci (2012) as the square root of the area of the central peak of the autocorrelogram divided by pi.
	This should be in cm2

	Grid orientation:
	Defined by Hafting, Fyhn, Molden, Moser, Moser (2005) as the angle between a camera-defined reference line (0 degrees or x axis) 
	and a vector to the nearest vertex of the inner hexagon in the counterclockwise direction
	This is in degrees and can vary between 0 and 59 (after 59 a new field should emerge at 0 if its a grid cell);
	Ellipticity/eccentricity:
	As measured by Krupic, Bauza, Burton, Barry, O'Keefe (2015) by fitting an ellipse to the six central peaks of the local spatial 
	autocorrelogram using a least squares method. Eccentricity e was used as a measure of ellipticity (with 0 indicating a perfect 
	circle): e = sqrt(1 - (b^2/a^2)) where a and b are the major and minor axis lengths respectively
	This varies between 0 and 1; 0 is the ellipticity of a perfect circle, 1 is the ellipticity of a parabola

These values are calculated from data when the mouse was not stationary (running) -the same as above

skaggsrun

sparsrun

spatialcoherencerun

maxFRspatialrun

gridscorerun

The following values are related to light stimulation - this is expecting 100 3 ms long pulses

This analysis is based on Kvitsiani et al (2013) and uses SALT analysis

SALT Stimulus-associated spike latency test. [P I] = SALT(SPT_BASELINE,SPT_TEST,DT,WN) calculates a modified version of Jensen-Shannon divergence (see Endres and Schindelin, 2003) for spike latency histograms.

Input arguments:

   SPT_BASELINE - Discretized spike raster for stimulus-free baseline
       period. N x M binary matrix with N rows for trials and M 
       columns for spikes. Spike times have to be converted to a
       binary matrix with a temporal resolution provided in DT. The
       baseline segment has to excede the window size (WN) multiple
       times, as the length of the baseline segment divided by the
       window size determines the sample size of the null
       distribution (see below).
   SPT_TEST - Discretized spike raster for test period, i.e. after
       stimulus. N x M binary matrix with N rows for trials and M 
       columns for spikes. Spike times have to be converted to a
       binary matrix with a temporal resolution provided in DT. The
       test segment has to excede the window size (WN). Spikes out of
       the window are disregarded.
   DT - Time resolution of the discretized spike rasters in seconds.
   WN - Window size for baseline and test windows in seconds
       (optional; default, 0.01 s).

Output arguments:

   P - Resulting P value for the Stimulus-Associated spike Latency
       Test.
   I - Test statistic, difference between within baseline and 
       test-to-baseline information distance values. 

Briefly, the baseline binned spike raster (SPT_BASELINE) is cut to non-overlapping epochs (window size determined by WN) and spike latency histograms for first spikes are computed within each epoch. A similar histogram is constructed for the test epoch (SPT_TEST). Pairwise information distance measures are calculated for the baseline histograms to form a null-hypothesis distribution of distances. The distances of the test histogram and all baseline histograms are calculated and the median of these values is tested against the null-hypothesis distribution, resulting in a p value (P).

Reference: Endres DM, Schindelin JE (2003) A new metric for probability distributions. IEEE Transactions on Information Theory 49:1858-1860.

lightscoreP - Resulting P value for the Stimulus-Associated spike Latency Test.

lightscoreI - Test statistic, difference between within baseline and test-to-baseline information distance values.

lightlatency - latency of responses

percentresponse - % of stumulation trials that had a response

The same is calculated for the second set of 100 pulses

lightscore_p2 lightscore_I2 lightlatency2 percentresponse2

This is for a 50Hz stimulation protocol (100ms on, 3ms pulses, 100ms off)

lightscore_p3

lightscore_I3

lightlatency3

percentresponse3

This is for a 50Hz stimulation protocol (200ms on, 3ms pulses, 200ms off)

lightscore_p4

lightscore_I4

lightlatency4

percentresponse4

These are calculated by the curation script

cluster - cluster ID is repeated for readability

goodcluster - 1 if the cluster passed curation criteria, 0 if not

firing_rate - average firing rate repeated for readability

FRpass - 1 if the firing rate it high enough for analysis, 0 if not

isolation - The isolation metric quantifies how well separated (in feature space) the cluster is from other nearby clusters. Clusters that are not well separated from others would be expected to have high false-positive and false-negative rates due to mixing with overlapping clusters. This quantity is calculated in a nonparametric way based on nearest-neighbor classification.

isolationpass - 1 if cluster passed isolation criteria, 0 otherwise

noiseoverlap - Noise overlap estimates the fraction of “noise events” in a cluster, i.e., above-threshold events not associated with true firings of this or any of the other clustered units. A large noise overlap implies a high false-positive rate. The procedure first empirically computes the expected waveform shape for noise events that have by chance crossed the detection threshold. It assesses the extent of feature space overlap between the cluster and a set of randomly selected noise clips after correcting for this expected noise waveform shape.

The noise overlap and isolation metrics vary between 0 and 1, and in a sense, represent the fraction of points that overlap either with another cluster (isolation metric) or with the noise cluster (noise overlap metric). However, they should not be interpreted as a direct estimate of the misclassification rate but should rather be considered to be predictive of this quantity. Indeed, due to the way they are computed, these values will depend on factors such as the dimensionality of the feature space and the noise properties of the underlying data. Therefore, the annotation thresholds should be chosen to suit the application. With that said, in this study we used the same sorting parameters and annotation thresholds for all analyses.

noiseoverlappass - 1 if cluster passed noise overlap criteria, 0 otherwise

peakSNR - Depending on the nature of signal contamination in the dataset, some clusters may consist primarily of high-amplitude artifactual signals such as those that arise from movement, muscle, or other non-neural sources. In this case, the variation among event voltage clips will be large compared with clusters that correspond to neural units. To automatically exclude such clusters we compute cluster SNR, defined as the peak absolute amplitude of the average waveform divided by the peak SD. The latter is defined as the SD of the aligned clips in the cluster, taken at the channel and time sample where this quantity is largest.

peakSNRpass - - 1 if cluster passed peak signal to noise criteria, 0 otherwise

burstingparent - this this may be the ID of another cluster that this cluster might be part of but couldn't be sorted because of bursting (??)

We should consider renaming several of these output files and moving them to a subfolder within the recording folder.

Clone this wiki locally