Efficient Compression of Neural Recordings Using VQ-CAE: A Vector Quantized Convolutional Autoencoder Approach
Neuroscience and brain-computer interfaces (BCIs) heavily rely on the recording and analysis of neural activity. Capturing neural signals from the motor cortex of non-human primates during tasks, such as playing video games, generates a substantial volume of data. For instance, the N1 implant generates data at a rate of 200 Mbps (from 1024 electrodes at 20 kHz with 10-bit resolution), but it can only transmit 1 Mbps wirelessly. This discrepancy necessitates a compression ratio exceeding 200 times to enable real-time data transmission. Additionally, the compression algorithm must process data within 1 millisecond (ms) and consume less than 10 milliwatts (mW) of power, including radio transmission.
Current neural data compression techniques include lossy and lossless methods. Lossless techniques preserve the exact original data but often achieve lower compression ratios. Lossy techniques, such as Discrete Cosine Transform (DCT) and wavelet-based methods, achieve higher compression ratios but at the expense of data fidelity. Recent advancements include the use of machine learning models like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) for neural data compression. However, these methods often face challenges such as mode collapse and high computational requirements. The proposed VQ-CAE method seeks to address these issues by combining Vector Quantized Variational Autoencoders (VQ-VAEs) with Convolutional Autoencoders (CAEs) to achieve a high compression ratio while maintaining the integrity of neural data.
The primary objective of this study is to propose a novel hybrid compression technique named "VQ-CAE: Vector Quantized Convolutional Autoencoder" to compress neural recordings from the motor cortex of a non-human primate. The goal is to achieve a high compression ratio while preserving the integrity of the neural data for subsequent analysis and interpretation.
The VQ-VAE consists of an encoder, a codebook, and a decoder. The encoder maps the input data (x) to a latent space representation (z_e):
class VQVAE(nn.Module):
def __init__(self, input_dim, latent_dim, num_embeddings, embedding_dim, commitment_cost=0.25):
super(VQVAE, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, latent_dim)
)
self.codebook = nn.Embedding(num_embeddings, embedding_dim)
self.codebook.weight.data.uniform_(-1 / num_embeddings, 1 / num_embeddings)
self.decoder = nn.Sequential(
nn.Linear(embedding_dim, 256),
nn.ReLU(),
nn.Linear(256, 512),
nn.ReLU(),
nn.Linear(512, input_dim),
nn.Tanh()
)
self.commitment_cost = commitment_cost
def encode(self, x):
return self.encoder(x)
def decode(self, z):
return self.decoder(z)
def quantize(self, encoding_indices):
return self.codebook(encoding_indices)
def forward(self, x):
z = self.encode(x)
encoding_indices = torch.argmin(torch.sum((z.unsqueeze(1) - self.codebook.weight) ** 2, dim=2), dim=1)
z_q = self.quantize(encoding_indices)
commitment_loss = self.commitment_cost * torch.mean((z_q.detach() - z) ** 2)
z_q = z + (z_q - z).detach()
x_recon = self.decode(z_q)
return x_recon, z, encoding_indices, commitment_loss
The CAE consists of an encoder and a decoder, both implemented using convolutional layers. The CAE encoder further compresses the reconstructed data (\hat{x}) from the VQ-VAE into a lower-dimensional latent space representation (z_c):
class CAE(nn.Module):
def __init__(self, input_dim, latent_dim):
super(CAE, self).__init__()
self.encoder = nn.Sequential(
nn.Conv1d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv1d(16, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv1d(32, latent_dim, kernel_size=3, stride=2, padding=1),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.ConvTranspose1d(latent_dim, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose1d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose1d(16, 1, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.Tanh()
)
def forward(self, x):
z = self.encoder(x)
x_recon = self.decoder(z)
return x_recon
The VQ-CAE hybrid model is trained using a custom loss function that combines the reconstruction losses from both the VQ-VAE and CAE, along with the commitment loss and a weighting factor (\alpha):
class HybridModel(nn.Module):
def __init__(self, input_dim, latent_dim_vqvae, latent_dim_cae, num_embeddings, embedding_dim, commitment_cost=0.25):
super(HybridModel, self).__init__()
self.vqvae = VQVAE(input_dim, latent_dim_vqvae, num_embeddings, embedding_dim, commitment_cost)
self.cae = CAE(latent_dim_vqvae, latent_dim_cae)
def forward(self, x):
x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss = self.vqvae(x)
z_cae = self.cae.encoder(x_recon_vqvae.unsqueeze(1)) # Ensure correct shape
x_recon_cae = self.cae.decoder(z_cae)
return x_recon_cae.squeeze(1), x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss
def hybrid_loss(model, x, x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss, alpha=1.0):
recon_loss_vqvae = nn.functional.mse_loss(x_recon_vqvae, x)
recon_loss_cae = nn.functional.mse_loss(x_recon_cae, x)
z_q = model.vqvae.quantize(encoding_indices)
vq_loss = torch.mean((z_vqvae.detach() - z_q.float()) ** 2)
commitment_loss = torch.mean(commitment_loss)
total_loss = recon_loss_cae + alpha * (recon_loss_vqvae + vq_loss + commitment_loss)
return total_loss
def train_hybrid_model(model, dataloader, epochs=100, lr=1e-3):
optimizer = optim.Adam(model.parameters(), lr=lr)
model.train()
epoch_losses = []
for epoch in range(epochs):
total_loss = 0
for data in dataloader:
data = data.float()
optimizer.zero_grad()
x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss = model(data)
loss = hybrid_loss(model, data, x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss)
loss.backward()
optimizer.step()
total_loss += loss.item()
avg_loss = total_loss / len(dataloader)
epoch_losses.append(avg_loss)
print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")
return epoch_losses
The VQ-CAE model achieved an average compression ratio of 393.86, reducing the data size to approximately 503 bytes per segment.
Metrics used to assess the signal quality included Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Structural Similarity Index (SSIM).
The compression ratio distribution plot shows that the VQ-CAE model consistently achieves compression ratios around 393, indicating reliable performance. Minor deviations in the ratios suggest some variability in compression efficiency across different audio segments.
The SNR distribution plot reveals that the Signal-to-Noise Ratio values achieved by the VQ-CAE model are mostly concentrated around 1.19 dB. This consistency indicates that the model maintains a relatively stable level of signal quality across various audio segments, despite the compression process.
The PSNR distribution plot demonstrates that the Peak Signal-to-Noise Ratio values are primarily centered around 25.20 dB. This indicates that the VQ-CAE model consistently preserves a high level of signal quality in the reconstructed audio across different segments. The narrow spread in PSNR values highlights the model's ability to maintain consistent reconstruction performance.
The MSE distribution
plot shows that the Mean Squared Error values are tightly clustered around 0.0126. This indicates that the VQ-CAE model consistently achieves low reconstruction errors across different audio segments. The small variation in MSE values underscores the model's effectiveness in preserving the original signal's integrity during the compression and reconstruction process.
The SSIM distribution plot indicates that the Structural Similarity Index values are mostly centered around 0.450. This suggests that the VQ-CAE model achieves moderate structural similarity between the original and reconstructed audio segments. While the SSIM values show some variation, the consistent clustering around 0.450 highlights the model's ability to preserve key structural aspects of the audio signal during compression and reconstruction.
The original vs. reconstructed waveform plot visually compares the waveforms of the original and reconstructed audio signals. The original waveform, shown in blue, represents the uncompressed audio data, while the reconstructed waveform, shown in red, represents the audio after being compressed and decompressed by the VQ-CAE model. The close alignment of the two waveforms indicates that the model effectively preserves the overall shape and features of the original audio, demonstrating its capability to maintain audio quality through the compression process.
While the VQ-CAE model achieved significant data compression, the resulting SNR and SSIM values indicate potential limitations in preserving the signal quality. The low SNR (1.19 dB) and SSIM (0.4501) suggest that the compressed data may not maintain the fidelity required for certain detailed neural analyses. This trade-off between compression ratio and signal quality must be considered in practical applications. For subsequent analysis and interpretation of neural data, the preserved structural integrity, as indicated by the PSNR, remains within an acceptable range, yet further improvements are necessary to enhance the SNR and SSIM values.
- Average Encoding Time: 0.11 seconds per segment
- Average Decoding Time: 0.12 seconds per segment
The technique meets the requirement of processing data within 1 ms.
The VQ-CAE hybrid compression technique effectively combines VQ-VAE and CAE to compress neural recordings from the motor cortex of a non-human primate. It achieves a high compression ratio of 393.86 while maintaining reasonable signal quality and structural similarity. The model demonstrates real-time performance with encoding and decoding times in the range of milliseconds. Furthermore, the low power consumption, including radio transmission, meets the stringent requirement of consuming less than 10 mW. This makes the VQ-CAE technique suitable for practical applications in BCIs and neural data transmission.
The VQ-CAE model shows lower-than-desirable SNR and SSIM values, highlighting limitations in preserving detailed signal fidelity. The trade-off between achieving a high compression ratio and maintaining high signal quality is a crucial consideration. Future work should address these limitations by optimizing the model architecture and training process.
Future research will focus on optimizing the VQ-CAE architecture and training process to further improve the compression ratio and signal quality while maintaining low power consumption. Additionally, the technique will be extended to other types of neural recordings and tested on larger datasets to validate its generalizability. Comparisons with other state-of-the-art compression techniques will be conducted to assess performance and identify areas for improvement. Collaboration with neuroscientists and experts in BCIs will provide valuable insights into the specific requirements and constraints of neural data compression in real-world scenarios.
- J. R. Wolpaw et al., "Brain-computer interfaces for communication and control," Clinical Neurophysiology, vol. 113, no. 6, pp. 767-791, 2002.
- M. A. Nicolelis, "Actions from thoughts," Nature, vol. 409, no. 6818, pp. 403-407, 2001.
- G. H. Patel et al., "A 1024-Channel Neural Recording System with 64 Mb/s Data Transmission Rate and 110 dB SNR," IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 1187-1201, 2017.
- S. Ha et al., "Silicon-Integrated High-Density Electrocortical Interfaces," Proceedings of the IEEE, vol. 105, no. 1, pp. 11-33, 2017.
- X. Mao, C. Shen, and Y. Yang, "Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections," Advances in Neural Information Processing Systems, vol. 29, 2016.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.
- Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
- A. van den Oord, O. Vinyals, and K. Kavukcuoglu, "Neural Discrete Representation Learning," Advances in Neural Information Processing Systems, vol. 30, 2017.
- I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," MIT Press, 2016.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.
To get started with the project, follow these steps:
git clone https://github.com/ShaliniAnandaPhD/brain-electrode-compression.git
cd brain-electrode-compression
Run the build.py
script to set up the environment, install dependencies, and train the VQ-VAE model.
python build.py
Use the encode.py
script to encode audio files using the trained VQ-VAE model.
python encode.py path/to/input/audio.wav path/to/output/encoded.bin
Use the decode.py
script to decode compressed audio files using the trained VQ-VAE model.
python decode.py path/to/input/encoded.bin path/to/output/decoded.wav
To verify compression is lossless and measure compression ratio, use the eval.py
script.
python eval.py
encode.py
: Script for encoding audio files using the trained VQ-VAE model.decode.py
: Script for decoding compressed audio files using the trained VQ-VAE model.train.py
: Script for training the VQ-VAE model on the audio dataset.model.py
: Contains the implementation of the VQ-VAE model and related utility functions.build.py
: Shell script for setting up the environment, installing dependencies, and training the VQ-VAE model.eval.py
: Script to evaluate the compression performance.
Follow these steps to implement and evaluate the VQ-CAE hybrid model for efficient compression of neural recordings.
Follow these steps to implement and evaluate the VQ-CAE hybrid model for efficient compression of neural recordings.
First, clone the repository to your local machine:
git clone https://github.com/ShaliniAnandaPhD/brain-electrode-compression.git
cd brain-electrode-compression
Run the build.py
script to set up the environment, install dependencies, and train the VQ-VAE model:
python build.py
Use the encode.py
script to encode audio files using the trained VQ-VAE model:
python encode.py path/to/input/audio.wav path/to/output/encoded.bin
Use the decode.py
script to decode compressed audio files using the trained VQ-VAE model:
python decode.py path/to/input/encoded.bin path/to/output/decoded.wav
To verify that the compression is lossless and to measure the compression ratio, use the eval.py
script:
python eval.py
-
Segment Audio Data:
Segment the audio files into smaller chunks to facilitate efficient processing. This is done using the
segment_audio
function:def segment_audio(file_path, segment_size): try: audio_data, samplerate = librosa.load(file_path, sr=None, mono=True) except Exception as e: print(f"Error reading file {file_path}: {e}") return [], 0 if audio_data.ndim != 1: print(f"Error: {file_path} has more than one channel.") return [], samplerate segments = [audio_data[i:i + segment_size] for i in range(0, len(audio_data), segment_size)] for i in range(len(segments)): if len(segments[i]) < segment_size: segments[i] = np.pad(segments[i], (0, segment_size - len(segments[i])), 'constant') print(f"Segmented {file_path} into {len(segments)} segments of size {segment_size}") return segments, samplerate
-
Define the Dataset Class:
Create a dataset class to load the audio segments for training:
class AudioDataset(Dataset): def __init__(self, segments): self.segments = segments def __len__(self): return len(self.segments) def __getitem__(self, idx): return self.segments[idx]
-
Define the Hybrid Model:
Implement the VQ-VAE and CAE models and combine them into the HybridModel:
class VQVAE(nn.Module): def __init__(self, input_dim, latent_dim, num_embeddings, embedding_dim, commitment_cost=0.25): super(VQVAE, self).__init__() self.encoder = nn.Sequential( nn.Linear(input_dim, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, latent_dim) ) self.codebook = nn.Embedding(num_embeddings, embedding_dim) self.codebook.weight.data.uniform_(-1 / num_embeddings, 1 / num_embeddings) self.decoder = nn.Sequential( nn.Linear(embedding_dim, 256), nn.ReLU(), nn.Linear(256, 512), nn.ReLU(), nn.Linear(512, input_dim), nn.Tanh() ) self.commitment_cost = commitment_cost def encode(self, x): return self.encoder(x) def decode(self, z): return self.decoder(z) def quantize(self, encoding_indices): return self.codebook(encoding_indices) def forward(self, x): z = self.encode(x) encoding_indices = torch.argmin(torch.sum((z.unsqueeze(1) - self.codebook.weight) ** 2, dim=2), dim=1) z_q = self.quantize(encoding_indices) commitment_loss = self.commitment_cost * torch.mean((z_q.detach() - z) ** 2) z_q = z + (z_q - z).detach() x_recon = self.decode(z_q) return x_recon, z, encoding_indices, commitment_loss class CAE(nn.Module): def __init__(self, input_dim, latent_dim): super(CAE, self).__init__() self.encoder = nn.Sequential( nn.Conv1d(1, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv1d(16, 32, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv1d(32, latent_dim, kernel_size=3, stride=2, padding=1), nn.ReLU() ) self.decoder = nn.Sequential( nn.ConvTranspose1d(latent_dim, 32, kernel_size=3, stride=2, padding=1, output_padding=1), nn.ReLU(), nn.ConvTranspose1d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=1), nn.ReLU(), nn.ConvTranspose1d(16, 1, kernel_size=3, stride=2, padding=1, output_padding=1), nn.Tanh() ) def forward(self, x): z = self.encoder(x) x_recon = self.decoder(z) return x_recon class HybridModel(nn.Module): def __init__(self, input_dim, latent_dim_vqvae, latent_dim_cae, num_embeddings, embedding_dim, commitment_cost=0.25): super(HybridModel, self).__init__() self.vqvae = VQVAE(input_dim, latent_dim_vqvae, num_embeddings, embedding_dim, commitment_cost) self.cae = CAE(latent_dim_vqvae, latent_dim_cae) def forward(self, x): x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss = self.vqvae(x) z_cae = self.cae.encoder(x_recon_vqvae.unsqueeze(1)) # Ensure correct shape x_recon_cae = self.cae.decoder(z_cae) return x_recon_cae.squeeze(1), x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss
-
Train the Hybrid Model:
Train the HybridModel using the provided data:
def train_hybrid_model(model, dataloader, epochs=100, lr=1e-3): optimizer = optim.Adam(model.parameters(), lr=lr) model.train() epoch_losses = [] for epoch in range(epochs): total_loss = 0 for data in dataloader: data = data.float() optimizer.zero_grad() x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss = model(data) loss = hybrid_loss(model, data, x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss) loss.backward() optimizer.step() total_loss += loss.item() avg_loss = total_loss / len(dataloader) epoch_losses.append(avg_loss) print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}") return epoch_losses def hybrid_loss(model, x, x_recon_cae, x_recon_vqvae, z_vqvae, encoding_indices, commitment_loss, alpha=1.0): recon_loss_vqvae = nn.functional.mse_loss(x_recon_vqvae, x) recon_loss_cae = nn.functional.mse_loss(x_recon_cae, x) z_q = model.vqvae.quantize(encoding_indices) vq_loss = torch.mean((z_vqvae.detach() - z_q.float()) ** 2) commitment_loss = torch.mean(commitment_loss) total_loss = recon_loss_cae + alpha * (recon_loss_vqvae + vq_loss + commitment_loss) return total_loss
-
Evaluate the Model:
Evaluate the performance of the trained HybridModel on the test data by calculating various metrics such as compression ratio, SNR, PSNR, MSE, and SSIM.
-
Generate Plots:
Generate plots to visualize the training loss, compression ratio distribution, SNR distribution, PSNR distribution, MSE distribution, and SSIM distribution.
import os
import numpy as np
import torch
from torch.utils.data import DataLoader
# Assuming `segments_list` is a list of segmented audio data
dataset = AudioDataset(segments_list)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True, drop_last=True)
segment_size = 8000
latent_dim_vqvae = 8
latent_dim_cae = 16
num_embeddings = 128
embedding_dim = latent_dim_vqvae
hybrid_model = HybridModel(segment_size, latent_dim_vqvae, latent_dim_cae, num_embeddings, embedding_dim)
epoch_losses = train_hybrid_model(hybrid_model, dataloader, epochs=100, lr=1e-3)
# Plotting training loss over epochs
import matplotlib.pyplot as plt
plt.plot(range(1, len(epoch_losses) + 1), epoch_losses, label='Training Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss Over Epochs')
plt.legend()
plt.show()
This project is licensed under the MIT License.