Skip to content

Unable to train model on SPEECHCOMMANDS dataset #1

@jayathungek

Description

@jayathungek

I've downloaded the speech_commands_v0.02 tar file and extracted it into the following directory structure:

data/SCNUMBERS1024
└── SpeechCommands
    └── speech_commands_v0.02
        ├── _background_noise_
        ├── backward
        ├── bed
        ├── bird
        ├── cat
        ├── dog
        ├── down
        ├── eight
        ├── five
        ├── follow
        ├── forward
        ├── four
        ├── go
        ├── happy
        ├── house
        ├── learn
        ├── left
        ├── marvin
        ├── nine
        ├── no
        ├── off
        ├── on
        ├── one
        ├── right
        ├── seven
        ├── sheila
        ├── six
        ├── stop
        ├── three
        ├── tree
        ├── two
        ├── up
        ├── visual
        ├── wow
        ├── yes
        └── zero

I then try to train the model on this dataset via:

$ python train.py --wandb 0 --architecture pi-gan_wide --dataset_name SPEECHCOMMANDS --dataset_size 128

but run into a NoneType error, which leads me to believe than the dataset is not initialised properly somehow. the full output of running the above command is below:

~/.virtualenvs/pcinr/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  warnings.warn(
1
{   'architecture': 'pi-gan_wide',
    'audio_length': 16000,
    'autoconfig': 0,
    'batch_size': 128,
    'cdpam': 0,
    'coord_multi': 1,
    'dataset_name': 'SPEECHCOMMANDS',
    'dataset_size': 128,
    'deriv_per_sample': 1,
    'double': 0,
    'eval_every': 5000,
    'eval_samples': 1,
    'eval_upscale_ratio': 1,
    'first_omega_0': 3000,
    'hidden_omega_0': 30,
    'input_dim': 1,
    'latent_descent_steps': 1,
    'latent_init_std': 0.001,
    'latent_lr': 0.3,
    'lr': 1e-05,
    'max_high_res_batch_size': 16,
    'meta_architecture': 'autodecoder',
    'multiscale_STFT': 0,
    'note': 'default',
    'note_general': 'default',
    'num_epochs': 10001,
    'num_groups': 0,
    'num_latent': 256,
    'output_dim': 1,
    'per_sample': 1,
    'prog_weight_decay_every': 0,
    'prog_weight_decay_factor': 0,
    'sample_even': 1,
    'samples_per_datapoint': 2000,
    'save_audio': 1,
    'save_audio_plots': 0,
    'save_latents': 1,
    'save_model': 1,
    'save_path': 'results/default/SPEECHCOMMANDS/pi-gan_wide/autodecoder',
    'use_gpu': 1,
    'use_multi_gpu': 0,
    'wandb': 0,
    'wandb_project_name': 'neurips',
    'weight_decay': 0,
    'weight_norm': 0}
activations: ['sine', 'sine', 'none']
init_methods: [{'weights': 'siren_first', 'bias': 'polar'}, {'weights': 'siren', 'bias': 'polar'}, {'weights': 'siren_omega', 'omega': 30, 'bias': 'none'}]
layer 0: Film conditioned
layer 1: Film conditioned
layer 2: Film conditioned
layer 3: Film conditioned
piGAN_custom(
  (film_mapping_net): PiGANMappingNetwork(
    (net): Sequential(
      (0): Linear(in_features=256, out_features=256, bias=True)
      (1): LeakyReLU(negative_slope=0.2, inplace=True)
      (2): Linear(in_features=256, out_features=256, bias=True)
      (3): LeakyReLU(negative_slope=0.2, inplace=True)
      (4): Linear(in_features=256, out_features=256, bias=True)
      (5): LeakyReLU(negative_slope=0.2, inplace=True)
      (6): Linear(in_features=256, out_features=730, bias=True)
    )
  )
  (net): Sequential(
    (0): ImplicitMLPLayer(
      (linear): Linear(in_features=1, out_features=365, bias=True)
    )
    (1): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (2): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (3): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (4): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=1, bias=True)
    )
  )
)
Number of parameters: 786852
Random Seed:  0
~/Desktop/phd/continuous-audio-representations/objective.py:11: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
  self.finite_diff_derivative = torch.range(-1,1,2).unsqueeze(0).unsqueeze(0).to(device)
Seeing  1 GPUs
Starting run for 10001 epochs..
Traceback (most recent call last):
  File "train.py", line 358, in <module>
    train(model, optim_INR, optim_mapping, scheduler, train_loader, config)
  File "train.py", line 80, in train
    g = model(sampled_coords, z=z)
  File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/INR_collection/modules.py", line 487, in forward
    concat = concat.repeat(1, coordinates.shape[1], 1)
AttributeError: 'NoneType' object has no attribute 'repeat'

Any idea why this might be? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions