Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyerror when preprocess data #12

Open
liu-x-p opened this issue Aug 10, 2020 · 6 comments
Open

keyerror when preprocess data #12

liu-x-p opened this issue Aug 10, 2020 · 6 comments

Comments

@liu-x-p
Copy link

liu-x-p commented Aug 10, 2020

I set the directory for data as datasets/2019/english, when I run the script preprocess.py, it raises
keyerror: 'accessing unknown key in a struct: dataset.in_dir'
but I can't find how to solve it.
Could you help me?

@bshall
Copy link
Owner

bshall commented Aug 10, 2020

Hi @liu-x-p,

Sure. If you look at the usage in the readme it says:

python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise]

Note: in_dir must be the path to the 2019 folder...

This is the folder that contains the wav in it's subdirectories. So, for example, if I download the ZeroSpeech 2020 dataset and store it at ~/Documents/ZeroSpeech/2020 the command should be:

python preprocess.py in_dir=~/Documents/ZeroSpeech/2020/2019 dataset=2019/english

If you're still having trouble you please post the command you use and the path to your data directory.

Hope that helps!

@liu-x-p
Copy link
Author

liu-x-p commented Aug 10, 2020

@bshall Thank you!
I followed your settings for the command
python preprocess.py in_dir=/home/omnisky/mount/holiday/ZeroSpeech-0.1/datasets/2020/2019 dataset=2019/english
and the path is /home/omnisky/mount/holiday/ZeroSpeech-0.1/datasets/2020/2019, it contains 'english' and 'surprise'.

@bshall
Copy link
Owner

bshall commented Aug 10, 2020

No problem @liu-x-p. If you're still having issues I'd advise keeping the actual data in a separate folder to this repo. So this repo would be under holiday/ZeroSpeech for example and the actual wav files would be stored in holiday/RawData/2020 for example. Then in_dir should point to .../holiday/RawData/2020/2019.

@dummy-arch
Copy link

On following the exact same procedure I am getting an error : hydra.errors.OverrideParseException: LexerNoViableAltException: Passport/VAE/ZeroSpeech/zerospeech_2020/2020/2019. Could you kindly help me out? The directory path to wav files is Passport/VAE/ZeroSpeech/zerospeech_2020/2020/2019 and to the json files is Passport/VAE/ZeroSpeech/zerospeech_2020/datasets/2019/english

@ZhengRachel
Copy link

@liu-x-p Hi! I am also a Chinese student trying to run this repo and I am encountering some similar problems as you...TAT I wonder if you have successfully run this repo and could we have a discussion via e-mail... this is my email adress [email protected] Looking forward to your reply!

@liu-x-p
Copy link
Author

liu-x-p commented Mar 29, 2021

@ZhengRachel
I'm not sure about this as it has been so long time.
As you can see in my question and comment, I got this problem when I downloaded this work as ZeroSpeech-0.1, which I think may be a early version. And I downloaded it again, the ZeroSpeech-master branch, then it worked.
I think the command I used to run is python preprocess.py in_dir=../datasets/2020/2019 dataset=2019/english

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants