diff --git a/README.md b/README.md index c2ae03a..c689d30 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ This project has two purposes. First of all, I'd like to share some of my experi * Go get various English word vectors [here](https://github.com/3Top/word2vec-api) if needed. ## Work Flow -* STEP 1. Download the [wikipedia database backup dumps](https://dumps.wikimedia.org/backup-index.html) of the language you want. +* STEP 1. Download the [wikipedia database backup dumps](https://dumps.wikimedia.org/backup-index.html) of the language you want (for example, for english wiki go to `https://dumps.wikimedia.org/enwiki/` click the latest timestamp, and download the `enwiki-YYYYMMDD-pages-articles-multistream.xml.bz2` file). * STEP 2. Extract running texts to `data/` folder. * STEP 3. Run `build_corpus.py`. * STEP 4-1. Run `make_wordvector.sh` to get Word2Vec word vectors.