Skip to content

Concurrent Web Scraping with Selenium Grid and Docker Swarm #76

Open
@coding-to-music

Description

@coding-to-music

Concurrent Web Scraping with Selenium Grid and Docker Swarm
https://github.com/coding-to-music/selenium-grid-docker-swarm

Want to learn how to build this project?
Check out the blog post.
https://testdriven.io/blog/concurrent-web-scraping-with-selenium-grid-and-docker-swarm/

Want to use this project?
Fork/Clone
https://github.com/coding-to-music/selenium-grid-docker-swarm

Create and activate a virtual environment

apt-get install python-virtualenv

virtualenv -p python3 myApp

optionally use --no-site-packages
virtualenv --no-site-packages -p python3 myApp

source myApp/bin/activate

$ cd myapp/
$ source bin/activate
(myapp)debian@hostname:~/myapp$

Install the requirements

pip -r requirements.txt

Sign up for Digital Ocean and generate an access token

Add the token to your environment:

(env)$ export DIGITAL_OCEAN_ACCESS_TOKEN=[your_token]
Spin up four droplets and deploy Docker Swarm:

(env)$ sh project/create.sh
Run the scraper:

(env)$ docker-machine env node-1
(env)$ eval $(docker-machine env node-1)
(env)$ NODE=$(docker service ps --format "{{.Node}}" selenium_hub)
(env)$ for i in {1..8}; do {
python project/script.py ${i} $(docker-machine ip $NODE) &
};
done
Bring down the resources:

(env)$ sh project/destroy.sh

source myApp/bin/activate

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions