Cloujera lets you do a fine-grained search for spoken words in Coursera's videos. It does this by performing full text searches on the transcripts of videos on coursera.
-
Bring up Vagrant (elasticsearch + redis):
vagrant up -
Compile the clojurescript: (Make sure you have java >1.7)
lein cljsbuild once -
Start the app:
lein run -
On the first run, visit
http://127.0.0.1:8080/burglar/goto seed the db (it will error out ridiculously with anIndexMissingExceptionfrom elasticsearch if you don't do this!);
$ vagrant ssh
$ cd /vagrant
$ ./scripts/deploy.shNOTE: the address to access the dockerized cloujera is
http://127.0.0.1:8081 (see Vagrantfile)
$ vagrant ssh
$ cd /vagrant
$ source ./scripts/prod-env.sh
$ lein uberjar
$ java -jar ./target/uberjar/cloujera-*-standalone.jarNOTE: the address to access the uberjarred cloujera running on port 8080
is http://127.0.0.1:8082 (see Vagrantfile)
Visiting http://cloujera.whatever/burglar/go scrapes some 10 courses to get
you started;
To scrape another course, you need to:
-
Visit the cloujera session API
https://api.coursera.org/api/catalog.v1/sessionsand choose a course -
Sign up for the course and agree to honour code manually for the
[email protected]user -
Find the video lecture URL (
videoLecturesURL) -
Perform an http
POST http://cloujera.whatever/burglar/raidwith this payload (JSON):{ "url": videoLecturesURL }For example:
{ "url": "https://class.coursera.org/apcalcpart1-001/lecture" }
$ ssh user@cloudmachine
$ git clone https://github.com/vise890/cloujera
$ cd cloujera
$ sudo ./scripts/provision.sh# in the cloujera directory...
$ ./scripts/deploy.shNOTE: deploy.sh pulls the most recent version of cloujera from the repo
Ensure that all the containers are running in the VM:
$ vagrant ssh
$ sudo docker ps -aYou should see redis, elasticsearch and cloujera running
$ vagrant ssh
$ sudo docker logs cloujeraVisit http://localhost:9200/, you should see status: 200
redis-cli will drop you into a Redis shell. Some useful commands are: INFO,
MONITOR, HELP, HELP @server.
NOTE: this works form the host as well as in the Vagrant VM
$ vagrant ssh || ssh user@cloudbox
$ sudo docker exec -i -t cloujera bashlein rundoesn't give any output initiallylein rundoesn't reload