Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web harvester won't docker-compose scale #408

Open
justinlittman opened this issue Aug 17, 2016 · 2 comments
Open

Web harvester won't docker-compose scale #408

justinlittman opened this issue Aug 17, 2016 · 2 comments
Milestone

Comments

@justinlittman
Copy link
Contributor

Every web harvester container must have a heritrix container. This is currently done by simple linking. However, this probably won't work well with docker-compose scale, as the 1:1 pairing won't occur.

Possible approaches to fixing:

  1. Move Heritrix and web container into same container.
  2. Rely on container naming conventions. For example, web_harvest_2 would know to use heritrix_2. See http://stackoverflow.com/questions/29725955/how-do-links-and-scaling-work-together-in-docker-compose.
  3. Link via an ambassador container. See https://docs.docker.com/engine/admin/ambassador_pattern_linking/.
  4. Use some sort of service discovery.
@lwrubel lwrubel modified the milestone: Backlog Sep 23, 2016
@justinlittman
Copy link
Contributor Author

From @dchud:

 - big heritrix crawls -- and, perhaps, not deduping, or both -- are a bottleneck.  there are surely tweets with attached media and linked content in my timeline collections that will be gone by the time heritrix catches up.  adding a big list of accounts up front always takes a long while, so later incremental follow-ups take a while to process.  this might be a result of running user timelines every hour.. will have to play with that.  but a second heritrix process for recent/small batches might help.

@justinlittman
Copy link
Contributor Author

@liuqingli also encountered problems with not being able to scale the web harvesters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants