Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance suggestion for random_task view #972

Open
risicle opened this issue Feb 12, 2017 · 7 comments
Open

Performance suggestion for random_task view #972

risicle opened this issue Feb 12, 2017 · 7 comments

Comments

@risicle
Copy link

risicle commented Feb 12, 2017

Taking a browse of the code and wondering why the "pick a random task" feature can get so slow during a mapping party it occurred to me that you could save a lot of db load by denormalizing and maintaining an adjacency table for tasks rather than performing expensive geometry operations in-query. Supposing a m2m relationship called neighbours, locked_filter would become something along the lines of:

locked_filter = Task.neighbours.has(locked=True)

which is a join postgres can do incredibly fast.

If you wanted to go a step further, it would be easily possible to denormalize the "in priority area" information into a boolean field on Task and entirely remove expensive geometry operations from this view.

As for how to maintain the denormalized data, there are many ways to skin that cat.

(also I found this project's code pleasingly clear so 👍 there)

@pgiraud
Copy link
Collaborator

pgiraud commented Feb 13, 2017

Hi @risicle
Thanks a lot for the analysis and the relevant suggestions.
Can you please elaborate about the performance issues? When did you go to a mapathon for the last time? Which project were you working on?
Though I agree with your suggestions, I checked the performances using the pyramid debugging tools and I don't see much load on the database. So I wonder if this is really worth.

What I would like to know particularly is whether you encountered performance issues before or after the switch to the new server.

@risicle
Copy link
Author

risicle commented Feb 13, 2017

Yup, I of course don't have the ability to analyze a running system with a real life workload, so my analysis can really be taken with a pinch of salt. I've noticed speed to be an issue with the "random task" function quite often. Most recently on task 2525 this last tuesday evening, the 7th - a "random task" request was taking, maybe, 30 seconds? (leading many users to of course do the "is it doing anything? i'll click again..." thing which would exacerbate it.

But of course, if that is not a bottleneck in reality this would be a pointless overcomplication to add, what with all the housekeeping needed...

When was the switch to the new server?

@risicle
Copy link
Author

risicle commented Feb 13, 2017

(have you set up postgres to log slow queries?)

@ethan-nelson
Copy link
Collaborator

(I know very little about PostGIS so I don't have anything substantive to add, but I will point out project 2525 specifically is a bit on the extremely high side for number of tasks in a project.)

@risicle
Copy link
Author

risicle commented Feb 14, 2017

Indeed, however it isn't the first time I've noticed this problem.

If we were to ignore the pre-building-adjacency-lists route for a moment (due to the moderate amount of implementation pain), I'm assuming the current, build-a-big-union-and-then-check-disjoint-against-that technique is deliberately being used over a more obvious disjoint-join-condition technique for performance reasons. I would have thought the latter would be faster as it might be able to make use of a spatial index.

Using that approach would be as simple as adding a neighbours relationship to Task something like:

neighbours = relationship(
    "Task",
    primaryjoin=lambda: and_(
        foreign(project_id) == remote(project_id),
        foreign(geometry).ST_Intersects(remote(geometry)),
    ),
    viewonly=True,
)

and again in the view this would make the locked_filter definition in the view as simple as

locked_filter = Task.neighbours.has(locked=True)

(but I haven't tried this so there may be some gotcha lurking in the above!)

@pgiraud
Copy link
Collaborator

pgiraud commented Feb 17, 2017

When was the switch to the new server?

It was done on Feb 9th.
If some of the request took > 30s to get a response it's most likely a problem with the old server being overloaded by other applications.

As far as I know no one complained about performances issues after the switch.

Anyway, thanks for your code snippet.
Unfortunately, it doesn't work as is. The Task table doesn't have any locked property. Instead there's a cur_lock relationship with which we can get the last Lock record for the given task.

@risicle
Copy link
Author

risicle commented Feb 17, 2017

Oh of course, I glossed over that extra join - it would probably be something more like, keeping the current flow of the view:

locked_filter = Task.neighbours.any(Task.id.in_(DBSession.query(Task.id).join(Task.cur_lock).filter(TaskLock.locked == True).subquery()))

(might TaskLock need use of aliased? is it clear I'm from more of a manual-sql background?)

which is quite ugly so I personally would probably alter the behaviour of the view somewhat so that locked_filter and friends are lambdas that when called with to the existing query object return a filtered version of it. This would allow the whole thing to be done as a join instead of the awkward subquery above.

Anyway, if it's not currently a problem, it's not currently a problem, but I would recommend enabling slow query logging in case performance hiccup is reached in the future.

(also in my above relationship listing, I accidentally suggested joining Task.id against Task.id, so have edited it now to join on the project_id. duh.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants