Performance suggestion for `random_task` view #972

risicle · 2017-02-12T21:49:00Z

Taking a browse of the code and wondering why the "pick a random task" feature can get so slow during a mapping party it occurred to me that you could save a lot of db load by denormalizing and maintaining an adjacency table for tasks rather than performing expensive geometry operations in-query. Supposing a m2m relationship called neighbours, locked_filter would become something along the lines of:

locked_filter = Task.neighbours.has(locked=True)

which is a join postgres can do incredibly fast.

If you wanted to go a step further, it would be easily possible to denormalize the "in priority area" information into a boolean field on Task and entirely remove expensive geometry operations from this view.

As for how to maintain the denormalized data, there are many ways to skin that cat.

(also I found this project's code pleasingly clear so 👍 there)

The text was updated successfully, but these errors were encountered:

pgiraud · 2017-02-13T15:20:13Z

Hi @risicle
Thanks a lot for the analysis and the relevant suggestions.
Can you please elaborate about the performance issues? When did you go to a mapathon for the last time? Which project were you working on?
Though I agree with your suggestions, I checked the performances using the pyramid debugging tools and I don't see much load on the database. So I wonder if this is really worth.

What I would like to know particularly is whether you encountered performance issues before or after the switch to the new server.

risicle · 2017-02-13T20:23:31Z

Yup, I of course don't have the ability to analyze a running system with a real life workload, so my analysis can really be taken with a pinch of salt. I've noticed speed to be an issue with the "random task" function quite often. Most recently on task 2525 this last tuesday evening, the 7th - a "random task" request was taking, maybe, 30 seconds? (leading many users to of course do the "is it doing anything? i'll click again..." thing which would exacerbate it.

But of course, if that is not a bottleneck in reality this would be a pointless overcomplication to add, what with all the housekeeping needed...

When was the switch to the new server?

risicle · 2017-02-13T23:02:34Z

(have you set up postgres to log slow queries?)

ethan-nelson · 2017-02-14T00:59:22Z

(I know very little about PostGIS so I don't have anything substantive to add, but I will point out project 2525 specifically is a bit on the extremely high side for number of tasks in a project.)

risicle · 2017-02-14T21:01:06Z

Indeed, however it isn't the first time I've noticed this problem.

If we were to ignore the pre-building-adjacency-lists route for a moment (due to the moderate amount of implementation pain), I'm assuming the current, build-a-big-union-and-then-check-disjoint-against-that technique is deliberately being used over a more obvious disjoint-join-condition technique for performance reasons. I would have thought the latter would be faster as it might be able to make use of a spatial index.

Using that approach would be as simple as adding a neighbours relationship to Task something like:

neighbours = relationship(
    "Task",
    primaryjoin=lambda: and_(
        foreign(project_id) == remote(project_id),
        foreign(geometry).ST_Intersects(remote(geometry)),
    ),
    viewonly=True,
)

and again in the view this would make the locked_filter definition in the view as simple as

locked_filter = Task.neighbours.has(locked=True)

(but I haven't tried this so there may be some gotcha lurking in the above!)

pgiraud · 2017-02-17T08:02:35Z

When was the switch to the new server?

It was done on Feb 9th.
If some of the request took > 30s to get a response it's most likely a problem with the old server being overloaded by other applications.

As far as I know no one complained about performances issues after the switch.

Anyway, thanks for your code snippet.
Unfortunately, it doesn't work as is. The Task table doesn't have any locked property. Instead there's a cur_lock relationship with which we can get the last Lock record for the given task.

risicle · 2017-02-17T20:27:22Z

Oh of course, I glossed over that extra join - it would probably be something more like, keeping the current flow of the view:

locked_filter = Task.neighbours.any(Task.id.in_(DBSession.query(Task.id).join(Task.cur_lock).filter(TaskLock.locked == True).subquery()))

(might TaskLock need use of aliased? is it clear I'm from more of a manual-sql background?)

which is quite ugly so I personally would probably alter the behaviour of the view somewhat so that locked_filter and friends are lambdas that when called with to the existing query object return a filtered version of it. This would allow the whole thing to be done as a join instead of the awkward subquery above.

Anyway, if it's not currently a problem, it's not currently a problem, but I would recommend enabling slow query logging in case performance hiccup is reached in the future.

(also in my above relationship listing, I accidentally suggested joining Task.id against Task.id, so have edited it now to join on the project_id. duh.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance suggestion for `random_task` view #972

Performance suggestion for `random_task` view #972

risicle commented Feb 12, 2017

pgiraud commented Feb 13, 2017

risicle commented Feb 13, 2017

risicle commented Feb 13, 2017

ethan-nelson commented Feb 14, 2017

risicle commented Feb 14, 2017 •

edited

Loading

pgiraud commented Feb 17, 2017

risicle commented Feb 17, 2017

Performance suggestion for random_task view #972

Performance suggestion for random_task view #972

Comments

risicle commented Feb 12, 2017

pgiraud commented Feb 13, 2017

risicle commented Feb 13, 2017

risicle commented Feb 13, 2017

ethan-nelson commented Feb 14, 2017

risicle commented Feb 14, 2017 • edited Loading

pgiraud commented Feb 17, 2017

risicle commented Feb 17, 2017

Performance suggestion for `random_task` view #972

Performance suggestion for `random_task` view #972

risicle commented Feb 14, 2017 •

edited

Loading