-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Remove panics and optimise mesh picking #16148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove panics and optimise mesh picking #16148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good in itself, but I'm wondering whether the function should return a Result instead. It has multiple failure modes.
On an M3 Max, I'm seeing 3% regressions to 1% improvements. Probably noise. |
Yea I wondered that too. I would lean towards this Admittedly, malformed meshes are probably not too common. I'm kinda on the fence about whether the default public api for mesh picking should bother with them or not. I imagine it would be frustrating if some of your meshes "aren't working" when you try to click on them. To track it down to a malformed mesh might be a little bit of a journey. Especially if you have some kind of user generated content and malformed meshes are making it into the game. |
It might be better to |
I'm open to the idea, but I think it's equally as likely to cause false positives. |
I think it makes sense for normals - if the positions are coming back fine, but the normals are broken, you can still cast against the mesh as if it had no normals. |
If it's possible to make the parameter(s) more strongly typed to avoid dealing with certain failure modes (invalid mesh/triangle), that would be nice. Otherwise, this direction is fine. |
Approving for now since it is an improvement |
@aevyrie the nesting order of the Option<Option<>> is wrong for the |
I like the idea of a pre-validated Mesh*, but that's a much bigger change :P |
That's because it is no longer failing when the normals are missing. Instead, is is continuing to raycast, ignoring the normals - I think this is better behavior considering normals are already optional. There is one mistake though. The map()?; should be and_then(); That was causing the simple raycast test to fail. |
Ah, the combo of the two was confusing me. I didn't realise normals weren't used for hit detection, I care a lot less about wanting to short circuit if they're wrong now 👍 . Added a benchmark for the no indices case since it wasn't represented. I tried a few ways of indexing and this had the best perf. |
Compared to main now, this branch has between 41-47% worse performance. I'm not 100% sure why the difference is so big from my initial commit of 0-10% improvement to now. I haven't spend much time attempting to optimise anything, so there's likely some opportunities here. For example only fetching normals if there is a triangle intersection instead of before, etc. I still think this is a preferable direction. For one, this is a new feature to bevy, so there isn't an established/expected performance baseline. Of course this code should be as fast as possible, but ideally users generate optimised colliders and cast against them in a proper physics sense. |
I don't think we can make any sacrifices in perf here, it is already slower than it should be, and 40-50% is a huge hit. I can poke this a bit more and see if I can find a solution as well.
Yup! Normals are only used to compute the smooth normal used for lighting, but it doesn't change the actual hit position in space. If triangle normals are missing, it uses the face of the triangle for the normal. I'm going to try an alternate method here. I think the goal of not crashing due to bad indices is a good one, but I don't think sacrificing perf is an option here, because it is on such a hot path. |
Do you have other benchmarks you're going off? All I can find is a few in As far as I've seen there hasn't been any optimisation done since merging, so I'm curious about the pushback. I think there are quite a few options we have, I mentioned deferring normal resolution until a hit is found. It seems like it would be amenable to some level of parallelization too. Not to mention switching to optimised colliders/LOD over full meshes, or any kind of culling. Thoughts on an "unchecked_picking" feature flag that would avoid the checked indexing for people who are less risk averse? |
I wrote the original impl in mod_raycast. This is in the hottest of hot loops. Any slowdown here is multiplied by any meshes that have their AABB hit. There have been 2ish perf improvements since this was merged. The wins are usually on the order of 5%-20% improvements, it's pretty hard to claw performance here.
This isn't acceptable, I agree, but there should be a way to avoid the crashes without halving performance.
The mesh picking backend already heavily parallelizes and culls using AABBs, that's the only way this is performant at all. At the end of the day, this isn't really a full solution for the editor, it's a stopgap until we have a physics engine integrated, and we can use real acceleration structures, that or a shader picking backend. |
I didn't look very hard then, woops! Ah, ok good context :) might be cool if we added a picking stress test example so we have something more concrete for contributors to measure against. From my perspective all I see is microbench perf in an unreleased feature so it's hard to evaluate perf in any real sense. Good to hear. I guess I was looking too closely at just these functions, didn't see the outer parallelization! Agree on the physics engine bit! |
I might have some time to poke around over the week. If you have any suggestions I could try them out, otherwise will just bash my head against it xD |
That is completely understandable. 🙂 It would be nice to add benchmarks that test the full raycast as well, but when this benchmark was made, there was not parallelization or culling, so it really was everything. For the record, I think the tests you've added here are invaluable, and I don't mean to put a damper on things. Feel free to ignore nits about code brevity, as long as performance hasn't regressed, this should be merged. |
Any more thoughts @aevyrie ? If you prefer I could try and source another reviewer if you don't have the bandwidth available ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this fell off my radar.
- The benchmarks and tests are a straight upgrade, very appreciated, no notes.
- The changes to the intersection code still gives me pause. The only thing I'd like to see before merging is testing this in a real scene, and the closest we have is the caldera hotel. I'd say that as long as we aren't regressing in this case, I'm happy to approve and merge.
- In all likelihood, performance will be unchanged, but I'd really like to see a representative case tested before merging. This is a hot path and small tweaks can have surprising impacts to performance, so I'm being pretty cautious.
- I'm happy to help with this, I can also try doing this myself. I think the easiest way to test this will be using the mesh picking backend, loading in the caldera hotel gltf, and running tracy, while mousing over the same spot in the scene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we could possibly create some kind of picking stress test, but the trouble is having realistic test data, so it's cool you could test with an actual scene. |
@tbillington @BD103 I'd like to merge this; ping me when merge conflicts are resolved please :) |
Not finished yet, but while converting the the overlapping_planes benchmarks the test was failing. Even though I initially thought about removing it as aevyrie suggested, I wanted to make sure the test at least passed so there was less chance of something funky from my refactor. I tried for about an hour to figure out why, it seems moving the y position of the vertices of the plane would cause the ray to never intersect. I can't work out if something is being rotated somewhere, or something else is going on so I'm a little hesitant until I can figure out exactly what is wrong, whether it's just the silly mesh generation code I have or if it's actually something related to my picking refactor. |
@BD103 I've covered your requests I'm pretty sure, though I stopped short of fully utilising your
|
Awesome! I'll do a thorough review of your changes later today, but I did notice some commented-out code. If |
Yea it was more of a work in progress commit to get y of ur eyes on it. But yes fortifying commented out stuff is next |
@tbillington In order to not hold up this PR, could you split the benchmark changes into a separate pull request? That way the actual optimizations can be merged while we work on the benchmarks separately. |
@tbillington gentle ping on this. I'd like to get this into 0.16, and I think it's largely ready. |
Cutting from the milestone and putting this up for adoption for now: it's been a couple of weeks without a response but I really do think that this work is valuable. |
Closing as mostly adopted. The benchmark changes are still desired in the future! |
_Note from BD103: this PR was adopted from #16148. The majority of this PR's description is copied from the original._ # Objective Adds tests to cover various mesh picking cases and removes sources of panics. It should prevent users being able to trigger panics in `bevy_picking` code via bad mesh data such as #15891, and is a follow up to my comments in [#15800 (review)](#15800 (review)). This is motivated by #15979 ## Testing Adds 8 new tests to cover `ray_mesh_intersection` code. ## Changes from original PR I reverted the changes to the benchmarks, since that was the largest factor blocking it merging. I'll open a follow-up issue so that those benchmark changes can be implemented. --------- Co-authored-by: Trent <[email protected]>
Objective
Adds tests to cover various mesh picking cases and removes sources of panics.
It should prevent users being able to trigger panics in
bevy_picking
code via bad mesh data such as #15891, and is a follow up to my comments in #15800 (review).This is motivated by #15979
Testing
Adds 8 new tests to cover
ray_mesh_intersection
code.Benchmarks
Included benchmarks done on m1 pro laptop. I also tested on an i5-4590 with Windows 11 and the results were in the same direction, albeit showing a bigger improvement over
main
than the mac.elem/s
measures triangles tested per second. For meshes with indices, it isindices.len() / 3
, for meshes without it ispositions.len() / 3
.To compare this pr vs
main
I first rancargo bench --bench ray_mesh_intersection
on this pr branch, then I copied the previous contents ofintersection.rs
into my branch and re-ran the bench cmd to get the exact same benchmark suite with the previous implementation.Existing
Existing benchmarks are all significantly faster than
main
.ray_mesh_intersection
162 triangles
1.3025 µs | 124.37 Melem/s
717.03 ns | 225.93 Melem/s
19_602 triangles
112.26 µs | 174.61 Melem/s
62.717 µs | 312.55 Melem/s
1_996_002 triangles
11.166 ms | 178.76 Melem/s
6.1663 ms | 323.69 Melem/s
ray_mesh_intersection_no_cull
162 triangles
1.3345 µs | 121.39 Melem/s
716.67 ns | 226.05 Melem/s
19_602 triangles
113.84 µs | 172.19 Melem/s
63.624 µs | 308.09 Melem/s
1_996_002 triangles
11.373 ms | 175.50 Melem/s
6.3036 ms | 316.64 Melem/s
ray_mesh_intersection_no_intersection
162 triangles
1.2987 µs | 124.74 Melem/s
707.89 ns | 228.85 Melem/s
19_602 triangles
112.60 µs | 174.08 Melem/s
62.372 µs | 314.27 Melem/s
1_996_002 triangles
11.255 ms | 177.35 Melem/s
6.1949 ms | 322.20 Melem/s
New
overlapping_planes_back_to_front
is on par withmain
, though faster for tiny meshes. It generates 10, 1_1000, 100_000 quads that are ordered such that each subsequent quad tested will be "the new closest" hit. It's quite a terrible scenario, but it's good to know the worst case performance.20 triangles
307.68 ns | 65.002 Melem/s
130.08 ns | 153.75 Melem/s
2_000 triangles
27.947 µs | 71.564 Melem/s
29.579 µs | 67.616 Melem/s
200_000 triangles
2.8266 ms | 70.756 Melem/s
2.8997 ms | 68.973 Melem/s
overlapping_planes_front_to_back
is significantly faster thanmain
. It generates 10, 1_1000, 100_000 quads that are ordered such that the first quad is recognised as the "closest hit", and while each subsequent quad tested will be hit, it will fail the distance comparison to the "current closest". It's not quite as bad a scenario as the previous benchmark, but it's still pretty bad.20 triangles
307.79 ns | 64.979 Melem/s
132.76 ns | 150.65 Melem/s
2_000 triangles
27.935 µs | 71.594 Melem/s
10.241 µs | 195.30 Melem/s
200_000 triangles
2.8140 ms | 71.073 Melem/s
1.0264 ms | 194.85 Melem/s
ray_mesh_intersection_single_plane_indices
is significantly faster thanmain
.242 triangles
1.4731 µs | 164.28 Melem/s
851.79 ns | 284.11 Melem/s
20_402 triangles
114.88 µs | 177.60 Melem/s
62.320 µs | 327.37 Melem/s
2_004_002 triangles
11.365 ms | 176.33 Melem/s
6.1316 ms | 326.83 Melem/s
ray_mesh_intersection_single_plane_no_indices
is significantly faster thanmain
.242 triangles
1.2379 µs | 195.49 Melem/s
800.99 ns | 302.13 Melem/s
20_402 triangles
95.915 µs | 212.71 Melem/s
58.634 µs | 347.96 Melem/s
2_004_002 triangles
9.5208 ms | 210.49 Melem/s
5.8238 ms | 344.11 Melem/s