-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize tensors further #88
Conversation
well I don't see that this helps much with the jitter, but it also did another big drop in CPU - which will also correspondingly drop the jitters' impacts. I see this running well under 10ms now (we started the week at 20ms!) with maximum peaks mid-25ms, which is below the 30 hz rate. 6-11ms is typical. I'd say its worth the time to finish up this PR. The general performance is more stable now and average slightly lowers. |
I cleaned things up & handled the merge conflicts. The remaining stuff is w.r.t. visualization |
After this change, I re-benchmarked where the jitter is coming from and see numbers where the PathAlign is the biggest contributor now. The entire spike of the trajectory generation process is going from 2-3ms to 5-6ms (2-3ms deltas are hardly worth doing much about, I doubt we even could). PathAlign can go from 4-6ms steady state up to 11-12ms. I think if we can improve PathAlign, that would be the point where we can/should stop hunting for optimizations and pivot to #79/#80 to complete the first iteration of this software. By the way, I'm going to be on vacation from July 24 - Aug 17 so you won't hear from me much over the next couple of weeks while I'm traveling. But this remains a priority for me! I'd like to close out the optimization ticket before I leave, but I fear I will not get the Path Align optimizations done tomorrow 😢 |
I tested with TBB parallel for loops for path align, it doesn't appear to be any faster and the jitter still has the same consistent range. I also made an attempt to use xtensor's besides that, we could try to reduce the Perhaps if we established a distance potential field to the path, we could use that to reference the distance from the path for each trajectory point without needing to iterate through the path for each trajectory point and compute the distance. The discrete nature of a grid potential field though would reduce the quality of its exact alignment, however. We could also try to fit a spline to the path in a localized region to have an analytical equation to work with. That would buy us some simplification of the math if the spline's order was analytically differentiable to program in (or wolfram alpha-able). but I'll have a few hours tomorrow before I leave to think about this more and see if there's solutions with what we have. |
I'm quite surprised by the degree at which increasing the point steps for both trajectories and path points seems to not really impact performance of the critic much. Making them larger than would be good (5, 10 each) reduces by only 0.5ms but the peaks still hit 11ms. There's no amount of tuning those numbers higher than currently set which will meaningfully impact performance (though lower obviously has alot more compute). I removed the contents of the loops, so that they just loop emptily and I can still actually see spikes occurring upwards of 10ms. The CPU time drops about in half to ~2ms steady state, so half of the compute time isn't even the main loop. Given those 3 pieces of evidence (multithreading the loops doesn't help; reducing the resolution of the loop doesn't help; removing the loop content doesn't help), I don't think the reason that this critic is expensive is actually to do with the core logic itself, which is interesting - or perhaps the lambdas. |
Now that works with 35 time steps and 2000 batch on my 4 gen notebook within 20 hz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back from PTO, this pretty much all looks great to me but I think I found one error and wanted to ask 1 question that might also be a problem (or just at least good to clarify)
xt::view(yaw_offseted, xt::all(), xt::range(1, _)) = | ||
xt::view(yaw, xt::all(), xt::range(_, -1)); | ||
xt::view(yaw_offseted, xt::all(), 0) = initial_yaw; | ||
traj_yaws = xt::cumsum(utils::normalize_angles(wz) * settings_.model_dt, 0) + initial_yaw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this needs to be utils::normalize_angles(wz * settings_.model_dt + initial_yaw)
so it can be normalized as angles after the time & offsets are applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which actually the other implementation of this function does properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a mistake
@artofnothingness I did some testing on the current state of |
No description provided.