Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tensors further #88

Merged
merged 15 commits into from
Jul 25, 2022
Merged

Optimize tensors further #88

merged 15 commits into from
Jul 25, 2022

Conversation

artofnothingness
Copy link
Owner

No description provided.

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jul 21, 2022

well I don't see that this helps much with the jitter, but it also did another big drop in CPU - which will also correspondingly drop the jitters' impacts. I see this running well under 10ms now (we started the week at 20ms!) with maximum peaks mid-25ms, which is below the 30 hz rate. 6-11ms is typical.

I'd say its worth the time to finish up this PR. The general performance is more stable now and average slightly lowers.

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jul 21, 2022

I cleaned things up & handled the merge conflicts. The remaining stuff is w.r.t. visualization getOptimizedTrajectory and getGeneratedTrajectories based on the updated Trajectories wrapped tensors. Could you work your xtensor magic? You can merge this one that's added, I've tested everything else and very happy with the stabilization improvement this adds.

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jul 21, 2022

After this change, I re-benchmarked where the jitter is coming from and see numbers where the PathAlign is the biggest contributor now. The entire spike of the trajectory generation process is going from 2-3ms to 5-6ms (2-3ms deltas are hardly worth doing much about, I doubt we even could). PathAlign can go from 4-6ms steady state up to 11-12ms.

I think if we can improve PathAlign, that would be the point where we can/should stop hunting for optimizations and pivot to #79/#80 to complete the first iteration of this software.

By the way, I'm going to be on vacation from July 24 - Aug 17 so you won't hear from me much over the next couple of weeks while I'm traveling. But this remains a priority for me! I'd like to close out the optimization ticket before I leave, but I fear I will not get the Path Align optimizations done tomorrow 😢

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jul 22, 2022

I tested with TBB parallel for loops for path align, it doesn't appear to be any faster and the jitter still has the same consistent range. I also made an attempt to use xtensor's xaxis_iterator but after about an hour, I couldn't get it to compile with the types and whatnot. Might be worth a shot on your end, since using iterators will compile some xstrides which might be more optimized than manual looping over the data.

besides that, we could try to reduce the steps, but otherwise I think we'd be reaching the end of what we can do with it as currently implemented and might be worth rethinking how that's implemented to see if there's a lighter weight alternative.

Perhaps if we established a distance potential field to the path, we could use that to reference the distance from the path for each trajectory point without needing to iterate through the path for each trajectory point and compute the distance. The discrete nature of a grid potential field though would reduce the quality of its exact alignment, however.

We could also try to fit a spline to the path in a localized region to have an analytical equation to work with. That would buy us some simplification of the math if the spline's order was analytically differentiable to program in (or wolfram alpha-able).

but I'll have a few hours tomorrow before I leave to think about this more and see if there's solutions with what we have.

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Jul 22, 2022

I'm quite surprised by the degree at which increasing the point steps for both trajectories and path points seems to not really impact performance of the critic much. Making them larger than would be good (5, 10 each) reduces by only 0.5ms but the peaks still hit 11ms. There's no amount of tuning those numbers higher than currently set which will meaningfully impact performance (though lower obviously has alot more compute).

I removed the contents of the loops, so that they just loop emptily and I can still actually see spikes occurring upwards of 10ms. The CPU time drops about in half to ~2ms steady state, so half of the compute time isn't even the main loop.

Given those 3 pieces of evidence (multithreading the loops doesn't help; reducing the resolution of the loop doesn't help; removing the loop content doesn't help), I don't think the reason that this critic is expensive is actually to do with the core logic itself, which is interesting - or perhaps the lambdas.

@artofnothingness
Copy link
Owner Author

Now that works with 35 time steps and 2000 batch on my 4 gen notebook within 20 hz

@artofnothingness artofnothingness merged commit 4b51d5f into develop Jul 25, 2022
@artofnothingness artofnothingness deleted the sep_ten_traj branch July 25, 2022 19:58
@artofnothingness artofnothingness changed the title Sep ten traj Optimize tensors further Jul 25, 2022
Copy link
Collaborator

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back from PTO, this pretty much all looks great to me but I think I found one error and wanted to ask 1 question that might also be a problem (or just at least good to clarify)

xt::view(yaw_offseted, xt::all(), xt::range(1, _)) =
xt::view(yaw, xt::all(), xt::range(_, -1));
xt::view(yaw_offseted, xt::all(), 0) = initial_yaw;
traj_yaws = xt::cumsum(utils::normalize_angles(wz) * settings_.model_dt, 0) + initial_yaw;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be utils::normalize_angles(wz * settings_.model_dt + initial_yaw) so it can be normalized as angles after the time & offsets are applied

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which actually the other implementation of this function does properly

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a mistake

@SteveMacenski
Copy link
Collaborator

@artofnothingness I did some testing on the current state of dev on runtime. I never see anything > 20ms anymore! Really great work on finishing this up 🥇 That puts is at 50hz at the settings I've been playing with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants