Optimize trajectories integration and generation #74

artofnothingness · 2022-06-24T23:22:11Z

No description provided.

SteveMacenski · 2022-06-25T00:37:46Z

I'll profile on Monday, though breaking these into separate files seems odd to me. For the noise generator, we already had some vx_noises_ objects in the optimizer class (that you didn't remove) so I don't see that we're storing any more or less data that would need to be broken out into a separate class, just what was there already just needed to be tweaked in how its represented. The code is pretty much exactly the same but now with the overhead of the indices

The TrajectoryIntegrator you are storing some legit new stuff that is pretty big so I can see why that is broken out into a new class - although its not clear to me that you need to store that new stuff (?) It was just the concatenation that was heavy, not allocating the yaw / etc method variables

artofnothingness · 2022-06-25T01:47:48Z

removed redundant classes

artofnothingness · 2022-06-26T17:08:16Z

I removed prefer forward from defaults. We can just clip velocity from the bottom if we need more forward / only forward moves

SteveMacenski · 2022-06-27T23:27:35Z

It looks like this reduced the integrate state velocities CPU by about half (29% down to 15%) but didn't make much change in the generate noised controls function (~15-16%). It looks like from that, about 11% is spent on the random sampling processes + related container-y stuff from xtensor to align things / stride stepping. I'm not sure there is much we can do about that?

Most of the time I can account for is spent in xtensor. So if there's anywhere we can minimize changes / strides / hooking up GPU / generally optimizing the use of xtensor, this would be the time to do it so we can push up our batch_size as much as effectively helpful.

We now spend about 1/3 of the time on evaluating the trajectories and about 2/3 of the time generating trajectories. Of the controller time (e.g. where 100% is made up of the components within MPPIC, not other things), 22% is on generating noised controls and another 22% is on integrating state velocities. There's a number of < 8% tasks that I don't think are the big fish to fry. The critics change on a per-run basis just depending on what's going on in a particular test run, but right now none of them look to me to be a bottleneck comparatively - so there's no real need to try to minimize the number of critics at all. That's a drop in the bucket of the compute time. Anyway, this tells us our settings scale with the number of points and batches, not critics or complexity of the critics. And the number of batches and points within those batches are important things for us to push up to improve performance.

With a set of parameters I am playing with (doesn't really matter what they are, just for comparison of runtime improvement):

In develop branch: 4-5ms per run with 8-11ms when a new path is given
In this branch: 2-6ms per run with 10ms when a new path is given. Its trending down, but I think a little more to go if we need to increase our batches greatly for removing jerky behavior (which I suspect is going to be part of the final answer to that problem in addition to a critic)

I removed prefer forward from defaults. We can just clip velocity from the bottom if we need more forward / only forward moves

I don't think that is the best to remove PreferForward, although I totally agree we should have a separate minimum Vx and the way you set it up. We could still have then a situation where there is a goal behind us that we wouldn't reverse to move forward towards. It would then just move backward but at a slower pace. I showed as much in simulation. Also the analysis above shows that it really doesn't matter if we include PreferForward from a compute standpoint, it doesn't change anything for run-time but changes alot in expected / good behavior.

src/critic_manager.cpp

SteveMacenski · 2022-06-28T00:10:26Z

In pushing up numbers, I can get to about ~500 samples with 3s forward simulation time (currently 30 timesteps @ .1s increments, but still tuning and playing with those numbers wildly) at 40hz. I could do more, but that's as much as I would feel confident in at steady state, with the current performance. If we can improve things a bit, I'd love to have ~1000 which doesn't seem to need that much more optimization (10-20%).

The autorally folks used 2560 batches at 2s increments (0.02 timestep, 100 of them). I don't think we need to go that level, but even with my 30 timesteps at .1s using 2560, the behavior is really nice. The differences between 500, 1000, 2560 is hard to gauge since I can't visualize the trajectories without making rviz get choppy. That's a hardware testing need. Though given the fact that this is random sampling based optimization, we need as many samples as possible so we can model our system as best as can - especially if we can't iterate multiple times per control frequency.

README.md

src/trajectory_visualizer.cpp

SteveMacenski

Assuming the issues are fixed, this is good to go for me.

I started on some tuning this afternoon but couldn't finish due to the visualization issue and how many batches/frequency I could push in. I can pick it up tomorrow and propose some new parameters that I think help a number of the behaviors concerns

SteveMacenski · 2022-06-28T02:48:44Z

I got down to a tuned value of temperature around 0.35 as a good trade off & simulation time of 3-3.5 seconds broken down between 0.05 - 0.1 increments. I'm currently testing with 3s @ 0.075s (or 40 time steps) as my middle ground. I found that below 2.5s it had a hard time with back out maneuvers because it didn't think far enough ahead. Anything about 4.5s thought too far ahead and had problems being responsive.

In terms of batches, minimum of 500 is necessary for smoother behavior. With the visualization stuff fixed, I'm trying to understand how much it helps / doesn't impact for 1000, 2000 to find the sweet spot. One major reason I want to squeeze a bit more performance out of this if possible to support a 1,000 batch system at 40hz on a "regular" intel CPU (raspberry pis or something will obviously need less). Anything above 50hz I think is just excessive, so we're closing in on that number and I couldn't be happier about that. If we can support 50hz @ 2000 batches with the previous settings, I can't imagine we'll ever need more. Though realistically I'd make the defaults less than that

artofnothingness requested a review from SteveMacenski June 24, 2022 23:22

artofnothingness force-pushed the feat/optimizing-tensor-ops branch from a7c6281 to 76ad504 Compare June 24, 2022 23:26

artofnothingness mentioned this pull request Jun 25, 2022

Segmentation fault with Controller Server caused by Xtensor(?) #71

Closed

artofnothingness force-pushed the feat/optimizing-tensor-ops branch from 53d733c to bc9d591 Compare June 25, 2022 09:38

artofnothingness added 8 commits June 28, 2022 01:30

Optimize trajectories integration and generation

7ba68e4

fixes, add min vel

5c34b76

simpler arch

ef1049e

refactoring stuffs

7db267d

ament uncristify

d84fc18

remove I from model

5c18d13

clean up

3fc1f4e

remove prefer forward from defaults

56a9562

artofnothingness force-pushed the feat/optimizing-tensor-ops branch from f203df0 to 56a9562 Compare June 27, 2022 22:34

rebase fix

d5db565

SteveMacenski approved these changes Jun 27, 2022

View reviewed changes

src/critic_manager.cpp Show resolved Hide resolved

SteveMacenski requested changes Jun 28, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

SteveMacenski requested changes Jun 28, 2022

View reviewed changes

src/trajectory_visualizer.cpp Show resolved Hide resolved

artofnothingness added 6 commits June 28, 2022 04:29

viz fix, review fixes

641914d

bring back old api

76ade00

fixes

3fc2251

fixes

e6ead22

fix param ns

bbfaf7e

minor

87be642

SteveMacenski approved these changes Jun 28, 2022

View reviewed changes

fix minor

3dd793d

SteveMacenski merged commit 2e7e99b into develop Jun 28, 2022

SteveMacenski deleted the feat/optimizing-tensor-ops branch June 28, 2022 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize trajectories integration and generation #74

Optimize trajectories integration and generation #74

artofnothingness commented Jun 24, 2022

SteveMacenski commented Jun 25, 2022 •

edited

Loading

artofnothingness commented Jun 25, 2022

artofnothingness commented Jun 26, 2022

SteveMacenski commented Jun 27, 2022 •

edited

Loading

SteveMacenski commented Jun 28, 2022

SteveMacenski left a comment •

edited

Loading

SteveMacenski commented Jun 28, 2022 •

edited

Loading

Optimize trajectories integration and generation #74

Optimize trajectories integration and generation #74

Conversation

artofnothingness commented Jun 24, 2022

SteveMacenski commented Jun 25, 2022 • edited Loading

artofnothingness commented Jun 25, 2022

artofnothingness commented Jun 26, 2022

SteveMacenski commented Jun 27, 2022 • edited Loading

SteveMacenski commented Jun 28, 2022

SteveMacenski left a comment • edited Loading

Choose a reason for hiding this comment

SteveMacenski commented Jun 28, 2022 • edited Loading

SteveMacenski commented Jun 25, 2022 •

edited

Loading

SteveMacenski commented Jun 27, 2022 •

edited

Loading

SteveMacenski left a comment •

edited

Loading

SteveMacenski commented Jun 28, 2022 •

edited

Loading