-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Achieving full speed on straight-a-ways #80
Comments
@artofnothingness any thoughts on this one? The most obvious thing to try is just penalizing the difference between the maximum speed and the speeds in the trajectory or speed of the current state? I believe the latter is what "Aggressive Driving with Model Predictive Path Integral Control" does in Eq. 35. I have a prototype I got started with this afternoon in the ~45 minutes I had which uses the state information, though some TODOs:
|
This seems to be one of the last tickets we can do before another set of hardware testing evaluations |
Fixed those issues -- now just up to the actual I pushed some updates using the |
Try next:
|
Averages/min/max/average fail. But penalizing the difference of each trajectory point summed works too well (needs normalization?) and penalizing only the first trajectory works pretty well too so a couple of potential good path forward here exists. Working on it tomorrow. The main issue now is that because we penalize the difference between the desired velocity and the actual velocity, it doesn't allow reversing (or even slowing down) and causes collisions because its too intense of a force so the robot doesn't want to slow down. This is even with a penalty weight of only 1.0 with the units of By the time the weight is large enough to meaningfully speed up the robot, its also too large to allow it to back up or slow down. |
Mhm, this seems odd. If I increase the weights of the existing critics without specifically penalizing higher speeds, i get higher speeds out. This seems like an odd result since I just made them all proportionally higher. I see the same result if I just increase the power of cost from linear to quadratic. Admittedly the system becomes a bit less stable (perhaps needs retuning or changes in the softmax gain?). @artofnothingness do you have any thoughts on this? Perhaps the right move isn't actually to penalize for higher speeds at all but make the existing critics have higher penalties. Do you know why that would be that just higher (but proportionally so) would select faster trajectories? That would solve a great deal of this issue if we don't need to actually create a specific new critic. I found this result by trying critics 1-by-1 and realizing that none of them are the reason its going slower. If I only use the |
Hmm. This sounds weird. It could be that temperature parameter make weighting less cost sensitive, so we just get more random-average values than such that minimize cost function Did you try to change temperature parameter ? |
I think we need to dig into original papers. This implementation has a lot of invented experimental stuffs. |
Just tried that. Yes, if we lower the value from 0.35 -> 0.1, then we go from 75% of the max speed to 90% of the max speed. If I lower it to near-zero (but non-zero) then its virtually butting up against that max speed value (but with quite a large variance of values, I assume due to destabilization). I see a normalization step before finding the softmax, so if things are proportionally the same, it should be similar outputs? (ex. cost A is 1, cost B is 5 vs cost A is 10 and cost B is 50). The exponential function probably warps that a bit but seems rather silly that just higher values of costs, but proportional to each other the same, would impact things that much. Perhaps its worth seeing if we can retune the system using squared costs? Just squaring the current weights makes it unstable, but at least it moves more closely to max speed. I'm perfectly happy if we don't hit the full max speed at all iterations, but I'd like to at least be in the right ~5-15% ballpark. It doesn't seem like lowering temperature is wise though, that is definitely very destabilizing. Some weighed averaging is useful. But what do you think? It seems like our options are (1) higher powers (2) higher weights (3) lower temperatures. Perhaps additionally a "smoothing" critic to handle destabilization (maybe penalize large |
I went over the autorally code from the MPPI authors and agree that what they have is the same as what you've done in the The only thing they do after that point is apply a Savitzky–Golay filter https://github.com/AutoRally/autorally/blob/c2692f2970da6874ad9ddfeea3908adaf05b4b09/autorally_control/include/autorally_control/path_integral/mppi_controller.cu#L416-L446. This paper characterizes it. This doesn't help with full speed, but it does help with jitteriness, so if we do this then we can decrease the temperature value - my assertion is. They use a gamma value of 0.15 to our current What do you think? Wouldn't require any retuning or messing with critics, smoothing a local buffer of recent commands. This actually seems pretty sensible, softmax is over smoothing us, so we reduce that to use a narrower band but then smooth the output at a second level. |
That's great observation. We should implement this. Definitely high temperature not the greatest way to get smoother trajectory |
So the MPPI in Autorally does the SGF filter on the output of the control, but the paper recommends doing something else (after reading it now in more detail). They claim the filter reduces convergence, who's argument is relatively intuitive but also I'm not sure for our situation it really matters. But, they claim something that's more theoretically bounded in the MPPI framework, so I figure that's good to utilize. Their choice of language around "i-axis" and "t-axis" is pretty annoying though. As you're reading that, Can you take a look over this paper and let me know if this aligns with your understanding as well? I think we need to:
The method also shows Though, I wonder if we can't just add in a smoothing critic
|
I pushed the change to the following branch: https://github.com/artofnothingness/mppic/tree/smoothmppi I'm still playing around but this seems correct to my current understanding of the paper. I know the changes to I think the trajectories are being generated using the control velocities and not the "action" velocities, so its scoring trajectories that aren't directly analogous to what the actual velocities being sent to the base are (since those are now "action"s). I've gone through and updated the motion model to use Now the issue I'm running into is that the behavior is nutty of the system, its relatively smooth in motion (wobbly oscillations though), but seems to ignore the penalty functions for actually stopping at the goal or following the path (among others; it seems like its on drugs). The optimizer is regularly having to be reset because the trajectories aren't super responsive to obstacles - but that could have something to do with the virtually suppressed normal distribution so the samples aren't further apart (e.g. before I'm not sure exactly what's going on there, but at least it moving around is progress 👍. The branch is up to date on my work today, please take a look and let know if you make any progress. I'm not sure if I'll get a chance to continue this tomorrow and I'm on vacation on Friday. I put in ~4 hours on this today to get this far! |
Well, I have it kind of working, but I'm not sure why. If I don't clamp the But if I clamp the actions to be within valid bounds of velocities I could use a second pair of eyes at this point. There are Its pretty smooth with a temperature of |
Still working through the issues in Action space, but see work on post-filtering with the Savitsky Golay filter in the |
@artofnothingness the #99 PR is finally ready for you to give a review on |
@artofnothingness the #99 PR is finally ready for you to give a review on... but for real this time. I'm completely done and happy with the results. Just waiting on your thoughts. |
#99 has confirmed to resolve on hardware |
See #77 (comment)
A good place to start is a critic to weight going the full speed, with probably a low weight to just incentivize but not force.
The text was updated successfully, but these errors were encountered: