-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchdog/Timeout completely broken for 4.12 hardware #84
Comments
Hi Chris, could you attach your motor config? |
EDIT: Disregard, bad debugging info |
Nevermind, disregard the above comment. This happens with stock settings on a brand new flash of the firmware when configured for 4.12 |
I flashed one of my palta boards with hw_410 here and I can't reproduce this issue.
|
Yes, I tried a full erase. My hardware is both a Flipsky mini vesc and a torque vesc from esk8 Steps to repro: git reset --hard; git pull origin master Uncomment: Full erase with STLink, make upload After this the board never boots up to the point where VCP works, as it is always rebooting. |
Also, nothing connected externally, and the xtal point is interesting, but
given the board call work with USB with the time out disabled, it must be
ok (xtal required for USB)
…On Tue, Apr 9, 2019, 2:54 PM Marcos Ariel Chaparro ***@***.***> wrote:
I flashed one of my palta boards with hw_410 here and I can't reproduce
this issue.
1. What do you mean by a brand new flash? Did you command a full chip
erase from an stlink to ensure old configurations are erased?
2. Are you using any app with the firmware?
3. Are you using an encoder or other cpu load?
4. Is your crystal okay? firmware now double checks the timing with an
independent watchdog clock.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAjpkCMuWNmW2hgUIhSWIiTsfEe0ZYaeks5vfO_ogaJpZM4clJJv>
.
|
Steps I'm doing:
Do you know if there are other users with the same issue? |
To my comment above add flashing the bootloader before step 7. |
So, did not flash the bootloader, but it shouldn't make any difference, right? Looking through the code, it doesn't touch the wdg, (other than to abuse it to reset the board, lol). Anyone with a torque or flipsky esc who can test? This is looking like a hw issue, and it must be with the xtal. |
Here's some more debug info. If I accidentally leave HW60 in as the selected config, USB works! So, what's the difference between 60 and 410 that affects this? Edit: Also, I confirmed that both boards do have an xtal loaded, but I'm guessing everything must be ok on this front, because if the clock settings or xtal were incorrect, USB wouldn't work at all. |
A significant difference is that hw6 defaults to FOC mode and hw4 defaults to bldc mode... |
Maybe adding to hw410.h this could narrow this down:
|
Yup! That fixes it. So now...
So there is an issue starting BLDC mode with the timeout |
More debugging info: This is absolutely related to the switching freq. I configured my motor, everything worked great, so I set 29.5K as my FOC switching freq (everything still worked great) and then I rebooted. After the reboot, the vesc now does the boot loop. I bet the reason it fails with bldc selected is because the switching freq is very high (35K) by default |
Yes, I think you are right. So its not a problem with the watchdog, the watchdog led you to discover that the CPU usage hit 100% with your default configuration and scheduler timing is failing. In my palta hardware I added this limit a while ago to prevent exactly that IMO a line like that should be added to all hardware versions. I don't use BLDC mode, but a similar limit should be implemented for that mode. The frequency limit depends on the CPU load. Looks like BLDC mode (or something else) is getting more cpu intensive and now the cpu can't keep up. Now that we have a likely solution (or at least an explanation) I think we need @vedderb Thanks for reporting! |
And more debugging info... This goes beyond just the switching freq. If I get a good auto detection in FOC with hall/general, and then reboot, everything is fine. If I take those settings and back them up to a file, and then reload the file the VESC will boot loop. Even if I simply backup stock settings after a fresh erase/flash and restore them, the same thing happens. |
And even more debugging info, If I do a fresh flash, load settings, not touch the motor config, but set the CAN baud to 1M and save, the vesc will bootloop on reboot |
When you are near the cpu limit any configuration change can make it better or worse. An spi encoder will require more cpu usage, so would higher CAN packet decoding frequency. Max frequency should be dialed down now, and then see how we are going to continue. Profiling and optimizing code is an endless endeavor once you hit your resources limit, I'd rather limit freq than making the code less clear. |
@nitrousnrg Oops, I didn't see your previous message until now. That said, my configuration isn't really anything interesting. It's a totally stock config other than CAN being 1M, and FOC with a slightly higher switching freq in sensored mode. Seems a little unreasonable that this should be at the fully limits of the hardware/RTOS? |
Memory resources are plentiful, but you can easily max out the cpu if you run the core control loop at high frequencies. Thats why my first question here was if you are running > 30kHz. |
I just received a support ticket of a customer telling me that the latest firmware doesn't work for him in BLDC mode, so I would think this has escalated to be a critical bug that needs patching asap before more users upgrade the firmware and brick devices. |
@nitrousnrg Just a note, I encountered this running at 20Khz (default FOC switching freq) too. It does not appear to only be dependent on switching freq. I don't know the codebase well enough to speculate on what might be going on, but it seems very sensitive to any kind of configuration changes. |
Meh, customer installed a wrong resistor, totally unrelated. Too bad I emailed Benjamin about this. |
I was following the conversation, but have not been home for a few days so I could not test anything myself. Emailing me is not a problem :-) When I come home I will catch up with the pull requests and issues. If a commit from back then would break things for HW4 I suspect that I would have heard a lot more by now, so I was kind of hoping that you would resolve the issue. @chris1seto is it ok to close this issue, or do you still have the problem? If you do, can you make sure that your compiler is working properly and that you did not disable optimizations? |
Hi Benjamin, That's my feeling too, is that you'd have heard more if this was really broken, but it seems like it really is (or at least, I'm not sure what could be wrong in my configuration). My compiler should be working correctly, I build other projects, and the optimization options should be set in the makefile, correct? I haven't changed the makefile or any part of the FW other than the general conf file (to target 410). I don't suppose anyone has an Esk8 Torque or flipsky mini vesc they could test on? Do you have any potential steps to try to debug? I could send you a binary of stock FW to compare to one generated by your build system, but I suspect that if we have differing versions, the binary could change slightly. EDIT: I am using gcc-arm-none-eabi-8-2018-q4-major |
@chris1seto, did you get the chance to confirm its not a hardware issue? Can we close this issue? |
Hi @nitrousnrg , It's definitely not a hardware issue. There's something else going on here in the bldc software, but I think Ben may need to look at it. Without disabling the watchdog, I cannot get the code to run on any of my 4.10 vescs. With the watchdog disabled the code seems to run fine, even if the scheduler is saturated. |
Could you attach your motor config xml AND app xml? If the scheduler is saturated it should not run fine, the board should reset, thats the purpose of using a wdt. With your files I can probe this deeper, thanks! |
Hi @nitrousnrg See attached!! These are for a 6" garden variety hoverboard motor. |
Thanks Chris, |
Hi @nitrousnrg , see attached.fw.zip
|
Chris, your attached binary doesn't work in a discovery board, while mainstream binaries do work. Looks like a building issue.
My compiler version doesn't mention anything about jenkins and docker stuff |
Where did you get your compiler package from? I got mine via the official tarball from here: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads (Linux x64) Perhaps this is too much to ask, but would you mind downloading the tarball and using the prebuilt binaries within the build the source? I agree that this certainly points to a build issue, and thus may not be a bug at this point, but I'm wondering what could be wrong here... I use this compiler for my fulltime day job as an STM32/Arm Cortex M3/M4F developer, so I would think that I notice if there was something wrong with my other projects. I'm more concerned about what's going on than anything... Thanks!! |
I followed the instructions here:
You can also check if the mainstream binary I used bricks your board. |
I'll go ahead and try this tomorrow. I guess if I can build a successful binary using those directions we can go ahead and close the bug report. I am extremely curious as to why the tarball release generates a binary that fails in this way though. Perhaps some kind of difference in optimization? |
I'm baffled as well, but at the same time, I'm not. The purpose of me pushing a motor simulator into vesc codebase is exactly this, to be able to automate tests on real hardware. If one day we bump the compiler version we could hit a problem like this and the test tools will catch the problem for us. |
I haven't had time to test this, but also I don't want to just keep this open since it's pretty clear this is some kind of bizarre build system issue. I guess we can go ahead and close it. Man, I'd really love to know where the difference is though. I'm not even sure how to debug this because I bet different versions of gcc will emit slightly different code, although I'm sure for 99.9999% of differences, it will be inconsequential. But my point is, I'm not sure how you could even diff the disassembly to pinpoint it. |
I had a look, and the GCC version you are using is 8 whereas I have been using 7. That should be no problem, but I can give it a try with the same version you are using and see if I encounter the same problem. Will report back in a few days after testing. |
Thanks Benjamin! That would be excellent! |
I happen to also have a 4.10 Flipsky around so I tested the latest firmware on it.
I have tried reducing 10ms to 1ms or 100us but still get the board reset. |
FWIW I can also reproduce this on a 4.12 VESC. I was able to bisect it to the same commit. I'm using GCC 9.2.1 from Fedora's repositories. I also tried @Guillaume227 's suggestion of always continue, however that was an incomplete fix - it gets farther, but USB never comes up. |
I just rebuilt the code with gcc-arm-none-eabi-7-2018-q2-update and now it works perfectly. So it is, in fact, the gcc version that matters. |
Had the same issue and can confirm, current master works when compiled with |
Starting in 17f9776, when this firmware is configured for hard 4.12, the board simply reboots on bootup in a loop.
The text was updated successfully, but these errors were encountered: