Watchdog/Timeout completely broken for 4.12 hardware #84

chris1seto · 2019-04-09T18:29:38Z

Starting in 17f9776, when this firmware is configured for hard 4.12, the board simply reboots on bootup in a loop.

The text was updated successfully, but these errors were encountered:

nitrousnrg · 2019-04-09T18:50:53Z

Hi Chris, could you attach your motor config?
In particular I would be looking for a switching frequency set too high that is crashing the RTOS timing. It happened to me and its the main reason the watchdog has been reworked.
More than 30khz is dangerous territory.

chris1seto · 2019-04-09T19:08:11Z

EDIT: Disregard, bad debugging info

chris1seto · 2019-04-09T19:17:51Z

Nevermind, disregard the above comment. This happens with stock settings on a brand new flash of the firmware when configured for 4.12

nitrousnrg · 2019-04-09T19:54:10Z

I flashed one of my palta boards with hw_410 here and I can't reproduce this issue.

What do you mean by a brand new flash? Did you command a full chip erase from an stlink to ensure old configurations are erased?
Are you using any app with the firmware?
Are you using an encoder or other cpu load?
Is your crystal okay? firmware now double checks the timing with an independent watchdog clock.

chris1seto · 2019-04-09T20:07:31Z

Yes, I tried a full erase.

My hardware is both a Flipsky mini vesc and a torque vesc from esk8

Steps to repro:

git reset --hard; git pull origin master

Uncomment:
#define HW_SOURCE "hw_410.c" // Also for 4.11 and 4.12
#define HW_HEADER "hw_410.h" // Also for 4.11 and 4.12
and comment the hardware60 lines

Full erase with STLink,

make upload

After this the board never boots up to the point where VCP works, as it is always rebooting.

chris1seto · 2019-04-09T20:16:29Z

Also, nothing connected externally, and the xtal point is interesting, but given the board call work with USB with the time out disabled, it must be ok (xtal required for USB)

…

On Tue, Apr 9, 2019, 2:54 PM Marcos Ariel Chaparro ***@***.***> wrote: I flashed one of my palta boards with hw_410 here and I can't reproduce this issue. 1. What do you mean by a brand new flash? Did you command a full chip erase from an stlink to ensure old configurations are erased? 2. Are you using any app with the firmware? 3. Are you using an encoder or other cpu load? 4. Is your crystal okay? firmware now double checks the timing with an independent watchdog clock. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#84 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAjpkCMuWNmW2hgUIhSWIiTsfEe0ZYaeks5vfO_ogaJpZM4clJJv> .

nitrousnrg · 2019-04-09T20:42:12Z

Steps I'm doing:

qstlink2 --cli -e Full flash memory erase
fresh clone from the repo
make clean
edit conf_general
make upload
connects OK to vesc tool.
just in case, Program with vesc tool the latest firmware found here: https://github.com/vedderb/bldc/blob/master/build_all/410_o_411_o_412/VESC_default.bin
powercycle turns out ok
Store default config
after powercycle connects ok to vesc tool

Do you know if there are other users with the same issue?
Thanks

nitrousnrg · 2019-04-09T20:57:06Z

To my comment above add flashing the bootloader before step 7.

chris1seto · 2019-04-09T23:47:18Z

So, did not flash the bootloader, but it shouldn't make any difference, right? Looking through the code, it doesn't touch the wdg, (other than to abuse it to reset the board, lol). Anyone with a torque or flipsky esc who can test? This is looking like a hw issue, and it must be with the xtal.

chris1seto · 2019-04-10T02:17:46Z

Here's some more debug info. If I accidentally leave HW60 in as the selected config, USB works! So, what's the difference between 60 and 410 that affects this?

Edit: Also, I confirmed that both boards do have an xtal loaded, but I'm guessing everything must be ok on this front, because if the clock settings or xtal were incorrect, USB wouldn't work at all.

nitrousnrg · 2019-04-10T02:55:02Z

A significant difference is that hw6 defaults to FOC mode and hw4 defaults to bldc mode...

nitrousnrg · 2019-04-10T03:08:38Z

Maybe adding to hw410.h this could narrow this down:

// Default setting overrides
#ifndef MCCONF_DEFAULT_MOTOR_TYPE
#define MCCONF_DEFAULT_MOTOR_TYPE		MOTOR_TYPE_FOC
#endif

chris1seto · 2019-04-10T14:01:55Z

Yup! That fixes it. So now...

410 is flashed with the above mod
Board reboots, USB VCP comes up
Change to BLDC
Reboot
Board boot loops

So there is an issue starting BLDC mode with the timeout

chris1seto · 2019-04-10T15:03:29Z

More debugging info: This is absolutely related to the switching freq. I configured my motor, everything worked great, so I set 29.5K as my FOC switching freq (everything still worked great) and then I rebooted. After the reboot, the vesc now does the boot loop. I bet the reason it fails with bldc selected is because the switching freq is very high (35K) by default

nitrousnrg · 2019-04-10T15:33:04Z

Yes, I think you are right.

So its not a problem with the watchdog, the watchdog led you to discover that the CPU usage hit 100% with your default configuration and scheduler timing is failing.

In my palta hardware I added this limit a while ago to prevent exactly that
#define HW_LIM_FOC_CTRL_LOOP_FREQ 10000.0, 30000.0 //at around 38kHz the RTOS starts crashing (26us FOC ISR)
https://github.com/vedderb/bldc/blob/master/hwconf/hw_palta.h#L268

IMO a line like that should be added to all hardware versions.

I don't use BLDC mode, but a similar limit should be implemented for that mode.
#define MCCONF_M_BLDC_F_SW_MAX 35000 // Maximum switching frequency in bldc mode
Its either decrease the frequency or optimize the code to make it run faster. (I'd decrase freq)

The frequency limit depends on the CPU load. Looks like BLDC mode (or something else) is getting more cpu intensive and now the cpu can't keep up.

Now that we have a likely solution (or at least an explanation) I think we need @vedderb

Thanks for reporting!

chris1seto · 2019-04-10T15:34:41Z

And more debugging info... This goes beyond just the switching freq. If I get a good auto detection in FOC with hall/general, and then reboot, everything is fine. If I take those settings and back them up to a file, and then reload the file the VESC will boot loop. Even if I simply backup stock settings after a fresh erase/flash and restore them, the same thing happens.

chris1seto · 2019-04-10T15:52:40Z

And even more debugging info, If I do a fresh flash, load settings, not touch the motor config, but set the CAN baud to 1M and save, the vesc will bootloop on reboot

nitrousnrg · 2019-04-10T16:37:50Z

When you are near the cpu limit any configuration change can make it better or worse. An spi encoder will require more cpu usage, so would higher CAN packet decoding frequency.

Max frequency should be dialed down now, and then see how we are going to continue. Profiling and optimizing code is an endless endeavor once you hit your resources limit, I'd rather limit freq than making the code less clear.

chris1seto · 2019-04-10T17:27:30Z

@nitrousnrg Oops, I didn't see your previous message until now. That said, my configuration isn't really anything interesting. It's a totally stock config other than CAN being 1M, and FOC with a slightly higher switching freq in sensored mode. Seems a little unreasonable that this should be at the fully limits of the hardware/RTOS?

nitrousnrg · 2019-04-10T19:05:34Z

Memory resources are plentiful, but you can easily max out the cpu if you run the core control loop at high frequencies. Thats why my first question here was if you are running > 30kHz.

nitrousnrg · 2019-04-10T21:03:18Z

I just received a support ticket of a customer telling me that the latest firmware doesn't work for him in BLDC mode, so I would think this has escalated to be a critical bug that needs patching asap before more users upgrade the firmware and brick devices.

chris1seto · 2019-04-10T21:06:35Z

@nitrousnrg Just a note, I encountered this running at 20Khz (default FOC switching freq) too. It does not appear to only be dependent on switching freq. I don't know the codebase well enough to speculate on what might be going on, but it seems very sensitive to any kind of configuration changes.

nitrousnrg · 2019-04-11T13:04:30Z

Meh, customer installed a wrong resistor, totally unrelated. Too bad I emailed Benjamin about this.

vedderb · 2019-04-11T13:24:03Z

I was following the conversation, but have not been home for a few days so I could not test anything myself. Emailing me is not a problem :-) When I come home I will catch up with the pull requests and issues.

If a commit from back then would break things for HW4 I suspect that I would have heard a lot more by now, so I was kind of hoping that you would resolve the issue.

@chris1seto is it ok to close this issue, or do you still have the problem? If you do, can you make sure that your compiler is working properly and that you did not disable optimizations?

chris1seto · 2019-04-11T17:21:12Z

Hi Benjamin,

That's my feeling too, is that you'd have heard more if this was really broken, but it seems like it really is (or at least, I'm not sure what could be wrong in my configuration). My compiler should be working correctly, I build other projects, and the optimization options should be set in the makefile, correct? I haven't changed the makefile or any part of the FW other than the general conf file (to target 410). I don't suppose anyone has an Esk8 Torque or flipsky mini vesc they could test on?

Do you have any potential steps to try to debug? I could send you a binary of stock FW to compare to one generated by your build system, but I suspect that if we have differing versions, the binary could change slightly.

EDIT: I am using gcc-arm-none-eabi-8-2018-q4-major

nitrousnrg · 2019-05-02T16:07:45Z

@chris1seto, did you get the chance to confirm its not a hardware issue? Can we close this issue?

chris1seto · 2019-05-02T16:10:12Z

Hi @nitrousnrg ,

It's definitely not a hardware issue. There's something else going on here in the bldc software, but I think Ben may need to look at it. Without disabling the watchdog, I cannot get the code to run on any of my 4.10 vescs. With the watchdog disabled the code seems to run fine, even if the scheduler is saturated.

nitrousnrg · 2019-05-02T17:15:44Z

Could you attach your motor config xml AND app xml?
I can try your binary as well if you want.

If the scheduler is saturated it should not run fine, the board should reset, thats the purpose of using a wdt.

With your files I can probe this deeper, thanks!

chris1seto · 2019-05-02T17:18:53Z

Hi @nitrousnrg See attached!! These are for a 6" garden variety hoverboard motor.
focworkingmini.zip

nitrousnrg · 2019-05-02T22:28:04Z

Thanks Chris,
please send me your compiled binary, because with the latest firmware taken from https://github.com/vedderb/bldc/blob/master/build_all/410_o_411_o_412/VESC_default.bin your configs don't brick a discovery board.

chris1seto · 2019-05-02T22:40:53Z

Hi @nitrousnrg , see attached.fw.zip

chris@itxdev:~/Vesc1/bldc$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/home/chris/opt/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib /gcc/arm-none-eabi/8.2.1/lto-wrapper
Target: arm-none-eabi
Configured with: /tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_ 1544945247/src/gcc/configure --target=arm-none-eabi --prefix=/tmp/jenkins/jenkin s-GCC-8-build_toolchain_docker-519_20181216_1544945247/install-native --libexecd ir=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/ins tall-native/lib --infodir=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_ 20181216_1544945247/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/tm p/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/install-n ative/share/doc/gcc-arm-none-eabi/man --htmldir=/tmp/jenkins/jenkins-GCC-8-build toolchain_docker-519_20181216_1544945247/install-native/share/doc/gcc-arm-none- eabi/html --pdfdir=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_2018121 6_1544945247/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c ,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx -pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-a s --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm -none-eabi --with-sysroot=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519 20181216_1544945247/install-native/arm-none-eabi --build=x86_64-linux-gnu --host =x86_64-linux-gnu --with-gmp=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-5 19_20181216_1544945247/build-native/host-libs/usr --with-mpfr=/tmp/jenkins/jenki ns-GCC-8-build_toolchain_docker-519_20181216_1544945247/build-native/host-libs/u sr --with-mpc=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_154 4945247/build-native/host-libs/usr --with-isl=/tmp/jenkins/jenkins-GCC-8-build_t oolchain_docker-519_20181216_1544945247/build-native/host-libs/usr --with-libelf =/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/build -native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc+ +,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 8-2018 -q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 8.2.1 20181213 (release) [gcc-8-branch revision 267074] (GNU Tools f or Arm Embedded Processors 8-2018-q4-major)

chris@itxdev:/Vesc1/bldc$ git show -s --format=%H
fb94428
chris@itxdev:/Vesc1/bldc$

chris@itxdev:~/Vesc1/bldc$ git diff
diff --git a/conf_general.h b/conf_general.h
index 61eed55..9f20ec4 100644
--- a/conf_general.h
+++ b/conf_general.h
@@ -61,14 +61,14 @@
//#define HW_SOURCE "hw_49.c"
//#define HW_HEADER "hw_49.h"

-//#define HW_SOURCE "hw_410.c" // Also for 4.11 and 4.12
-//#define HW_HEADER "hw_410.h" // Also for 4.11 and 4.12
+#define HW_SOURCE "hw_410.c" // Also for 4.11 and 4.12
+#define HW_HEADER "hw_410.h" // Also for 4.11 and 4.12

// Benjamins first HW60 PCB with PB5 and PB6 swapped
//#define HW60_VEDDER_FIRST_PCB

-#define HW_SOURCE "hw_60.c"
-#define HW_HEADER "hw_60.h"
+//#define HW_SOURCE "hw_60.c"
+//#define HW_HEADER "hw_60.h"

//#define HW_SOURCE "hw_r2.c"
//#define HW_HEADER "hw_r2.h"

nitrousnrg · 2019-05-02T22:57:38Z

Chris, your attached binary doesn't work in a discovery board, while mainstream binaries do work. Looks like a building issue.

Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/bin/../lib/gcc/arm-none-eabi/7.3.1/lto-wrapper
Target: arm-none-eabi
Configured with: /build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/src/gcc/configure --target=arm-none-eabi --prefix=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native --libexecdir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/lib --infodir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/arm-none-eabi --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 7-2018-q3-update' --with-multilib-list=rmprofile
Thread model: single
gcc version 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] (GNU Tools for Arm Embedded Processors 7-2018-q3-update)

My compiler version doesn't mention anything about jenkins and docker stuff

chris1seto · 2019-05-02T23:13:21Z

Where did you get your compiler package from? I got mine via the official tarball from here: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads (Linux x64)

Perhaps this is too much to ask, but would you mind downloading the tarball and using the prebuilt binaries within the build the source?

I agree that this certainly points to a build issue, and thus may not be a bug at this point, but I'm wondering what could be wrong here... I use this compiler for my fulltime day job as an STM32/Arm Cortex M3/M4F developer, so I would think that I notice if there was something wrong with my other projects. I'm more concerned about what's going on than anything...

Thanks!!

nitrousnrg · 2019-05-02T23:17:54Z

I followed the instructions here:
https://vesc-project.com/node/310

sudo add-apt-repository ppa:team-gcc-arm-embedded/ppa
sudo apt update
sudo apt install gcc-arm-embedded

You can also check if the mainstream binary I used bricks your board.

chris1seto · 2019-05-03T00:00:33Z

I'll go ahead and try this tomorrow. I guess if I can build a successful binary using those directions we can go ahead and close the bug report. I am extremely curious as to why the tarball release generates a binary that fails in this way though. Perhaps some kind of difference in optimization?

nitrousnrg · 2019-05-03T00:08:47Z

I'm baffled as well, but at the same time, I'm not. The purpose of me pushing a motor simulator into vesc codebase is exactly this, to be able to automate tests on real hardware. If one day we bump the compiler version we could hit a problem like this and the test tools will catch the problem for us.
In your pc it could be an environment variable issue, ir maybe the IDE you're using. I'd try an ubuntu virtual machine to be sure.
Keep us posted!

chris1seto · 2019-05-08T18:12:35Z

I haven't had time to test this, but also I don't want to just keep this open since it's pretty clear this is some kind of bizarre build system issue. I guess we can go ahead and close it. Man, I'd really love to know where the difference is though. I'm not even sure how to debug this because I bet different versions of gcc will emit slightly different code, although I'm sure for 99.9999% of differences, it will be inconsequential. But my point is, I'm not sure how you could even diff the disassembly to pinpoint it.

vedderb · 2019-05-08T18:34:35Z

I had a look, and the GCC version you are using is 8 whereas I have been using 7. That should be no problem, but I can give it a try with the same version you are using and see if I encounter the same problem. Will report back in a few days after testing.

chris1seto · 2019-05-08T18:53:47Z

Thanks Benjamin! That would be excellent!

Guillaume227 · 2019-05-12T22:12:30Z

I happen to also have a 4.10 Flipsky around so I tested the latest firmware on it.

I can reproduce the issue 'out of the box' with a fresh FW upload.
my debug shows that it's related to the CAN reader thread:
in particular that line seems to not come back in the 10ms it's supposed to.
(chEvtWaitAnyTimeout(ALL_EVENTS, MS2ST(10)) == 0) {

I have tried reducing 10ms to 1ms or 100us but still get the board reset.
If I change it to just continue, it behaves fine.
Do you see that too?

tdaede · 2020-03-04T05:44:57Z

FWIW I can also reproduce this on a 4.12 VESC. I was able to bisect it to the same commit. I'm using GCC 9.2.1 from Fedora's repositories. I also tried @Guillaume227 's suggestion of always continue, however that was an incomplete fix - it gets farther, but USB never comes up.

tdaede · 2020-03-04T06:09:05Z

I just rebuilt the code with gcc-arm-none-eabi-7-2018-q2-update and now it works perfectly. So it is, in fact, the gcc version that matters.

lalten · 2021-01-17T23:28:21Z

Had the same issue and can confirm, current master works when compiled with gcc-arm-none-eabi-7-2018-q2 - but will boot loop when compiled with gcc-arm-none-eabi-9-2019-q4.

nitrousnrg mentioned this issue Jun 12, 2020

Make VESC Protable to "any" Chibios supported MCU #181

Open

Watchdog/Timeout completely broken for 4.12 hardware #84

Watchdog/Timeout completely broken for 4.12 hardware #84

Comments

chris1seto commented Apr 9, 2019

nitrousnrg commented Apr 9, 2019

chris1seto commented Apr 9, 2019 • edited Loading

chris1seto commented Apr 9, 2019

nitrousnrg commented Apr 9, 2019

chris1seto commented Apr 9, 2019

chris1seto commented Apr 9, 2019 via email

nitrousnrg commented Apr 9, 2019

nitrousnrg commented Apr 9, 2019

chris1seto commented Apr 9, 2019

chris1seto commented Apr 10, 2019 • edited Loading

nitrousnrg commented Apr 10, 2019

nitrousnrg commented Apr 10, 2019

chris1seto commented Apr 10, 2019

chris1seto commented Apr 10, 2019 • edited Loading

nitrousnrg commented Apr 10, 2019

chris1seto commented Apr 10, 2019

chris1seto commented Apr 10, 2019

nitrousnrg commented Apr 10, 2019

chris1seto commented Apr 10, 2019

nitrousnrg commented Apr 10, 2019

nitrousnrg commented Apr 10, 2019

chris1seto commented Apr 10, 2019

nitrousnrg commented Apr 11, 2019

vedderb commented Apr 11, 2019

chris1seto commented Apr 11, 2019 • edited Loading

nitrousnrg commented May 2, 2019

chris1seto commented May 2, 2019

nitrousnrg commented May 2, 2019

chris1seto commented May 2, 2019

nitrousnrg commented May 2, 2019

chris1seto commented May 2, 2019 • edited Loading

nitrousnrg commented May 2, 2019

chris1seto commented May 2, 2019

nitrousnrg commented May 2, 2019 • edited Loading

chris1seto commented May 3, 2019

nitrousnrg commented May 3, 2019

chris1seto commented May 8, 2019

vedderb commented May 8, 2019

chris1seto commented May 8, 2019

Guillaume227 commented May 12, 2019

tdaede commented Mar 4, 2020

tdaede commented Mar 4, 2020

lalten commented Jan 17, 2021

chris1seto commented Apr 9, 2019 •

edited

Loading

chris1seto commented Apr 10, 2019 •

edited

Loading

chris1seto commented Apr 10, 2019 •

edited

Loading

chris1seto commented Apr 11, 2019 •

edited

Loading

chris1seto commented May 2, 2019 •

edited

Loading

nitrousnrg commented May 2, 2019 •

edited

Loading