We have not used an STLink yet. Though I’m currently trying to get it up and running. I’m using vscode but when setting breakpoints, it indicates that no source file can be found with the name of in which the breakpoint is set.
In VSCode, when you click on Terminal->Run Task, are the options Build and flash - ST-Link available, and have you been able to get them to complete succesfully?
Your life will be a lot easier with debugging properly setup.
Yes they are able to complete successfully.
I’ve been able to run the debugger though added breakpoints in the source files are ignored. Whenever paused, it shows assembly instructions rather than source code. It seems like it’s unable to link the source files. When run using make gdb, the TUI displays assembly instructions and when trying ‘layout src’ displays that there are no source files available.
It sounds like your .elf file doesn’t contain debug info. Can you check if the compile commands include the -g
flag? You can put in a deliberate syntax error so that the build system shows an error and dumps the compile command.
You can also check arm-none-eabi-readelf --debug-dump=info build/ODriveFirmware.elf | wc -l
. If I build with -g
I get 1277898 (huge amount if debug info) and without -g
I get 1002 (almost no debug info).
Regarding the SPI transfers: currently we start SPI at the beginning of a control loop iteration, then update a few unrelated components and then by the time we get to the encoder’s update() call we expect the SPI transfers to be done. That means there’s not much time for the transfers. If this becomes a problem then maybe you can use the SPI results one control iteration later. That way the SPI transfers can span the whole 125µs iteration.
The command does seem to use the -g flag:
arm-none-eabi-gcc -x assembler-with-cpp -c Board/v3/startup_stm32f405xx.s -DUSB_PROTOCOL_NATIVE -DUART_PROTOCOL_ASCII -DSTM32F405xx -DARM_MATH_CM4 -mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -DFPU_FPV4 -DHW_VERSION_MAJOR=3 -DHW_VERSION_MINOR=6 -DHW_VERSION_VOLTAGE=56 -D__weak="__attribute__((weak))" -D__packed="__attribute__((__packed__))" -DUSE_HAL_DRIVER -mthumb -mfloat-abi=hard -Wno-psabi -Wall -Wdouble-promotion -Wfloat-conversion -fdata-sections -ffunction-sections -g -gdwarf-2 --g -Og -flto -ffast-math -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/include -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS -IBoard/v3/Middlewares/ST/STM32_USB_Device_Library/Core/Inc -IBoard/v3/Middlewares/ST/STM32_USB_Device_Library/Class/CDC/Inc -IBoard/v3/Drivers/STM32F4xx_HAL_Driver/Inc -IBoard/v3/Drivers/STM32F4xx_HAL_Driver/Inc/Legacy -IBoard/v3/Drivers/CMSIS/Device/ST/STM32F4xx/Include -IBoard/v3/Drivers/CMSIS/Include -IBoard/v3/Inc -I. -o build/obj/Board_v3_startup_stm32f405xx.s.o
(with the --g being the wrong addition to cause a syntax error).
The arm-none-eabi-readelf --debug-dump=info build/ODriveFirmware.elf | wc -l
does return 1006 though which is not inline with what you indicated.
Regarding the SPI transfers: currently we start SPI at the beginning of a control loop iteration, then update a few unrelated components and then by the time we get to the encoder’s update() call we expect the SPI transfers to be done. That means there’s not much time for the transfers. If this becomes a problem then maybe you can use the SPI results one control iteration later. That way the SPI transfers can span the whole 125µs iteration.
This sounds like a very plausible root of our problem. Thanks for the added suggestion on how to fix this. We’ll try to implement it.
Thanks for the help!
The debugger now works . The crux of the problem was LTO still being enabled in the tup.config.
So we got a basic implementation running. But when running current control with 2 motors at the same time we get control_deadline_missed as error. Looking at the ControlLoop_IRQHandler function; it seems like there is an exact amount of clock cycles that are supposed to have occurred for this error not to occur.
// If we did everything right, the TIM8 update handler should have been
// called exactly once between the start of this function and now.
if (timestamp_ != timestamp + TIM_1_8_PERIOD_CLOCKS * (TIM_1_8_RCR + 1)) {
motors[0].disarm_with_error(Motor::ERROR_CONTROL_DEADLINE_MISSED);
motors[1].disarm_with_error(Motor::ERROR_CONTROL_DEADLINE_MISSED);
}
So assuming that our changes has altered the amount of clock cycles, would we need to alter this? And if so, how would we find the new value?
Do you get the error intermittently, or every cycle/always?
Because that error can also occur when the controller is unstable. Does it persist when you lower controller effort?
We get the error whenever we put the second motor in closed loop control. So no actual actuation of either of the motors has occurred yet.
It’s not an exact number, just an upper limit. How much extra processing time do you figure you added? Make sure you’re compiling with -Ofast and ideally LTO, but that might be broken.
Does the error persist when you turn off debugging?
Thanks for the suggestions, I’ll see if I’m able to test it today using an optimized build.
I’m not sure how much processing time we’ve added. I think the 2 additional encoders and some safety checks are most likely to have increased it.
Using -Ofast and LTO has indeed solved the issue ! Thanks for the help.
I’m currently looking into proper error propagation for our new structure. I noticed that for the axis do_checks is used for error checking errors in sub-components. However, the encoders weren’t checked in the first place. Were their errors propagated to the axis differently or is it still something under construction on the dev branch?
Encoder errors are propagated implicitly by their output being unavailable (std::nullopt
). That means the controller will fail if it gets the position/velocity estimate from a failed encoder. But for instance in sensorless mode the motor does not fail if an encoder fails.
That also explains some problems we had with decoupling the encoders from the axis.
As we got a request by someone to possible already use our current firmware, I’ve created a small guide at a new thread: Guide to using multiple encoders per axis (early-access)
I’ve encountered an interesting problem with our new firmware. We’ve got an USB interface for the native protocol in C++, which functioned properly with the v0.5.1 firmware. With the new firmware we’re getting errors while importing the JSON from the ODrive. I first thought that it had to do with a wrong CRC or something similar but that doesn’t seem to be the case as the communiciation fails halfway through the transaction.
I ran the ODrive in debug mode using an STLink and found out that it stops transmitting packets as it gets stuck in a function. It stays in the xTaskResumeAll() function within the task.c file. I’ve haven’t been able to find out what exactly causes this issue yet.
The interesting thing is that the odrivetool is still able to properly communicate with the ODrive. Soon we’re also gonna test whether a python USB interface still functions as well (which is likely as the odrivetool still functions).
If there is anything that comes to mind what could cause this, please let me know. I thought that in terms of communication, only the CRC needs to be changed when the size of the JSON changes but there might be more going on.
Hey everyone (and especially @Samuel ), I’ve sadly got some bad news.
Yesterday we discovered that after about 1 minute of actuating 2 motors simultaneously, the firmware crashes(/is stuck in a loop). This is a rather pesky problem in terms of debugging, as we can’t use the STLink to debug in this case. As you might remember previously we had an issue with control deadlines missed which was resolved by not building in debug mode and by using LTO. So we can’t actuate both motors and debug at the same time :(.
As our major adjustments were only to the structure of where encoders are located and how to change between them, we’ve changed relatively little to the actuation process.
I initially thought that it might have something to do with thread safety, as we’ve gone from axes with their private encoders to accessing an encoder through a shared resources: the encoder_manager. Though this shouldn’t be the case as besides when an encoder is being changed, the encoder_manager simply acts as an interface with getters and setters.
It shouldn’t be a problem of processing power either as we’d then expect similar errors to the control_deadline_missed like before. We’ve even tried it with only using 2 encoders again with the only difference then being the firmware’s structure.
I should also add that the issue does not occur when actuating a single motor. So it’s an issue to specifically the use of 2 motors simultaneously.
Any inkling of what could be going wrong would be highly appreciated.