Using an absolute and incremental encoder per axis

Jorijs · March 3, 2021, 10:18am

We have not used an STLink yet. Though I’m currently trying to get it up and running. I’m using vscode but when setting breakpoints, it indicates that no source file can be found with the name of in which the breakpoint is set.

Riewert · March 3, 2021, 12:16pm

In VSCode, when you click on Terminal->Run Task, are the options Build and flash - ST-Link available, and have you been able to get them to complete succesfully?

Your life will be a lot easier with debugging properly setup.

Jorijs · March 3, 2021, 1:10pm

Yes they are able to complete successfully.
I’ve been able to run the debugger though added breakpoints in the source files are ignored. Whenever paused, it shows assembly instructions rather than source code. It seems like it’s unable to link the source files. When run using make gdb, the TUI displays assembly instructions and when trying ‘layout src’ displays that there are no source files available.

Riewert · March 3, 2021, 3:31pm

Maybe @Samuel can help you out.

Samuel · March 4, 2021, 9:05am

It sounds like your .elf file doesn’t contain debug info. Can you check if the compile commands include the -g flag? You can put in a deliberate syntax error so that the build system shows an error and dumps the compile command.

You can also check arm-none-eabi-readelf --debug-dump=info build/ODriveFirmware.elf | wc -l. If I build with -g I get 1277898 (huge amount if debug info) and without -g I get 1002 (almost no debug info).

Regarding the SPI transfers: currently we start SPI at the beginning of a control loop iteration, then update a few unrelated components and then by the time we get to the encoder’s update() call we expect the SPI transfers to be done. That means there’s not much time for the transfers. If this becomes a problem then maybe you can use the SPI results one control iteration later. That way the SPI transfers can span the whole 125µs iteration.

Jorijs · March 4, 2021, 9:31am

The command does seem to use the -g flag:
arm-none-eabi-gcc -x assembler-with-cpp -c Board/v3/startup_stm32f405xx.s -DUSB_PROTOCOL_NATIVE -DUART_PROTOCOL_ASCII -DSTM32F405xx -DARM_MATH_CM4 -mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -DFPU_FPV4 -DHW_VERSION_MAJOR=3 -DHW_VERSION_MINOR=6 -DHW_VERSION_VOLTAGE=56 -D__weak="__attribute__((weak))" -D__packed="__attribute__((__packed__))" -DUSE_HAL_DRIVER -mthumb -mfloat-abi=hard -Wno-psabi -Wall -Wdouble-promotion -Wfloat-conversion -fdata-sections -ffunction-sections -g -gdwarf-2 --g -Og -flto -ffast-math -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/include -IBoard/v3/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS -IBoard/v3/Middlewares/ST/STM32_USB_Device_Library/Core/Inc -IBoard/v3/Middlewares/ST/STM32_USB_Device_Library/Class/CDC/Inc -IBoard/v3/Drivers/STM32F4xx_HAL_Driver/Inc -IBoard/v3/Drivers/STM32F4xx_HAL_Driver/Inc/Legacy -IBoard/v3/Drivers/CMSIS/Device/ST/STM32F4xx/Include -IBoard/v3/Drivers/CMSIS/Include -IBoard/v3/Inc -I. -o build/obj/Board_v3_startup_stm32f405xx.s.o
(with the --g being the wrong addition to cause a syntax error).

The arm-none-eabi-readelf --debug-dump=info build/ODriveFirmware.elf | wc -l does return 1006 though which is not inline with what you indicated.

Regarding the SPI transfers: currently we start SPI at the beginning of a control loop iteration, then update a few unrelated components and then by the time we get to the encoder’s update() call we expect the SPI transfers to be done. That means there’s not much time for the transfers. If this becomes a problem then maybe you can use the SPI results one control iteration later. That way the SPI transfers can span the whole 125µs iteration.

This sounds like a very plausible root of our problem. Thanks for the added suggestion on how to fix this. We’ll try to implement it.

Thanks for the help!

Jorijs · March 9, 2021, 3:41pm

The debugger now works . The crux of the problem was LTO still being enabled in the tup.config.

Jorijs · March 19, 2021, 2:34pm

So we got a basic implementation running. But when running current control with 2 motors at the same time we get control_deadline_missed as error. Looking at the ControlLoop_IRQHandler function; it seems like there is an exact amount of clock cycles that are supposed to have occurred for this error not to occur.

// If we did everything right, the TIM8 update handler should have been
// called exactly once between the start of this function and now.

if (timestamp_ != timestamp + TIM_1_8_PERIOD_CLOCKS * (TIM_1_8_RCR + 1)) {
    motors[0].disarm_with_error(Motor::ERROR_CONTROL_DEADLINE_MISSED);
    motors[1].disarm_with_error(Motor::ERROR_CONTROL_DEADLINE_MISSED);
}

So assuming that our changes has altered the amount of clock cycles, would we need to alter this? And if so, how would we find the new value?

Riewert · March 20, 2021, 8:23am

Do you get the error intermittently, or every cycle/always?

Because that error can also occur when the controller is unstable. Does it persist when you lower controller effort?

Jorijs · March 21, 2021, 10:20am

We get the error whenever we put the second motor in closed loop control. So no actual actuation of either of the motors has occurred yet.

Wetmelon · March 21, 2021, 5:23pm

It’s not an exact number, just an upper limit. How much extra processing time do you figure you added? Make sure you’re compiling with -Ofast and ideally LTO, but that might be broken.

Riewert · March 21, 2021, 7:49pm

Does the error persist when you turn off debugging?

Jorijs · March 22, 2021, 8:13am

Thanks for the suggestions, I’ll see if I’m able to test it today using an optimized build.
I’m not sure how much processing time we’ve added. I think the 2 additional encoders and some safety checks are most likely to have increased it.

Jorijs · March 22, 2021, 1:09pm

Using -Ofast and LTO has indeed solved the issue ! Thanks for the help.

Jorijs · April 8, 2021, 9:13am

I’m currently looking into proper error propagation for our new structure. I noticed that for the axis do_checks is used for error checking errors in sub-components. However, the encoders weren’t checked in the first place. Were their errors propagated to the axis differently or is it still something under construction on the dev branch?

Samuel · April 9, 2021, 9:20am

Encoder errors are propagated implicitly by their output being unavailable (std::nullopt). That means the controller will fail if it gets the position/velocity estimate from a failed encoder. But for instance in sensorless mode the motor does not fail if an encoder fails.

Jorijs · April 9, 2021, 10:14am

That also explains some problems we had with decoupling the encoders from the axis.

Jorijs · April 14, 2021, 8:44am

As we got a request by someone to possible already use our current firmware, I’ve created a small guide at a new thread: Guide to using multiple encoders per axis (early-access)

Jorijs · May 3, 2021, 9:45am

I’ve encountered an interesting problem with our new firmware. We’ve got an USB interface for the native protocol in C++, which functioned properly with the v0.5.1 firmware. With the new firmware we’re getting errors while importing the JSON from the ODrive. I first thought that it had to do with a wrong CRC or something similar but that doesn’t seem to be the case as the communiciation fails halfway through the transaction.
I ran the ODrive in debug mode using an STLink and found out that it stops transmitting packets as it gets stuck in a function. It stays in the xTaskResumeAll() function within the task.c file. I’ve haven’t been able to find out what exactly causes this issue yet.
The interesting thing is that the odrivetool is still able to properly communicate with the ODrive. Soon we’re also gonna test whether a python USB interface still functions as well (which is likely as the odrivetool still functions).

If there is anything that comes to mind what could cause this, please let me know. I thought that in terms of communication, only the CRC needs to be changed when the size of the JSON changes but there might be more going on.

Jorijs · May 7, 2021, 7:53am

Hey everyone (and especially @Samuel ), I’ve sadly got some bad news.
Yesterday we discovered that after about 1 minute of actuating 2 motors simultaneously, the firmware crashes(/is stuck in a loop). This is a rather pesky problem in terms of debugging, as we can’t use the STLink to debug in this case. As you might remember previously we had an issue with control deadlines missed which was resolved by not building in debug mode and by using LTO. So we can’t actuate both motors and debug at the same time :(.
As our major adjustments were only to the structure of where encoders are located and how to change between them, we’ve changed relatively little to the actuation process.
I initially thought that it might have something to do with thread safety, as we’ve gone from axes with their private encoders to accessing an encoder through a shared resources: the encoder_manager. Though this shouldn’t be the case as besides when an encoder is being changed, the encoder_manager simply acts as an interface with getters and setters.
It shouldn’t be a problem of processing power either as we’d then expect similar errors to the control_deadline_missed like before. We’ve even tried it with only using 2 encoders again with the only difference then being the firmware’s structure.
I should also add that the issue does not occur when actuating a single motor. So it’s an issue to specifically the use of 2 motors simultaneously.

Any inkling of what could be going wrong would be highly appreciated.