No response from CAN after variable amount of time

Has anyone experienced behavior where there Odrive will stop responding to CAN messages after a few minutes. If I power cycle only the Odrive, the CAN works again. During a O-drive power cycle, the Teensy and Jetson still have power on their CAN transceivers and keep running as normal. After power cycling, the Odrive will again run for a few minutes before ceasing to respond (around 10000-20000 heartbeat packets). Opening o-drive tool during such an event shows no errors in dump_errors and all Odrive functions still work normally (holding motor position in this case).

The CAN is set to 250000 I am using the example node_id’s an requesting the CANSimple messages with RTR bits set. All messages are being requested at 10Hz. The CAN-H and CAN-L lines are around 30cm long.

Currently running the latest stable firmware released in September 2020.

I have the same issue with the same symptoms. Any guidance would be greatly appreciated.

Hmm, curious. I can test it here.

Thanks @Wetmelon we have subsequently tried to reduce the frequency that we request messages. After 2 test runs I saw the same failure mode after about 2 hours on one of the tests.

Does the heartbeat stop or just the requested messages, or both? Can you send me a CAN trace of some sort?

We are only requesting data from axis0, during normal operation we see both the requested messages and heartbeat messages from axis0 and axis1. On failure, no messages are received on the CAN bus. I usually monitor using a cansniffer tool on linux, will post a log of the last few messages. All messages look normal with no errors reported. If I power cycle just the O-drive (the CAN tranceivers are kept powered), the messages return as normal. The O-Drive has a ground wire that runs along the CAN lines to keep everything (Jetson and Teensy) on a common reference. I have counted the number of heartbeat messages until failure, but the number is variable, so its not as deterministic as I hoped.

I have the same issue.
When we use odrive can, suddenly the heartbeat signal does not response.

In our system, we communicate with CAN between odrive and STM board.
The abstract symptom is below.

  • Suddenly, no heartbeat signal received in STM board.
  • STM board show no CAN error.
  • We connect odrive with odrivetool, but there are no errors.
    • no dump_errors(odrv0), no odrv0.can.error

Could anyone give some tips for debugging this issues?

Can you see any traffic on the bus at all, with an oscilloscope or similar? Also, does the odrivetool still work?

Hi, Wetmelon.

Currently, we didn’t check it via oscilloscope. But, if the symptom generate again, we will check and share it.
However, how could we check the traffic on bus?

Yes, odrivetool still working. Because, configuration value changed well.
Instruction odrv0.axis0.controller.input_vel = 1 also operate the motor.
And there are no error in dump_error(odrv0).
For debug the odrive firmware, we set up the two LED blinking at analog_polling_thread and can_server_thread.
But, the two LED blinking stopped, when the symptom is generated.

I am really concerned by this issue as I am planning to use a o drive through can for a multi motor project. If the leds are not blinking anymore there is a great probability that there is a firmware issue causing the code to stop running for some reasons. might be a counter overflow. If so you should also lost the physical signal of CAN. As said before it s interesting to check with a scope. If the issue does not appears when CAN is not enabled, that clearly point the firmware issue on code related to CAN. do you have the same issue over time when using o drive without enabling CAN protocol ? If not, you can try to blink a led along the CAN code to try to find where the problem comes from

I suspect it’s an issue with the error handling, but nobody’s been able to pin it down.

@Hoyong_Lee , could you try to light a led inside the error_handler ? if the error appears, the led should light up. that 's a first step of debugging

Okay, we will try to light a LED inside the error_handler.
After checking the symptom I’ll inform the results.

1 Like