Communication Watchdog

Paul_Belanger · January 30, 2019, 3:35pm

There are many applications where the ODrive is used where the various
communications channels (USB, UART, etc.) are used to dynamically control the
controller settings, such as setting the desired velocity of the motor based on
some sort of external control.

This presents an issue in situations where the communication channel between the
ODrive and the external controller becomes interrupted either due to a crash on
the external controller or a disconnection of the communication wiring.
Currently, in such a case, the ODrive continues on its last configured
instruction. Depending on what’s connected to it, this can cause a host of
issues such as a vehicle running at full speed into a wall or person.

One way to help with this issue is to implement some sort of communication
watchdog for the ODrive motor control classes. This watchdog system would
prevent the Controller class from driving the motor MOSFETs if a property has
not been changed for longer than a specified interval. Setting a property on the
Controller class would act as a reset to the watchdog and allow the Controller
class to again control the motors.

The way I see this working is as follows:

A new property watchdog_timeout is added to each axis. This property
specifies the watchdog timeout, in seconds, for this axis. If the watchdog
has not been fed for a time exceeding this timeout, then the motor is
disarmed and not driven. The value of this property would be saved in
non-volatile memory. Setting a value of 0 for watchdog_timeout will
disable this watchdog.
A new property watchdog_feed is added to each axis. Writing any value to
this property will reset the watchdog interval and allow the motor to be
driven.
External control loops will write the watchdog_feed property after setting
updating any setpoints or other parameters of the axis. If the external
controller crashes or otherwise ceases communication, then the watchdog_feed
property will not be written and the axis will eventually transition into
safe mode until the watchdog is fed again.

I’d appreciate any thoughts on this subject. Would there be a better way to add
fail-safe functionality to the communication channels? Is this something that could be
considered for inclusion into the default firmware for the ODrive?

Cheers,

Paul Belanger

Wetmelon · January 31, 2019, 7:58pm

I like it in general - implementaton specifics need to be worked out. I think it’s best to put a single communications watchdog somewhere for that comms thread. And then any axis -> comms thread interface would have a separate watchdog

External < === > Comms thread < === > Axis 0/1

mikh3x4 · February 4, 2019, 7:14pm

I would like to just like to add support to this idea. We are using odrives on quite a big robot and already needed to use a hard estop when the odrives lost communication.

madcowswe · February 9, 2019, 8:02am

@Paul_Belanger Your proposal as written sounds perfect. If you are up for writing this yourself, then I’d suggest to decrement the timeout variable right here, and add a variable like communication_ok.

We could add in calls to watchdog_feed() in the p, v, c, t commands on the ascii protocol while we are at it, since calling those would constitute runtime type communication.

Let me know if you are not comfortable to dive into the firmware code, in which case I can add this when I find some time.

Paul_Belanger · March 6, 2019, 7:59pm

I’ve created an initial implementation of the watchdog, and have submitted a pull request: https://github.com/madcowswe/ODrive/pull/316

I’ve added a watchdog_timeout property to the Axis configuration, as well as a watchdog_feed() method to the axis api.

For the ascii protocol, I’ve added watchdog feeding to all setpoint commands, and added an additional u command to update the watchdog of a motor axis.

I haven’t had the hardware availability to actually test this patch yet, but I should be able to get my hands on some hardware by next week or so.

marc.s.cohn · December 2, 2019, 3:44pm

Has this been implemented yet? Had my project try to run away from me yesterday due to a broken wire between arduino and odrive. Failsafe needed.

towen · December 2, 2019, 4:46pm

I have had many a project run away from me. On the time axis though, not spatial coordinates

Yes I believe this is implemented in master now. Update your firmware.
I think you can simply run odrvtool dfu to automatically update to the latest release.

You should see a config option called axisN.config.watchdog_timeout - a non-zero value means that you have to continually call axisN.watchdog_feed() faster than this value or else it will go to idle state.

marc.s.cohn · December 3, 2019, 2:59am

thank you,

I assume milliseconds for the timeout value?

Paul_Belanger · December 3, 2019, 12:02pm

From the docs:

Each axis has a configurable watchdog timer that can stop the motors if the control connection to the ODrive is interrupted.

Each axis has a configurable watchdog timeout: axis.config.watchdog_timeout , measured in seconds. A value of 0 disables the watchdog functionality. Any value > 0 will stop the motors if the watchdog has not been fed in the configured time interval.

The watchdog is fed using the axis.watchdog_feed() method of each axis.

marc.s.cohn · December 5, 2019, 12:09am

Thanks for your help. Works for me.

daisukes · March 18, 2020, 9:26pm

I was trying to enable watchdog_timer from ascii interface but it looks like the time was not triggered by a command like “w axis0.config.watchdog_timer 1.0”.

Should I use native protocol to trigger the timer?

Paul_Belanger · March 18, 2020, 9:47pm

You mean the watchdog wasn’t fed?

That’s by design. The watchdog will only be updated when you explictly call axis.watchdog_feed() or use a function that modifies the controller setpoint. For the ASCII protocol that would be the t, q, p, v, c, and u commands. Changing parameters with the w command does not update the watchdog.

I also noticed you mis-spelled the parameter name in your comment(watchdog_timer instead of watchdog_timeout).

daisukes · March 19, 2020, 12:52am

No, watchdog was fed, but writing watchdog_timeout property does not start the timer.

w axis0.config.watchdog_timeout 1.0
(as you mentioned, it was misspelled)

I have been reading the code and I found that ascii protocol does not trigger “update_watchdog_settings” which actually set the counter.

Ascii protocol calls set_string function of ProtocolProperty here, but it won’t trigger property handler which eventually calls update_watchdog_settings.

I think it may need to call written_hook_ like the handle function.

jorgemia · September 9, 2020, 2:14pm

I’m using two odrives for my robot and I’m trying to implement the communication watchdog.

The problem I’m encountering is that my main script (ros launch file) has to load to start sending the watchdog_feed command when I turn on the robot. The problem is that by the time the script has loaded the odrives have already timed out as they’ve been powered on for a couple of seconds and the odrive exits closed loop control and coasts.

Is there any way to get the odrive to start the watchdog only when a first connection has been established/after it has received its first message? Or what do you recommend I do? Should the watchdog feed be placed somewhere else?

Paul_Belanger · September 9, 2020, 6:13pm

When the watchdog fails it sets the AXIS_ERROR_WATCHDOG_TIMER_EXPIRED (0x800) bit on the axis error state (axisN.error). Once you’ve finished your initialization code and are calling watchdog_feed at a suitable rate, you can try clearing the axis error state by setting the error to 0.

jorgemia · September 10, 2020, 10:18am

Thanks that worked! I was only calling watchdog_feed after setting the errors to 0 so i believe it was timing out again. I started calling the watchdog and then set the errors to 0 and it worked.

Wetmelon · September 10, 2020, 1:47pm

Ah, I bet that the watchdog isn’t looking for a rising edge, it’s just looking to see if the timer has expired. That means it’s re-applying the errors at 8kHz So yeah, send the watchdog first

Paul_Belanger · September 11, 2020, 12:32am

Yeah, it’s set up that way to avoid the trap of someone clearing the error, but forgetting to call watchdog_feed()

furushchev · July 12, 2021, 4:11am

Hi,

I’m planning to change communication between computer and ODrive controllers from USB to CAN for isolating electrical lines.
Does the watchdog timer work on CAN communication?

Thanks!

Wetmelon · July 17, 2021, 2:28am

Yes. The CAN protocol kicks the watchdog any time a valid message arrives