"unknown command" in native UART mode

Hello,

I run into an issue when trying to control an ODrive using an ESP32 with the native protocol over UART. I am using the official 0.5.1 firmware. About one in every 30 commands sent, it randomly returns the text “unknown command”, which should only appear in ASCII mode. I have tried recompiling the firmware with UART_PROTOCOL=native, but this makes no difference. First off, I would assume ODrive should not return text when issued native commands. Secondly, I have the feeling this is because the UART data is not received properly by the ODrive, i.e. the data is not recognized as a valid native frame/packet and is then forwarded to the ASCII protocol handler? If that is the case, why is the UART communication so bad? I have tried various baud rates and even different ESP32 boards. I have tried lowering the baud rate to 9600, which according to the documentation should have a 0% timing error on the ODrive side, but the result is even worse than with the default 115200 baud. I have tried resending the last command when the ODrive responds with an invalid response, which works for a bit, but after a while it just keeps returning “unknown command” one command after the other, until the maximum number of retries is reached. What could it be? Thanks in advance.

It does sound a little like a comunications error like you say…

what software are you using on the ESP32?

Can you sniff the serial line to see what data is actually being sent?

Can you provide a link?

-John

After some more investigation, the pattern doesn’t seem to be random. I have created a small program that writes a value to test_property. After this, it will read the value back every ~10ms. It will increment the native protocol “sequence number” after every succesful command. If a packet/frame fails with an “unknown command” reponse, it will resend the packet/frame using the same sequence number up to 4 times or give up. I am using the same sequence numbers that are used by odrivetool over USB, which appear to be values from 0x80 up to 0xff, for some reason starting at 0x81.

From the results, I can see that packets with certain sequence numbers always fail, the first time they are sent. Resending the failed packet seems to work on the second try. However, after a while another sequence number starts failing and keeps failing until the program gives up. This is completely reproducible across runs.

This made me think that maybe my implementation of the CRC16 algorithm was wrong. I checked the calculated checksums against the ones generated by the website linked in the ODrive documentation (Sunshine's Homepage - Online CRC Calculator Javascript) and they match up. Also the packets succeed on the second try, containing exactly the same bytes. Another the thing I noticed is that sending a packet with an incorrect checksum does not get any response (which matches the documentation), as opposed to getting the “unknown command” response.

The feeling I get that this is a bug in the firmware is getting stronger. Possibly something to do with the fact that native and ASCII protocols are now always active at the same time.

I am using ESP-IDF, the official development framework from Espressif.

Here is the error log. The “unknown sync byte” 117 is actually the ASCII representation of a “u”, the first letter of the “unknown command” response.

W (591) odrive: invalid sync byte: 117
W (591) odrive: frame 0x0095 (149) failed
W (941) odrive: invalid sync byte: 117
W (941) odrive: frame 0x00a2 (162) failed
W (2511) odrive: invalid sync byte: 117
W (2511) odrive: frame 0x0095 (149) failed
W (2871) odrive: invalid sync byte: 117
W (2871) odrive: frame 0x00a2 (162) failed
W (4451) odrive: invalid sync byte: 117
W (4451) odrive: frame 0x0095 (149) failed
W (4811) odrive: invalid sync byte: 117
W (4811) odrive: frame 0x00a2 (162) failed
W (6391) odrive: invalid sync byte: 117
W (6391) odrive: frame 0x0095 (149) failed
W (6751) odrive: invalid sync byte: 117
W (6751) odrive: frame 0x00a2 (162) failed
W (8331) odrive: invalid sync byte: 117
W (8331) odrive: frame 0x0095 (149) failed
W (8691) odrive: invalid sync byte: 117
W (8691) odrive: frame 0x00a2 (162) failed
W (10271) odrive: invalid sync byte: 117
W (10271) odrive: frame 0x0095 (149) failed
W (10631) odrive: invalid sync byte: 117
W (10631) odrive: frame 0x00a2 (162) failed
W (12231) odrive: invalid sync byte: 117
W (12231) odrive: frame 0x0095 (149) failed
W (12591) odrive: invalid sync byte: 117
W (12591) odrive: frame 0x00a2 (162) failed
W (14191) odrive: invalid sync byte: 117
W (14191) odrive: frame 0x0095 (149) failed
W (14551) odrive: invalid sync byte: 117
W (14551) odrive: frame 0x00a2 (162) failed
W (16151) odrive: invalid sync byte: 117
W (16151) odrive: frame 0x0095 (149) failed
W (16511) odrive: invalid sync byte: 117
W (16511) odrive: frame 0x00a2 (162) failed
W (18111) odrive: invalid sync byte: 117
W (18111) odrive: frame 0x0095 (149) failed
W (18471) odrive: invalid sync byte: 117
W (18471) odrive: frame 0x00a2 (162) failed
W (20071) odrive: invalid sync byte: 117
W (20071) odrive: frame 0x0095 (149) failed
W (20431) odrive: invalid sync byte: 117
W (20431) odrive: frame 0x00a2 (162) failed
W (22031) odrive: invalid sync byte: 117
W (22031) odrive: frame 0x0095 (149) failed
W (22391) odrive: invalid sync byte: 117
W (22391) odrive: frame 0x00a2 (162) failed
W (23991) odrive: invalid sync byte: 117
W (23991) odrive: frame 0x0095 (149) failed
W (24351) odrive: invalid sync byte: 117
W (24351) odrive: frame 0x00a2 (162) failed
W (25951) odrive: invalid sync byte: 117
W (25951) odrive: frame 0x0095 (149) failed
W (26311) odrive: invalid sync byte: 117
W (26311) odrive: frame 0x00a2 (162) failed
W (27911) odrive: invalid sync byte: 117
W (27911) odrive: frame 0x0095 (149) failed
W (28271) odrive: invalid sync byte: 117
W (28271) odrive: frame 0x00a2 (162) failed
W (29871) odrive: invalid sync byte: 117
W (29871) odrive: frame 0x0095 (149) failed
W (30231) odrive: invalid sync byte: 117
W (30231) odrive: frame 0x00a2 (162) failed
W (31831) odrive: invalid sync byte: 117
W (31831) odrive: frame 0x0095 (149) failed
W (32191) odrive: invalid sync byte: 117
W (32191) odrive: frame 0x00a2 (162) failed
W (33791) odrive: invalid sync byte: 117
W (33791) odrive: frame 0x0095 (149) failed
W (34151) odrive: invalid sync byte: 117
W (34151) odrive: frame 0x00a2 (162) failed
W (35741) odrive: invalid sync byte: 117
W (35741) odrive: frame 0x0094 (148) failed
W (35951) odrive: invalid sync byte: 117
W (35951) odrive: frame 0x0095 (149) failed
W (36301) odrive: invalid sync byte: 117
W (36301) odrive: frame 0x00a2 (162) failed
W (37891) odrive: invalid sync byte: 117
W (37891) odrive: frame 0x0094 (148) failed
W (38091) odrive: invalid sync byte: 117
W (38091) odrive: frame 0x0094 (148) failed
W (38291) odrive: invalid sync byte: 117
W (38291) odrive: frame 0x0094 (148) failed
W (38491) odrive: invalid sync byte: 117
W (38491) odrive: frame 0x0094 (148) failed
W (38691) odrive: invalid sync byte: 117
W (38691) odrive: frame 0x0094 (148) failed

And here is a full log/packet dump: Full packet dump/log · GitHub

After even more investigation, the ODrive seems to send both the native protocol response and the ASCII protocol response (in the cases seen above). First it sends the ASCII protocol response, immediately followed by the correct native response. This can’t be intended behaviour, right?

OK, I see a few oddities going on in you log file…

I suspect they are problems, or undocumented features…

Starting with line 41 of your dump we see these.

aa083d 950016820400409b 8700
W (591) odrive: invalid sync byte: 117
W (591) odrive: frame 0x0095 (149) failed

I looked at a couple of these error in your dump, and the frame number listed is the same as the first byte in you command packet.

I don’t know what kind of decoder they are using for the Native Protcol, but I suspect that it got out of sync, and tried to read the start of the data packet as the start character.

I don’t see anything else that looks like a problem.

How fast are you sending requests? Are you rate limiting them, or just blasting them as fast as you can?
What baud rate are you using?

-John

Do you have a log on the responses where it is sending both?
I don’t see anything like that in the logs you sent earlier…

-John

Thanks for your responses. I think I have figured it out.

Yeah sorry, the log doesn’t include the full responses. I thought I could safely discard any response that didn’t start with the sync byte 0xAA, which is why any invalid data beyond the first byte doesn’t show up in the log.

It turns out the invalid responses start with “unknown command\r\n” or “invalid command format\r\n” followed immediately by 0xAA and the actual correct native response. So the ODrive sends both the correct response and an error response from the ASCII protocol handler. For some reason the ASCII protocol handler on the ODrive thinks some native commands are actually ASCII commands and responds to those.

I now work around this by scanning the response data for an 0xAA sync byte and discarding any data before it. Everything seems to work fine now! I have also opened an issue on GitHub here: Firmware v0.5.1 sends both native reponse and ASCII protocol error in response to native command over UART · Issue #573 · odriverobotics/ODrive · GitHub. It’s still a little bit scary that my native command could be incorrectly interpreted as an ASCII command and that this garbage could then get executed on the ODrive.

Altough this is probably no longer relevant, in the test case I sent commands with 10ms in between at 115200 baud.

Apparently this bug has been around for at least almost two years. I guess nobody uses the native protocol. The solution proposed here is better than mine: Native Mode over UART (GPIO 1 and 2).

I get the feeling the Native Protocol is only used by a few of us diehards that need a more sophisticated interface then the ASCII interface provides.

I am working on my own library to use the Native Interface in C. Once I have it starting to work, I will publish it for others to look at and test. Maybe it can become a basis for people not using python or just playing with the Aduino with the ascii interface.

-John

I replied to the issue on Github, in short: there will be a config option to select one of the two protocols.

But the native protocol (aka Fibre) on UART is indeed somewhat niche and not very well tested but it’s essentially the same as the one on USB plus a thin encapsulation (sync byte and CRC).

@jscott note that an official host side C++ implementation with C API exists nowadays under the name libfibre (or in its standalone repository here). The upcoming odrivetool release uses this as its backend. However the API is still unstable (in fact it’s already slightly ahead in the standalone repo) and a bit lower level than what you might want (it doesn’t have a concept of an ODrive but rather provides a way to do remote coroutine calls on objects that may or may not be an ODrive. But you could use code generation to make some safely typed ODrive proxy classes).

@Samuel I have looked at Fiber. I will take a pass.

The Library I am writing will not be the kitchen sink of odrive libs, but it will be fast, easy to implement, and will need to be updated every time the API Endpoints change.

The fiber library is the kitchen sink, but has a large developmental overhead, and schedule risk to use.

I have probably spent 20 hours trying to understand the fiber and how it us used in odrive.

I have about 3 hours in my lib. It now works, I just have to implement the endpoint processing functions I want to use. I currently have Serial Number, VBus, and IBus working.

I will probably release an initial version in the next week or three. I see it as an substantial upgrade for people using the ASCII interface.

Thanks for your work on the odrive project.
-John