Decreasing latency

@madcowswe thanks for the reply. I’ve also tried creating new endpoints for dual motor control etc., to have more than one commands sent in one go. I’ve eventually found that the most limiting factor seems to be packet size, which seems to top off at 64bit of payload in my experiments, i.e. transferring anything more than two int32s seems to trigger two USB transactions and this seems to take almost double the time.

I am not well versed in USB to understand all the details but it definitely seems to me it would make sense to have the maximum packet size increased, if possible.

Looking forward to your thoughts when you have the time.

If I remember correctly the packages can be pretty big. The longest package I recorded coming from the ODrive is when it sends configuration data:

5b7b226e616d65223a22222c226964223a302c2274797065223a226a736f = 60 bytes + 4 sequence number bytes.

Since an int32 is 4 bytes. You should be able to send up to 15 values in a package (assuming you don’t send a CRC or the endpoint address).

Skim through my post about working with the packages in java - Odrive and Java

@Riewert interesting. Have you done any timing measurements sending/receiving packets?

About 200us - 400us per package one-way, but I had quite a few incidental delays that bumped the average for a round trip up to 1.5ms ± 0.5ms. Probably due to thread priority or related issues on my PC, the ODrive was the only connected USB device aside from my mouse.

Yes the underlying USB Full Speed is using 64 Byte packets. The current implementation sends each transfer in its own packet, an obvious optimisation would be to bundle multiple requests/transfers.

Hey I’ve been working on Doggo, but unfortunately the ODrive was communicating too slowly for us as well. We were unable to close the loop and do torque control. USB was too slow for us since we would only be able to communicate at ~250Hz. Previously for each loop, we had to give a current command for each axis and request encoder positions and velocities for each axis (and we have 4 ODrives onboard). After digging into the firmware a little bit, we added an endpoint function that would take in both current commands packaged into a struct. This function would return all four encoder positions/velocities as struct. This allowed us to use more of the large 64 byte packet. The other thing we did is thread all the USB requests. We were unsure how much this would help, but with the threading and packaging of the commands we saw a huge speed up with communication rates >1500Hz. This resulted in a visible improvement in our closed loop torque control from an external board. If you would like to see our fork of ODrive it is here. I still haven’t tested it with the python communication library yet since we have been using a C++ library to communicate with the ODrives.

3 Likes

That is really nice. I was looking into this exact same issue/feature! Will use your code, very useful, thanks for sharing.

If you get time, I’d love if you could add the USB handling code in a pull request back to the main repository.

Can you explain this a little clearer? Also looking at your code, did you succeed in reducing the amount of messages necessary per function call from 6 total to 2? I did something similar, putting everything into a single function call, but it is still pretty slow, so I’m guessing a lot of the increase is from your C++ library? - Odrive and Java

This looks great, and is a good inspiration for others who need to do the same.

One day we will hopefully have a transfer-coalescing feature, and a subscription feature, which will hopefully achieve the same thing but in a more dynamic way. Also of course a concurrent communication feature to allow parallel talking to multiple ODrives to reduce overall latency.
One day we’ll get there :wink:

Yes, we needed to have 6 messages before, but 4 of them required responses from the ODrive which takes a long time. With the way that the function call works, there are actually 3 required messages but 2 of them can be fire and forget which definitely helps. This means that the function call only requires one response from the ODrive. You are correct though that the bulk of the speedup comes from the threading of the requests. I was planning on submitting a pull request and getting the new endpoints mainlined into the ODrive repo.

The C++ library was developed by another member of our team but it’s still kinda messy and a work in progress. It uses libusb and it works well enough for our purposes so far but definitely can be improved. All the threading was on the computer side with this C++ library and the testing repo/C++ library can be found here. I would like to implement a thread pool so threads are not created and destroyed every loop, but that’s something I’ll do in the near future.

I don’t mean to revive an old thread if there is a newer one out there (I couldn’t find one), but has anyone done any latency testing for the other protocols? CAN, I2C, SPI, and UART

On a similar note, is there any documentation on max bus speeds, or should I just follow the STM32F4 reference manual?

2 Likes