Decreasing latency

yconst · November 28, 2018, 6:02am

Thanks for the information @Samuel . After your suggestion (and having just received my ST-Link programmer ) I reflashed my ODrive with CONFIG_USB_PROTOCOL=native-stream , but after that I am unable to access it from odrivetool:

ODrive control utility v0.4.6
Waiting for ODrive...
USB discover loop
ConfigurationValue 1
	InterfaceNumber 0,0
		EndpointAddress 130
	InterfaceNumber 1,0
		EndpointAddress 1
		EndpointAddress 129
	InterfaceNumber 2,0
		EndpointAddress 3
		EndpointAddress 131

Kernel Driver was not attached
EndpointAddress for writing 3
EndpointAddress for reading 131
Connecting to device on USB device bus 20 device 5
Please connect your ODrive.
You can also type help() or quit().

        no response - probably incompatible
        USB discover loop
        USB discover loop
        USB discover loop

Any ideas? I’m on MacOS by the way.

Samuel · November 28, 2018, 9:05am

Ah yeah now odrivetool needs to treat the device as a serial port, otherwise it will still try to use the packet based protocol. Try to run odrivetool --path serial:[/dev/ttyWhatever], where /dev/ttyWhatever is the serial port device that corresponds to the ODrive. See odrivetool --help for more info:

                    To select a specific serial port:
                      --path serial:PATH
                    where PATH is the path of the serial port. For example "/dev/ttyUSB0".
                    You can use `ls /dev/tty*` to find the correct port.
                    
                    You can combine USB and serial specs by separating them with a comma (no space!)
                    Example:
                      --path usb,serial:/dev/ttyUSB0
                    means "discover any USB device or a serial device on /dev/ttyUSB0"

However take this with a grain of salt because I haven’t tried this in a while.

Otherwise you can also hack usbbulk_transport.py to treat USB as a stream-based transport, by adding lines analogous to here

yconst · November 28, 2018, 8:53pm

Thanks @Samuel for the detailed reply, but unfortunately that didn’t make a difference. I still cannot connect to the ODrive using native-serial.

I’ve also tested the USB interface response time on a Raspberry Pi 3, turns out it’s around five times as much as those on a Macbook Pro:

2018-11-28 20:40:21,208 Testing time_read_current_limit
2018-11-28 20:40:21,545 0.0016756459449999283
2018-11-28 20:40:21,545 Testing time_read_position
2018-11-28 20:40:21,881 0.0016733162449997963
2018-11-28 20:40:21,881 Testing time_read_state
2018-11-28 20:40:22,217 0.0016740571549999572
2018-11-28 20:40:22,218 Testing time_read_voltage
2018-11-28 20:40:22,554 0.0016733441499999912

This is quite disappointing as with a 1.6ms latency even with multiple reads I’m way out of my application limits (I’d like a ~500Hz update rate, and I’m reading a couple other sensors and making some computations as well).

I ordered a CAN Bus transceiver and I’m gonna try CAN-Bus next

tlbtlbtlb · November 30, 2018, 1:22am

I’m also seeing high latency for read/write ops. It seems to average about 1.15 mS for read, 1.21 mS for write (see Python cProfile lines below, there were 3033 reads and 1014 writes in a roughly 5-second run.)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3033    0.059    0.000    3.450    0.001 remote_object.py:71(get_value)
     1014    0.018    0.000    1.237    0.001 remote_object.py:75(set_value)

For every loop, we write 2 values (axis{0,1}.vel_setpoint) and read 6 values (axis{0,1}.{Iq_setpoint, pos_estimate, pll_vel}). That limits our maximum update rate to about 100 Hz, which makes it hard to have any kind of tight feedback loop in software.

A “set all” and “read all” command might be a better way to do the API.

This is on an nVidia TX2 board, with 6 64-bit ARM cores.

tlbtlbtlb · November 30, 2018, 9:14pm

I upgraded the odrive firmware from 0.4.2 to 0.4.6, and the read/write times got worse by about 1.5x. Now 1.77 mS for read, 1.93 mS for write.

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1941    0.059    0.000    3.439    0.002 remote_object.py:71(get_value)
     650    0.018    0.000    1.256    0.002 remote_object.py:75(set_value)

madcowswe · December 2, 2018, 3:41am

So we recently discussed this issue on the ODrive Discord. Here is sort of a summary:

Full Speed USB has a frame period of 1ms. Interrupt transfers and Isochronous transfers are initiated once per frame, with guaranteed latency. However, we use bulk transfers, which use up the “unallocated” bandwidth when the bus is otherwise idle. So the transfers may be initiated at any time: but there are no guarantees.
We found (at least on the machines we tested) that the bulk transfer had the lowest latency, because of this “unscheduled” attribute. Specifically: “Data are transferred in the same manner as in Interrupt Transfers, but have no defined polling rate. Bulk Transfers take up all the bandwidth that is available after the other transfers have finished. If the bus is very busy, then a Bulk Transfer may be delayed.”

So if there is no other traffic on the bus, we get better than 1ms latency, but if there is other heavy traffic on the bus, or if the host doesn’t really bother being timely about the bulk transfers, then there will be more latency
I am pretty sure that the latency on the ODrive side is extremely short, on the order of 10-100 microseconds.

On my Windows 10 machine it benchmarked a 1-out 1-response (2 packet) sustained full cycle time of 250 microseconds. So around 4000 full request-responses per second.

So if you get much worse latency than this, I would check:

Is there other traffic on the bus?
Does your platform not bother to be realtime about bulk transfers?

In the future we may upgrade to USB High-Speed (USB 2.0). This has 125 microsecond micro-frames, which lets us guarantee 125 microsecond latency Interrupt transfers.

Wetmelon · December 2, 2018, 5:59am

I went ahead and benchmarked this using a timing script that was provided by @moorage. Results as follows:

That’s ~ 0.3ms consistent performance. Windows 10 desktop with an i5

moorage · December 2, 2018, 6:36pm

@Wetmelon glad to see you got good results. We’re running on Ubuntu 16.04 on a NVIDIA TX2, and the oDrive is the only thing hooked up to USB. I wonder if it’s an ubuntu thing or an ubuntu+arm64 thing…

Code provided here:

gist.github.com

https://gist.github.com/moorage/8d2caffe87f9300822ea1aad8c00c325

odrive_time_trials.py

import time
import math
import odrive
import odrive.enums

ODRIVE0 = None
ODRIVE0_HULL_SENSOR_COUNTS_PER_POLE_PAIR = 6
ODRIVE0_MOTOR_LEFT_COUNTS_PER_REV = None # Read from config on odrive
ODRIVE0_MOTOR_RIGHT_COUNTS_PER_REV = None # Read from config on odrive
ODRIVE0_MOTOR_LEFT_KV = 15.98

This file has been truncated. show original

Samuel · December 5, 2018, 12:05pm

I would say trying to reduce the USB packet latency is a workaround at best.
The root problem here is that for a single variable write we do a complete round-trip on USB.

This can be fixed by implementing the following two things (in combination):

make remote_endpoint_operation asynchronous or “fire-and-forget”
transmit multiple commands in the same USB frame. As a temporary solution this can be achieved connecting to the ODrive via the COM-port interface.

yconst · December 13, 2018, 7:39pm

So I went ahead and implemented a simple futures mechanism using some of the classes included in fibre. There were two aims at this point:

Make property writes fire-and-forget. In my case this almost halved the time spent in a property write (from 390μs to ~180μs), as would be expected.
Make property reads optionally return a future instead of waiting for the actual value to be retrieved from the Odrive. To measure whether this had any effect in latency I stacked a bunch of read commands and stored their returned futures in an array, and then went on and read them one by one. The resulting latency was slightly increased from 390μs to 420μs (probably due to the futures mechanism overhead).

(partial) test code:

import sys
import inspect
from timeit import Timer

import odrive
import unittest
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')

odrv = None
REPEAT = 200
SAMPLES = 64


def time_read_current_limit(samples=SAMPLES):
    futures = []
    for _ in range(samples):
        futures.append(odrv.axis0.motor.config.current_lim_future) # just some property
    for f in futures:
        f.get_value()


class TestTiming(unittest.TestCase):

    def setUp(self):
        global odrv
        odrv = odrive.find_any()

    def runTest(self):
        fset = [f for name, f in inspect.getmembers(
            sys.modules[__name__]) if inspect.isfunction(f)]
        for f in fset:
            fn = f.__name__
            if fn.startswith("time"):
                logging.info("Testing "+fn)
                logging.info(Timer(lambda: f()
                                ).timeit(number=REPEAT)/float(REPEAT)/float(SAMPLES))

As seen above the futures functionality is currently implemented as property aliases; to get a property to return a future, you append the string “_future” to its name, e.g. odrv0.vbus_voltage_future will give you a Future instance which you can query for the value using get_value(). I’m not sure if this is the ideal way to do this, but that’s what I ended up with.

If anyone wishes to take a look, the code is here: https://github.com/yconst/ODrive/tree/futures

What’s missing here is obviously a mechanism to defer sending packets and stack them up altogether so that they can be sent in one go over USB. Perhaps that should give a measurable drop in latency.

The tests mentioned above were performed on a Macbook Pro with the odrive connected directly to the computer’s USB port. Tests with the Raspberry Pi pending.

I would love to hear your ideas.

madcowswe · January 21, 2019, 7:36am

I really like the concept! Now code that doesn’t need the response right away can be way more efficient. Also I see how it can be a very neat way to interface to an underlying batching mechanism that can serve to use the USB bandwidth better by using larger packets, when/if we were to add that.

I don’t think I have the bandwidth (pun intended) to have a look at what you have made at the moment, but I’ve added it to the roadmap to take a look at later.

Thanks for your work!

yconst · January 23, 2019, 8:59am

@madcowswe thanks for the reply. I’ve also tried creating new endpoints for dual motor control etc., to have more than one commands sent in one go. I’ve eventually found that the most limiting factor seems to be packet size, which seems to top off at 64bit of payload in my experiments, i.e. transferring anything more than two int32s seems to trigger two USB transactions and this seems to take almost double the time.

I am not well versed in USB to understand all the details but it definitely seems to me it would make sense to have the maximum packet size increased, if possible.

Looking forward to your thoughts when you have the time.

Riewert · January 23, 2019, 10:15am

If I remember correctly the packages can be pretty big. The longest package I recorded coming from the ODrive is when it sends configuration data:

5b7b226e616d65223a22222c226964223a302c2274797065223a226a736f = 60 bytes + 4 sequence number bytes.

Since an int32 is 4 bytes. You should be able to send up to 15 values in a package (assuming you don’t send a CRC or the endpoint address).

Skim through my post about working with the packages in java - Odrive and Java

yconst · January 23, 2019, 12:55pm

@Riewert interesting. Have you done any timing measurements sending/receiving packets?

Riewert · January 23, 2019, 1:15pm

About 200us - 400us per package one-way, but I had quite a few incidental delays that bumped the average for a round trip up to 1.5ms ± 0.5ms. Probably due to thread priority or related issues on my PC, the ODrive was the only connected USB device aside from my mouse.

madcowswe · January 24, 2019, 9:37am

Yes the underlying USB Full Speed is using 64 Byte packets. The current implementation sends each transfer in its own packet, an obvious optimisation would be to bundle multiple requests/transfers.

tarun · July 12, 2019, 10:04pm

Hey I’ve been working on Doggo, but unfortunately the ODrive was communicating too slowly for us as well. We were unable to close the loop and do torque control. USB was too slow for us since we would only be able to communicate at ~250Hz. Previously for each loop, we had to give a current command for each axis and request encoder positions and velocities for each axis (and we have 4 ODrives onboard). After digging into the firmware a little bit, we added an endpoint function that would take in both current commands packaged into a struct. This function would return all four encoder positions/velocities as struct. This allowed us to use more of the large 64 byte packet. The other thing we did is thread all the USB requests. We were unsure how much this would help, but with the threading and packaging of the commands we saw a huge speed up with communication rates >1500Hz. This resulted in a visible improvement in our closed loop torque control from an external board. If you would like to see our fork of ODrive it is here. I still haven’t tested it with the python communication library yet since we have been using a C++ library to communicate with the ODrives.

Riewert · July 13, 2019, 8:04am

That is really nice. I was looking into this exact same issue/feature! Will use your code, very useful, thanks for sharing.

Wetmelon · July 13, 2019, 10:34am

If you get time, I’d love if you could add the USB handling code in a pull request back to the main repository.

Riewert · July 13, 2019, 6:16pm

Can you explain this a little clearer? Also looking at your code, did you succeed in reducing the amount of messages necessary per function call from 6 total to 2? I did something similar, putting everything into a single function call, but it is still pretty slow, so I’m guessing a lot of the increase is from your C++ library? - Odrive and Java