Unresponsive USB after about 20 secs


#21

I was doing something else on another PC and suddenly the motor stopped spinning

apparently it’s a software issue on the PC sending commands (according to the last error message). it overall made 520 000+ commands 24/7 for over 19days. The test did not try to close/open the USB communication for each command but merely reuse an already opened com port + there wasn’t someone or a video analysis performed to actually check that the motor did spin for every single command.

all in all, this is stable enough for me :smiley:

there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
524557
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
there was an error in the main loop…
there was an error in the receive_thread() function: ‘ODriveBulkDevice’ object has no attribute 'recieve’
root@MTBD00694:~/ODriveFirmware/Firmware#


#22

I’m moving the continuation of @alexisdal’s usb reliability issues from Where are the trajectories? to here, to keep that thread about trajectories.


#23

sometimes the USB connection breaks (i’m using the builtin python odrive lib) while the boards continue to operate (motors are still holding their positions

It looks like you ran the 24/7 tests using the ASCII protocol (the one with the p 0 1000 0 0 commands), right? If nothing else changed in your test setup (same PC, same OS, same cables and hubs, same motors, same current and velocity limits, same actual max current and max velocity) this would allow us to deduce that you current USB issues are not triggered by any of the following: STM USB code, electrical issues or PC-side kernel.

If any of the mentioned parameters changed, you might try to use a shorter USB cable to reduce the likelihood of electrical issues.

In any case, we might be able to rule out a bit more of the USB stack if you connect to the odrive with find_any(consider_usb=False, consider_serial=True, ...) and then configure the firmware accordingly (CONFIG_USB_PROTOCOL=native-stream).

sometimes USB continues to operate but motors no longer move

This sounds like a DRV fault. Can you check the error code of both motors? What’s your current_limit? Does it still occur if you disable M0?
Most recent tests indicate that the DRV fault triggers as a function of the M0 I_q setpoint and the M0 electrical phase.

Edit:
I finally took some time to do some burn-in tests on USB to track down the issue to here:


Changing the priority to osPriorityAboveNormal fixes at least the issue I’m observing. Before the mean-time-to-failure was around 100’000 read operations and now I’m appoaching 5M-values without any hick-up.


#24

according to my notes (Unresponsive USB after about 20 secs)

I was using in nov 2017

Can’t remember for the max limit and current limit. I deleted everything from the computer. Back then i was defined at compile time.

Now i use

  • still odrive 3.2 but with code revision 3593a6812859446ddc62431060350d2e7b60f3fc (by the way, the tup.config is quite a pain releif)
    CONFIG_BOARD_VERSION=v3.2
    CONFIG_USB_PROTOCOL=native
    CONFIG_UART_PROTOCOL=ascii
    CONFIG_STEP_DIR=n
  • all board params are set from my own script file (instead of compile time) => I did try different values of vel_limit and current_lim and observed mix stabilities
    my_odrive.config.brake_resistance = 0.47
    my_odrive.motor0.config.vel_limit = 20000.0 * 10 # 10x the default 20000
    my_odrive.motor1.config.vel_limit = 20000.0 * 10 # 10x the default 20000
    my_odrive.motor0.current_control.config.current_lim = 50.0 # defaults to 10
    my_odrive.motor1.current_control.config.current_lim = 50.0 # defaults to 10
  • i use two sku3 6374 149 motor with CUI ATM102-V encoders
  • a significant mecanical load (yes i’ll try detaching the belts soon)
  • same 1.5m USB cable connected to same PC port + 2m extension cable (yes i’ll try without the extension cable too)
  • i used my_odrive = next(odrive.core.find_all()) to connect to my odrive board and never touched CONFIG_USB_PROTOCOL (just kept default values since it was working)

Soon I want to switch to a raspberry pi instead of a regular PC. But last time I tried to use the pi, I had to physically disconnecter/reconnect the cable between each execution of my script.
Otherwise, I would never be able to reconnect to odrive.
I also noticed that demo.py made motors run slower (did not search further)

I will also try checking the error code of motors too.

I think I should need “only” one pi and one odrive connected through USB for what I want to achieve.
But given my amount of available time for this project, I might have to resort to put odrive config in permanent memory once with USB, then only communicate with it through the step/dir interface using an arduino-like board with a merlin or similar fw to send step/dir commands with a calculated trajectory.

@qjones i’m curious to read your post :slight_smile:


#25

I believe the USB reliability issues are resolved with the fix I mentioned. Checkout the latest devel branch, it’s been applied there. Since applying the fix I observed no more USB issues at runtime, only sometimes when initializing (I think specifically when powering up the board while it’s connected and the script is already waiting).

If you still have to physically unplug/replug on the Raspberry Pi with the latest devel, check out the “sam_python_fixes” branch which improves the resilience (and the interface) of the python lib. For instance on that branch the python lib resets the ODrive USB connection before talking to it. This may be necessary if the previous script instance didn’t release the device properly.

If at all possible I would recommend avoiding step/dir, as it gravely limits your options and is not very robust against electrical noise. Its main purpose is interoperability with existing systems.


#26

Last monday afternoon, I tried to use commit 6e60ec4631bcb72d96124c52eaa0aa9a9a53262c in the hope to see those USB problems gone (i did reflash the board)

Using CONFIG_USB_PROTOCOL=native (and not native-stream! i forgot), i’m fairly confident that the usb problem is still there.

i used
root@MTBD00694:~/ODrive# cat Firmware/tup.config
# Copy this file to tup.config and adapt it to your needs
# make sure this fits your board
CONFIG_BOARD_VERSION=v3.2
CONFIG_USB_PROTOCOL=native
CONFIG_UART_PROTOCOL=ascii
CONFIG_STEP_DIR=n
root@MTBD00694:~/ODrive#

and re-run my previous experiment: sending back and forth commands to odrive under serious mecanical load to see what happens

using the following odrive configuration

    ## configuring odrive
    self.my_odrive.config.brake_resistance = 0.47
    self.my_odrive.motor0.config.vel_limit = 20000.0 * 10 # 10x the default 20000
    self.my_odrive.motor1.config.vel_limit = 20000.0 * 10 # 10x the default 20000
    self.my_odrive.motor0.current_control.config.current_lim = 50.0  # defaults to 10 
    self.my_odrive.motor1.current_control.config.current_lim = 50.0  # defaults to 10 

(notice the foolish current_lim…)

Since I could already observe that the mecanical precision was satisfying (none axis move shift towards a given given direction over time), I used another power supply (24V/25A, configured on 18V) and used a maybe foolish current_lim at 50 instead of the default 10, because I wanted to have serious torque while the motor was not spinning.

I started the experiment at 19h39, after about 10 minutes running smoothly, I left the lab to head home.
I still had the mecanical load + I still had the 2m usb extension cable. (it was a pain to reduce the distance between the odrive and the PC)

When I got back in the morning, I had forgotten about the experiment running? I went to another location to focus instead on math and python code for trajectory planning. When I was finally happy with the code and unittests, I came back to the lab to experiment the new trajectory planner.

First of the all, the machine was no longer moving, I was disappointed of course. The belt where strangely very loose but where still in position everywhere. I also noticed a strange smell, as if somebody left a soldering iron turned on. On PC, I noticed that the previous evening the script had stopped at 20h23 with a USBError: [Errno 5] Input/Output Error (see details below)

But then, M0 was freely spinning (with resistance) but M1 was resisting. Then i realised the odrive was still under power. a funny faint vibration sound was emitted by M1, nothing from M0. when i touched M1, i got burnt. Was super hot, you could cook an egg on it.

I immediately turned the power off.
Turns out that the delrin motor socket partly melted under the heat and the 4 motor mounting screws basically penetrated the delrin and shifted the motors in the direction of tension for the belt (explaining why the belt where so loose). the white delrin was deformed and partly yellowish. I guess I’m lucky it didn’t catch fire at night…

My explanation is that the USB exception on the PC side made the script stopped. Then the odrive would no longer receive movement orders but would continue to operate. So it basically kept asking the motors to hold position, drawing current, increasing the heat… for about 24h :frowning:

Now the two motors are burnt/dead for sure. Their labels are yellowish. there is a lot of friction/resistance internally, as if it was full of jam or caramel.

The funny thing is that this odrive 3.2 still replies to USB and accept commands.
I replaced my two dead motors and powered on the odrive. It makes zero beep, the motors initialization sequence does not happen (nothing moves). Besides, even without power at all, M0 physically resists at bit, while it does not it i electrically unplug it from my odrive 3.2 !!! (a behavior I never observed in the past that makes me think the board is likely dead)

Question is: did i just fry my second odrive 3.2?

Traceback (most recent call last):
  File "/root/odrive_test/odrive/usbbulk_transport.py", line 77, in process_packet
    ret = self.epw.write(usbBuffer, 0)
  File "/usr/local/lib/python3.5/dist-packages/usb/core.py", line 387, in write
    return self.device.write(self, data, timeout)
  File "/usr/local/lib/python3.5/dist-packages/usb/core.py", line 948, in write
    self.__get_timeout(timeout)
  File "/usr/local/lib/python3.5/dist-packages/usb/backend/libusb1.py", line 824, in bulk_write
    timeout)
  File "/usr/local/lib/python3.5/dist-packages/usb/backend/libusb1.py", line 920, in __write
    _check(retval)
  File "/usr/local/lib/python3.5/dist-packages/usb/backend/libusb1.py", line 595, in _check
    raise USBError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBError: [Errno 5] Input/Output Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "corexy.py", line 444, in <module>
    c.send_cmds("G0 X0 Y-550") ; time.sleep(d) 
  File "corexy.py", line 128, in send_cmds
    res = self.__handle_cmd(cmd)
  File "corexy.py", line 157, in __handle_cmd
    return self.__parse_goto(cmd[2:])
  File "corexy.py", line 179, in __parse_goto
    return self.__goto(target_x, target_y)
  File "corexy.py", line 215, in __goto
    self.__move_motors(dA_steps, dB_steps)
  File "corexy.py", line 300, in __move_motors
    self.my_odrive.motor0.set_pos_setpoint(int(self.cur_a), 0.0, 0.0)
  File "/root/odrive_test/odrive/core.py", line 61, in call_remote_function
    arg_properties[i].fset(None, args[i])
  File "/root/odrive_test/odrive/core.py", line 51, in fset
    self._channel.remote_endpoint_operation(self._id, buffer, True, 0)
  File "/root/odrive_test/odrive/protocol.py", line 245, in remote_endpoint_operation
    self._output.process_packet(packet)
  File "/root/odrive_test/odrive/usbbulk_transport.py", line 84, in process_packet
    self.epw.clear_halt()
  File "/usr/local/lib/python3.5/dist-packages/usb/core.py", line 406, in clear_halt
    self.device.clear_halt(self.bEndpointAddress)
  File "/usr/local/lib/python3.5/dist-packages/usb/core.py", line 909, in clear_halt
    self._ctx.backend.clear_halt(self._ctx.handle, ep)
  File "/usr/local/lib/python3.5/dist-packages/usb/backend/libusb1.py", line 889, in clear_halt
    _check(self.lib.libusb_clear_halt(dev_handle.handle, ep))
  File "/usr/local/lib/python3.5/dist-packages/usb/backend/libusb1.py", line 595, in _check
    raise USBError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBError: [Errno None] Other error

#27

Hm that’s unfortunate. Though I don’t know how the loss of communication would lead to a motor dissipating current while not generating any torque. Do you think that possibly your encoder slipped at the same time?

Unfortunately USB has seen a bit of a stability hit as we’ve been working on adding a ton of new features. Once the whole new codebase is ready (firmware v0.4.0), we will for sure do some serious endurance tests on the USB communication and make sure that it’s rock solid. Meanwhile, I’m thinking maybe it’s good to add an optional communications watchdog, where you can set a timeout and a safe action, such as disable motors.

You can double check with an ohm-meter if there is low-impedence between any phase and GND or VBUS when the ODrive is turned off. If there is, then yes it has a fried FET there. Since this seems to have been an overheating issue, you may be fine with simply replacing the shorted FET(s).


#28

wow… you rock.
I managed to isolate some faulty FETs and got someone to change them using working FETs from my first dead odrive 3.2 and…

we’re back in business :smiley: she’s working again :slight_smile:

i’ll pursue the tests (without letting it rolling alone unattended for now :wink: )


#29

and i just placed an order for 3 odrive v3.5 in 48V. just in case.


#30

i switched to native-stream in tup.config and used
next(odrive.core.find_all(consider_usb=False, consider_serial=True))
to open communication stream in the python code

this morning while pushing the accelereration i managed to have the motor refusing to spin.
I noted some values below
bus_voltage => 17.999340057373047
vel_limit => 400000.0 400000.0
current_lim => 50.0 50.0
calibration_current => 10.0 10.0
pos_gain => 20.0 20.0
vel_gain => 0.0005000000237487257 0.0005000000237487257
vel_integrator_gain => 0.0010000000474974513 0.0010000000474974513
motor_error => ERROR_DC_BUS_UNDERVOLTAGE ERROR_DC_BUS_UNDERVOLTAGE
set_point => 36208.0 -33477.0

I don’t understand why the motors complain about ERROR_DC_BUS_UNDERVOLTAGE while the voltage returned is an expected 18V

But USB communication continued to operate. By rebooting the board, everything went back to normal.

After that, i slighted reduced the acceleration to observe stability. Then I started another endurance test at 09:45. it’s 13:00 now. The code tries to send two motor positionning commands through USB every 0.005sec. Everything’s fine so far.


#31

What is the current/power rating of your power supply? You likely drew too much power right when accelerating making the voltage sag. You can’t see this after you stop of course, because the bus voltage will recover back to nominal when you are not drawing power.