Memory gets lost

Jason_Lewis · December 16, 2020, 5:46pm

What is the status of the odrive flash memory corruption issue? Other people must have experienced this, every 10 power cycles or so, one of my cards loses its memory. Is this problem identified or fixed in the latest FW. I am running the latest HW, version 0.4. This is a bad problem.

Wetmelon · December 16, 2020, 7:30pm

You’re on Hardware v3.6, running firmware version 4.12? Could you try flashing firmware 0.5.1?

I will bring up the problem internally

Jason_Lewis · December 16, 2020, 7:44pm

Unfortunately, 0.51 is not compatible with our system. Some more information for you.

We have 5 cards in our system, and it happens about 1/10 times, so 2% of cycles for us.
In this instance, our card lost its configuration as well (was reset to default), however the odrive is not able to control the motor anymore, so we cannot even recalibrate to bring it back up. We can talk to the device over USB just fine.

Jason_Lewis · December 16, 2020, 7:45pm

The axis that is failing is stuck in permanent error state, the motor will not even calibrate

Jason_Lewis · December 16, 2020, 7:46pm

oh, we are using the 56V card

Wetmelon · December 16, 2020, 7:56pm

Which error specifically?

Wetmelon · December 16, 2020, 7:56pm

Btw our best guess is this problem is causes by current injection somewhere. Check out the ground loops section on the docs and see if any of the mitigation strategies help

Jason_Lewis · December 16, 2020, 7:58pm

current injection is corrupting the flash? Does the FW write anything to the flash on power up/down? The card was idle during the power up/down.

Wetmelon · December 16, 2020, 10:07pm

Hmm, not sure, actually. It reads all the configuration data from the flash on power up but I don’t know if it writes anything. Have you seen this github issue? Perhaps you can add your experiences to it https://github.com/madcowswe/ODrive/issues/527

Jason_Lewis · December 16, 2020, 10:50pm

OK, I am tyring to flash this piece with this ST on a windows 10 machine. How do I make this work. The programmer cannot see my board.

Jason_Lewis · December 17, 2020, 4:56pm

So, some update on this.

Jason_Lewis · December 17, 2020, 5:03pm

Notice the problem only occurs on axis0 on a new card. I can switch the same motor to axis1 and the problems does not happen. Also does not depend on the motor or the wiring. Turns out copying the flash config from an older card fixes this issue. What default setting were we missing that could cause unstable behavior during calibration like this? Two engineers checked the settings, the configs looked identical to us. What config setting can cause this behavior?

Jason_Lewis · December 17, 2020, 5:07pm

so to recap. This problem was caused by a power cycling event causing the setup flash to get reset. The reset behavior contained a setting that allowed intermittent calibration of an axis motor. The hardware was verified good with multiple motors and harnesses. Axis 1 did not show the problem. Seems like there is some marginal setting in the config.

Wetmelon · December 17, 2020, 5:22pm

Oh. This is a different problem than what I thought you were describing. We have had issues where the STM32 has to be entirely re-flashed with an STLink just to get it to work again (firmware corruption).

Yours seems to have an issue with the configuration, which is a bit different. We have actual control over the NVM there. Is it possible you’re losing power during saving the config? We use two NVM flash pages to try to avoid losing data if that happens, but it could certainly be iffy.

You haven’t really actually described the behavior. Are you trying to run, or trying to calibrate? CURRENT_UNSTABLE can come from several sources including simple electrical noise or incorrect current controller PI gains / bandwidth settings, etc.

If you can give us more information about your build and upload a JSON file of your config, we can look it over.

Try this:

Backup your good config with backup-config
Next time it fails to start correctly, use odrv0.save_configuration() then Backup the “bad” configuration to another file
Compare the two files (you can use VSCode to format JSON automatically so it’s more readable, and Beyond Compare or VScode for comparing them)

Jason_Lewis · December 17, 2020, 5:29pm

I have seen cards need a re-flash as well, and seems related to power cycle. That being said, this case is fresh in my mind, so I can provide the best data on it. We were not configuring the card when power when down. The most important problem here is the loss of the config in the field.

Jason_Lewis · December 17, 2020, 5:30pm

We know how to get it to calibrate now, kind of anyway

Jason_Lewis · December 17, 2020, 7:35pm

OK, we have learned some more about this config memory performance. It seems that FW 0.4.12 will calibrate, but not be able to reboot into a calibrated state. Putting FW 0.4.11 on resolves that issue. We are not ready to use the latest FW due to API changes. Does this make sense to you?

Wetmelon · December 19, 2020, 7:28am

There are 10 months worth of upgrades between 4.11 and 4.12… it could be anything.

Jason_Lewis · December 19, 2020, 4:15pm

This makes me concerned that the stability of anything after 0.4.11 cannot be guaranteed. In our system, we cannot calibrate all the motors at cold start, and even though our system is almost always powered, maintenance requires shutdown.

We downgraded to 0.4.11, and all problems related to the config memory either not being saved correctly or getting lost appear gone. We had several cards with 0.4.12 on them, and these motors have always given us trouble at startup. We are early in the product design cycle, so we always assumed it was on our side of the fence. It is only when the power cycle erased the memory, that we were forced to spend a couple of days on the issue.

We have significant experience with embedded FW, and while I cannot promise any time, if you can point us to a list of changes, we might be able to dig into them some.

Jason

Wetmelon · December 20, 2020, 9:42pm

The changelog is available here https://github.com/madcowswe/ODrive/blob/devel/CHANGELOG.md

Actually it looks like I misread the git graph. The only difference between 4.11 and 4.12 is a trajectory planner fix and some documentation changes.