Shrinking AV’s 1 Billion Test Miles

There is still no answer to how many miles an autonomous vehicle needs to drive before it’s proven safe. But some AV developers and test companies are hoping to ease the burden a bit with automation that makes millions of real and simulated miles of road testing simpler to implement, supported by standards that make it easier to create and trade simulation scenarios.

The goal is to reduce the time and cost of testing to the extent possible without a standard or set of criteria defining what would make an AV safe to drive on public roads and the tests that would demonstrate that condition. But there is as little agreement on how to do that as there is on the definition of what “safe” means for a self-driving vehicle on public roads.

The best option from a traditional systems-verification point of view would be to follow the example of chip and component makers, which have been successful in adapting their verification tools and methods to the auto market, as well as updating the ISO 26262 functional safety standard developer new standards and protocols that are more able to spot problems in higher-level systems software as well as hardware.

Getting a good functional-safety-test result on everything between an AV’s rear bumper and its headlights would only guarantee the vehicle would be safe to take outside and turn on. It would not indicate anything about whether the vehicle would drive safely, said Kurt Shuler, vice president of marketing at Arteris IP.

“Testing those functional things is all deterministic,” Shuler said. “You can put a number on your result and be confident it means something. If you’re dealing with the system itself, from a practicing AI/machine-learning standpoint, it’s non-deterministic. Everything is on a distribution curve somewhere, and there’s a probability you may get two answers or that a certain percentage of the time it’s going to fail.”

The key to a good safety test would be to know how long to test and how many driving situations are needed to prove a vehicle would drive safely. Unfortunately we don’t have enough experience with AVs to know any of those things—or even to have a clear, consistent idea of what would constitute a “safe” automated vehicle, said Mike Demler, analyst for The Linley Group.

What’s missing
The gap in the science of testing an AV’s future driving behavior is large. Those tests still have to be connected to verification of lower-level systems that provide the incoming data that an AV controller must process, and the actuators and mechanical connections it must use to get the vehicle to respond, according to David Fritz, global technology manager at Mentor, a Siemens Business.

Mentor’s PAVE360 uses a digital-twin approach to AV simulation/emulation testing. It works hardware and software subsystems into the same closed-loop simulation to test responses that begin with the functionality of sensors. It also includes the fusion of sensor data, data traffic flows and the responses of actuators controlling the vehicle.

Most of the simulation-testing systems being used on AVs come from the chip and electronics industries, which are very effective at predicting the behavior of electronics, but they tend to leave physical systems out of the mix. The highly advanced simulators used by automakers on the other hand, focus almost exclusively on design and manufacturing, not troubleshooting or safety verification.

“The OEMs concluded very quickly that virtualizing a vehicle is a lot more than just having an abstract model that can’t handle things that come into play when you’re making intelligent decisions at high speed,” Fritz said. “They don’t handle the fact that the tires on the left side of the car have a different friction coefficient than the right side of the car because it’s driving through a puddle and the car is turning. Vehicle dynamics are really hard, but they’re as important as the thinking and decision-making to the final behavior of the vehicle itself.”

Other EDA and simulation-test providers also have pushed into the AV test market and are moving aggressively to add new partners and capabilities.

Synopsys, which offers functional-safety-focused IP and pre-silicon testing for ADAS and automotive sensor SoCs, also partners with Infineon for design and verification of AV chips. In addition, the company is expanding its repertoire with the just-completed acquisition of QTronic, a German automotive simulation and test-tool maker.

ANSYS, which participates in the working groups developing the AV scenario-testing standards OpenDrive and OpenScenario at the European Association for Standardization of Automation and Measuring Systems (ASAM), uses partnerships to expand its native capabilities, as well. In September, for example, the company announced it would add the Hologram scenario-based autonomous vehicle (AV) stress-testing application from Edge Case Research, and would integrate visual design and compliance capabilities from AutoDesk to a new version of its R3 closed-loop scenario-simulation test platform.

Adding new simulation platforms, or more capabilities to existing platforms, doesn’t solve the critical problems of scenario-based AV simulation testing, however, according to Chad Partridge, CEO of simulation-testing provider Metamoto.

Every vehicle and every OEM’s test requirements and procedures are different, and nearly all the data- and scenario-creation-language specifications are either too old, too narrowly focused or too proprietary to be of much use to testers who need an unprecedented volume and variety of scenario content, Partridge said.

“Our platform is good for perception stacks, for example, so when the LiDAR is on the response is realistic because the material properties of the asphalt gets the reflection you expect, but you get beam spreading if there is an obstacle that’s glass,” Partridge said.

“There are some open standards – OpenDrive and OpenScenario, but they’re not at a point that a lot of content is available,” Partridge said. “The models are pretty labor intensive and what content is available is in a whole series of legacy and proprietary formats that are very difficult to adapt.”

Creating a single set of test scenarios or methods that constitute a minimum level of acceptable safe function would save time and eliminate a lot of redundant effort, according to Jamie Smith, director of global automotive strategy at National Instruments, and Nicholas Keel, principal offering manager at NI’s ADAS and Autonomous Vehicle Test division.

“[But] each stakeholder has different processes for test development and each vehicle has a different suite of sensors, which means each scenario will need to be consumed differently for each vehicle,” Smith and Keel wrote. “Every stakeholder in the AV development process is trying to move as fast as they can, but not necessarily in the same direction.”

How much testing?
Limiting tests by considering only scenarios within the geographic or physical limitations of a car being tested is one approach, Fritz said. “If you test to what we call a digital plan, which is correlated to the physical vehicle, your constraints limit testing to the most realistic possibilities in that vehicle and place. That cuts down on the amount of unused testing and allows you to test the bizarre corner cases you would never be able to in a physical vehicle.”

There may be no answer on the question of how much to test, but there is plenty of interest in languages and formats that make it easier to share results, scenarios and road layouts that could be used on different simulation platforms by different AV testing organizations, said Klaus Estenfeld, managing director of ASAM.

The OpenScenario effort alone has seven active working groups with between 15 and 50 experts pulled from ASAM member companies in the auto and electronics business, Estenfeld said. “One of the hottest topics of discussion in the whole community is the possibility of creating a huge database of scenarios of different road situations that could be used by anyone for testing because there is not enough of this available now.”

He noted there has been resistance by companies with legacy products that don’t want to risk losing their traditional markets to newcomers. But ASAM members have committed energy and resources to revamping the existing versions of its specifications to make him optimistic about the impact of the version 2.0 specs ASAM is currently developing. The effort took a big step forward in September when Israeli startup Foretellix—launched by a trio of chip-industry heavyweights in 2017—committed to releasing their verification-automation language to open source through ASAM.

“The umbrella purpose of all these workgroups is to define a global architecture, based on requirements set last year, for extensions to the standards we hope to have available in December,” Estenfeld said. “Foretellix is very much pushing their scenario-description language as a big part of the standard, but there is also a lot of heavy discussion about what is needed from a descriptive language and what potential for portability that will need to be resolved.”

“The problem with the way everyone is testing is the number of miles is not a good way to tell whether the vehicle is safe, or even what all those miles mean,” If you drive 10 million miles, but drive them all up and down the same highway, you aren’t proving you are able to handle a variety of situations,” said Ziv Binyamini, Foretellix’s CEO. “The key is coverage driven testing—meaning you define what scenarios you need to use for the test and the variations, automate the test so you can get through a lot of the scenarios, monitor which scenarios you encountered but didn’t complete, and which you never encountered. Then at least you know what scenarios you have tested and which you still have to test.”

This is at least a good starting point. “What they’re doing mainly is testing the test plan,” Demler said. “‘Here is my test plan, did I run through all the things in my checklist?’ Which does have some value, but it’s the kind of thing you could do by other means as well.”

Guidelines
Defining how long the test should be and what should be included “would be easy to answer if you had a formal model for safety that protects against mistakes of the artificial intelligence functions, which make bad recommendations from time to time, just as Netflix sometimes recommends movies you have no interest in watching,” said Jack Weast, senior principal engineer at Intel. “Without that you can do a robust safety assessment that takes into account the things you clearly should address, and the things you probably should consider, and takes into account your geography and your mandatory recommended safety requirements to create a set of expectations. That could produce a more convincing idea of what you should consider than saying, ‘I’ve driven a million miles at midnight on a one-lane road with no stop sign, streetlights or pedestrians, so I think this [vehicle] will be safe on the roads.'”

Intel published its own set of guidelines this summer in a white paper/functional-safety guideline called, “Safety First for Automated Driving,” which it developed with input from 10 other companies, including BMW, Volkswagen, Audi and Continental. The guideline is based on a research paper Mobileye published in 2017 touting a common-sense, humane set of safety decision-making criteria called Responsibility-Sensitive Safety (RSS) that boil down to not expecting perfection from human or machine drivers and asking that neither drives aggressively.

Intel rival Nvidia has a similar-sounding plan called the Safety Force Field, which relies on a computational framework—a set of driving policies built on an inference model from a machine-learning app trained by Nvidia Drive servers using vehicle sensor data. The SFF policies provide a decision making framework for the Nvidia vehicle that opts toward driving decisions that “won’t create, escalate or contribute to an unsafe situation, and include the measures necessary to mitigate harmful scenarios.”

Both approaches show some promise of moving toward a point that the criteria to judge AV safety could be defined and tested in fewer than tens of millions or billions of miles of simulated testing, as does the full-system-verification approach of Mentor’s PAVE360, Demler said. But none are likely to produce a definitive test any time soon.

“That’s just another part of the reason people are making a mistake if they don’t realize a lot of this is still an R&D project,” Demler said. “Think about a shopping center parking lot right before Christmas. You can try, but it would be hard to even test that accurately.”

Fig. 1: Siemens’ mixed-fidelity, multi-ECU digital twin. Source: Siemens