Air Force photo

A sensor analyst at work at Joint Space Operations Center, Vandenberg Air Force Base, California

WASHINGTON: Artificial intelligence has an insatiable appetite for data – but if you feed it the wrong kind of data, it’s going to choke. To get clean-enough data in large enough quantities for machine-learning algorithms to actually learn something from it, officials say that the intelligence community needs to change how drones, satellites, and other sensors performs their mission every day.

The turning point will be “when we start seeing collection requirements … for the production of training-quality datasets, versus the support of a tactical operation,” said David Spirk, who became the Defense Department’s Chief Data Officer in June and is now finalizing the DoD’s new data strategy. “I don’t know that we’ve entirely made that turn yet, but I think we’re talking about it.”

For example, the US has collected vast amounts of data on the Central Command theater, said Spirk, who served in Afghanistan himself as a Marine Corps intel specialist. But, he told the AFCEA AI+ML conference yesterday, that data collection was driven by urgent tactical needs, without a systematic approach to archiving it, curating it, and making it accessible for machine learning.

Air Force photo

Predator drone over Afghanistan

That’s understandable. Artificial intelligence in the modern sense was in its infancy on Sept. 11th, 2001, and the Pentagon did not systematically embrace AI until 2014, long after the peak of fighting in Afghanistan and Iraq. But it means that the military has vast archives of “legacy data” – from drone video to maintenance records – that are in poorly catalogued, inconsistently formatted or otherwise too messy for a machine-learning algorithm to use without a massive and costly clean-up.

“Is that juice worth the squeeze?” asked Capt. Michael Kanaan, an Air Force intelligence officer who heads the USAF-MIT Artificial Intelligence Accelerator. In many cases, you could spend a lot of time and money cleaning up out of date, low-quality data that no longer reflects how your agency does analysis today. You get a much better return on investment, he told the conference, by “digitizing your [current] workflows” and ensuring they produce “training-quality data” going forward.

It’s crucial to set up “a strong data management culture” to govern your data collection from the beginning, said Terrence Busch, the Defense Intelligence Agency’s technical director for the Machine-assisted Analytic Rapid-repository System (MARS).

It took DIA years of effort and “a lot of back-end money” to set up the processes, training, and technology required for data management, Busch told the conference. “It’s not exciting work, [and] a lot of folks didn’t want to invest in it,” he said – but now that system is in place, the new data that DIA collects is much more accessible for AI.

At the same time, another DIA official warned the conference, you don’t want to clean up your data too much, because you might erase a seemingly irrelevant detail that turns out to be useful later on.

“We never want to throw it away, [because] we don’t know if it’s going to have value later,” said Brian Drake, DIA’s director of artificial intelligence. “The concept we are socializing inside of our agency is something we’ve done since World War Two, which is creating a ‘gold copy’ of that data”: a copy of the data as originally collected, with all its flaws, that’s archived and kept unchanged in perpetuity for the benefit of future analysts.

“We have to have an honest conversation with our vendors on that point,” Drake told the conference, “[because] we do find some data sets that come to us that have been pre-prepped and labeled,” especially when it comes to imagery. While that cleaned-up data is often great for the immediate task at hand in the contract, he said, DIA needs the raw material as well.

Getting everyone from contracting officers to analysts thinking about AI-quality data is a long-term effort, Busch said. “Down in the workforce level, culture adaptation is slow,” he said.  “We’ve spent at least 10 years getting people acculturated to big data, getting used to automation.”

That cultural revolution now needs to spread beyond the intelligence community. “Every single soldier, airman, sailor, Coast Guardsmen is really a data officer in the future,” said Greg Garcia, the Army’s Chief Data Officer. “Every single individual, no matter what their specialty is, has to think about data.”