Pentagon tested generative AI to draft supply plans in latest GIDE 9 wargame

Members of the 56th Air and Space Communications Squadron operate cyber systems using a Enhanced communications flyaway kit during the Global Information Dominance Experiment 3 and Architect Demonstration Evaluation 5 at Alpena Combat Readiness Training Center, Alpena, Michigan, July, 12, 2021. (U.S. Air Force photo by Tech. Sgt. Amy Picard)

WASHINGTON — Supply officers at the military’s operational combatant commands tested ChatGPT-like software to help them write logistics plans, as part of the latest Global Information Dominance Experiment, GIDE 9.

Their verdict: Generative AI showed huge potential to help them sort through masses of mind-numbing details and outline options to offer their human commander — a crucial aspect of the nascent revolution in command-and-control known as CJADC2.

“What we’ve found is that, for certain use cases, it’s proving to have a lot of value,” said Air Force Col. Matthew “Nomad” Strohmeyer, the GIDE director for the Pentagon’s central Chief Data & AI Officer (CDAO), on a CSIS webcast Tuesday. “It’s not this panacea, but it’s also not this Pandora’s Box of evil. It’s somewhere in between.”

Given generative AI’s tendency to output convincing but false answers, however, the technology needs to be tamed, harnessed, and fed a strict diet of trustworthy data, then forced to footnote its answers so users can double-check them against the original source. The Department of Defense also needs to ensure the military secrets it feeds into the AI don’t get sucked up and used as training data by a parent company outside DoD’s security perimeter, or even embedded in their next publicly available product.

In other words, the military can’t write war plans with the same publicly available chatbots that high schoolers use to cheat on their homework. They may be able to use Reddit posts and other publicly available websites to teach your Large Language Model how to speak English, but once it knows those basics, CDAO officials and other experts say, DoD generally wants to keep both the data it ingests and the answers it outputs inside a secure perimeter.

“The common-use tools, those are not the ones we are encouraging,” said Navy Capt. M. Xavier Lugo, head of CDAO’s Task Force Lima. “The real utility comes into the tools that we can actually isolate.”

‘A Human Is Always Accountable’

“I divide the world into three: the wild, the zoo, and the cages,” Lugo continued. The wild is the vast landscape of the internet, the open market, and private-sector innovation, which the Pentagon must learn to access without exposing its own sensitive information. The zoo is the military-controlled slice of cyberspace, collectively known as the Department of Defense Information Network. The cages are the various security compartments inside the DoDIN, some of them very highly classified indeed.

“Any tool that can actually be containerized and can actually be utilized within either the zoo or a cage, those are the ones that were contemplating,” Lugo said.

Lugo’s Task Force Lima, created last August, is still collecting potential “use cases” for generative AI — almost 230 at this point, he said, submitted from across the Defense Department — and thrashing out detailed guidelines and guardrails. But within its first 30 days, it pronounced its golden rule:

“A human is always accountable,” Lugo said. “Don’t point to Google or to OpenAI and say, ‘They wrote it.’”

That’s a principle Strohmeyer and his team are taking to heart. “We’re being very measured about it. We’re taking our cues from Task Force Lima,” he said.

“There’s a lot of promise there,” he argued, but AI output has be treated as a rough draft for humans to check, edit, and decide. It’s “initial content creation, not the final answer.”

So how did generative AI help in the latest quarterly GIDE, which was interwoven with the massive Army-led Project Convergence exercise? It turns out the chatbots are just the user-friendly face of a complex “data mesh” connecting commands around the world.

Guidance From GIDE

“We just finished up GIDE 9 last week,” Strohmeyer said. “During it, we did a single-blind test, with multiple combatant commands, of an AI capability [for] logistics” — that is, one team of staff officers used the new AI, another stuck to the traditional process, and the people assessing their output didn’t know which was which.

Figuring out how to redeploy supplies and support units from one location to another is a routine yet complex task, Strohmeyer explained. It’s the kind of boring but essential work that takes “hundreds and hundreds of hours” as large staffs hunt down the latest data from units around the world:

What do the frontline forces need, from fuel to food to specialized bridging equipment? Who has it now? Where is it? How can it get to them? What’s the fastest/cheapest/safest combination of planes, trains, ships, trucks, military supply units and private contractors, to make that delivery? What airports, seaports, roads, and bridges can handle the load, and are they still intact as of this morning?

Step one is making all the relevant information available, instead of locked up in incompatible computer systems that can’t share data, which forces staff officers to manually reenter long columns of data — a slow and error-prone process — or even scrawl vital information on sticky notes. Connecting hundreds of different organizations’ data “stovepipes” into a single “data mesh” is a major push for Strohmeyer’s bosses at CDAO.

“The data mesh services that we are trying to bring to bear allow us to [share] data in common between the combatant commands,” he said. “In the past that data wasn’t viewable by another combatant command. But now, because we’re trying to truly globally integrate everything we do … that piece of data is shared by all the combatant commands, and not just shared via email or something, it’s shared live.”

“GIDE and JADC2 [are] very much the reason you need the data mesh,” added Lugo.

The crucial nuance of the “mesh” concept is it doesn’t require ripping out existing databases and replacing them with one central mega-server, which would not only cost too much but also create a single point of failure. Instead, the idea is to leave the legacy databases in place while layering interconnections and a road map (called a data catalog) on top.

In fact, for frontline combat units doing real-time targeting and defense against opposing forces, Strohmeyer said, your data mesh has to withstand jamming, hacking, and physical destruction of the network. That’s why GIDE 9 also experimented with keeping backups of every piece of vital data at every single node, so units could keep fighting even when cut off.

Once you have all that data, though, it turns out that it’s too much information for mere humans to digest in time to make decisions. Enter generative AI, whose entire purpose is to take masses of information and spit out a readable summary.

“When … you give it a very tailored specific prompt about [for example], ‘look for for new changes in the road network in this area’ that may not have been identified previously, it can identify that really well,” he said. “It can also do it in a way that exposes the sources to humans so that they can check.”

Strohmeyer and his team are still digesting all the feedback from GIDE 9, which will lead to improvements — both software updates and streamlined staff procedures — to be tested in GIDE 10, now less than three months away. That tight cycle of quarterly experiments is meant to drive the kind of rapid progress that the usual years-long pipeline struggles to deliver, with many promising projects fizzling out in the “valley of death” between R&D and actual deployment. GIDE, by contrast, is part of a new “agile” approach, inspired by Silicon Valley, that puts software developers and servicemembers side-by-side.

“This wasn’t, you know, research organizations” conducting the GIDE exercises, Strohmeyer emphasized. “This was actual warfighters that were doing this and seeing what worked, what didn’t work as they went through the process.”

Months of budget gridlock did slow that innovation cycle, he acknowledged, but the spending bill passed just last week reopens the funding floodgates, at least until the federal fiscal year ends on September 31st. “We did have to slow down considerably,” Strohmeyer said. “We’re now able to push these things out more to the combatant commands.”