EXCLUSIVE: Pentagon aims to 'own the technical baseline' for AI tech, R&D official says

Airmen monitor an Advanced Battle Management System (ABMS) “Onramp” demonstration in 20220. (U.S. Air Force photo by Senior Airman Daniel Hernandez)

WASHINGTON — Within “weeks” invitations will go out to key figures in defense, industry and academia for a first-of-its-kind Pentagon-hosted conference on “trusted AI and autonomy,” one of the lead organizers told Breaking Defense in an exclusive interview. The crucial question: Can the Defense Department rely on AI across a host of future missions?

The DoD is well aware it’s playing catch-up to the rapidly advancing private sector in many aspects of AI, acknowledged Maynard Holliday, the Pentagon’s deputy CTO for critical technologies. A big part of the conference is a push, not only to better understand what’s happening on the cutting edge, but how the military can adopt and adapt commercial tech to build AI capabilities it can trust — and control.

“We recognize we need to fast-follow, but we also need to develop military-specific applications of these commercial technologies, and as Under Secretary LaPlante has said in the past, we need to own the technical baseline of these technologies, so that we can have control over their evolution to a militarily specific solution, rather than being vendor-locked and having us beholden to one single vendor to evolve a capability.”

“Technical baseline” isn’t just a metaphor here: It’s a specific term of art for the foundational details that define a complex system, guiding its design and development from the initial drafting of requirements through multiple reviews to the final product — or, in the case of ever-evolving software, through continual cycles of upgrades.

But the Pentagon’s desire for control sits uneasily with AI innovators’ desire to keep trade secrets. OpenAi, for instance, has released almost no data about how it built its latest bot, ChatGPT-4. What’s more, ChatGPT’s algorithms run on the company’s own servers. Users only see the queries they send and the responses they get back, with no view into the complex processes happening inside the AI. Nor is this cloud-based approach unique to OpenAI, with more and more companies offering “software as a service” rather than selling software that customers then own, download, and run on their own machines.

Maynard Holliday (U.S. Army photo by William Pratt)

So how can DoD reconcile its desire to own the technical baseline with private industry’s obsession with protecting its intellectual property?

“Great question,” Holliday acknowledged. Part of the solution, he said, is that “we’re going to have to develop our own militarily specific, DoD-specific corpus of data that’s updated with our information, our jargon, so that we can interact with it seamlessly — and that we trust it.”

The conference, tentatively slated for June 20-22 at the MITRE facility in in McLean, Va., will be hosted by Holliday’s boss, Pentagon R&D chief Heidi Shyu. The attendees will also hear from acquisition under secretary Bill LaPlante and chief data & AI officer Craig Martell. But it’ll be equally important for Pentagon leaders to listen to innovators from outside government, who are driving the revolution in AI.

“The net is going to be broad and cover industry, academia, and the defense sector,” Holliday said. His staff is still thrashing out the guest list, he said, but he expects a number of attendees in the “triple digits.”

Going Beyond ChatGPT: Generating Trust

Prep work for the conference started late last year, Holliday said, with undersecretary Shyu first mentioning it publicly during a George Mason University forum in November. It was on the last day of that month that OpenAI launched ChatGPT, a Large Language Model (LLM) whose ability to generate coherent, plausible, paragraph-long answers — sometimes riddled with uncanny errors, called “hallucinations” — has heightened both excitement and anxiety about AI.

Holliday told a Potomac Officers Club event last month that the conference would look at the perils and potential of “generative artificial intelligence” like ChatGPT. But in his interview with Breaking Defense, he made clear that the agenda was much broader.

“We’ll definitely be talking about… how going forward we would mitigate the hallucinatory tendencies of LLMs and what we could be doing with respect to making those results of queries to LLMs more trusted,” Holliday said. But other topics will include cybersecurity, command systems, and the Pentagon’s recently revised policy directive on the control, reliability, and ethics of autonomous weapons.

US Army tank from Stable Diffusion – download (8)

Output from a generative AI, Stable Diffusion, when prompted with “US Army tank.”

Indeed, even within the specific realm of generative AI, LLMs are just part of a wider variety of algorithms that Holliday said he wants to discuss. While ChatGPT and its fellow language models can digest and generate text, other algorithms can scan thousands of pictures and then generate never-before-seen images. The most famous and controversial examples are art generators like Stable Diffusion and OpenAI’s own DALL-E, but algorithms could potentially combine data from multiple types of intelligence sensors into a single picture intelligible to human eyes.

“Generative AI goes across modalities,” Holliday said. “We would like to combine electro-optics, infra-red, cyber [data]. That’s where it becomes really powerful.”

“Combining different modalities” this way may also be the key to reducing hallucinations, Holliday suggested, with each type of input acting as a cross-check on all the others.

If generative AI can be made reliable, he went on, the potential is enormous for what the Pentagon calls “decision support.” That means not only helping policymakers and commanders understand a seething mass of ever-changing data, but actually generating potential courses of action — be that restructuring a troubled procurement contract or launching a particular missile at a priority target.

The inputs to such a generative AI, Holliday explained, could include intelligence on the threats and the current status of the friendly force’s “kill web” — the networked sensors, weapons, jammers, and other systems that can have different effects, lethal or disabling, on the enemy. The output generated, he said, would be “options” for the commander. This kind of AI-assisted “battle management” is a central goal for the Pentagon’s sprawling Joint All Domain Command and Control (JADC2) effort.

A Navy technician trouble-shoots a server aboard USS America in 2017. (U.S. Navy photo by Mass Communication Specialist 3rd Class Vance Hand)

A Long Road To Trust

The limitations of AI have been on military minds for many years, Holliday said, but what “crystalized” the issue was the reaction to the Defense Science Board’s landmark study on autonomy, for which he served as an adviser. That study was done in 2015 and 2016, long before the current boom in generative AI.

“When we briefed that to the combatant commands way back then, they said, ‘Yeah AI is great, autonomy is great, but you know what? We’re never going to use it unless we can trust it,’” Holiday recalled.

“It’s going to take some time— a lot of time — to get a commander and then down to the front lines comfortable with querying a system to get situational awareness,” he said. “They’re going to have to be able to confirm what the AI is telling them is what in fact is going on in the battlefield.”

But there may be no alternative to AI for an increasing range of important missions, Holliday said. Missile defense and cybersecurity in particular are areas where threats can move too fast for humans to react in time.

“We recognize in fights of the future, we’re going to have to be dependent on some form [of AI] over some continuum of capability, because the weapon systems we’re going to be facing — hypersonics, directed energy, and cyber effects — are going to be moving faster than human decision-making, and so you’re going to have to be able to react at machine speed to be adequately defended,” he said.

This may not rise to the level of AI-driven management of the entire battle, both offensive measures and defense, as some envision JADC2, Holliday said. But, he emphasized, “we’re going to have to have some level of autonomy at the very least at the defensive level, so that we can react at machine speed if our adversaries are employing those capabilities.”

So how do you make AI reliable enough to trust with such life-and-death decisions? “There are things that the research community is exploring, like reinforcement learning with human feedback [and] constitutional AI,” Holliday said.

Reinforcement learning with human feedback (RLHF) is already in use with ChatGPT and other generative AIs. In essence, RLHF has human beings rate an AI’s output (the human feedback), with the AI incentivized to do more of whatever got a good rating (the reinforcement) and less of whatever got a poor one. But RLHF is labor-intensive and — as ChatGPT hallucinations show — may not prevent all bad behaviors, because it relies on people.

“The jury’s still out,” Holliday acknowledged. “It’s very costly to do that kind of data training with human feedback, and then you’re susceptible to human bias when that feedback is woven into the training set.”

By contrast, constitutional AI requires human involvement at the start, to draft a set of principles in machine-readable terms, but then uses that computerized “constitution” to automatically rate the AI’s outputs as good or bad, in essence having one algorithm train another rather than having humans manually rate each one. That makes constitutional AI faster and cheaper than RLHF, but still far from infallible.

“The accuracy is only going up,” Holliday said, “but… I don’t think we’re ever going to get to 100 percent or even 99 percent accurate AI.”

“We’re going to have to assign some risk percentage, some accuracy percentage, around it that people are going to have to get comfortable with,” he said.