V Corps conducts Allied Spirit Command Post Exercise

U.S. Army Capt. Michael Holder, a military police officer assigned to V Corps, analyzes a computer during Allied Spirit at U.S. Army Garrison Bavaria Grafenwöhr on March 6, 2023. (U.S. Army photo by Pfc. Myenn LaMotta)

WASHINGTON — The generative AI explosion that began with ChatGPT has led some Army organizations to run up big and unexpected bills, the service’s chief information officer told reporters Tuesday.

Getting GenAI costs under control will be a major focus for a forthcoming rollout of best practices and new policies, expected by April, Leonel Garciga said. But in the meantime, said Garciga, maybe think a little harder before you click.

“Part of this is, let’s not have a bazillion flowers blooming in this space,” Garciga said. “That gets expensive very fast and it gets really hard to protect our data. Let’s be a little bit more deliberate in our approach.”

Garciga and his chief ally in the Army’s procurement shop, deputy assistant secretary for acquisition Jennifer Swanson, combined their previously separate generative AI pilot projects, #CalibrateAI and Athena, back in November. Now they’re analyzing those experiments and preparing comprehensive guidance for the Army: lessons-learned, best practices, and, in a cases, mandatory do’s and don’ts. What’s not coming, they emphasized: new money.

The guidance package will be published by the time the Athena pilot ends in April, Garciga said. “We’ve been hyperfocused on this idea of how we put out guidance across the force, in a bunch of different areas, to make sure that… we’re actually buying the right things and the right secure things,” he told reporters.

Added Swanson: “How we control costs is really important, because what we certainly don’t want to end up with is a massive cloud compute bill. … That’s a big part of what we’re doing.”

Different organizations within the service will use their own budgets and make their own choices about the best GenAI to buy — within the parameters of the guidance — rather than having a top-down mandate or Army-wide program everyone is required to use, she said.

“There isn’t enterprise funding that we’re applying to this. It doesn’t exist,” Swanson emphasized. “Everybody is gonna use their money. But we want to make sure that we’re all using our money smartly and that we all understand what we’re getting into before we go shopping.”

There is a silver lining to this fiscal storm-cloud, Garciga and Swanson said. The good news is that the Army’s GenAI experiments have shown there are lots of commercially available tools, lots of Army organizations eager to try them out, and a growing list of promising applications. For instance, Large Language Models can help acquisition officers understand lengthy proposals from vendors, help counsel sort through voluminous regulations, or help public affairs officers keep track of social media. There’s also some promise for GenAI to help write software code, they said, although so far, said Swanson — an oldschool coder who cut her teeth on FORTRAN — it’s hardly ready to replace human developers.

These near-term possibilities are overwhelmingly in “back office” business processes that have lots of overlap with how private companies operate, Garciga noted. More military-specific applications, like sharing targeting data or controlling unmanned combat vehicles, tends to require other kinds of AI.

Even where functions seems to similar, he said, civilian GenAI may not translate well to military contexts. When the Army looked at aviation safety software for its helicopter fleet, he said as an example, “we just kept on getting responses from FAA stuff that had nothing to do with our rotary wings, or we started getting responses from all the work NASA’s doing.”

That experience shows that whether GenAI is viable really depends on the particular problem you’re trying to solve, Garciga said.

“It taught us the lesson that that we really need to understand the data sets we’re working with,” he explained. “Sometimes people come [to the CIO] and they’re like, ‘We’re gonna throw an LLM at it!’”

The other lesson is that not all GenAI products follow the (admittedly evolving) best practices on controlling costs. While limited versions of ChatGPT, Google Gemini, Twitter’s Grok, and so on are free for anyone to use, it takes expensive and power-hungry GPUs to run GenAI algorithms. When a private business or government organization signs a contract to use GenAI, some vendors have told Breaking Defense they offer paid models that include near-instant billing for each query and can automatically cut off a user if their activity gets too expensive. Most cloud service providers have been “really good” at putting in “hard stops” and other such “guardrails,” Garciga said.

But a lot of GenAI computing power is still paid for under more traditional cloud-computing contracts. Those arrangements may not send the customer a bill until the end of the month — long after a few overly broad queries or a few hyperactive users can burn through the budget for cloud services.

“If it’s Quarter One and you’ve spent 50 percent of your budget in cloud already, we’ve got to have a conversation,” he chuckled.

The forthcoming guidance, Garciga and Swanson said, will help Army organizations sort through which GenAIs are best for what purposes and which vendors follow best practices on cost, billing, and cybersecurity.