Team Players: Natural Language Tech Is Shaping Human-Machine Operations
By David Strachan
A rendering of what someone using the new Project Overmatch capability known as Maven would look like. A unified tactical display that provides insight on vessels across the world, the Maven program overlays real-time ship readiness and sustainment data right at the user’s fingertips by utilizing artificial intelligence. It was created by Project Overmatch and Palantir in collaboration as the first of many projects that integrate commercial tech at commercial speeds for government use to ensure interoperability. By continuing to partner with industry through a pilot program known as Open DAGIR, or Data and Applications Government-owned Interoperable Repositories, Project Overmatch is creating an ecosystem of U.S. Navy data to field capability at unprecedented speeds.
Stock photo courtesy of PalantirSensors alert the Ops Center to an underwater intruder. The watch officer issues a series of audible commands: “Bravo Pod: Contact XRAY. Interrogate and report.” Targeting data streams through a fiber optic cable to a seabed docking station and uploads to three dormant AUVs. They spring to life, slip from their berths, and proceed on an intercept course.
Lead vehicle Bravo-1 initiates active sonar, and as the pod closes on the contact it floods the water column with high-powered LEDs, capturing imagery of a shadow hovering in the depths. The data is quickly shared among the pod via OCOMMS. Bravo-2 is designated as relay, racing toward a seabed cable where a prepositioned optical transceiver awaits.
The imagery data transfers via blue-green laser and races back along the cable to Ops where it is processed and displayed. The watch officer recognizes the silhouette, but consults the AI per procedure. “Analyze imagery.” Seconds later a nearby display renders a schematic as an automated voice reports, “PLAN AUV HSU-1. Ninety-eight percent match.”
The watch officer nods, the figure well within ROE. “Bravo Pod: Strike XRAY. Authorization: RYAN ZERO-TWO-NINER.”
The command is transmitted to Bravo-2 via the optical link then rebroadcast acoustically through the water column to Bravo-1 and -3. Bravo-1 acknowledges, arms a small onboard warhead, accelerates, homes, and detonates on contact. Bravo-2 and -3 exchange status via ACOMMS. Silence from Bravo-1.
Bravo-2 relays a SITEP to Ops. “XRAY engaged by Bravo-1. Bravo-2 and Bravo-3 undamaged.” The watch officer replies: “Maintain surveillance and report contacts. Commence BDA.”
This vignette may sound like Jack Ryan fan fiction, but such an engagement may be coming soon to critical underwater infrastructure near you.
Last month, the U.S. Defense Innovation Unit (DIU) announced a $100 million prize challenge for an Autonomous Vehicle Orchestrator (AVO), a layer of technology designed to translate natural language—voice, text, or graphical interface—into actionable tasks across multiple autonomous platforms. “We want orchestrator technologies that allow humans to work the way they already command—through plain language that expresses desired effects, constraints, timing, and priorities—not by clicking through menus or programming behaviors,” said Lt Gen Frank Donovan, Director of the Defense Autonomous Weapons Group (DAWG).
At first glance, this may seem like a moonshot, but AVO-like robotic interfaces, as well as supporting mission-level autonomy and data-fusion frameworks, already exist across government, academia, and industry.
U.S. Army research into human–machine communication has been underway for many years. Its Joint Understanding and Dialogue Interface (JUDI) was developed to enable two-way, conversational interaction between soldiers and robotic systems, employing statistical language classifiers to infer intent from human speech. Meanwhile, within the U.S. Naval Research Laboratory, the Navy Center for Applied Research in Artificial Intelligence (NCARAI) has been exploring how humans engage with machines, including integrating natural language with gesture, touch-based interaction, and visual displays, as well as analyzing phonological differences to accommodate accent and dialect variation.
Academic research has made significant progress in this space. In the undersea domain, gesture-based human–robot interaction has received particular attention, driven by the constraints of acoustic communication and the realities of diver-led operations. Research has focused on the development of static and dynamic hand gestures that enable divers to issue task-level commands, such as repositioning, surveying, or holding station, directly to AUVs. Other efforts have explored language-based interfaces for subsea autonomy, such as Word2Wave, developed by researchers at the University of Florida, or OceanChat, developed at Georgia Tech and the University of Notre Dame, which both convert spoken or written human instructions into structured AUV mission plans using trained language models. Additional studies have extended these concepts to multi-robot coordination, exploring how a single human operator could issue intent-level guidance to a supervisory agent that generates tasking, assigns roles, and coordinates actions across a team of underwater vehicles.
Several commercial systems, while not explicitly plain language-driven, already provide capabilities that partly align with the AVO concept. Anduril’s Lattice functions as a software-centric C2 layer that fuses sensor data to maintain situational awareness, and enables operators to task autonomous systems at a high level of control. Shield AI’s Hivemind or Gotham by Palantir could also support an AVO that requires semantic abstraction, task decomposition, and coordinated execution across multiple systems.
One company is focusing directly on the human-machine interface. New Haven-based Primordial Labs has developed Anura, a voice- and language-driven interface that allows users to task unmanned systems using spoken language rather than traditional controllers or graphical user interfaces. With use cases spanning single platform operations, battlespace management, and multi-domain teaming, Anura illustrates how natural language integration at the tactical edge is already within reach.
From Tactics to Operations
Although the AVO has initially been characterized as a tactical enabler (“Shadow that vessel at twenty kilometers and avoid radar exposure,” to cite DIU’s example), its design could have broader implications. An “orchestrator” suggests a capability that operates above the level of individual platform control, and DIU’s framing—“plain language that expresses desired effects, constraints, timing, and priorities”—implies operational intent in addition to tactical. At a higher level, the AVO could enable commanders to articulate objectives and constraints while delegating tactical execution to machines.
In practice, this shift would allow commanders to focus more on shaping outcomes and less on controlling individual platforms, marking an evolution in how autonomy is employed at the fleet level. This would align closely with the U.S. Navy’s vision for Project Overmatch, or the broader ambition of Joint All-Domain Command and Control (JADC2). As the number of unmanned and autonomous systems grows, the limiting factor increasingly becomes human cognitive capacity rather than data access. The AVO would function as a bridge between higher-level operational intent and data-driven tactical autonomy, shortening decision cycles and enabling effects at greater speed and scale.
Challenges Abound
The AVO will likely resemble a large language model (LLM) like Open AI’s ChatGPT or Google Gemini, translating unstructured command inputs into structured, machine-executable tasks. But while LLMs are powerful tools, their use in coordinating autonomous systems poses significant challenges and risks. Robotic platforms require precise, unambiguous tasking. The LLM would need to translate human intent into actionable commands, but as anyone who has used an LLM knows, they tend to fill in gaps and make assumptions. Subtle differences in phrasing could materially change meaning, and any linguistic misalignment between human and machine would risk misinterpretation. Even benign or imprecise human phrasing could trigger unintended, possibly severe, consequences. From a cybersecurity perspective, LLM-based systems offer adversaries an attractive attack vector, one with a potentially lower barrier to entry than code-driven hacks. The AVO would be vulnerable to malicious prompt injection, poisoned training data, or exploitation of its limitations such as hallucination, weak temporal reasoning, and mismatches between language input and physical reality. Moreover, LLMs do not inherently understand international law, rules of engagement, or escalation thresholds. Any delegation of authority would need to be clearly articulated, constrained, and reversible, rather than inferred by the AVO.
Commanding at Machine Speed
The accelerating tempo of modern conflict and its accompanying flood of data will increasingly demand autonomous and AI-enabled decision-making at the tactical edge. A unified orchestration layer that translates human intent into machine action would represent a decisive enabler not only of tactical autonomy, but operational command that supports the ambitions of Project Overmatch or JADC2. The AVO could represent not only the integration of cutting-edge technology, but an architectural shift, heralding a future in which commanders shape outcomes through intent, while autonomous systems manage execution at speed and scale.
