2510.18395v1

Model: gemini-3-flash-free

# Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games **Authors**: Runnan Qi, Yanan Ni, Lumin Jiang, Zongyuan Li, Kuihua Huang, Xian Guo institutetext: National University of Defense Technology, Changsha, China email: {qirunnan13579, niyanan, khhuang}@nudt.edu.cn institutetext: Nankai University, Tianjin, China email: {2120230524, guoxian}@mail.nankai.edu.cn Abstract This paper proposes Memory-Augmented State Machine Prompting (MASMP), a novel framework for LLM agents in real-time strategy games. Addressing key challenges like hallucinations and fragmented decision-making in existing approaches, MASMP integrates state machine prompting with memory mechanisms to unify structured actions with long-term tactical coherence. The framework features: (1) a natural language-driven state machine architecture that guides LLMs to emulate finite state machines and behavior trees through prompts, and (2) a lightweight memory module preserving strategic variables (e.g., tactics, priority units) across decision cycles. Experiments in StarCraft II demonstrate MASMP’s 60% win rate against the hardest built-in AI (Lv7), vastly outperforming baselines (0%). Case studies reveal the method retains LLMs’ semantic comprehension while resolving the "Knowing-Doing Gap" through strict state-action mapping, achieving both interpretability and FSM-like reliability. This work establishes a new paradigm for combining neural and symbolic AI in complex decision-making. 1 Introduction Real-time strategy (RTS) games like StarCraft II represent a grand challenge for AI, testing capabilities in real-time decision-making, long-term planning, and strategic adaptation. While reinforcement learning agents like AlphaStar have achieved superhuman performance [1], they require immense computational resources and lack interpretability. In contrast, Large Language Model (LLM)-based agents offer a promising alternative by simulating the human “perception-reasoning-action” cycle [2], demonstrating strong potential across domains from military planning (COA-GPT [3]) to complex game environments like Minecraft (GITM [4]). However, in complex RTS environments, LLM agents face critical limitations that prevent them from competing effectively. They suffer from hallucinations (generating invalid actions), greedy decision-making (prioritizing short-term gains over long-term strategy), and fragmented execution (inconsistent actions across decision cycles due to a lack of memory). These issues result in poor performance; for instance, the LLM-PySC2 agent achieves only an 8% win rate against level-6 and 0% against level-7 built-in AI. To overcome these challenges, we propose the Memory-Augmented State Machine Prompting (MASMP) framework. Our work is built upon LLM-PySC2, a text-based API that provides a natural language interface for StarCraft II, enabling LLMs to process game observations and output actions. MASMP integrates state-machine prompting to enforce structured, reliable decision-making and a strategic memory module to maintain long-term tactical coherence. Our agent achieves a 60% win rate against the hardest built-in AI (Lv7), significantly outperforming all previous LLM-based baselines. This work demonstrates the potential of hybrid neuro-symbolic architectures for complex decision-making tasks. 2 Related Works 2.1 Traditional RTS Game Agents Traditional RTS games have long relied on rule-based systems, where finite state machines (FSMs) [5, 6] and hierarchical FSMs [7] stand out for their simplicity and reliability. Behavior trees offer another effective approach, handling complex concurrent tasks through modular designs [8]. These methods form the backbone of built-in AI systems in popular titles like StarCraft II and Age of Empires. The field advanced significantly with reinforcement learning, particularly AlphaStar ’s breakthrough in achieving superhuman performance in StarCraft II through deep neural networks and imitation learning [1]. Other approaches employing either RL or rule-based methods have also demonstrated strong performance against built-in AI [9]. However, both paradigms face inherent limitations: RL agents demand substantial computational resources and struggle to adapt to novel strategies, while rule-based systems require extensive manual engineering and lack true environmental comprehension. 2.2 Large Language Models in RTS Games The emergence of LLM-based agents has introduced a new paradigm for RTS games. TextStarCraftII [10] pioneered LLM integration with StarCraft II, while LLM-PySC2 [11] enhanced this approach with multi-agent coordination and full action space support, establishing itself as a standard experimental platform. This shift comes with significant challenges. LLM agents are plagued by hallucinations (generating impossible actions), local greediness (prioritizing short-term gains) [12], strategic inconsistency (incoherent planning across decision cycles), and a pronounced Knowing-Doing Gap (failing to execute well-reasoned plans) [12]. Consequently, their performance remains limited, with reported win rates as low as 8% against intermediate-level (Lv6) and 0% against expert-level (Lv7) built-in AI [11]. Current improvement strategies include: - Prompt Engineering: Employing few-shot learning and Chain-of-Thought in both TextStarCraftII [8] and LLM-PySC2 [11] - Hybrid Architectures: Integrating rule-based automation (e.g., Easy Build Mode) in LLM-PySC2 [11] and combining LLM with FSM in SwarmBrain [13] These contributions highlight a crucial insight: LLMs can benefit from traditional rule-based systems as complementary modules for achieving stateful and reliable decision-making. 3 Memory-Augmented State Machine Prompting 3.1 State Machine Prompting for LLM-Based Agents To enhance decision-making reliability in RTS games, we propose State Machine Prompting, a novel approach that guides LLMs to emulate structured decision patterns of finite state machines (FSMs) and behavior trees through natural language. <details> <summary>figure1.png Details</summary> ![7fe546c2](/v1/image/7fe546c27e42ca5e38bad1541f100191e3eed3dcd295956ab4980c530925e906) ### Visual Description # Technical Document Extraction: Prompting Architectures This document provides a detailed technical breakdown of the provided image, which illustrates two distinct logic architectures for Large Language Model (LLM) prompting in a gaming or strategic context: **State Machine Prompting** and **Behavior Tree Prompting**. --- ## 1. State Machine Prompting (Top Section) The top half of the diagram illustrates a Finite State Machine (FSM) consisting of two primary states and the logic governing the transitions between them. ### Components and States * **State: aggressive** (Red rounded rectangle, left side) * **Associated Action:** A red rectangular box below the state contains the text: `action: <All_Units_Attack()>`. * **State: defensive** (Blue rounded rectangle, right side) * **Associated Action:** A blue rectangular box below the state contains the text: `action: <All_Units_Defend()>`. ### Transitions and Logic The states are connected by two directional arrows representing transition logic: 1. **Aggressive to Defensive:** A blue-to-red gradient arrow points from "aggressive" to "defensive". Above it is a text box stating: `switch to defensive when there are insufficient remaining troops`. 2. **Defensive to Aggressive:** A red-to-blue gradient arrow points from "defensive" to "aggressive". Below it is a text box stating: `switch to aggressive when there are sufficient troops`. --- ## 2. Behavior Tree Prompting (Bottom Section) The bottom half of the diagram illustrates a hierarchical Behavior Tree (BT) structure used for decision-making regarding unit production. ### Root Node * **Shape:** Green oval. * **Text:** `root: units training`. * **Function:** The starting point of the logic tree, branching into three sequences. ### Level 1: Sequences The root branches into three green oval nodes, processed from left to right: 1. **sequence: carrier** 2. **sequence: voidray** 3. **sequence: other units** ### Level 2: Conditions and Actions Each sequence contains specific logic paths: #### Sequence: Carrier (Left Branch) * **Condition:** A green rectangle stating: `condition: carrier requirements are fulfilled`. * **Action:** A green rectangle stating: `action: <Train_Carrier()>`. * **Visual Asset:** An image of a "Carrier" unit from StarCraft II is placed next to the action box. #### Sequence: Voidray (Middle Branch) * **Condition:** A green rectangle stating: `condition: voidray requirements are fulfilled`. * **Action:** A green rectangle stating: `action: <Train_VoidRay()>`. * **Visual Asset:** An image of a "Void Ray" unit from StarCraft II is placed next to the action box. #### Sequence: Other Units (Right Branch) * **Logic:** This branch leads directly to a black square containing a white question mark (**?**), indicating an undefined or default fallback action for units not specified in the primary sequences. --- ## 3. Summary of Textual Information | Category | Extracted Text / Labels | | :--- | :--- | | **Headers** | State Machine Prompting, Behavior Tree Prompting | | **States/Roots** | aggressive, defensive, root: units training | | **Sequences** | sequence: carrier, sequence: voidray, sequence: other units | | **Conditions** | switch to defensive when there are insufficient remaining troops, switch to aggressive when there are sufficient troops, condition: carrier requirements are fulfilled, condition: voidray requirements are fulfilled | | **Actions** | action: <All_Units_Attack()>, action: <All_Units_Defend()>, action: <Train_Carrier()>, action: <Train_VoidRay()> | | **Symbols** | ? (Question mark) | --- ## 4. Visual/Spatial Analysis * **Separation:** A dashed horizontal line separates the two prompting methodologies. * **Color Coding:** * **State Machine:** Uses Red (Aggressive/Attack) and Blue (Defensive/Defend) to distinguish between opposing tactical mindsets. * **Behavior Tree:** Uses shades of Green for all nodes, suggesting a unified logic flow or "Go" conditions for production. * **Flow Direction:** The State Machine is cyclical/bi-directional. The Behavior Tree is top-down/hierarchical. </details> Figure 1: Framework of State Machine Prompting for LLM-Based Agents. As shown in Fig.1, our framework comprises three key components: - Macro-Strategic State Machine: Defines tactical states (e.g., <aggressive>), natural language transition conditions, and state-action mappings. - Action Implementation Behavior Tree: Implements hierarchical decision-making through selector, sequence, condition, and action nodes. - Supplementary Atomic Rules: Standalone natural language rules for specific scenarios. Unlike traditional FSMs requiring exhaustive rule enumeration, our approach uses natural language conditions (e.g., "when resources exceed threshold"), leveraging LLMs’ ability to generalize from partial specifications without manual edge-case handling. 3.2 Strategic Memory for Non-Markovian Decision Making RTS games exhibit non-Markovian characteristics due to fog of war and strategic temporal dependencies. While prior works treated RTS as MDPs: $$ a_{t}\sim\text{LLM\_Generate}(o_{t},\text{prompt}) \tag{1} $$ this assumption fails in practice. Our framework introduces strategic memory $M$ storing state variables (e.g., [Tactic]:<defensive>), extending the formulation: $$ (s_{t},a_{t})\sim\text{LLM\_Generate}(o_{t},M_{t-1},\text{prompt}_{sm}) \tag{2} $$ $$ M_{t}=\text{Update}(M_{t-1},s_{t}) \tag{3} $$ where $\text{prompt}_{sm}$ denotes our state machine prompt template, enabling persistent tactical coherence across decisions. 3.3 Implementation in LLM-PySC2 Environment We implement our Memory-Augmented State Machine Prompting (MASMP) framework within LLM-PySC2, creating a closed-loop system for StarCraft II. <details> <summary>figure2.png Details</summary> ![cec34ba0](/v1/image/cec34ba0ec22e8466d55839d5baa2604f83d05c7b0305bbda98c2536cc3cd0d4) ### Visual Description # Technical Document Extraction: LLM-PySC2 Framework Architecture This document provides a comprehensive technical breakdown of the provided architectural diagram, which illustrates a system named **LLM-PySC2**. This system integrates Large Language Models (LLMs) with the StarCraft II game environment via a structured memory and processing pipeline. ## 1. System Overview The architecture is divided into four primary functional regions: 1. **LLM-PySC2 (Environment Interface):** The bridge to the StarCraft II game engine. 2. **Memory:** A persistent storage layer for strategy and state history. 3. **LLM Input/Output Processing:** The formatting and parsing layer for model communication. 4. **LLMs:** The external reasoning engines (represented by OpenAI, Meta/Llama, and DeepSeek logos). --- ## 2. Component Analysis ### Region 1: LLM-PySC2 (Left Column) This region handles the conversion between game-state data and textual information. * **StarCraft II Game:** The core engine, represented by a screenshot of gameplay showing Protoss units (Stalkers and Pylons). * **Obs-Text converter:** An oval-shaped processing node that transforms raw game observations into text. * **Observation Extractor:** Receives data from the converter and sends it to the LLM input stage. * **Text-Action converter:** An oval-shaped node that translates textual commands back into game-executable actions. * **Action Extractor:** Receives commands from the LLM output stage and feeds them into the converter. ### Region 2: Memory (Center Column) A centralized database system to maintain context over time. * **memory.db:** A structured database containing step-by-step logs. * *Data Content:* `{step1:[Tactic]:...}`, `{step2:[Tactic]:...}`. * **get_latest Strategy:** A retrieval component that pulls the most recent tactical state from the database to inform the next LLM prompt. * **Strategy Extractor:** A component that parses the LLM's output to update the memory database with new tactical decisions. ### Region 3: LLM Input (Top Right) This block aggregates various data sources into a single prompt for the LLM. * **Textual Observation:** Real-time data from the Observation Extractor. * **State Machine Prompt:** A visual/logical template (represented by a diagram with red and blue state boxes and green oval transitions). * **Last Strategy States:** Historical context retrieved from Memory. * *Specific Data:* `[Tactic]:<defensive>`, `[PriorityUnit]:<Voidray>`. ### Region 4: LLM Output (Bottom Right) This block represents the structured response from the LLM, which is then decomposed. * **Textual Reasoning:** The model's internal "thought process." * **New Strategy States:** Updated tactical goals. * *Specific Data:* `[Tactic]:<aggressive>`, `[PriorityUnit]:<Carrier>`. * **Textual Analysis:** Further breakdown of the current situation. * **Executable Actions:** Concrete game commands. * *Specific Data:* `<Train_Carrier()>`, `<All_Units_Attack()>`. * **Other Textual Content:** Miscellaneous model output. --- ## 3. Data Flow and Logic The system operates in a closed-loop cycle: 1. **Observation Phase:** The **StarCraft II Game** state is processed by the **Obs-Text converter** into a **Textual Observation**. 2. **Contextualization Phase:** The **Memory** system provides the **Last Strategy States** (e.g., "defensive" with "Voidray"). 3. **Inference Phase:** The **LLM Input** (Observation + State Machine + Memory) is sent to the **LLMs** (OpenAI, Meta, or DeepSeek). 4. **Decision Phase:** The LLM returns an **LLM output**. * The **Strategy Extractor** identifies a shift in strategy (e.g., changing from "defensive" to "aggressive" and prioritizing "Carrier") and updates **memory.db**. * The **Action Extractor** identifies specific commands (e.g., `Train_Carrier()`) and sends them to the **Text-Action converter**. 5. **Execution Phase:** The converted actions are executed within the **StarCraft II Game**, completing the loop. ## 4. Textual Transcriptions | Category | Transcribed Text | | :--- | :--- | | **Headers** | LLM-PySC2, Memory, LLM input, LLM output, LLMs | | **Process Nodes** | Observation Extractor, Obs-Text converter, StarCraft II Game, Text-Action converter, Action Extractor, Strategy Extractor, get_latest Strategy | | **Data Fields (Input)** | Textual Observation, State Machine Prompt, Last Strategy States: [Tactic]:<defensive>, [PriorityUnit]:<Voidray> | | **Data Fields (Output)** | Textual Reasoning, New Strategy States: [Tactic]:<aggressive>, [PriorityUnit]:<Carrier>, Textual Analysis, Executable Actions: <Train_Carrier()>, <All_Units_Attack()>, Other Textual Content | | **Database Content** | memory.db, {step1:[Tactic]:...}, {step2:[Tactic]:...} | </details> Figure 2: MASMP framework within LLM-PySC2 As shown in Fig.2, the system integrates textual observations with our prompt template and retrieved memory to form LLM input. The output is parsed for both action execution and strategy storage. Algorithm 1 Workflow of the MASMP Framework in LLM-PySC2 0: Textual Game Observation $o_{t}$ , MemoryDB $memory$ , State Machine Prompt $prompt_{sm}$ , Timestep $t$ 0: Action execution, Memory update 1: $last\_strategy← memory.\text{get\_latest}()$ {Retrieve via MemoryDB method} 2: $input_{t}←\text{CONCAT}(o_{t},prompt_{sm},last\_strategy)$ 3: $output←\text{LLM\_Generate}(input_{t})$ 4: $strategies←\text{StrategyExtractor.extract\_strategies}(output)$ {Regex pattern matching} 5: if $strategies$ is not empty then 6: $memory.\text{add\_memory}(strategies[0],t)$ {Store with timestep} 7: end if 8: $\text{ExecuteActions}(output)$ 4 Experiments and Results 4.1 Experimental Setup We evaluated our method in the LLM-PySC2 environment using StarCraft II ’s global gameplay scenario on map Simple64. Experiments used DeepSeek-V3 with Easy Build/Control Mode enabled, testing against built-in AI (difficulty levels 1-7) under symmetric fair-play conditions. Win rate ( $WR$ ) was used as the evaluation metric: $$ WR=\frac{N_{\text{win}}}{N_{\text{total}}}\times 100 \tag{4} $$ 4.2 Experimental Results As shown in Table.1, our Memory-Augmented State Machine Prompting (MASMP) approach significantly outperforms the baseline across all difficulty levels. While the baseline fails completely at higher difficulties (0% at Lv6/Lv7), MASMP maintains perfect win rates at Lv1-Lv5 and achieves remarkable 80% and 60% win rates at professional-grade Lv6 and Lv7 respectively. This demonstrates that LLM agents can now compete with professionally-engineered rule-based AI through our integrated approach. Table 1: Win Rate Comparison between Baseline and MASMP | Baseline MASMP | 100% 100% | 100% 100% | 100% 100% | 40% 100% | 40% 100% | 0% 80% | 0% 60% | | --- | --- | --- | --- | --- | --- | --- | --- | 4.3 Comparative Analysis 4.3.1 Strategic Coherence & Dynamic Comprehension Fig.3 illustrates MASMP’s dynamic strategy adaptation. The agent transitions from defensive to aggressive state upon achieving force advantage (Step75), maintains aggression while assessing battle progress (Step76), and strategically retreats when detecting reinforcements (Step77), successfully preserving forces (Step78). This demonstrates coherent tactical tempo control and causal reasoning capabilities absent in memoryless baselines. <details> <summary>figure3a.png Details</summary> ![35a5e192](/v1/image/35a5e19284c7aeb4cb582d70a5db91303596ee3273486d24efdfb97b26c7ca28) ### Visual Description # Technical Document Extraction: StarCraft II Gameplay Interface This document provides a comprehensive extraction of the textual and data-driven information contained within the provided image, which depicts a real-time strategy game interface (StarCraft II). ## 1. Component Isolation The image is segmented into four primary functional regions: * **Header (Top Bar):** Resource management and player status. * **Main Viewport (Center):** Tactical unit display and chat log. * **Minimap & Selection (Bottom Left/Center):** Spatial orientation and unit control groups. * **Command & Replay Interface (Bottom Right):** Playback controls and unit abilities. --- ## 2. Header Data (Resource Management) Located at the top right of the screen, tracking two players (Blue and Red). | Player Icon | Minerals | Vespene Gas | Supply (Current/Max) | | :--- | :--- | :--- | :--- | | **Blue (Protoss)** | 135 | 79 | 102/118 | | **Red (Protoss)** | 1795 | 136 | 112/118 | * **Top Left Dropdown:** `None (N)` --- ## 3. Main Viewport (Tactical Information) ### Textual Overlay (Chat/Log) Located in the lower-center of the viewport: * **Text:** `[All] MainAgentLLMPysc2: None Step75 (08:43:17) <All_Units_Attack()>` * **Context:** This indicates an automated agent command issued at the 8-minute, 43-second mark. ### Visual Components * **Units:** A cluster of Protoss units (Void Rays and Stalkers) are grouped in the center. * **Structures:** A Protoss Pylon and a Nexus (partially visible) are on the left. * **Worker Status:** `Workers: 16/16` (displayed above the minimap area). --- ## 4. Minimap and Selection (Bottom Left/Center) ### Minimap Data * **Location:** Bottom left corner. * **Features:** Shows a blue base (top left) and a green unit cluster (bottom left) within a white boundary box. * **Time Display:** `8:43` ### Control Groups * **F1:** 2 units * **F2:** 24 units * **W:** 2 units ### Unit Selection Grid (Center Console) The grid shows the currently selected army. * **Top Row (Icons 1-8):** 8 Void Rays (indicated by green wireframe icons). * **Middle Row (Icons 9-16):** 2 Void Rays, followed by 6 Stalkers. * **Bottom Row (Icons 17-24):** 8 Stalkers. * **Total Selection:** 10 Void Rays, 14 Stalkers. * **Sub-selection Tabs:** * Void Ray icon: `8` * Stalker icon: `7` (Note: This indicates the active sub-group being viewed). --- ## 5. Command & Replay Interface (Bottom Right) ### Replay Controls (Top of Section) * **Timer:** `8:43 / 19:22` * **Speed:** `Normal` * **Buttons:** Play/Pause, Rewind, Slow Down (-), Speed Up (+). * **Active Player View:** `MainAgentLLMP...` (Dropdown menu). ### Command Card (Bottom Right Corner) * **Instructional Text:** * `[Right Click Icon] to Move` * `[A] then [Left Click Icon] ground to Auto-attack` * **Active Ability Icon:** * **Label:** `E` * **Visual:** Prismatic Alignment (Void Ray ability). ### Menu Buttons (Far Right) * `?` (Help) * `[Speech Bubble]` (Chat) * `[List]` (Log) * `Menu` (System Menu) * **Unspent Points/Alerts:** `0` (Green circle above Menu). --- ## 6. Trend and Fact Summary * **Economic Trend:** The Red player has a significant mineral lead (1795 vs 135) but similar supply counts, suggesting the Blue player (the agent) has spent more of their resources on the current army. * **Game State:** The game is at the 08:43 mark of a 19:22 duration replay. * **Action:** The command `<All_Units_Attack()>` has just been triggered, corresponding with the army cluster moving toward the center-right of the map. </details> (a) step75: <defensive> $→$ <aggressive> <details> <summary>figure3b.png Details</summary> ![cc58bc94](/v1/image/cc58bc94cf270177c74d94aca02f445f299abff21b3a1a7db1cffff89549b1de) ### Visual Description # Technical Document Extraction: StarCraft II Gameplay Interface This document provides a comprehensive extraction of textual and data-driven information from the provided image, which depicts a real-time strategy game (StarCraft II) featuring an automated agent. ## 1. Header / Global Status Bar (Top Right) This section tracks the resources and supply for two players (Blue and Red). | Player Color | Minerals (Icon) | Vespene Gas (Icon) | Supply (Icon) | | :--- | :--- | :--- | :--- | | **Blue** | 155 | 17 | 107 / 118 | | **Red** | 1785 | 82 | 118 / 118 | ## 2. Replay / Control Interface (Middle Right) * **Time Elapsed:** 8:55 / 19:22 * **Game Speed:** Normal * **Active Perspective:** MainAgentLLMP... (Dropdown menu selected) * **Controls:** Standard replay controls (Play/Pause, Skip Back, Speed Decrease/Increase, Camera Lock, Menu). ## 3. In-Game Chat / Log (Center Overlay) The center of the screen contains a series of logs from an automated agent named `MainAgentLLMPysc2`. The logs follow a specific syntax: `[All] [AgentName]: [ActionType] [Step] ([Timestamp]) <[Command]>`. | Step | Timestamp | Command / Action | | :--- | :--- | :--- | | Step 75 | 08:43:71 | `<Train_Phoenix()>` | | Step 75 | 08:43:88 | `<ChronoBoost_Military()>` | | Step 76 | 08:54:28 | `<Build_Pylon()>` | | Step 76 | 08:54:51 | `<Warp_Adept()>` | | Step 76 | 08:54:64 | `<Warp_Adept()>` | | Step 76 | 08:54:78 | `<Research_ProtossAirWeapons()>` | | Step 76 | 08:54:91 | `<ChronoBoost_Military()>` | ## 4. Minimap and Control Groups (Bottom Left) * **Minimap:** Shows a base in the top left (Blue) and a base in the bottom right (Green). The current camera view is focused on the Blue base. * **Control Groups:** * **F1:** 2 (Idle Workers) * **F2:** 26 (Army Units) * **W:** 0 (Warp Gates) * **Clock:** 8:55 ## 5. Unit Selection and Command Card (Bottom Center/Right) ### Selection Tray The tray shows 10 units currently selected (likely Phoenixes and Adepts based on the icons and chat log). * **Row 1:** 8 units. Colors indicate health/shield status (Green = Full, Yellow/Orange = Damaged). * **Row 2:** 2 units. Both are Green (Full health). ### Command Card (Bottom Right) * **Instructional Text:** * "[Right Click Icon] to Move" * "[A] then [Left Click Icon] ground to Auto-attack" * **Active Ability Icon:** "E" (Likely Graviton Beam for Phoenixes). ## 6. Visual Scene Analysis (Main View) * **Environment:** Dark, rocky terrain (Protoss-themed map). * **Structures:** Multiple Protoss structures are visible, including a Nexus (center-top), Pylons, and Stargates. * **Units:** A large cluster of Protoss units is gathered in the center. * **Phoenixes:** Numerous aerial units emitting blue energy beams (Chrono Boost effect) toward the Stargates. * **Adepts:** Ground units positioned below the Phoenixes. * **Visual Effects:** Bright blue "Chrono Boost" animations are active on the production structures, correlating with the chat logs `<ChronoBoost_Military()>`. ## 7. Technical Summary The image captures an AI agent (`MainAgentLLMPysc2`) managing a Protoss army and economy. The agent is currently at "Step 76" of its execution, focusing on air unit production (Phoenixes), air weapon research, and warping in ground reinforcements (Adepts). The agent is utilizing "Chrono Boost" to accelerate these processes. The Red player has a significant mineral lead (1785 vs 155), while the Blue player (the agent) is actively spending resources on production. </details> (b) step76: <aggressive> <details> <summary>figure3c.png Details</summary> ![4fd08cdc](/v1/image/4fd08cdc02194f7098a49518cb103c00bf76d3377d588a24f69140e8ebf73304) ### Visual Description # Technical Document Extraction: StarCraft II Gameplay Interface ## 1. Image Overview This image is a high-resolution screenshot of the real-time strategy game **StarCraft II**. It depicts a combat engagement between two Protoss factions (distinguished by Blue and Red team colors) on a dark, rocky terrain map. The interface includes a Head-Up Display (HUD) with resource counts, a minimap, unit selection wireframes, and a replay control console. --- ## 2. Component Isolation & Data Extraction ### A. Header / Resource Bar (Top Right) This section tracks the economic and population status of the two players. | Player Color | Minerals (Blue Icon) | Vespene Gas (Green Icon) | Supply (Food Icon) | | :--- | :--- | :--- | :--- | | **Blue** | 170 | 61 | 94/111 | | **Red** | 2045 | 166 | 88/118 | * **Top Left Dropdown:** Displays "None [N]" in a green-bordered box. ### B. Main Game World (Center) * **Action:** Red Stalkers (robotic quadruped units) are attacking a Blue Protoss base. One Red Stalker is firing a blue particle beam at a Blue unit. * **Structures:** Multiple Protoss structures are visible, including Pylons (power sources), Gateways/Warpgates (unit production), and a Nexus (base hub). * **Chat Log (Center Bottom):** * `[All] MainAgentLLMPysc2: None Step77 (09:05:31)` * `<All_Units_Retreat()>` (Note: This appears to be a scripted command or bot output). ### C. Replay Control Console (Middle Right) * **Time:** `9:05 / 19:22` * **Speed:** `Normal` * **Controls:** Standard playback buttons (Play/Pause, Rewind, Speed Decrease `-`, Speed Increase `+`). * **Active Perspective:** `MainAgentLLMP...` (Dropdown menu). ### D. Minimap (Bottom Left) * **Spatial Grounding:** The map shows a two-player layout. * **Blue Presence:** Concentrated in the top-left quadrant (Main base and natural expansion). * **Green Presence:** Concentrated in the bottom-right quadrant. * **Current View:** A white trapezoid indicates the camera is currently focused on the Blue player's natural expansion (top-left area). * **Timer:** `9:05` is displayed above the minimap. * **Unit Counts (Above Minimap):** * `2` (F1 key icon) * `18` (F2 key icon - Select All Army) * `0` (W key icon - Warp Gates) ### E. Unit Selection Panel (Bottom Center) This panel shows the currently selected group of units. The units are highlighted with colored wireframes indicating their health/status. * **Total Units Selected:** 21 units are visible in the grid. * **Composition:** * **Stalkers:** 11 units (Mix of Green and Yellow health status). * **Sentries:** 8 units (Mix of Green and Yellow health status). * **Zealots:** 2 units (One Green, one Red health status). * **Control Groups:** * Group `1`: Contains 8 units (indicated by a small icon above the grid). * Group `2`: Contains 3 units. ### F. Command Card & Info (Bottom Right) * **Portrait:** A 3D animated portrait of a Protoss Zealot. * **Instructional Text:** * `[Right Click Icon] to Move` * `[A] then [Left Click Icon] ground to Auto-attack` * **Active Ability:** A blue icon with the letter `G` (Guardian Shield). --- ## 3. Trend & Logic Verification * **Economic Trend:** The Red player has a massive surplus of Minerals (2045) compared to the Blue player (170), suggesting Red is either floating resources or has a significantly stronger economy but lower spending efficiency at this timestamp. * **Combat Status:** The selection grid shows one Zealot in "Red" health, indicating it is near death. The presence of "Yellow" wireframes across Stalkers and Sentries confirms an ongoing or very recent engagement where shields have been depleted and hull damage has been taken. * **Game Phase:** At 09:05, the game is in the mid-game phase, with both players having established at least two bases and reached mid-tier tech (Stalkers/Sentries). ## 4. Language Declaration * **Primary Language:** English. * **Other Languages:** None detected. All technical terms and UI elements are in English. </details> (c) step77: <aggressive> $→$ <defensive> <details> <summary>figure3d.png Details</summary> ![b67a3146](/v1/image/b67a3146524eab2b307a36937dc92ca2a7040c1075b93e4838e67f5b3e0320e1) ### Visual Description # Technical Document Extraction: StarCraft II Gameplay Interface This document provides a comprehensive extraction of textual and data-driven information from the provided image, which depicts a StarCraft II replay or live session involving an automated agent. ## 1. Header Information (Top Bar) The top bar contains resource counts and supply information for two players, identified by color-coded icons. | Player Color | Minerals | Vespene Gas | Supply (Current/Max) | | :--- | :--- | :--- | :--- | | **Blue** | 255 | 93 | 95/111 | | **Red** | 2070 | 112 | 82/118 | * **Application Title (Top Left):** StarCraft II * **Active View (Top Left Dropdown):** None (N) ## 2. Central Log / Event Feed A series of text logs are overlaid in the center of the screen, detailing the actions of an agent named `MainAgentLLMPysc2`. | Timestamp | Agent/Channel | Action/Command | | :--- | :--- | :--- | | (09:05:45) | [All] MainAgentLLMPysc2 | None Step77 `<All_Units_Defend()>` | | (09:05:62) | [All] MainAgentLLMPysc2 | None Step77 `<Train_VoidRay()>` | | (09:05:80) | [All] MainAgentLLMPysc2 | None Step77 `<Train_VoidRay()>` | | (09:05:94) | [All] MainAgentLLMPysc2 | None Step77 `<Warp_Adept()>` | | (09:06:03) | [All] MainAgentLLMPysc2 | None Step77 `<Build_Stargate()>` | | (09:06:25) | [All] MainAgentLLMPysc2 | None Step77 `<Research_ProtossAirWeapons()>` | | (09:16:56) | [All] MainAgentLLMPysc2 | None Step78 `<All_Units_Defend()>` | ## 3. Replay Control Interface (Middle Right) A control panel for playback is visible. * **Current Time / Total Time:** 9:16 / 19:22 * **Playback Speed:** Normal * **Selected Perspective:** MainAgentLLMP... (Dropdown) * **Controls:** Standard playback buttons (Play/Pause, Rewind, Fast Forward, Speed Adjustments). ## 4. Unit Selection and Command Card (Bottom Center/Right) ### Selection Tray (Bottom Center) The tray shows a group of units currently selected or active. The units are Protoss, primarily Void Rays and Adepts. * **Top Row (8 units):** 1 Green, 1 Red, 1 Yellow, 2 Green, 1 Yellow, 2 Green. * **Bottom Row (5 units):** 1 Red, 4 Green. * **Control Group Indicator:** A small "1" is visible above the first unit in the tray. ### Command Card (Bottom Right) * **Portrait:** Protoss Adept. * **Instructional Text:** * "[Right Click Icon] to Move" * "[A] then [Left Click Icon] ground to Auto-attack" * **Active Ability Icon:** A blue icon with a "G" hotkey (likely Psionic Transfer). ## 5. Minimap and Navigation (Bottom Left) * **Minimap:** Shows a two-player map. * **Blue base:** Located at the top left. * **Green base:** Located at the bottom right. * **Camera Viewport:** A white trapezoid indicates the current camera focus is near the center-left of the map. * **Hotkeys (Above Minimap):** * F1: 2 (Idle Workers) * F2: 13 (Army Units) * W: 2 (Warp Gates) * **Game Clock:** 9:16 ## 6. Visual Scene Description (Main Viewport) * **Environment:** A dark, rocky, lunar-style terrain with cliffs. * **Units:** A cluster of Protoss **Void Rays** (approximately 7-8) are hovering near a cliff edge. They are emitting blue energy glows. * **Structures:** Several Protoss **Pylons** (providing power) and **Stargates** (production structures) are visible in the immediate vicinity of the units. * **Spatial Grounding:** The units are positioned centrally, slightly above the log text. The structures are clustered below the units. </details> (d) step78: <defensive> Figure 3: Dynamic Strategy Adaptation Case 4.3.2 Solving the Greedy Trap: Long-Term Planning <details> <summary>x1.png Details</summary> ![fa1b1c72](/v1/image/fa1b1c7201b2af0160bb83ae6a1efb0edbeeab94e47294db3688a57020047b4a) ### Visual Description # Technical Data Extraction: Unit Ratio Comparison ## 1. Document Overview This image contains two side-by-side pie charts comparing unit compositions between two different scenarios: **Baseline** and **MASMP**. The data relates to real-time strategy game unit compositions (specifically Protoss units from StarCraft II). ### Header Information - **Main Title (Center):** Total Population: Baseline=43.3, MASMP=45.6 - **Left Chart Title:** Baseline Unit Ratio - **Right Chart Title:** MASMP Unit Ratio --- ## 2. Component Isolation & Data Extraction ### Region A: Baseline Unit Ratio (Left Chart) **Total Population Context:** 43.3 | Unit Category | Color | Percentage | Visual Trend/Placement | | :--- | :--- | :--- | :--- | | **Zealot** | Red/Salmon | 57.4% | Largest slice; spans from ~6 o'clock to ~12 o'clock. | | **Advanced** | Dark Blue/Purple | 19.4% | Second largest; located in the top right quadrant. | | **Stalker** | Yellow/Gold | 17.9% | Third largest; located in the bottom right quadrant. | | **Adapt** | Light Blue | 5.3% | Smallest slice; located between Zealot and Stalker. | ### Region B: MASMP Unit Ratio (Right Chart) **Total Population Context:** 45.6 | Unit Category | Color | Percentage | Visual Trend/Placement | | :--- | :--- | :--- | :--- | | **Advanced** | Dark Blue/Purple | 40.2% | Largest slice; spans from ~12 o'clock to ~4 o'clock. | | **Zealot** | Red/Salmon | 35.4% | Second largest; spans from ~8 o'clock to ~12 o'clock. | | **Adapt** | Light Blue | 16.5% | Third largest; located in the bottom left quadrant. | | **Stalker** | Yellow/Gold | 7.9% | Smallest slice; located in the bottom right quadrant. | --- ## 3. Comparative Analysis & Key Trends By comparing the two charts, the following shifts in unit composition are observed: 1. **Advanced Units:** Show the most significant growth, increasing from **19.4%** in Baseline to **40.2%** in MASMP (a +20.8 percentage point increase). 2. **Zealots:** Show a significant decrease, dropping from **57.4%** in Baseline to **35.4%** in MASMP (a -22.0 percentage point decrease). 3. **Adapts:** Show a notable increase in presence, growing from **5.3%** to **16.5%** (+11.2 percentage points). 4. **Stalkers:** Show a decrease in presence, dropping from **17.9%** to **7.9%** (-10.0 percentage points). 5. **Total Population:** The MASMP scenario has a slightly higher total population (**45.6**) compared to the Baseline (**43.3**). ## 4. Text Transcription Summary - **Titles:** "Baseline Unit Ratio", "MASMP Unit Ratio", "Total Population: Baseline=43.3, MASMP=45.6" - **Labels:** "Zealot", "Advanced", "Stalker", "Adapt" - **Values:** "57.4%", "19.4%", "17.9%", "5.3%", "35.4%", "40.2%", "7.9%", "16.5%" </details> Figure 4: Early-game Unit Production Ratio Fig.4 quantitatively shows MASMP’s superior long-term planning. At 7-minute mark, MASMP produces 18.32 advanced units (40.2% of total) versus baseline’s 8.40 (19.6%), with better diversification (Zealot ratio: 35.5% vs. 57.8%). Through state variables like [PriorityUnit], our method guides resource allocation toward technological advancement, avoiding the baseline’s greedy trap of spamming low-tier units. 4.3.3 Advantages Over Traditional State Machines MASMP maintains structural constraints while preserving LLMs’ advantages: - Interpretability: Natural language justifications for state transitions - Generalization: Semantic adaptation to unseen scenarios - Creativity: Autonomous employment of unspecified counters The probabilistic formulation enables fuzzy reasoning that eliminates traditional FSMs’ need for precise thresholds and manual rule programming. 5 Conclusion and Future Work We proposed Memory-Augmented State Machine Prompting (MASMP), a novel framework that bridges LLM flexibility with rule-based reliability for RTS games. MASMP integrates state machine prompting with strategic memory, achieving a 60% win rate against StarCraft II ’s highest-difficulty AI (Lv7), significantly outperforming baselines (0%). This demonstrates the potential of hybrid neuro-symbolic architectures for complex decision-making. Future work includes exploring multi-agent coordination, dynamic prompt optimization, and cross-domain applications. References - [1] Vinyals, O., Babuschkin, I., Czarnecki, W.M., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 (7782), 350–354 (2019) - [2] Xu, X., Wang, Y., Xu, C., et al.: A survey on game playing agents and large models: Methods, applications, and challenges. https://arxiv.org/abs/2403.10249, last accessed 2025/6/1 - [3] Goecks, V.G., Waytowich, N.: Coa-gpt: Generative pre-trained transformers for accelerated course of action development in military operations. In: 2024 International Conference on Military Communication and Information Systems (ICMCIS), pp. 1–10. IEEE (2024) - [4] Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. https://arxiv.org/abs/2305.17144, last accessed 2025/6/1 - [5] Fauzi, R., Hariadi, M., Nugroho, S.M.S., et al.: Defense behavior of real time strategy games: Comparison between HFSM and FSM. Indonesia Journal of Electrical Engineering and Computer Science 13 (2), 634–642 (2019) - [6] Jagdale, D.: Finite state machine in game development. International Journal of Advanced Research in Science, Communication and Technology 10 (1) (2021) - [7] Hu, W., Zhang, Q., Mao, Y.: Component-based hierarchical state machine—A reusable and flexible game AI technology. In: 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, vol. 2, pp. 319–324. IEEE (2011) - [8] Sekhavat, Y.A.: Behavior trees for computer games. International Journal on Artificial Intelligence Tools 26 (2), 1730001 (2017) - [9] Sun, P., Sun, X., Han, L., et al.: Tstarbots: Defeating the cheating level builtin ai in starcraft ii in the full game. https://arxiv.org/abs/1809.07193, last accessed 2025/6/1 - [10] Ma, W., Mi, Q., Zeng, Y., et al.: Large language models play starcraft ii: Benchmarks and a chain of summarization approach. Advances in Neural Information Processing Systems 37, 133386–133442 (2024) - [11] Li, Z., Ni, Y., Qi, R., et al.: Llm-pysc2: Starcraft ii learning environment for large language models. https://arxiv.org/abs/2411.05348, last accessed 2025/6/1 - [12] Schmied, T., Bornschein, J., Grau-Moya, J., et al.: Llms are greedy agents: Effects of rl fine-tuning on decision-making abilities. https://arxiv.org/abs/2504.16078, last accessed 2025/6/1 - [13] Shao, X., Jiang, W., Zuo, F., et al.: SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models. https://arxiv.org/abs/2401.17749, last accessed 2025/6/1

Rendering Paper...