2510.18395v1

Model: nemotron-free

# Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games **Authors**: Runnan Qi, Yanan Ni, Lumin Jiang, Zongyuan Li, Kuihua Huang, Xian Guo institutetext: National University of Defense Technology, Changsha, China email: {qirunnan13579, niyanan, khhuang}@nudt.edu.cn institutetext: Nankai University, Tianjin, China email: {2120230524, guoxian}@mail.nankai.edu.cn Abstract This paper proposes Memory-Augmented State Machine Prompting (MASMP), a novel framework for LLM agents in real-time strategy games. Addressing key challenges like hallucinations and fragmented decision-making in existing approaches, MASMP integrates state machine prompting with memory mechanisms to unify structured actions with long-term tactical coherence. The framework features: (1) a natural language-driven state machine architecture that guides LLMs to emulate finite state machines and behavior trees through prompts, and (2) a lightweight memory module preserving strategic variables (e.g., tactics, priority units) across decision cycles. Experiments in StarCraft II demonstrate MASMP’s 60% win rate against the hardest built-in AI (Lv7), vastly outperforming baselines (0%). Case studies reveal the method retains LLMs’ semantic comprehension while resolving the "Knowing-Doing Gap" through strict state-action mapping, achieving both interpretability and FSM-like reliability. This work establishes a new paradigm for combining neural and symbolic AI in complex decision-making. 1 Introduction Real-time strategy (RTS) games like StarCraft II represent a grand challenge for AI, testing capabilities in real-time decision-making, long-term planning, and strategic adaptation. While reinforcement learning agents like AlphaStar have achieved superhuman performance [1], they require immense computational resources and lack interpretability. In contrast, Large Language Model (LLM)-based agents offer a promising alternative by simulating the human “perception-reasoning-action” cycle [2], demonstrating strong potential across domains from military planning (COA-GPT [3]) to complex game environments like Minecraft (GITM [4]). However, in complex RTS environments, LLM agents face critical limitations that prevent them from competing effectively. They suffer from hallucinations (generating invalid actions), greedy decision-making (prioritizing short-term gains over long-term strategy), and fragmented execution (inconsistent actions across decision cycles due to a lack of memory). These issues result in poor performance; for instance, the LLM-PySC2 agent achieves only an 8% win rate against level-6 and 0% against level-7 built-in AI. To overcome these challenges, we propose the Memory-Augmented State Machine Prompting (MASMP) framework. Our work is built upon LLM-PySC2, a text-based API that provides a natural language interface for StarCraft II, enabling LLMs to process game observations and output actions. MASMP integrates state-machine prompting to enforce structured, reliable decision-making and a strategic memory module to maintain long-term tactical coherence. Our agent achieves a 60% win rate against the hardest built-in AI (Lv7), significantly outperforming all previous LLM-based baselines. This work demonstrates the potential of hybrid neuro-symbolic architectures for complex decision-making tasks. 2 Related Works 2.1 Traditional RTS Game Agents Traditional RTS games have long relied on rule-based systems, where finite state machines (FSMs) [5, 6] and hierarchical FSMs [7] stand out for their simplicity and reliability. Behavior trees offer another effective approach, handling complex concurrent tasks through modular designs [8]. These methods form the backbone of built-in AI systems in popular titles like StarCraft II and Age of Empires. The field advanced significantly with reinforcement learning, particularly AlphaStar ’s breakthrough in achieving superhuman performance in StarCraft II through deep neural networks and imitation learning [1]. Other approaches employing either RL or rule-based methods have also demonstrated strong performance against built-in AI [9]. However, both paradigms face inherent limitations: RL agents demand substantial computational resources and struggle to adapt to novel strategies, while rule-based systems require extensive manual engineering and lack true environmental comprehension. 2.2 Large Language Models in RTS Games The emergence of LLM-based agents has introduced a new paradigm for RTS games. TextStarCraftII [10] pioneered LLM integration with StarCraft II, while LLM-PySC2 [11] enhanced this approach with multi-agent coordination and full action space support, establishing itself as a standard experimental platform. This shift comes with significant challenges. LLM agents are plagued by hallucinations (generating impossible actions), local greediness (prioritizing short-term gains) [12], strategic inconsistency (incoherent planning across decision cycles), and a pronounced Knowing-Doing Gap (failing to execute well-reasoned plans) [12]. Consequently, their performance remains limited, with reported win rates as low as 8% against intermediate-level (Lv6) and 0% against expert-level (Lv7) built-in AI [11]. Current improvement strategies include: - Prompt Engineering: Employing few-shot learning and Chain-of-Thought in both TextStarCraftII [8] and LLM-PySC2 [11] - Hybrid Architectures: Integrating rule-based automation (e.g., Easy Build Mode) in LLM-PySC2 [11] and combining LLM with FSM in SwarmBrain [13] These contributions highlight a crucial insight: LLMs can benefit from traditional rule-based systems as complementary modules for achieving stateful and reliable decision-making. 3 Memory-Augmented State Machine Prompting 3.1 State Machine Prompting for LLM-Based Agents To enhance decision-making reliability in RTS games, we propose State Machine Prompting, a novel approach that guides LLMs to emulate structured decision patterns of finite state machines (FSMs) and behavior trees through natural language. <details> <summary>2510.18395v1/figure1.png Details</summary> ![7fe546c27e42ca5e38bad1541f100191e3eed3dcd295956ab4980c530925e906](http://localhost:8000/v1/image/7fe546c27e42ca5e38bad1541f100191e3eed3dcd295956ab4980c530925e906) ### Visual Description # Technical Document Extraction: State Machine and Behavior Tree Prompting ## State Machine Prompting ### Diagram Structure - **Header**: "State Machine Prompting" (black text on white background) - **States**: - **Aggressive State** (red rectangle): - Label: `aggressive` - Action: `<All_Units_Attack()>` - Transition Conditions: - `switch to defensive when there are insufficient remaining troops` (blue arrow to defensive state) - `switch to aggressive when there are sufficient troops` (red arrow to itself) - **Defensive State** (blue rectangle): - Label: `defensive` - Action: `<All_Units_Defend()>` - Transition Condition: - `switch to defensive when there are insufficient remaining troops` (blue arrow to itself) ### Spatial Grounding - Aggressive state: Positioned left (x: 0.2, y: 0.2) - Defensive state: Positioned right (x: 0.8, y: 0.2) - Arrows: - Red (aggressive→aggressive): x: 0.5, y: 0.3 - Blue (aggressive→defensive): x: 0.5, y: 0.4 - Blue (defensive→defensive): x: 0.5, y: 0.5 ## BehaviorTree Prompting ### Diagram Structure - **Header**: "Behavior Tree Prompting" (black text on white background) - **Root Node**: - Label: `root: units training` (green oval) - Child Sequences: 1. **Carrier Sequence** (green oval): - Label: `sequence: carrier` - Condition: `carrier requirements are fulfilled` (green rectangle) - Action: `<Train_Carrier()>` (green rectangle) - Unit Image: [Carrier unit icon] 2. **Voidray Sequence** (green oval): - Label: `sequence: voidray` - Condition: `voidray requirements are fulfilled` (green rectangle) - Action: `<Train_VoidRay()>` (green rectangle) - Unit Image: [Voidray unit icon] 3. **Other Units Sequence** (green oval): - Label: `sequence: other units` - Action: `?` (black square with white question mark) ### Spatial Grounding - Root node: Center (x: 0.5, y: 0.2) - Carrier sequence: Left (x: 0.3, y: 0.3) - Voidray sequence: Center (x: 0.5, y: 0.3) - Other units sequence: Right (x: 0.7, y: 0.3) - Unit icons: Positioned adjacent to respective action nodes ## Key Observations 1. **State Machine Logic**: - Aggressive/Defensive states toggle based on troop availability - Actions are atomic (`All_Units_Attack`/`All_Units_Defend`) 2. **Behavior Tree Hierarchy**: - Root node branches into three specialized training sequences - Carrier and Voidray sequences have explicit requirement checks - "Other units" sequence lacks defined action (marked with `?`) ## Language Analysis - Primary language: English - No non-English text detected ## Diagram Validation - All transition conditions in State Machine are mutually exclusive - Behavior Tree sequences follow hierarchical structure (root → sequence → condition → action) - Color coding consistent: Red (aggressive), Blue (defensive), Green (behavior tree elements) </details> Figure 1: Framework of State Machine Prompting for LLM-Based Agents. As shown in Fig.1, our framework comprises three key components: - Macro-Strategic State Machine: Defines tactical states (e.g., <aggressive>), natural language transition conditions, and state-action mappings. - Action Implementation Behavior Tree: Implements hierarchical decision-making through selector, sequence, condition, and action nodes. - Supplementary Atomic Rules: Standalone natural language rules for specific scenarios. Unlike traditional FSMs requiring exhaustive rule enumeration, our approach uses natural language conditions (e.g., "when resources exceed threshold"), leveraging LLMs’ ability to generalize from partial specifications without manual edge-case handling. 3.2 Strategic Memory for Non-Markovian Decision Making RTS games exhibit non-Markovian characteristics due to fog of war and strategic temporal dependencies. While prior works treated RTS as MDPs: $$ a_{t}\sim\text{LLM\_Generate}(o_{t},\text{prompt}) \tag{1} $$ this assumption fails in practice. Our framework introduces strategic memory $M$ storing state variables (e.g., [Tactic]:<defensive>), extending the formulation: $$ (s_{t},a_{t})\sim\text{LLM\_Generate}(o_{t},M_{t-1},\text{prompt}_{sm}) \tag{2} $$ $$ M_{t}=\text{Update}(M_{t-1},s_{t}) \tag{3} $$ where $\text{prompt}_{sm}$ denotes our state machine prompt template, enabling persistent tactical coherence across decisions. 3.3 Implementation in LLM-PySC2 Environment We implement our Memory-Augmented State Machine Prompting (MASMP) framework within LLM-PySC2, creating a closed-loop system for StarCraft II. <details> <summary>2510.18395v1/figure2.png Details</summary> ![cec34ba0ec22e8466d55839d5baa2604f83d05c7b0305bbda98c2536cc3cd0d4](http://localhost:8000/v1/image/cec34ba0ec22e8466d55839d5baa2604f83d05c7b0305bbda98c2536cc3cd0d4) ### Visual Description # Technical Document Extraction of Image ## Overview This diagram depicts a system architecture for processing strategy data from StarCraft II using Large Language Models (LLMs). The system integrates game state observation extraction, memory management, and LLM-driven strategy generation. Key components include: 1. **Left Column (Input Pipeline)** - **LLM-PySC2**: - Observation Extractor - Obs-Text Converter (→ StarCraft II Game) - Text-Action Converter (→ Action Extractor) - **StarCraft II Game**: ![Game screenshot](...base64_image...) 2. **Center (Memory System)** - **Memory**: - `get_latest Strategy` - `memory.db` storage: - `{step1: [Tactic: ...]}` - `{step2: [Tactic: ...]}` - **Strategy Extractor**: Processes memory data 3. **Right Column (LLM Processing)** - **LLM Input**: - **Textual Observation**: Game state data - **State Machine Prompt**: Visual node-based diagram with: - Red node: "state" - Blue node: "action" - **Last Strategy States**: - `[Tactic]:<defensive>` - `[PriorityUnit]:<Voidray>` - **LLM Output**: - **Textual Reasoning**: Analysis of game state - **New Strategy States**: - `[Tactic]:<aggressive>` - `[PriorityUnit]:<Carrier>` - **Textual Analysis**: Strategic recommendations - **Executable Actions**: - `<Train_Carrier()>` - `<All_Units_Attack()>` 4. **LLM Services**: - Hexagonal node icon (Input) - Whale icon (Output) ## Textual Data Flow 1. **Observation Extraction**: - Raw game data → Observation Extractor → Textual representation via Obs-Text Converter 2. **Memory Integration**: - Textual observation → Memory System → Last Strategy States 3. **LLM Processing**: - State Machine Prompt + Last Strategy States → LLM → New Strategy States + Executable Actions 4. **Output Generation**: - LLM → Strategy Extractor → Executable Actions (green-highlighted) ## Component Relationships - **Bidirectional Flow**: Game observation <-> LLM-generated strategies - **Memory Buffering**: Historical tactics stored in `memory.db` - **Action Generation**: Executable Actions directly interface with game engine ## Color Coding - Green arrows: Observation conversion flow - Blue arrows: Strategy extraction/memory processing - Black arrows: LLM input/output ## Critical Textual Elements - `get_latest Strategy`: Primary memory query function - `memory.db`: Persistent strategy storage - `Executable Actions`: Direct API calls to game engine - `State Machine Prompt`: Visual state transition model ## System Architecture </details> Figure 2: MASMP framework within LLM-PySC2 As shown in Fig.2, the system integrates textual observations with our prompt template and retrieved memory to form LLM input. The output is parsed for both action execution and strategy storage. Algorithm 1 Workflow of the MASMP Framework in LLM-PySC2 0: Textual Game Observation $o_{t}$ , MemoryDB $memory$ , State Machine Prompt $prompt_{sm}$ , Timestep $t$ 0: Action execution, Memory update 1: $last\_strategy← memory.\text{get\_latest}()$ {Retrieve via MemoryDB method} 2: $input_{t}←\text{CONCAT}(o_{t},prompt_{sm},last\_strategy)$ 3: $output←\text{LLM\_Generate}(input_{t})$ 4: $strategies←\text{StrategyExtractor.extract\_strategies}(output)$ {Regex pattern matching} 5: if $strategies$ is not empty then 6: $memory.\text{add\_memory}(strategies[0],t)$ {Store with timestep} 7: end if 8: $\text{ExecuteActions}(output)$ 4 Experiments and Results 4.1 Experimental Setup We evaluated our method in the LLM-PySC2 environment using StarCraft II ’s global gameplay scenario on map Simple64. Experiments used DeepSeek-V3 with Easy Build/Control Mode enabled, testing against built-in AI (difficulty levels 1-7) under symmetric fair-play conditions. Win rate ( $WR$ ) was used as the evaluation metric: $$ WR=\frac{N_{\text{win}}}{N_{\text{total}}}\times 100 \tag{4} $$ 4.2 Experimental Results As shown in Table.1, our Memory-Augmented State Machine Prompting (MASMP) approach significantly outperforms the baseline across all difficulty levels. While the baseline fails completely at higher difficulties (0% at Lv6/Lv7), MASMP maintains perfect win rates at Lv1-Lv5 and achieves remarkable 80% and 60% win rates at professional-grade Lv6 and Lv7 respectively. This demonstrates that LLM agents can now compete with professionally-engineered rule-based AI through our integrated approach. Table 1: Win Rate Comparison between Baseline and MASMP | Baseline MASMP | 100% 100% | 100% 100% | 100% 100% | 40% 100% | 40% 100% | 0% 80% | 0% 60% | | --- | --- | --- | --- | --- | --- | --- | --- | 4.3 Comparative Analysis 4.3.1 Strategic Coherence & Dynamic Comprehension Fig.3 illustrates MASMP’s dynamic strategy adaptation. The agent transitions from defensive to aggressive state upon achieving force advantage (Step75), maintains aggression while assessing battle progress (Step76), and strategically retreats when detecting reinforcements (Step77), successfully preserving forces (Step78). This demonstrates coherent tactical tempo control and causal reasoning capabilities absent in memoryless baselines. <details> <summary>2510.18395v1/figure3a.png Details</summary> ![35a5e19284c7aeb4cb582d70a5db91303596ee3273486d24efdfb97b26c7ca28](http://localhost:8000/v1/image/35a5e19284c7aeb4cb582d70a5db91303596ee3273486d24efdfb97b26c7ca28) ### Visual Description # Technical Document: StarCraft II Game Interface Analysis ## Header Region - **Top Left Corner**: - Game Title: `StarCraft II` - Dropdown Menu: - Label: `None (N)` - Indicator: Green checkmark - **Top Right Corner**: - Resource Counters: - Blue Resource: `135` - Green Resource: `79` - Mineral Resource: `102/118` - Secondary Resource Counters: - Red Resource: `1795` - Green Resource: `136` - Mineral Resource: `112/118` ## Main Battlefield - **Central Units**: - Large insectoid unit cluster with red/gold armor and blue energy effects. - Smaller unit group (left): Red/gold armored units with a glowing blue crystal entity. - **Environment**: - Dark, rocky terrain with shadows suggesting a cavern/underground setting. ## Bottom Left Corner - **Mini-Map**: - Green circle: Player's base. - Blue squares: Enemy positions. - **Resource/Timing Info**: - Workers: `16/16` - Timer: `8:43` ## Bottom Right Corner - **Command Panel**: - Unit Selection: Green icon with count `1`. - Character Portrait: Armored figure with blue energy effects. - Command Instructions: - `to Move` - `(A) then ground to Autoattack` - Keybind: `E` (blue lightning bolt icon). ## Center Screen - **Command Prompt**: - Text: `All MainAgentLLMPysc2: None Step75 (08:43:17) <All_Units_Attack()>` - Red text: `MainAgentLLMPysc2` - Blue text: `All_Units_Attack()` ## Footer Region - **Resource/Status Bar**: - Time Display: `8:43 / 19:22` - Status: `Normal` - Control Buttons: Play, Rewind, Minus, Plus, Dropdown. - **Agent Configuration**: - Dropdown: `MainAgentLLMPysc2` - Icons: Folder, Three horizontal lines. ## Color Legend Cross-Reference - **Blue**: Energy/Ability Effects (e.g., unit abilities, mini-map). - **Green**: Resource Counts, Unit Selection, Mini-map. - **Red**: Unit Group Label (`MainAgentLLMPysc2`). ## Spatial Grounding - **Legend Position**: Top-right corner (resource counters). - **Data Point Validation**: - Blue icons match blue energy effects on units. - Green icons align with resource counters and mini-map. ## Trend Verification - **Resource Trends**: - Blue resource: Stable at `135`. - Green resource: Stable at `79`. - Mineral resource: Decreasing from `102/118` to `112/118`. ## Component Isolation 1. **Header**: Game title and dropdown menu. 2. **Main Chart**: Battlefield with unit clusters and terrain. 3. **Footer**: Resource counters, command panel, and agent configuration. ## Data Table Reconstruction | Resource Type | Count 1 | Count 2 | Mineral Resource | |---------------------|---------|---------|------------------| | Primary Resources | 135 | 79 | 102/118 | | Secondary Resources | 1795 | 136 | 112/118 | ## Notes - No non-English text detected. - All textual elements extracted align with StarCraft II's UI conventions. - No chart/diagram-specific trends identified; interface elements prioritized. </details> (a) step75: <defensive> $→$ <aggressive> <details> <summary>2510.18395v1/figure3b.png Details</summary> ![cc58bc94cf270177c74d94aca02f445f299abff21b3a1a7db1cffff89549b1de](http://localhost:8000/v1/image/cc58bc94cf270177c74d94aca02f445f299abff21b3a1a7db1cffff89549b1de) ### Visual Description # Technical Document: StarCraft II Game Interface Analysis ## Language Declaration - **Primary Language**: English - **Secondary Languages**: None detected --- ## Spatial Component Breakdown ### 1. Top Left Corner - **Title**: `StarCraft II` - **Dropdown Menu**: - Label: `None (N)` - Visual: Green box with dropdown arrow ### 2. Main Game Area (Central) - **Chat Log** (Red/White Text): - `All MainAgentLLMPysc2: None Step75 (08:43:71) <Train_Phoenix>` - `All MainAgentLLMPysc2: None Step75 (08:43:88)` - `All MainAgentLLMPysc2: None Step76 (08:54:28) <Build_Pylon>` - `All MainAgentLLMPysc2: None Step76 (08:54:51) <Warp_Adept>` - `All MainAgentLLMPysc2: None Step76 (08:54:64) <Warp_Adept>` - `All MainAgentLLMPysc2: None Step76 (08:54:78)` - `All MainAgentLLMPysc2: None Step76 (08:54:91)` - `All MainAgentLLMPysc2: ChronoBoost_Military` - **Unit/Structure Labels** (Visual): - Phoenix units (blue-glowing) - Pylon structures (golden) - Warp Adept units (red/white) ### 3. Top Right Corner - **Resource Counters**: - **Minerals**: `155` (Blue icon) - **Gas**: `17` (Green icon) - **Crystals**: `107/116` (White icon) - **Enemy Resources**: - Minerals: `1785` (Red icon) - Gas: `82` (Green icon) - Crystals: `118/118` (White icon) ### 4. Bottom Left Corner - **Mini-Map**: - Labels: `F1`, `F2`, `W` - Timer: `8:55` - Grid: Color-coded regions (blue, green, red) ### 5. Bottom Right Corner - **Command Panel**: - Timer: `8:55 / 19:22` - Difficulty: `Normal` - Controls: - Play/Pause: `▶️`/`⏮️` - Volume: `+`, `-` - Dropdown: `MainAgentLLMP...` - Key Bindings: - `A` + Yellow Icon: `to Move` - `A` + Yellow Icon: `ground to Autoattack` - `E` + Blue Icon: `E` (Ability) --- ## Data Extraction ### Resource Table | Resource Type | Value | Icon Color | |---------------|-------|------------| | Minerals | 155 | Blue | | Gas | 17 | Green | | Crystals | 107/116 | White | | Enemy Minerals| 1785 | Red | | Enemy Gas | 82 | Green | | Enemy Crystals| 118/118 | White | ### Chat Log Analysis - **Command Structure**: - Format: `[Agent]: [Action] (Timestamp) <Command>` - Example: `All MainAgentLLMPysc2: None Step75 (08:43:71) <Train_Phoenix>` - **Key Actions**: - Unit Training: `Train_Phoenix` - Structure Building: `Build_Pylon` - Ability Activation: `Warp_Adept` - Chrono Boost: `ChronoBoost_Military` ### Mini-Map Observations - **Regions**: - `F1`: Blue-highlighted area - `F2`: Green-highlighted area - `W`: Red-highlighted area - **Timer**: `8:55` (Countdown or Elapsed Time?) --- ## Visual Trends 1. **Chat Log**: - Vertical list with timestamps (HH:MM:SS) - Red text indicates system messages or errors - Commands follow a pattern: `[Agent]: [Action] (Timestamp) <Command>` 2. **Resource Counters**: - Player resources: Low (155 minerals, 17 gas) - Enemy resources: High (1785 minerals, 82 gas) 3. **Command Panel**: - Timer discrepancy: `8:55 / 19:22` suggests dual timers (e.g., mission time vs. total time) --- ## Component Isolation ### Header - Game title and dropdown menu ### Main Chart (Game View) - Units/structures with glowing effects - Chat log overlay ### Footer - Mini-map (bottom left) - Command panel (bottom right) --- ## Conclusion The image depicts a StarCraft II gameplay interface with resource management, unit training, and command execution. No traditional charts/diagrams are present, but UI elements function as data points. All textual information has been extracted and spatially grounded. </details> (b) step76: <aggressive> <details> <summary>2510.18395v1/figure3c.png Details</summary> ![4fd08cdc02194f7098a49518cb103c00bf76d3377d588a24f69140e8ebf73304](http://localhost:8000/v1/image/4fd08cdc02194f7098a49518cb103c00bf76d3377d588a24f69140e8ebf73304) ### Visual Description # Technical Document: StarCraft II Game Interface Analysis ## Header Section - **Title**: `StarCraft II` (Top-left corner) - **Minimap Label**: `None (N)` with green checkmark (Top-left corner) ## Main Battlefield - **Chat Window Text**: - `[All] MainAgentLLMPv2: None Step77 (09:05:31) <All_Units_Retreat()>` ## Right-Side UI Elements - **Resource Counters**: - `170` (Blue icon) - `2945` (Red icon) - `61` (Green icon) - `166` (Green icon) - `94/111` (Crystal icon) - `88/118` (Crystal icon) ## Bottom-Left Mini-Map - **Labels**: - `F1`, `F2`, `W` (Top of mini-map) - `9:05` (Timer, bottom-left corner) ## Bottom-Center Unit Selection Panel - **Unit Counts**: - `8` (Top-left) - `3` (Top-center) - `1` (Top-right) - `2` (Bottom-center) ## Bottom-Right Command Panel - **Text**: - `to Move` (Yellow icon) - `[A] then ground to Autoattack` (Yellow icon) - `9:05 / 19:22` (Timer) - `Normal` (Difficulty label) - `MainAgentLLMP...` (Dropdown label) ## Additional Observations - **Language**: All text is in English. - **No charts, heatmaps, or data tables present**. - **No non-English text detected**. ## Spatial Grounding - **Legend Placement**: Not applicable (no legend present). - **Text Placement**: - Chat: Top-right quadrant - Resource counters: Top-right corner - Mini-map: Bottom-left quadrant - Unit selection: Bottom-center - Command panel: Bottom-right quadrant ## Trend Verification - No numerical trends or data series present (UI elements only). ## Component Isolation - **Header**: Title and minimap - **Main Chart**: Battlefield with units/structures - **Footer**: Mini-map, unit selection, command panel </details> (c) step77: <aggressive> $→$ <defensive> <details> <summary>2510.18395v1/figure3d.png Details</summary> ![b67a3146524eab2b307a36937dc92ca2a7040c1075b93e4838e67f5b3e0320e1](http://localhost:8000/v1/image/b67a3146524eab2b307a36937dc92ca2a7040c1075b93e4838e67f5b3e0320e1) ### Visual Description # Technical Document: StarCraft II Game Interface Analysis ## 1. Title and Header - **Title**: "StarCraft II" (top-left corner) - **Minimap Labels**: - "F1", "F2", "W" (bottom-left minimap) - "None (N)" with green checkmark (top-left corner) ## 2. Main Game Screen ### 2.1 Units and Commands - **Unit Labels**: - "MainAgentLLMPysc2" (repeated in chat log) - "None Step77" (command with timestamp) - **Chat Log Entries** (bottom-center): - `[ALL] MainAgentLLMPysc2: None Step77 (09:05:45) <All_Units_Defend()>` - `[ALL] MainAgentLLMPysc2: None Step77 (09:05:52) <Train_VoidRay()>` - `[ALL] MainAgentLLMPysc2: None Step77 (09:05:30) <Train_VoidRay()>` - `[ALL] MainAgentLLMPysc2: None Step77 (09:05:34) <Warp_Adept()>` - `[ALL] MainAgentLLMPysc2: None Step77 (09:06:03) <Build_Stargate()>` - `[ALL] MainAgentLLMPysc2: None Step78 (09:16:58) <All_Units_Defend()>` ### 2.2 Resource Display - **Top-right corner**: - Blue icon: `255` - Red icon: `2070` - Green icon: `93` / `112` - Silver icon: `95/111` / `82/118` ### 2.3 Unit Selection Panel - **Bottom-center**: - Unit icons with labels: - `7` (top-left) - `1` (bottom-left) - Action labels: - "to Move" - "(A) then ground To Autoattack" ## 3. Bottom UI Elements - **Minimap Timer**: `9:16` (bottom-left) - **Status Bar**: - "Normal" (top-right) - Play/Pause buttons - Dropdown menu labeled "MainAgentLLMPysc2..." ## 4. Spatial Grounding - **Legend Placement**: - Resource icons: Top-right corner - Unit icons: Bottom-center - Chat log: Bottom-center - **Color Matching**: - Blue icon (`255`) matches minimap blue regions - Red icon (`2070`) matches minimap red regions - Green icons (`93`, `112`) match minimap green regions ## 5. Component Isolation ### Header - Title: "StarCraft II" - Minimap: Left side with labels "F1", "F2", "W" ### Main Chart - Units: Floating on terrain with glowing effects - Chat log: Overlay text with timestamps and commands ### Footer - Resource display: Top-right - Unit selection panel: Bottom-center - Status bar: Bottom-right ## 6. Trend Verification - No numerical trends observed (static UI elements) - Chat log shows sequential command execution with timestamps ## 7. Data Table Reconstruction | Resource Type | Value | |---------------|-------| | Blue | 255 | | Red | 2070 | | Green | 93/112| | Silver | 95/111, 82/118| ## 8. Language Notes - **Primary Language**: English - **In-Game Terms**: Transcribed as-is (e.g., "VoidRay", "Warp_Adept") </details> (d) step78: <defensive> Figure 3: Dynamic Strategy Adaptation Case 4.3.2 Solving the Greedy Trap: Long-Term Planning <details> <summary>2510.18395v1/x1.png Details</summary> ![fa1b1c7201b2af0160bb83ae6a1efb0edbeeab94e47294db3688a57020047b4a](http://localhost:8000/v1/image/fa1b1c7201b2af0160bb83ae6a1efb0edbeeab94e47294db3688a57020047b4a) ### Visual Description # Technical Document Extraction: Unit Ratio Analysis ## Image Description The image contains two pie charts comparing unit ratios for two systems: "Baseline Unit Ratio" and "MASMP Unit Ratio". Both charts include percentage distributions across four categories: Zealot, Advanced, Stalker, and Adapt. Color-coded legends are present for each category. --- ### **Baseline Unit Ratio** - **Total Population**: 43.3 - **Categories**: - **Zealot**: 57.4% (Red) - **Advanced**: 19.4% (Dark Blue) - **Stalker**: 17.9% (Yellow) - **Adapt**: 5.3% (Light Blue) ### **MASMP Unit Ratio** - **Total Population**: 45.6 - **Categories**: - **Zealot**: 35.4% (Red) - **Advanced**: 40.2% (Dark Blue) - **Stalker**: 7.9% (Yellow) - **Adapt**: 16.5% (Light Blue) --- ### **Key Observations** 1. **Baseline System**: - Zealot dominates with 57.4% of the population. - Adapt is the smallest category at 5.3%. - Stalker and Advanced are intermediate, with 17.9% and 19.4%, respectively. 2. **MASMP System**: - Advanced becomes the largest category at 40.2%. - Zealot decreases to 35.4% compared to Baseline. - Adapt increases significantly to 16.5%. - Stalker is the smallest at 7.9%. --- ### **Legend and Color Mapping** - **Zealot**: Red (#FF0000) - **Advanced**: Dark Blue (#00008B) - **Stalker**: Yellow (#FFFF00) - **Adapt**: Light Blue (#ADD8E6) --- ### **Spatial Grounding** - **Baseline Chart**: - Zealot (Red) occupies the largest segment. - Adapt (Light Blue) is the smallest, positioned at the bottom. - **MASMP Chart**: - Advanced (Dark Blue) is the largest segment. - Stalker (Yellow) is the smallest, positioned at the bottom. --- ### **Trend Verification** - **Baseline**: - Zealot > Stalker > Advanced > Adapt (by percentage). - **MASMP**: - Advanced > Zealot > Adapt > Stalker (by percentage). --- ### **Conclusion** The MASMP system shows a shift in unit distribution compared to Baseline, with Advanced units increasing and Stalker units decreasing. Adapt units show the most significant growth in MASMP. All data is presented in English. </details> Figure 4: Early-game Unit Production Ratio Fig.4 quantitatively shows MASMP’s superior long-term planning. At 7-minute mark, MASMP produces 18.32 advanced units (40.2% of total) versus baseline’s 8.40 (19.6%), with better diversification (Zealot ratio: 35.5% vs. 57.8%). Through state variables like [PriorityUnit], our method guides resource allocation toward technological advancement, avoiding the baseline’s greedy trap of spamming low-tier units. 4.3.3 Advantages Over Traditional State Machines MASMP maintains structural constraints while preserving LLMs’ advantages: - Interpretability: Natural language justifications for state transitions - Generalization: Semantic adaptation to unseen scenarios - Creativity: Autonomous employment of unspecified counters The probabilistic formulation enables fuzzy reasoning that eliminates traditional FSMs’ need for precise thresholds and manual rule programming. 5 Conclusion and Future Work We proposed Memory-Augmented State Machine Prompting (MASMP), a novel framework that bridges LLM flexibility with rule-based reliability for RTS games. MASMP integrates state machine prompting with strategic memory, achieving a 60% win rate against StarCraft II ’s highest-difficulty AI (Lv7), significantly outperforming baselines (0%). This demonstrates the potential of hybrid neuro-symbolic architectures for complex decision-making. Future work includes exploring multi-agent coordination, dynamic prompt optimization, and cross-domain applications. References - [1] Vinyals, O., Babuschkin, I., Czarnecki, W.M., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 (7782), 350–354 (2019) - [2] Xu, X., Wang, Y., Xu, C., et al.: A survey on game playing agents and large models: Methods, applications, and challenges. https://arxiv.org/abs/2403.10249, last accessed 2025/6/1 - [3] Goecks, V.G., Waytowich, N.: Coa-gpt: Generative pre-trained transformers for accelerated course of action development in military operations. In: 2024 International Conference on Military Communication and Information Systems (ICMCIS), pp. 1–10. IEEE (2024) - [4] Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. https://arxiv.org/abs/2305.17144, last accessed 2025/6/1 - [5] Fauzi, R., Hariadi, M., Nugroho, S.M.S., et al.: Defense behavior of real time strategy games: Comparison between HFSM and FSM. Indonesia Journal of Electrical Engineering and Computer Science 13 (2), 634–642 (2019) - [6] Jagdale, D.: Finite state machine in game development. International Journal of Advanced Research in Science, Communication and Technology 10 (1) (2021) - [7] Hu, W., Zhang, Q., Mao, Y.: Component-based hierarchical state machine—A reusable and flexible game AI technology. In: 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, vol. 2, pp. 319–324. IEEE (2011) - [8] Sekhavat, Y.A.: Behavior trees for computer games. International Journal on Artificial Intelligence Tools 26 (2), 1730001 (2017) - [9] Sun, P., Sun, X., Han, L., et al.: Tstarbots: Defeating the cheating level builtin ai in starcraft ii in the full game. https://arxiv.org/abs/1809.07193, last accessed 2025/6/1 - [10] Ma, W., Mi, Q., Zeng, Y., et al.: Large language models play starcraft ii: Benchmarks and a chain of summarization approach. Advances in Neural Information Processing Systems 37, 133386–133442 (2024) - [11] Li, Z., Ni, Y., Qi, R., et al.: Llm-pysc2: Starcraft ii learning environment for large language models. https://arxiv.org/abs/2411.05348, last accessed 2025/6/1 - [12] Schmied, T., Bornschein, J., Grau-Moya, J., et al.: Llms are greedy agents: Effects of rl fine-tuning on decision-making abilities. https://arxiv.org/abs/2504.16078, last accessed 2025/6/1 - [13] Shao, X., Jiang, W., Zuo, F., et al.: SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models. https://arxiv.org/abs/2401.17749, last accessed 2025/6/1

Rendering Paper...