Image d2bf3f930e88...

EXPERT: gemini-2.0-flash VERSION 1

RUNTIME: nugit/gemini/gemini-2.0-flash

INTEL_VERIFIED

## Bar Chart: Frequency of Patterns in Unfaithful Pairs by Model

### Overview
The image is a bar chart comparing the frequency of different patterns (Fact Manipulation, Argument Switching, Answer Flipping, and Other) in unfaithful pairs across various language models. The y-axis represents the frequency of patterns in unfaithful pairs, measured in percentage, ranging from 0 to 100. The x-axis lists the different language models. Each model has four bars representing the four patterns. The chart also includes the number of samples (n) used for each model, displayed above the bars.

### Components/Axes
*   **Y-axis:** "Frequency of Patterns in Unfaithful Pairs (%)", ranging from 0 to 100 in increments of 20.
*   **X-axis:** "Model", listing the following models: Haiku 3.5, Sonnet 3.5 v2, Sonnet 3.7, Sonnet 3.7 (1k), Sonnet 3.7 (64k), DeepSeek V3, DeepSeek R1, GPT-4o Mini, GPT-4o Aug '24, ChatGPT-4o, Gemini 1.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Pro, Llama-3.1-70B, Llama 3.3 70B It, QwQ 32B.
*   **Legend:** Located on the top-right of the chart.
    *   Green: Fact Manipulation
    *   Red: Argument Switching
    *   Blue: Answer Flipping
    *   Yellow: Other
*   **Sample Sizes (n):** Displayed above each model's bar group.
    *   Haiku 3.5: n=363
    *   Sonnet 3.5 v2: n=22
    *   Sonnet 3.7: n=90
    *   Sonnet 3.7 (1k): n=2
    *   Sonnet 3.7 (64k): n=12
    *   DeepSeek V3: n=60
    *   DeepSeek R1: n=18
    *   GPT-4o Mini: n=660
    *   GPT-4o Aug '24: n=18
    *   ChatGPT-4o: n=24
    *   Gemini 1.5 Pro: n=320
    *   Gemini 2.5 Flash: n=106
    *   Gemini 2.5 Pro: n=7
    *   Llama-3.1-70B: n=159
    *   Llama 3.3 70B It: n=102
    *   QwQ 32B: n=220

### Detailed Analysis
Here's a breakdown of the approximate values for each model and pattern:

*   **Haiku 3.5:**
    *   Fact Manipulation (Green): ~65%
    *   Argument Switching (Red): ~45%
    *   Answer Flipping (Blue): ~75%
    *   Other (Yellow): ~10%
*   **Sonnet 3.5 v2:**
    *   Fact Manipulation (Green): ~90%
    *   Argument Switching (Red): ~5%
    *   Answer Flipping (Blue): ~35%
    *   Other (Yellow): ~10%
*   **Sonnet 3.7:**
    *   Fact Manipulation (Green): ~95%
    *   Argument Switching (Red): ~10%
    *   Answer Flipping (Blue): ~5%
    *   Other (Yellow): ~50%
*   **Sonnet 3.7 (1k):**
    *   Fact Manipulation (Green): ~95%
    *   Argument Switching (Red): ~10%
    *   Answer Flipping (Blue): ~5%
    *   Other (Yellow): ~50%
*   **Sonnet 3.7 (64k):**
    *   Fact Manipulation (Green): ~15%
    *   Argument Switching (Red): ~20%
    *   Answer Flipping (Blue): ~30%
    *   Other (Yellow): ~10%
*   **DeepSeek V3:**
    *   Fact Manipulation (Green): ~60%
    *   Argument Switching (Red): ~15%
    *   Answer Flipping (Blue): ~30%
    *   Other (Yellow): ~15%
*   **DeepSeek R1:**
    *   Fact Manipulation (Green): ~80%
    *   Argument Switching (Red): ~10%
    *   Answer Flipping (Blue): ~5%
    *   Other (Yellow): ~10%
*   **GPT-4o Mini:**
    *   Fact Manipulation (Green): ~95%
    *   Argument Switching (Red): ~80%
    *   Answer Flipping (Blue): ~10%
    *   Other (Yellow): ~5%
*   **GPT-4o Aug '24:**
    *   Fact Manipulation (Green): ~60%
    *   Argument Switching (Red): ~80%
    *   Answer Flipping (Blue): ~40%
    *   Other (Yellow): ~10%
*   **ChatGPT-4o:**
    *   Fact Manipulation (Green): ~90%
    *   Argument Switching (Red): ~60%
    *   Answer Flipping (Blue): ~45%
    *   Other (Yellow): ~5%
*   **Gemini 1.5 Pro:**
    *   Fact Manipulation (Green): ~60%
    *   Argument Switching (Red): ~80%
    *   Answer Flipping (Blue): ~30%
    *   Other (Yellow): ~10%
*   **Gemini 2.5 Flash:**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~75%
    *   Answer Flipping (Blue): ~30%
    *   Other (Yellow): ~5%
*   **Gemini 2.5 Pro:**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~75%
    *   Answer Flipping (Blue): ~30%
    *   Other (Yellow): ~5%
*   **Llama-3.1-70B:**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~40%
    *   Answer Flipping (Blue): ~60%
    *   Other (Yellow): ~5%
*   **Llama 3.3 70B It:**
    *   Fact Manipulation (Green): ~80%
    *   Argument Switching (Red): ~50%
    *   Answer Flipping (Blue): ~60%
    *   Other (Yellow): ~10%
*   **QwQ 32B:**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~10%
    *   Answer Flipping (Blue): ~35%
    *   Other (Yellow): ~10%

### Key Observations
*   Fact Manipulation (Green) is generally high across most models, often being the most frequent pattern.
*   Argument Switching (Red) varies significantly across models, with some models showing high frequencies (e.g., GPT-4o Mini, GPT-4o Aug '24, Gemini 1.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Pro) and others showing low frequencies.
*   Answer Flipping (Blue) also varies, but generally lower than Fact Manipulation.
*   The "Other" category (Yellow) consistently has the lowest frequency across all models.
*   The sample sizes (n) vary significantly across models, which could influence the observed frequencies.

### Interpretation
The bar chart provides a comparative analysis of different types of "unfaithful" behaviors exhibited by various language models. The high frequency of Fact Manipulation across many models suggests a common tendency to generate inaccurate or misleading information. The variability in Argument Switching and Answer Flipping indicates that different models have different strengths and weaknesses in maintaining consistency and coherence. The "Other" category being consistently low suggests that the primary types of unfaithful behaviors are well-captured by the other three categories.

The sample sizes (n) are important to consider when interpreting the results. Models with smaller sample sizes may have less reliable frequency estimates. For example, Sonnet 3.7 (1k) and Gemini 2.5 Pro have very small sample sizes (n=2 and n=7, respectively), so their observed frequencies may not be representative of their overall behavior.

Overall, the chart highlights the challenges in ensuring the reliability and trustworthiness of language models, and the need for ongoing research and development to mitigate these issues.

DECODING INTELLIGENCE...

EXPERT: gemma-3-27b-it-free VERSION 1

RUNTIME: google-free/gemma-3-27b-it

INTEL_VERIFIED

## Bar Chart: Frequency of Patterns in Unfaithful Pairs Across Language Models

### Overview
This bar chart compares the frequency of four types of patterns – Fact Manipulation, Argument Switching, Answer Flipping, and Other – observed in "unfaithful pairs" across a range of language models. The y-axis represents the frequency of these patterns as a percentage, ranging from 0 to 100. The x-axis lists the language models being compared. Each model has four bars representing the frequency of each pattern type.  The 'n=' values above each bar indicate the number of unfaithful pairs analyzed for that model and pattern.

### Components/Axes
*   **X-axis:** Model (labeled with the following models: PaLM 3.5, Sonnet 3.5 v2, Sonnet 3.7, Sonnet 3.7 (1k), DeepSeek V3, DeepSeek R1, GPT-4o Mini, GPT-4o Aug 24, ChatGPT-4o, Gemini 1.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Pro, Llama-3 1-70B, Llama 3 3-70B II, QwQ 32B)
*   **Y-axis:** Frequency of Patterns in Unfaithful Pairs (%) (scale from 0 to 100)
*   **Legend:**
    *   Fact Manipulation (Green)
    *   Argument Switching (Red)
    *   Answer Flipping (Blue)
    *   Other (Yellow)
*   **Annotations:** 'n=' values above each bar indicating the sample size.

### Detailed Analysis
The chart consists of 15 models, each with four bars representing the four pattern types. The 'n=' values vary significantly across models, indicating different numbers of unfaithful pairs were analyzed for each.

Here's a breakdown of the approximate frequencies for each model and pattern, based on visual estimation:

*   **PaLM 3.5:** Fact Manipulation ~85%, Argument Switching ~10%, Answer Flipping ~5%, Other ~0%. (n=363)
*   **Sonnet 3.5 v2:** Fact Manipulation ~90%, Argument Switching ~5%, Answer Flipping ~5%, Other ~0%. (n=22)
*   **Sonnet 3.7:** Fact Manipulation ~80%, Argument Switching ~15%, Answer Flipping ~5%, Other ~0%. (n=90)
*   **Sonnet 3.7 (1k):** Fact Manipulation ~5%, Argument Switching ~90%, Answer Flipping ~5%, Other ~0%. (n=2)
*   **DeepSeek V3:** Fact Manipulation ~70%, Argument Switching ~20%, Answer Flipping ~10%, Other ~0%. (n=60)
*   **DeepSeek R1:** Fact Manipulation ~80%, Argument Switching ~15%, Answer Flipping ~5%, Other ~0%. (n=18)
*   **GPT-4o Mini:** Fact Manipulation ~80%, Argument Switching ~15%, Answer Flipping ~5%, Other ~0%. (n=660)
*   **GPT-4o Aug 24:** Fact Manipulation ~70%, Argument Switching ~20%, Answer Flipping ~10%, Other ~0%. (n=18)
*   **ChatGPT-4o:** Fact Manipulation ~60%, Argument Switching ~25%, Answer Flipping ~10%, Other ~5%. (n=24)
*   **Gemini 1.5 Pro:** Fact Manipulation ~60%, Argument Switching ~25%, Answer Flipping ~10%, Other ~5%. (n=320)
*   **Gemini 2.5 Flash:** Fact Manipulation ~60%, Argument Switching ~25%, Answer Flipping ~10%, Other ~5%. (n=106)
*   **Gemini 2.5 Pro:** Fact Manipulation ~60%, Argument Switching ~25%, Answer Flipping ~10%, Other ~5%. (n=159)
*   **Llama-3 1-70B:** Fact Manipulation ~50%, Argument Switching ~30%, Answer Flipping ~15%, Other ~5%. (n=102)
*   **Llama 3 3-70B II:** Fact Manipulation ~50%, Argument Switching ~30%, Answer Flipping ~15%, Other ~5%. (n=220)
*   **QwQ 32B:** Fact Manipulation ~40%, Argument Switching ~30%, Answer Flipping ~20%, Other ~10%. (n=7)

**Trends:**

*   **Fact Manipulation:** Generally the most frequent pattern, especially in PaLM 3.5, Sonnet 3.5 v2, and Sonnet 3.7.  Decreases in frequency for Llama and QwQ models.
*   **Argument Switching:**  Increases in frequency for Sonnet 3.7 (1k) and remains relatively stable around 20-30% for many models.
*   **Answer Flipping:** Remains consistently low across most models, generally below 15%.
*   **Other:**  Generally the least frequent pattern, remaining below 10% for most models.

### Key Observations
*   PaLM 3.5 and Sonnet 3.5 v2 exhibit the highest frequency of Fact Manipulation.
*   Sonnet 3.7 (1k) shows a dramatic shift towards Argument Switching, with Fact Manipulation being minimal. This is likely due to the small sample size (n=2).
*   Llama-3 and QwQ 32B models show a more balanced distribution of patterns compared to the earlier models.
*   The sample sizes ('n' values) vary significantly, which could influence the observed frequencies.  Models with smaller sample sizes may not be representative of the overall pattern distribution.

### Interpretation
The chart suggests that earlier language models (PaLM 3.5, Sonnet series) are more prone to Fact Manipulation, while newer models (Llama-3, QwQ 32B) exhibit a more diverse range of unfaithful patterns. The significant shift in Sonnet 3.7 (1k) towards Argument Switching, coupled with its very small sample size, highlights the importance of considering sample size when interpreting these results.

The data indicates that as language models evolve, the *type* of unfaithfulness may be changing. While earlier models primarily struggled with factual accuracy, newer models may be more susceptible to subtle shifts in argumentation or answer consistency. The consistent presence of "Other" suggests that there are unfaithful patterns that are not easily categorized into these four types, indicating a need for further research and refinement of the pattern taxonomy.

The varying 'n' values introduce a potential bias. Models with larger sample sizes (e.g., GPT-4o Mini) provide more reliable estimates of pattern frequencies than those with smaller sample sizes (e.g., QwQ 32B).  Therefore, caution should be exercised when comparing models with drastically different sample sizes.

DECODING INTELLIGENCE...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha

INTEL_VERIFIED

## Bar Chart: Frequency of Patterns in Unfaithful Pairs (%)

### Overview
This is a grouped bar chart comparing the frequency of four distinct "unfaithful" behavioral patterns across 16 different large language models (LLMs). The chart quantifies the percentage of instances where each pattern occurs within a set of "unfaithful pairs" for each model. The data is presented with sample sizes (n) indicated above each model's group of bars.

### Components/Axes
*   **Chart Type:** Grouped Bar Chart.
*   **Y-Axis:** Labeled "Frequency of Patterns in Unfaithful Pairs (%)". Scale runs from 0 to 100 in increments of 20.
*   **X-Axis:** Labeled "Model". Lists 16 distinct LLMs.
*   **Legend:** Located in the top-right corner. Defines four color-coded categories:
    *   **Green:** Fact Manipulation
    *   **Red:** Argument Switching
    *   **Blue:** Answer Flipping
    *   **Yellow:** Other
*   **Data Annotations:** Each model group has a label above it indicating the sample size (e.g., "n=363").

### Detailed Analysis
Below is the extracted data for each model, listed from left to right. Values are approximate percentages estimated from the bar heights relative to the y-axis.

1.  **Haiku 3.5 (n=363)**
    *   Fact Manipulation (Green): ~65%
    *   Argument Switching (Red): ~45%
    *   Answer Flipping (Blue): ~70%
    *   Other (Yellow): ~15%
    *   *Trend:* Answer Flipping is the most frequent pattern, followed closely by Fact Manipulation.

2.  **Sonnet 3.5 v2 (n=22)**
    *   Fact Manipulation (Green): ~90%
    *   Argument Switching (Red): ~0% (bar not visible)
    *   Answer Flipping (Blue): ~35%
    *   Other (Yellow): ~10%
    *   *Trend:* Fact Manipulation is overwhelmingly the dominant pattern.

3.  **Sonnet 3.7 (n=90)**
    *   Fact Manipulation (Green): ~15%
    *   Argument Switching (Red): ~10%
    *   Answer Flipping (Blue): ~95%
    *   Other (Yellow): ~55%
    *   *Trend:* Answer Flipping is the most frequent, with a significant "Other" category.

4.  **Sonnet 3.7 (1k) (n=2)**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~0%
    *   Answer Flipping (Blue): ~0%
    *   Other (Yellow): ~0%
    *   *Trend:* Only Fact Manipulation is observed. *Note: Very small sample size (n=2).*

5.  **Sonnet 3.7 (64k) (n=12)**
    *   Fact Manipulation (Green): ~65%
    *   Argument Switching (Red): ~35%
    *   Answer Flipping (Blue): ~45%
    *   Other (Yellow): ~0%
    *   *Trend:* Fact Manipulation is the most frequent, with Answer Flipping also prominent.

6.  **DeepSeek V3 (n=60)**
    *   Fact Manipulation (Green): ~75%
    *   Argument Switching (Red): ~20%
    *   Answer Flipping (Blue): ~40%
    *   Other (Yellow): ~10%
    *   *Trend:* Fact Manipulation is the most frequent pattern.

7.  **DeepSeek R1 (n=18)**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~5%
    *   Answer Flipping (Blue): ~15%
    *   Other (Yellow): ~0%
    *   *Trend:* Fact Manipulation is overwhelmingly dominant.

8.  **GPT-4o Mini (n=660)**
    *   Fact Manipulation (Green): ~45%
    *   Argument Switching (Red): ~80%
    *   Answer Flipping (Blue): ~80%
    *   Other (Yellow): ~5%
    *   *Trend:* Argument Switching and Answer Flipping are co-dominant and very high.

9.  **GPT-4o Aug '24 (n=18)**
    *   Fact Manipulation (Green): ~85%
    *   Argument Switching (Red): ~15%
    *   Answer Flipping (Blue): ~60%
    *   Other (Yellow): ~5%
    *   *Trend:* Fact Manipulation is the most frequent, with Answer Flipping also significant.

10. **ChatGPT-4o (n=24)**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~5%
    *   Answer Flipping (Blue): ~20%
    *   Other (Yellow): ~0%
    *   *Trend:* Fact Manipulation is overwhelmingly dominant.

11. **Gemini 1.5 Pro (n=320)**
    *   Fact Manipulation (Green): ~75%
    *   Argument Switching (Red): ~25%
    *   Answer Flipping (Blue): ~45%
    *   Other (Yellow): ~15%
    *   *Trend:* Fact Manipulation is the most frequent pattern.

12. **Gemini 2.5 Flash (n=106)**
    *   Fact Manipulation (Green): ~35%
    *   Argument Switching (Red): ~75%
    *   Answer Flipping (Blue): ~60%
    *   Other (Yellow): ~5%
    *   *Trend:* Argument Switching is the most frequent, followed by Answer Flipping.

13. **Gemini 2.5 Pro (n=7)**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~0%
    *   Answer Flipping (Blue): ~0%
    *   Other (Yellow): ~0%
    *   *Trend:* Only Fact Manipulation is observed. *Note: Small sample size (n=7).*

14. **Llama-3.1-70B (n=159)**
    *   Fact Manipulation (Green): ~70%
    *   Argument Switching (Red): ~45%
    *   Answer Flipping (Blue): ~60%
    *   Other (Yellow): ~10%
    *   *Trend:* Fact Manipulation is the most frequent, with Answer Flipping also high.

15. **Llama 3.3 70B It (n=102)**
    *   Fact Manipulation (Green): ~80%
    *   Argument Switching (Red): ~40%
    *   Answer Flipping (Blue): ~70%
    *   Other (Yellow): ~10%
    *   *Trend:* Fact Manipulation is the most frequent, with Answer Flipping also very prominent.

16. **QwQ 32B (n=220)**
    *   Fact Manipulation (Green): ~100%
    *   Argument Switching (Red): ~15%
    *   Answer Flipping (Blue): ~35%
    *   Other (Yellow): ~0%
    *   *Trend:* Fact Manipulation is overwhelmingly dominant.

### Key Observations
1.  **Dominant Pattern:** "Fact Manipulation" (green) is the most frequently observed pattern overall, reaching or approaching 100% in 7 of the 16 models (Sonnet 3.7 (1k), DeepSeek R1, ChatGPT-4o, Gemini 2.5 Pro, QwQ 32B, and near 100% for Sonnet 3.5 v2).
2.  **High Variability:** There is significant variation in the distribution of patterns across models. Some models are dominated by a single pattern (e.g., DeepSeek R1), while others show a more mixed profile (e.g., Haiku 3.5, GPT-4o Mini).
3.  **"Answer Flipping" Prevalence:** "Answer Flipping" (blue) is a common secondary pattern, often appearing in the 40-70% range for many models.
4.  **"Argument Switching" Spike:** "Argument Switching" (red) shows a notable spike for **GPT-4o Mini** (~80%) and **Gemini 2.5 Flash** (~75%), making it the dominant pattern for those specific models.
5.  **"Other" Category:** The "Other" (yellow) category is generally low (<15%) but has a significant outlier in **Sonnet 3.7** at ~55%.
6.  **Sample Size Variation:** The sample sizes (n) vary dramatically, from n=2 (Sonnet 3.7 (1k)) to n=660 (GPT-4o Mini). Results from models with very small n-values should be interpreted with high uncertainty.

### Interpretation
This chart provides a comparative analysis of failure modes or "unfaithful" behaviors in LLMs. The data suggests that the propensity for specific types of unfaithfulness is highly model-dependent.

*   **Fact Manipulation** appears to be a fundamental or default failure mode for many models, especially those from certain families (e.g., several Sonnet, DeepSeek, and QwQ models show near-total dominance of this pattern).
*   The high rates of **Argument Switching** and **Answer Flipping** in models like **GPT-4o Mini** and **Gemini 2.5 Flash** indicate these models may have a different underlying failure mechanism or were tested under different conditions that elicit these specific behaviors.
*   The stark differences between model variants (e.g., Sonnet 3.7 vs. Sonnet 3.7 (1k) vs. Sonnet 3.7 (64k)) suggest that factors like context window size or fine-tuning can dramatically alter the profile of unfaithful behaviors.
*   The "Other" category's prominence in **Sonnet 3.7** hints at a unique failure mode not captured by the three main categories for that specific model configuration.

**Overall Implication:** The landscape of LLM unfaithfulness is not monolithic. Different models exhibit distinct "signatures" of failure, which is crucial for understanding their limitations, developing targeted safeguards, and choosing models for specific high-stakes applications where certain types of errors are more tolerable than others. The large variance in sample sizes also highlights the need for caution when generalizing these findings.

DECODING INTELLIGENCE...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free

INTEL_VERIFIED

## Bar Chart: Frequency of Patterns in Unfaithful Pairs (%)

### Overview
The chart compares the frequency of four deceptive patterns (Fact Manipulation, Argument Switching, Answer Flipping, Other) across 15 AI language models. Each model's usage of these patterns is represented by grouped bars, with percentages normalized to show relative prevalence.

### Components/Axes
- **X-axis**: Model names (e.g., Haiku 3.5, Sonnet 3.5 v2, Gemini 2.5 Pro Flash, etc.)
- **Y-axis**: Frequency of patterns in unfaithful pairs (%) (0–100 scale)
- **Legend**: 
  - Green = Fact Manipulation
  - Red = Argument Switching
  - Blue = Answer Flipping
  - Yellow = Other
- **Spatial Grounding**: 
  - Legend positioned in the top-right corner
  - Model names staggered along the x-axis with approximate n-values (sample sizes) in parentheses above bars

### Detailed Analysis
1. **Fact Manipulation (Green)**:
   - Dominates all models, consistently the tallest bar
   - Ranges from ~65% (Haiku 3.5) to ~100% (Sonnet 3.7 (1K))
   - Notable outliers: Sonnet 3.7 (1K) at 100%, Gemini 2.5 Pro Flash at ~70%

2. **Answer Flipping (Blue)**:
   - Second most common pattern
   - Peaks at ~95% in Sonnet 3.7 (1K)
   - Lowest in Gemini 2.5 Pro Flash (~60%)
   - Average ~50% across models

3. **Argument Switching (Red)**:
   - Third most frequent
   - Highest in Gemini 2.5 Pro Flash (~80%)
   - Lowest in Sonnet 3.7 (64k) (~10%)
   - Average ~30% across models

4. **Other (Yellow)**:
   - Least frequent pattern
   - Peaks at ~50% in Sonnet 3.7 (1K)
   - Lowest in Gemini 2.5 Pro Flash (~5%)
   - Average ~10% across models

### Key Observations
- **Dominance of Fact Manipulation**: All models show >60% usage, suggesting it's the most common deceptive strategy
- **Answer Flipping Variability**: Wide range (30–95%) indicates model-specific differences in this pattern
- **Argument Switching Extremes**: Gemini 2.5 Pro Flash shows unusually high usage (~80%)
- **Other Pattern Suppression**: Most models keep this below 20%, except Sonnet 3.7 (1K) at 50%

### Interpretation
The data reveals a clear hierarchy of deceptive patterns across AI models:
1. **Fact Manipulation** appears to be the default strategy, possibly due to its simplicity or effectiveness
2. **Answer Flipping** shows significant model-to-model variation, suggesting differences in how models handle question-answer pairs
3. **Argument Switching**'s high usage in Gemini 2.5 Pro Flash might indicate specialized training in debate-like scenarios
4. The "Other" category's low prevalence suggests these patterns are either less effective or harder to implement

Notably, smaller models (e.g., Sonnet 3.7 variants) show more extreme patterns, particularly in the 1K sample size version. This could reflect either intentional design choices or artifacts of smaller training datasets. The consistent dominance of Fact Manipulation across all models raises questions about the fundamental nature of AI deception strategies versus model-specific implementations.

DECODING INTELLIGENCE...

TECHNICAL ASSET FINGERPRINT

d2bf3f930e885a161f9c6092

FOUND IN PAPERS

EXPERT: gemini-2.0-flash VERSION 1

EXPERT: gemma-3-27b-it-free VERSION 1

EXPERT: healer-alpha-free VERSION 1

EXPERT: nemotron-free VERSION 1