## Horizontal Bar Chart: Improvement over Base Model by Medical Category
### Overview
The chart compares the performance improvement of two methods ("SFT Only" and "SFT+RL (Ours)") across 15 medical categories. The x-axis represents improvement percentage (0-25%), while the y-axis lists medical specialties. "SFT+RL (Ours)" consistently outperforms "SFT Only" in all categories.
### Components/Axes
- **Y-Axis Categories**:
Ear, Congenital, Neoplasms, Circulatory, Pharmacology, Eye, Musculoskeletal, Blood/Immune, Infectious, Respiratory, Skin, Endocrine, Digestive, Nervous, Mental Health
- **X-Axis**: Improvement over Base Model (%) (0-25% in 5% increments)
- **Legend**:
- Pink: SFT Only
- Orange: SFT+RL (Ours)
- **Legend Position**: Bottom-right corner
### Detailed Analysis
| Category | SFT Only (%) | SFT+RL (Ours) (%) |
|--------------------|--------------|-------------------|
| Ear | ~17 | ~23 |
| Congenital | ~14 | ~22 |
| Neoplasms | ~13 | ~21 |
| Circulatory | ~10 | ~20 |
| Pharmacology | ~11 | ~18 |
| Eye | ~12 | ~17 |
| Musculoskeletal | ~6 | ~16 |
| Blood/Immune | ~4 | ~15 |
| Infectious | ~10 | ~15 |
| Respiratory | ~9 | ~15 |
| Skin | ~9 | ~15 |
| Endocrine | ~7 | ~15 |
| Digestive | ~6 | ~14 |
| Nervous | ~11 | ~14 |
| Mental Health | ~6 | ~13 |
### Key Observations
1. **Consistent Outperformance**: "SFT+RL (Ours)" exceeds "SFT Only" in all categories, with average improvements of ~15% vs. ~9%.
2. **Highest Gains**:
- Ear (+6% absolute improvement)
- Congenital (+8%)
- Neoplasms (+8%)
3. **Lowest Gains**:
- Blood/Immune (SFT Only: ~4%)
- Mental Health (SFT+RL: ~13%)
4. **Narrowest Gaps**:
- Mental Health (+7%)
- Digestive (+8%)
### Interpretation
The data demonstrates that integrating RL with SFT significantly enhances performance across medical domains. The largest improvements occur in **Ear** and **Congenital** categories, suggesting these areas benefit most from RL augmentation. **Blood/Immune** shows the weakest baseline performance with SFT Only (~4%), indicating inherent challenges in this domain. The consistent trend across all categories validates the effectiveness of the SFT+RL approach, though Mental Health and Digestive categories show relatively smaller gains, potentially due to domain-specific complexities.
### Spatial Grounding & Trend Verification
- **Legend**: Confirmed alignment with bar colors (pink/orange).
- **Trend Check**: All orange bars (SFT+RL) are visually longer than pink bars (SFT Only), matching the numerical data.
- **Outliers**: Blood/Immune (SFT Only) and Mental Health (SFT+RL) represent the lowest performance extremes.