## Horizontal Grouped Bar Chart: Improvement Over Base Model by Medical Category
### Overview
This image is a horizontal grouped bar chart comparing the performance improvement of two different methods ("SFT Only" and "SFT+RL (Ours)") over a base model across 15 distinct medical categories. The improvement is measured as a percentage.
### Components/Axes
* **Chart Type:** Horizontal Grouped Bar Chart.
* **Y-Axis (Vertical):** Lists 15 medical categories. From top to bottom:
1. Ear
2. Congenital
3. Neoplasms
4. Circulatory
5. Pharmacology
6. Eye
7. Musculoskeletal
8. Blood/Immune
9. Infectious
10. Respiratory
11. Skin
12. Endocrine
13. Digestive
14. Nervous
15. Mental Health
* **X-Axis (Horizontal):** Labeled "Improvement over Base Model (%)". The scale runs from 0 to 25, with major tick marks at 0, 5, 10, 15, 20, and 25.
* **Legend:** Located in the bottom-right corner of the chart area.
* **Magenta/Pink Bar:** Labeled "SFT Only".
* **Orange Bar:** Labeled "SFT+RL (Ours)".
* **Data Series:** Each medical category has two bars grouped together: a magenta bar (SFT Only) on the left and an orange bar (SFT+RL) on the right.
### Detailed Analysis
Below is an analysis of each category. For each, the visual trend is described first (orange bar vs. magenta bar), followed by approximate percentage values estimated from the x-axis.
1. **Ear:**
* **Trend:** The orange bar (SFT+RL) is significantly longer than the magenta bar (SFT Only).
* **Values:** SFT Only ≈ 17%, SFT+RL ≈ 22.5%.
2. **Congenital:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 14%, SFT+RL ≈ 22.5%.
3. **Neoplasms:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 13%, SFT+RL ≈ 22%.
4. **Circulatory:**
* **Trend:** The orange bar is substantially longer than the magenta bar.
* **Values:** SFT Only ≈ 10%, SFT+RL ≈ 21%.
5. **Pharmacology:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 11%, SFT+RL ≈ 18.5%.
6. **Eye:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 12%, SFT+RL ≈ 18%.
7. **Musculoskeletal:**
* **Trend:** The orange bar is much longer than the magenta bar.
* **Values:** SFT Only ≈ 6%, SFT+RL ≈ 17%.
8. **Blood/Immune:**
* **Trend:** The orange bar is dramatically longer than the magenta bar.
* **Values:** SFT Only ≈ 4%, SFT+RL ≈ 17%.
9. **Infectious:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 10%, SFT+RL ≈ 16%.
10. **Respiratory:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 8.5%, SFT+RL ≈ 16%.
11. **Skin:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 9%, SFT+RL ≈ 15.5%.
12. **Endocrine:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 7%, SFT+RL ≈ 15%.
13. **Digestive:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 6.5%, SFT+RL ≈ 13.5%.
14. **Nervous:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 10.5%, SFT+RL ≈ 13.5%.
15. **Mental Health:**
* **Trend:** The orange bar is longer than the magenta bar.
* **Values:** SFT Only ≈ 6%, SFT+RL ≈ 13%.
### Key Observations
* **Consistent Superiority:** In all 15 medical categories, the "SFT+RL (Ours)" method (orange bars) shows a greater improvement over the base model than the "SFT Only" method (magenta bars).
* **Magnitude of Improvement:** The improvement for "SFT+RL" ranges from approximately 13% (Mental Health, Digestive) to 22.5% (Ear, Congenital). The improvement for "SFT Only" ranges from approximately 4% (Blood/Immune) to 17% (Ear).
* **Largest Gains:** The most significant absolute improvements for the "SFT+RL" method are seen in the "Ear," "Congenital," and "Neoplasms" categories, all exceeding 20%.
* **Smallest Gains:** The smallest improvements for "SFT+RL" are in "Mental Health" and "Digestive," both around 13-13.5%.
* **Largest Performance Gap:** The most substantial difference between the two methods appears in the "Blood/Immune" category, where "SFT+RL" shows an improvement of ~17% compared to only ~4% for "SFT Only."
* **Smallest Performance Gap:** The smallest difference between the methods is in the "Nervous" category, where the values are closest (~10.5% vs. ~13.5%).
### Interpretation
The data strongly suggests that the proposed method, which combines Supervised Fine-Tuning with Reinforcement Learning ("SFT+RL"), consistently and significantly outperforms a method using Supervised Fine-Tuning alone ("SFT Only") across a wide spectrum of medical domains when measured by improvement over a base model.
The fact that the orange bar is longer in every single category indicates a robust and generalizable advantage for the "SFT+RL" approach. The variation in the size of the improvement (from ~13% to ~22.5%) suggests that the effectiveness of the method is domain-dependent. Categories like "Ear," "Congenital," and "Neoplasms" may have characteristics (e.g., data structure, task complexity) that are particularly well-suited to the reinforcement learning component, leading to the highest gains. Conversely, domains like "Mental Health" and "Digestive" might present challenges that are less mitigated by this specific RL approach, though it still provides a clear benefit.
The dramatic gap in categories like "Blood/Immune" highlights a potential key finding: the "SFT Only" method may struggle significantly in certain complex or data-sparse medical areas, a weakness that the "SFT+RL" method appears to correct substantially. Overall, the chart serves as compelling evidence for the efficacy of integrating reinforcement learning with supervised fine-tuning to enhance model performance in medical applications.