## Chart: Impact of Finetuning on Tool Call Ratio and Overall Accuracy
### Overview
The image presents a grouped bar chart illustrating the "Tool Call Ratio (%)" for four different categories of generators/search mechanisms at two distinct stages: "Step 0" (before finetuning) and "Step 32" (after finetuning). Additionally, it displays the overall accuracy percentages associated with these two stages, highlighting the change due to finetuning.
### Components/Axes
* **Chart Type**: Grouped Bar Chart.
* **Overall Context**: The chart's primary purpose is to show the effect of a "Finet-tuning" process on tool utilization and system accuracy.
* **Y-axis**: Labeled "Tool Call Ratio (%)". The scale ranges from 0 to 60, with major grid lines and numerical labels at 0, 10, 20, 30, 40, 50, and 60.
* **X-axis**: Divided into two main categories representing different stages: "Step 0" and "Step 32". A prominent gray arrow points from "Step 0" to "Step 32", with the text "After Finet-tuning" positioned directly above the arrow, indicating the transition or process applied.
* **Legend (Top-left)**: The legend defines the color coding for the different categories of tools/generators:
* Light Red/Pink: "Base Generator"
* Green: "Google Search"
* Blue: "Web Search"
* Purple: "Wikipedia Search"
* **Accuracy Indicators (Top-center)**: Two white boxes display overall accuracy figures:
* Left box (above "Step 0"): "Acc:19.2%"
* Right box (above "Step 32"): "Acc: 25.2% (+6.21%)"
### Detailed Analysis
**Overall Accuracy Change**:
* At "Step 0", the overall accuracy is **19.2%**.
* At "Step 32" (After Finet-tuning), the overall accuracy increases to **25.2%**, representing a positive change of **+6.21%** from the initial accuracy.
**Tool Call Ratio (%) - Step 0 (Before Finetuning)**:
* **Base Generator** (Light Red/Pink bar, left-most in the group): The bar indicates a "Tool Call Ratio" of **3.1%**.
* **Google Search** (Green bar): The bar indicates a "Tool Call Ratio" of **38.7%**.
* **Web Search** (Blue bar): The bar indicates a "Tool Call Ratio" of **18.4%**.
* **Wikipedia Search** (Purple bar, right-most in the group): The bar indicates a "Tool Call Ratio" of **38.5%**.
**Tool Call Ratio (%) - Step 32 (After Finetuning)**:
* **Base Generator** (Light Red/Pink bar, left-most in the group): The bar indicates a "Tool Call Ratio" of **0.9%**. Positioned above this bar is an additional label: **-2.2**.
* *Trend*: The tool call ratio for the Base Generator significantly decreased from 3.1% to 0.9%. The label "-2.2" accurately reflects this change (3.1 - 0.9 = 2.2).
* **Google Search** (Green bar): The bar indicates a "Tool Call Ratio" of **13.6%**. Positioned above this bar is an additional label: **-1.5**.
* *Trend*: The tool call ratio for Google Search significantly decreased from 38.7% to 13.6%. The label "-1.5" does not correspond to this change (38.7 - 13.6 = 25.1).
* **Web Search** (Blue bar): The bar indicates a "Tool Call Ratio" of **13.6%**. Positioned above this bar is an additional label: **+5.2**.
* *Trend*: The tool call ratio for Web Search decreased from 18.4% to 13.6%. The label "+5.2" does not correspond to this change (18.4 - 13.6 = 4.8).
* **Wikipedia Search** (Purple bar, right-most in the group): The bar indicates a "Tool Call Ratio" of **13.6%**. Positioned above this bar is an additional label: **-4.7**.
* *Trend*: The tool call ratio for Wikipedia Search significantly decreased from 38.5% to 13.6%. The label "-4.7" does not correspond to this change (38.5 - 13.6 = 24.9).
### Key Observations
* Finetuning demonstrably improves the overall system accuracy by 6.21 percentage points, from 19.2% to 25.2%.
* The "Base Generator" consistently exhibits a very low tool call ratio, which further decreases after finetuning.
* Before finetuning (Step 0), "Google Search" and "Wikipedia Search" have the highest tool call ratios, both around 38.5-38.7%. "Web Search" has a moderate ratio of 18.4%.
* After finetuning (Step 32), the tool call ratios for "Google Search", "Web Search", and "Wikipedia Search" all converge to an identical value of 13.6%. This represents a substantial reduction for Google and Wikipedia Search, and a smaller reduction for Web Search.
* The numerical labels positioned above the "Google Search", "Web Search", and "Wikipedia Search" bars at Step 32 (-1.5, +5.2, -4.7) do not represent the direct change in "Tool Call Ratio (%)" from Step 0 to Step 32 for their respective categories. Only the "-2.2" label for "Base Generator" accurately reflects this change.
### Interpretation
The data strongly suggests that the finetuning process significantly optimizes how the system utilizes external tools, leading to a notable improvement in overall accuracy.
1. **Enhanced Accuracy**: The increase in overall accuracy from 19.2% to 25.2% (+6.21%) is the most direct evidence of the finetuning's success. This indicates that the model, after finetuning, is better at performing its tasks, likely by making more appropriate decisions regarding tool usage.
2. **Strategic Tool Utilization**:
* The "Base Generator" is shown to have a minimal role in tool calling, which is further reduced post-finetuning. This implies that the finetuned model relies almost exclusively on specialized search tools when a tool call is deemed necessary, rather than its inherent generation capabilities.
* The most striking pattern is the convergence of "Google Search", "Web Search", and "Wikipedia Search" tool call ratios to 13.6% after finetuning. This suggests that the finetuning process has instilled a more standardized and perhaps more efficient strategy for invoking these external search tools. The significant reduction in calls for Google and Wikipedia Search (from ~38% to 13.6%) while overall accuracy improves implies that the model learns to be more selective and precise, avoiding unnecessary or redundant tool calls. It's possible the model learns to extract information more effectively from fewer calls, or to prioritize internal knowledge over external searches more often.
3. **Ambiguous Labels**: The discrepancy between the displayed change labels (-1.5, +5.2, -4.7) and the actual change in "Tool Call Ratio (%)" for Google, Web, and Wikipedia Search at Step 32 is a critical point. These labels likely refer to a different metric, such as the change in accuracy *attributable to each specific tool*, or perhaps a contribution to the overall accuracy gain. For instance, the "+5.2" for "Web Search" might indicate that this tool's *contribution to accuracy* increased by 5.2 percentage points, even though its *call ratio* decreased. Without further context, their precise meaning remains open to interpretation, but they are clearly intended to convey additional performance insights beyond just the tool call frequency. This highlights the importance of comprehensive documentation for such technical charts.