This image is a box plot chart displaying the distribution of "N-gram diversity" across different "N-gram sizes" for four distinct methods.
**1. Chart Type:** Box Plot
**2. Axes Information:**
* **Y-axis Label:** N-gram diversity
* **Y-axis Tick Markers:** 0.2, 0.4, 0.6, 0.8, 1.0
* **X-axis Label:** N-gram size
* **X-axis Tick Markers:** 2, 3, 4
**3. Legend Information:**
The legend is located in the bottom-center of the main chart area, slightly to the right. It defines the four methods represented by the box plots:
* **Grey box:** Baseline
* **Dark Blue box:** REAP
* **Light Blue box:** M-SMoE
* **Khaki/Olive Green box:** HC-SMoE
**4. Data Extraction and Trends:**
The chart presents four box plots for each N-gram size (2, 3, and 4), corresponding to the four methods. Each box plot visually represents the median, interquartile range (IQR), whiskers, and outliers of the N-gram diversity for that specific method and N-gram size.
**General Trends Across N-gram Sizes (2, 3, 4):**
* For all four methods, the N-gram diversity generally increases as the N-gram size increases from 2 to 4.
* "Baseline" and "REAP" consistently exhibit the highest N-gram diversity, with their distributions being very similar across all N-gram sizes.
* "M-SMoE" consistently shows lower N-gram diversity compared to "Baseline" and "REAP", but higher diversity than "HC-SMoE".
* "HC-SMoE" consistently demonstrates the lowest N-gram diversity among the four methods and often has a wider spread of data (larger IQR and more outliers), particularly at smaller N-gram sizes.
**Detailed Data Points by N-gram Size:**
**A. N-gram size = 2:**
* **Baseline (Grey):**
* **Trend:** High diversity.
* **Median:** Approximately 0.82
* **Interquartile Range (IQR):** From approximately 0.79 (Q1) to 0.85 (Q3).
* **Whiskers:** Extend from approximately 0.72 (lower) to 0.88 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.59 to 0.68.
* **REAP (Dark Blue):**
* **Trend:** Similar high diversity to Baseline.
* **Median:** Approximately 0.82
* **Interquartile Range (IQR):** From approximately 0.79 (Q1) to 0.85 (Q3).
* **Whiskers:** Extend from approximately 0.72 (lower) to 0.88 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.60 to 0.68.
* **M-SMoE (Light Blue):**
* **Trend:** Moderate diversity, lower than Baseline and REAP.
* **Median:** Approximately 0.78
* **Interquartile Range (IQR):** From approximately 0.72 (Q1) to 0.82 (Q3).
* **Whiskers:** Extend from approximately 0.68 (lower) to 0.84 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.40 to 0.67.
* **HC-SMoE (Khaki/Olive Green):**
* **Trend:** Lowest diversity among the methods, with a wide spread.
* **Median:** Approximately 0.72
* **Interquartile Range (IQR):** From approximately 0.66 (Q1) to 0.79 (Q3).
* **Whiskers:** Extend from approximately 0.50 (lower) to 0.80 (upper).
* **Outliers:** Numerous data points are observed below the lower whisker, ranging from approximately 0.25 to 0.48.
**B. N-gram size = 3:**
* **Baseline (Grey):**
* **Trend:** High diversity, increased from N-gram size 2.
* **Median:** Approximately 0.92
* **Interquartile Range (IQR):** From approximately 0.90 (Q1) to 0.95 (Q3).
* **Whiskers:** Extend from approximately 0.80 (lower) to 0.98 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.70 to 0.80.
* **REAP (Dark Blue):**
* **Trend:** Similar high diversity to Baseline, increased from N-gram size 2.
* **Median:** Approximately 0.91
* **Interquartile Range (IQR):** From approximately 0.89 (Q1) to 0.94 (Q3).
* **Whiskers:** Extend from approximately 0.80 (lower) to 0.97 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.70 to 0.80.
* **M-SMoE (Light Blue):**
* **Trend:** Moderate diversity, increased from N-gram size 2.
* **Median:** Approximately 0.89
* **Interquartile Range (IQR):** From approximately 0.86 (Q1) to 0.92 (Q3).
* **Whiskers:** Extend from approximately 0.78 (lower) to 0.95 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.50 to 0.77.
* **HC-SMoE (Khaki/Olive Green):**
* **Trend:** Lowest diversity, increased from N-gram size 2, but still with a wide spread.
* **Median:** Approximately 0.85
* **Interquartile Range (IQR):** From approximately 0.78 (Q1) to 0.91 (Q3).
* **Whiskers:** Extend from approximately 0.60 (lower) to 0.95 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.45 to 0.55.
**C. N-gram size = 4:**
* **Baseline (Grey):**
* **Trend:** Highest diversity, increased from N-gram size 3.
* **Median:** Approximately 0.96
* **Interquartile Range (IQR):** From approximately 0.94 (Q1) to 0.98 (Q3).
* **Whiskers:** Extend from approximately 0.88 (lower) to 0.99 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.78 to 0.87.
* **REAP (Dark Blue):**
* **Trend:** Similar high diversity to Baseline, increased from N-gram size 3.
* **Median:** Approximately 0.95
* **Interquartile Range (IQR):** From approximately 0.93 (Q1) to 0.97 (Q3).
* **Whiskers:** Extend from approximately 0.88 (lower) to 0.99 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.78 to 0.87.
* **M-SMoE (Light Blue):**
* **Trend:** Moderate diversity, increased from N-gram size 3.
* **Median:** Approximately 0.94
* **Interquartile Range (IQR):** From approximately 0.91 (Q1) to 0.96 (Q3).
* **Whiskers:** Extend from approximately 0.85 (lower) to 0.98 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.60 to 0.84.
* **HC-SMoE (Khaki/Olive Green):**
* **Trend:** Lowest diversity, increased from N-gram size 3, but still with a wide spread.
* **Median:** Approximately 0.90
* **Interquartile Range (IQR):** From approximately 0.85 (Q1) to 0.95 (Q3).
* **Whiskers:** Extend from approximately 0.70 (lower) to 0.98 (upper).
* **Outliers:** Several data points are observed below the lower whisker, ranging from approximately 0.35 to 0.68.