# Technical Document Extraction: Entity Frequency Analysis
## Figure Description
The image is a **line graph** comparing the **entity frequency distribution** across three datasets: **ZeroGen**, **DemoGen**, and **Ground Truth**. The graph uses a **logarithmic scale** for the y-axis to emphasize differences in frequency magnitudes.
---
### **Axis Labels**
- **X-axis**: `Entity ID's Sorted by Frequency`
- Range: `0` to `700` (discrete intervals).
- **Y-axis**: `Entity Frequency`
- Logarithmic scale: `10^-4` to `10^-1`.
---
### **Legend**
- **ZeroGen**: Blue line.
- **DemoGen**: Orange line.
- **Ground Truth**: Green line.
---
### **Key Trends**
1. **Initial High Frequency**:
- All three lines start near `10^-1` frequency for the first few entity IDs (IDs 0–50).
- **Ground Truth** maintains the highest frequency throughout, followed by **DemoGen** and **ZeroGen**.
2. **Divergence After ID 200**:
- **ZeroGen** and **DemoGen** intersect near `x=200`, after which **ZeroGen** drops below **DemoGen**.
- **Ground Truth** remains consistently above both generated datasets.
3. **Long-Tail Behavior**:
- Frequencies decay exponentially for all datasets, with **Ground Truth** retaining higher values in the long tail (IDs > 500).
---
### **Critical Observations**
- **ZeroGen** underperforms **DemoGen** and **Ground Truth** in retaining high-frequency entities beyond ID 200.
- **DemoGen** aligns more closely with **Ground Truth** than **ZeroGen**, particularly in the mid-range (IDs 100–400).
- The logarithmic scale highlights the steep drop-off in frequency for lower-ranked entities.
---
### **Data Extraction Notes**
- No explicit numerical data points are labeled, but the graph implies:
- **ZeroGen**: ~10^-3 frequency at ID 500.
- **DemoGen**: ~10^-3.5 frequency at ID 500.
- **Ground Truth**: ~10^-2.5 frequency at ID 500.
---
### **Conclusion**
The graph demonstrates that **Ground Truth** outperforms both generated datasets in preserving high-frequency entities. **DemoGen** shows moderate alignment with **Ground Truth**, while **ZeroGen** exhibits significant deviation, particularly in the long tail.