## Line Chart: Verification Performance vs. Computation Time
### Overview
This image is a line chart comparing the performance of seven different computational methods or algorithms. The chart plots the percentage of properties successfully verified against the computation time required, measured in seconds on a logarithmic scale. The primary purpose is to visualize the trade-off between computational cost (time) and verification success rate for each method.
### Components/Axes
* **Chart Type:** Step line chart (lines move horizontally then vertically, indicating discrete verification events over time).
* **X-Axis (Horizontal):**
* **Label:** `Computation time (in s)`
* **Scale:** Logarithmic (base 10).
* **Major Tick Marks:** `10^-1` (0.1), `10^0` (1), `10^1` (10), `10^2` (100), `10^3` (1000).
* **Y-Axis (Vertical):**
* **Label:** `% of properties verified`
* **Scale:** Linear, from 0 to 100.
* **Major Tick Marks:** 0, 20, 40, 60, 80, 100.
* **Legend:** Located in the top-left corner of the plot area. It lists seven data series with corresponding line colors:
1. **BaBSB** (Blue line)
2. **BaB** (Orange line)
3. **reluBaB** (Green line)
4. **reluplex** (Red line)
5. **MIPplanet** (Purple line)
6. **planet** (Brown line)
7. **BlackBox** (Pink line)
* **Reference Line:** A dashed grey horizontal line at the 100% mark on the y-axis, indicating the theoretical maximum verification rate.
### Detailed Analysis
The chart shows the cumulative percentage of properties verified as computation time increases. All methods start at 0% verification at time zero. The step-like nature of the lines indicates that verifications are completed in batches or at specific time intervals.
**Trend Verification & Data Point Extraction (Approximate):**
1. **BaBSB (Blue):**
* **Trend:** Starts verifying later than most methods but shows a strong, steady increase, ultimately achieving the highest verification rate.
* **Key Points:** Begins rising around 0.5s. Reaches ~30% by 2s, ~40% by 5s, ~50% by 20s, and plateaus at approximately **55%** after 100s.
2. **BaB (Orange):**
* **Trend:** Very similar trajectory to BaBSB, closely following it but consistently slightly below. It is the second-best performer.
* **Key Points:** Begins rising around 0.5s. Reaches ~25% by 2s, ~40% by 10s, and plateaus at approximately **53%** after 100s.
3. **reluBaB (Green):**
* **Trend:** Starts verifying earlier than BaB/BaBSB but has a lower final plateau. Shows a moderate, steady increase.
* **Key Points:** Begins rising around 0.3s. Reaches ~20% by 1s, ~30% by 3s, and plateaus at approximately **42%** after 10s.
4. **reluplex (Red):**
* **Trend:** One of the earliest methods to start verifying, but its progress stalls early, resulting in a low final verification rate.
* **Key Points:** Begins rising almost immediately (before 0.1s). Reaches ~12% by 0.2s, then makes very slow progress, ending at approximately **21%** after 1000s.
5. **MIPplanet (Purple):**
* **Trend:** Starts very late and shows the slowest progress, resulting in the lowest final verification rate among the plotted methods.
* **Key Points:** Begins rising around 5s. Reaches ~10% by 20s, ~15% by 100s, and ends at approximately **18%** after 1000s.
6. **planet (Brown):**
* **Trend:** Starts early, similar to reluplex, but achieves a higher plateau. Progress is stepwise and moderate.
* **Key Points:** Begins rising around 0.1s. Reaches ~12% by 0.5s, ~20% by 2s, ~30% by 10s, and plateaus at approximately **33%** after 20s.
7. **BlackBox (Pink):**
* **Trend:** Starts moderately early and shows a steady, stepwise increase to a mid-range final value.
* **Key Points:** Begins rising around 0.8s. Reaches ~10% by 2s, ~20% by 5s, and plateaus at approximately **30%** after 20s.
### Key Observations
* **Performance Hierarchy:** There is a clear performance stratification. **BaBSB** and **BaB** are the top performers, followed by a middle group (**reluBaB, planet, BlackBox**), and then the lower-performing **reluplex** and **MIPplanet**.
* **Time vs. Success Trade-off:** Methods that start verifying very early (e.g., **reluplex, planet**) do not necessarily achieve high final verification rates. Conversely, the top methods (**BaBSB, BaB**) invest more initial time before their first verifications but then scale more effectively.
* **Plateaus:** All methods eventually plateau, indicating that given more time, they are unlikely to verify a significantly higher percentage of properties beyond a certain point. The plateau levels vary dramatically.
* **Logarithmic Time Scale:** The use of a log scale emphasizes the wide range of computation times (from tenths of a second to over 1000 seconds) and highlights the early-stage behavior of the algorithms.
### Interpretation
This chart provides a comparative efficiency analysis of formal verification or neural network analysis methods. The data suggests:
1. **Superiority of Branch-and-Bound Variants:** The top two methods, **BaBSB** and **BaB**, likely represent advanced Branch-and-Bound algorithms. Their superior performance indicates that this algorithmic approach, while potentially having higher initial overhead, is more effective at scaling to verify a larger portion of properties within a given time budget.
2. **Early Starters vs. High Finishers:** There is an inverse relationship between "time to first verification" and "final verification rate" among the top and bottom performers. This could imply that methods making quick, easy verifications exhaust the simple cases early but struggle with complex ones, while more sophisticated methods take time to set up but make more progress on harder problems.
3. **Diminishing Returns:** The universal plateauing demonstrates the law of diminishing returns in computational verification. After a certain point (which varies by method), throwing more computation time at the problem yields minimal additional success. This is critical for practical applications where time budgets are fixed.
4. **Method Selection Implication:** The choice of method depends heavily on the time constraint. For very short time budgets (<1s), **reluplex** or **planet** might be preferable. For longer budgets (>10s), **BaBSB** or **BaB** are clearly superior. **MIPplanet** appears to be the least efficient method in this comparison.
**Notable Anomaly:** The **reluplex** line shows an unusual, very early jump to ~12% before 0.2s, followed by near-stagnation. This could indicate it quickly solves a specific subclass of easy problems but lacks the mechanisms to handle the remainder.