Image e6eb80555e88...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Latency Comparison of Language Models on Jetson Orin and Raspberry Pi 4

## Key Components
- **X-axis**: Language models with parameter sizes (e.g., Llama-2 (7B), Llama-2 (13B), LLaMA (30B), etc.)
- **Y-axis**: Tokens processed per second (Tokens/sec)
- **Legend**:
  - AutoGPTQ (gray)
  - llama.cpp (dark gray)
  - exllama (black)
  - TinyChat (red)

---

## Jetson Orin (64G) Mobile GPU (a)
### Latency Comparison
| Model               | AutoGPTQ | llama.cpp | exllama | TinyChat |
|---------------------|----------|-----------|---------|----------|
| Llama-2 (7B)        | 13.4     | 15.9      | 22.5    | 39.1     |
| Llama-2 (13B)       | 8.0      | 9.1       | 13.3    | 21.2     |
| LLaMA (30B)         | 2.9      | 5.8       | 3.2     | 8.8      |
| Llama-2 (70B)       | 1.3      | 3.0       | 1.4     | 3.5      |
| StarCoder (15.5B)   | 6        | 12        | 18      | 32       |
| StableCode (30B)    | 1.3      | 3.0       | 1.4     | 3.5      |
| Mistral (7B)        | 14       | 17        | -       | -        |

---

## Raspberry Pi 4 (b)
### Latency Comparison
| Model               | AutoGPTQ | llama.cpp | exllama | TinyChat |
|---------------------|----------|-----------|---------|----------|
| Llama-2 (7B)        | 0.7      | 0.7       | 0.7     | 3.0      |
| OPT (6.7B)          | 0.7      | NS        | 0.7     | 0.7      |
| Falcon (7B)         | 3        | 22        | 10      | 15       |

---

## Observations
1. **Device Performance**:
   - Jetson Orin outperforms Raspberry Pi 4 across all models and methods.
   - Higher parameter models (e.g., Llama-2 70B) show significantly lower tokens/sec on both devices.

2. **Method Efficiency**:
   - **TinyChat** (red) consistently achieves the highest tokens/sec (lowest latency) on Jetson Orin.
   - **AutoGPTQ** (gray) and **llama.cpp** (dark gray) show moderate performance, with varying support across models.
   - **exllama** (black) has limited support (e.g., "NS" for OPT on Raspberry Pi 4).

3. **Unsupported Methods**:
   - "NS" (Not Supported) and "Not Supported" labels indicate method incompatibility with specific models/devices.

4. **Raspberry Pi 4 Limitations**:
   - Severe performance degradation for larger models (e.g., Falcon 7B: 3–22 tokens/sec).
   - Some methods (e.g., exllama) are unsupported for certain models.

---

## Notes
- **Device Icons**: Jetson Orin (GPU icon), Raspberry Pi 4 (Pi 4 image).
- **Parameter Sizes**: Model sizes in parentheses (e.g., 7B = 7 billion parameters).
- **Color Consistency**: Legend colors match bar colors across all models.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

e6eb80555e88ae94cd2fca5d

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1