# Technical Document Extraction: GTransformer Architecture Diagram
## Diagram Overview
The image depicts a **GTransformer** neural network architecture for audio classification. The system processes audio waveforms through multiple stages to classify sounds into categories like "RORO," "Passenger," and "Fishboat."
---
## Key Components and Flow
### 1. Input Processing
- **T-Transform Block**
- Converts raw audio waveform (blue waveform) into a **spectrogram** (blue-green heatmap).
- Spatial grounding: Located at the bottom-left of the diagram.
### 2. Feature Extraction
- **Mel Patchify Block**
- Applies **3x3 convolutions** to the spectrogram.
- Outputs **Patch+Position Embeddings** (grid of green squares).
- Spatial grounding: Positioned below the T-Transform block.
### 3. GTransformer Blocks (Stacked Layers)
- **Structure of Each Block**:
- **FFN (Feed-Forward Network)**: Red rectangle.
- **GNN (Graph Neural Network)**: Pink rectangle.
- **Transformer Encoder**: Beige rectangle.
- **Connections**: Dashed arrows between components.
- **Flow**:
- Input embeddings pass through sequential GTransformer blocks.
- Outputs are aggregated via **Pooling** (yellow rectangle).
### 4. Classification Head
- **Components**:
- **Pooling Layer**: Aggregates features.
- **1x1 Convolution Layers**: Two sequential layers (yellow rectangles).
- **Output**: Class probabilities for categories:
- RORO
- Passenger
- Fishboat
- ... (additional classes)
---
## Legend and Labels
- **Legend**: Located in the **top-right corner** (yellow box).
- Labels: `Class: RORO, Passenger, Fishboat, ...`
- Color coding: Matches output class predictions.
---
## Spatial Grounding and Component Isolation
1. **Header**: Classification Head (top section).
2. **Main Chart**: GTransformer blocks and Mel Patchify Block (central region).
3. **Footer**: Input processing (T-Transform and Mel Patchify Block).
---
## Notes
- No numerical data or trends are present; the diagram focuses on architectural components.
- All labels and text are in **English**.
- No data tables or heatmaps with categorical axes are included.
---
## Diagram Flow Summary