## Heatmaps: MFCCs Coefficient Variation Across Frames
### Overview
The image presents four heatmaps, each visualizing the variation of MFCCs (Mel-Frequency Cepstral Coefficients) across different frames for four distinct actions: "Key Collect", "Jump", "Apple Collect", and "Coin Collect". The heatmaps display the coefficient value (ranging from approximately 0 to 12) on the y-axis against the frame number on the x-axis. Color intensity represents the coefficient value, with red indicating higher values and blue indicating lower values.
### Components/Axes
Each heatmap shares the following components:
* **X-axis:** "Frame" - Represents the frame number in the video sequence. The scales vary for each action:
* "Key Collect": 0 to 200
* "Jump": 0 to 35
* "Apple Collect": 0 to 400
* "Coin Collect": 0 to 160
* **Y-axis:** "Coefficient" - Represents the MFCC coefficient number, ranging from 0 to 12.
* **Color Scale:** A diverging color scale where red represents high coefficient values, blue represents low coefficient values, and white/light shades represent values near zero.
* **Titles:** Each heatmap has a title indicating the action being analyzed (e.g., "MFCCs (Key Collect)").
### Detailed Analysis or Content Details
**1. MFCCs (Key Collect)**
* The heatmap spans frames 0 to 200 and coefficients 0 to 12.
* The overall trend shows a dynamic pattern of coefficient variation.
* Initially (frames 0-25), coefficients 6-12 exhibit relatively high values (approximately 8-12, indicated by red color).
* From frames 25-75, there's a transition with coefficients 0-5 increasing in value while the higher coefficients decrease.
* Between frames 75-150, the pattern becomes more complex, with fluctuating values across all coefficients.
* From frames 150-200, coefficients 0-5 show a decrease in value, while coefficients 6-12 increase again.
**2. MFCCs (Jump)**
* The heatmap spans frames 0 to 35 and coefficients 0 to 12.
* The initial frames (0-10) show relatively low coefficient values (approximately 0-4, indicated by blue color).
* A rapid increase in coefficient values (6-12) occurs between frames 10-20, peaking around a value of 10-12 (red color).
* From frames 20-35, the coefficients gradually decrease, returning to lower values.
**3. MFCCs (Apple Collect)**
* The heatmap spans frames 0 to 400 and coefficients 0 to 12.
* The initial frames (0-50) show a relatively stable pattern with coefficients 6-12 having moderate values (approximately 4-8).
* From frames 50-200, there's a significant increase in coefficient variation, with fluctuating values across all coefficients.
* Between frames 200-300, the pattern stabilizes again, with coefficients 0-5 showing higher values (approximately 6-10).
* From frames 300-400, the coefficients gradually decrease.
**4. MFCCs (Coin Collect)**
* The heatmap spans frames 0 to 160 and coefficients 0 to 12.
* The initial frames (0-20) show low coefficient values (approximately 0-4, indicated by blue color).
* A rapid increase in coefficient values (6-12) occurs between frames 20-60, peaking around a value of 8-12 (red color).
* From frames 60-120, the coefficients fluctuate with moderate values.
* Between frames 120-160, the coefficients gradually decrease.
### Key Observations
* The "Jump" action exhibits the most distinct and rapid change in MFCC coefficients, with a clear peak around frames 10-20.
* "Key Collect" and "Apple Collect" show more gradual and complex variations in coefficients over longer frame durations.
* "Coin Collect" shows a similar pattern to "Jump" but with a slightly slower and more sustained increase in coefficients.
* The higher-order coefficients (6-12) generally exhibit greater variation than the lower-order coefficients (0-5).
### Interpretation
These heatmaps demonstrate how MFCCs, which represent the spectral envelope of a sound, change over time during different actions. The variations in coefficients reflect the dynamic characteristics of the sounds associated with each action. The distinct patterns observed for each action suggest that MFCCs can be used as features for action recognition.
The rapid change in coefficients during the "Jump" action likely corresponds to the impact sound and the associated changes in vocalization or body movement. The more gradual changes in "Key Collect" and "Apple Collect" may reflect the continuous nature of these actions and the subtle variations in sounds produced during their execution.
The differences in the heatmap patterns highlight the potential for using machine learning algorithms to classify actions based on their MFCC profiles. The outliers and trends observed in the data can provide insights into the specific acoustic features that are most discriminative for each action. The varying frame lengths suggest that the duration of each action differs, which could also be a useful feature for classification.