## Diagram: Foundation Model Robustness to Distribution Shifts
### Overview
This image is a conceptual diagram illustrating two categories of distribution shifts encountered by foundation models (FMs). It contrasts shifts where FMs have shown improved robustness against those that remain persistently challenging. The diagram uses a combination of example images, text, and citations to categorize and explain these shifts.
### Components/Axes
The diagram is organized into two main vertical panels, each with a purple header.
**Left Panel Header:** "Shifts with improved robustness from FMs"
**Right Panel Header:** "Persistently challenging shifts"
A vertical axis on the far left defines two states:
* **ID** (In-Distribution): Represented by a light blue background.
* **OOD** (Out-of-Distribution): Represented by a light pink background.
Each panel contains two to three columns representing specific shift types. Each column has:
1. A category title at the top.
2. An example image or text block in the ID (blue) section.
3. A downward-pointing purple arrow.
4. A corresponding example image or text block in the OOD (pink) section.
5. A citation (Author 'Year) at the very bottom in blue text.
### Detailed Analysis
#### Left Panel: Shifts with improved robustness from FMs
This panel contains three columns:
1. **Column 1: Common corruptions**
* **ID Image (Top-Left):** A clear, color photograph of a yellow bird perched on a branch with pink blossoms.
* **OOD Image (Bottom-Left):** A blurred, lower-resolution version of the same bird image.
* **Citation:** Hendrycks '19
2. **Column 2: Shifts across space**
* **ID Image (Top-Center):** An aerial or satellite photograph of a landscape with fields and roads.
* **OOD Image (Bottom-Center):** A heavily pixelated or low-resolution version of the same landscape image.
* **Citation:** Xie '21
3. **Column 3: Domain shift**
* **ID Image (Top-Right):** A photograph of a bunch of yellow bananas in a wooden bowl.
* **OOD Image (Bottom-Right):** A photograph of a single, peeled banana against a white background.
* **Citation:** Radford '21
#### Right Panel: Persistently challenging shifts
This panel contains two columns:
1. **Column 1: Extrapolation, e.g. shift across time**
* **ID Text (Top-Left):** "Pence is the Vice President of the US."
* **OOD Text (Bottom-Left):** "Harris is the Vice President of the US."
* **Citation:** Lazaridou '21
2. **Column 2: Spurious correlations**
* **ID Image (Top-Right):** A photograph of a cow standing on a grassy mountain slope.
* **OOD Image (Bottom-Right):** A photograph of a cow lying on a sandy beach near a boat.
* **Citation:** Beery '18
### Key Observations
* **Visual vs. Semantic Shifts:** The left panel primarily illustrates *visual* corruptions and domain changes (blur, pixelation, object state). The right panel illustrates *semantic* or *contextual* shifts (factual knowledge over time, object-context associations).
* **Layout Symmetry:** Both panels use an identical ID (top/blue) to OOD (bottom/pink) flow, connected by arrows, creating a clear comparative structure.
* **Citation Placement:** All academic citations are placed at the bottom of their respective columns, attributing the example or the research on that shift type.
* **Color Coding:** The light blue (ID) and light pink (OOD) backgrounds are consistently applied across both panels to denote the distribution state.
### Interpretation
This diagram serves as a taxonomy for understanding the limitations and strengths of current foundation models regarding distribution shift. It suggests that FMs have become notably robust to many *visual* perturbations and corruptions (left panel), likely due to training on vast, diverse datasets that implicitly cover these variations. Examples like "Common corruptions" (blur) and "Domain shift" (bananas to peeled banana) represent changes in the visual rendering or style of a concept, which models can often generalize across.
However, the right panel highlights fundamental challenges that are not merely visual. "Extrapolation across time" involves updating factual knowledge, a task requiring temporal reasoning or access to current information beyond a static training set. "Spurious correlations" involve decoupling objects from their typical backgrounds (e.g., cows are not *only* found on grass), which requires models to learn causal features rather than statistical shortcuts. These shifts are "persistently challenging" because they test deeper reasoning, world knowledge, and the ability to avoid biased associations, pointing to areas where model architecture or training paradigms may need advancement. The diagram effectively argues that robustness is not a monolithic property but is highly dependent on the *nature* of the shift.