# Technical Document Extraction: Multi-Modal Task Examples
This image displays four distinct examples of computer vision and natural language processing tasks applied to mobile or web screenshots. The image is organized into four columns labeled (a) through (d), each containing a screenshot, a "Text input" (prompt), and a "Target" (expected output).
---
## (a) Screen Annotation
**Visual Content:**
* **Top Image:** A dark background with white text and red rectangular redactions/annotations. Visible text: "love", "then the Merciful appears before".
* **Bottom Image:** A document-style layout featuring a green-tinted photo of a person writing. Below the photo is text with heavy red redactions. Visible text includes: "feels the pleasure", "love, follows truthfulness", "Merciful appears before", "February 2, 2017 || Written / No Comments", and a footer with scattered words like "one", "the pleasure", "love, follows truthfulness".
**Data Extraction:**
| Field | Value |
| :--- | :--- |
| **Text input** | Describe this screenshot. |
| **Target** | IMAGE pleasure or love follows truthfulness then the merciful appears before him 0 993 0 261 (TEXT pleasure of love, follows truthfulness, then the Merciful appears before him 3 991 0 248), IMAGE a ma... |
---
## (b) Question-Answering
**Visual Content:**
* **Header:** "Andrew Ramroop, London" with a Facebook icon.
* **Section - PROJEKTBESCHREIBUNG:** "Image Campaign Andrew Ramroop, Tailor of Queen Elisabeth, London".
* **Section - CREDITS:** "Silhouette http://www.silhouette.com".
* **Images:** Two photographs showing a tailor (Andrew Ramroop) in a workshop setting. One is a wide shot of a shop; the other is a black-and-white close-up of him sewing.
**Data Extraction:**
| Field | Value |
| :--- | :--- |
| **Text input** | What is the name of the tailor? |
| **Target** | Andrew Ramroop |
---
## (c) Navigation
**Visual Content:**
* **Interface:** A mobile web browser showing the URL `www.nice.com.sa`. The site name is "nice | نايس".
* **Product List:**
1. **FOOD WARMERS:** Image of gold-trimmed serving dishes.
2. **SERVING TROLLEYS:** Image of a multi-tiered metal cart (Eid Sale 70% badge).
3. **VACUUM FLASKS SETS:** Image of ornate silver and gold thermoses (Eid Sale 70% badge).
* **Footer:** Navigation arrows and a share icon.
**Data Extraction:**
| Field | Value |
| :--- | :--- |
| **Text input** | Select the first item in the list. |
| **Target** | click 15 983 199 359 (Note: These represent coordinate-based click actions). |
---
## (d) Summarization
**Visual Content:**
* **Header:** Website "adimeback" with a search icon and menu. Categories: "Men's Basketball", "News", "Recruiting".
* **Headline:** Hurley ‘Diggins’ Into Philly For Next Point Guard.
* **Byline:** By Peter Bard - July 26, 2020.
* **Main Image:** A basketball player in a white jersey with "WOOD 3" shooting a ball. Caption: "Rahsool Diggins is Dan Hurley's first 2021 recruit (Mark Palczewski/Media News Group)".
* **Social Icons:** Facebook, Twitter, Pinterest, WhatsApp.
* **Article Text:** "Dan Hurley landed his first recruit of the 2021 class on Sunday evening as Rahsool Diggins, a 6'1" point guard from Philadelphia, announced his commitment to UConn on Instagram. Diggins (ranked #54 in the 247 Composite) has a solid jump shot to go along with some outstanding passing ability and is exactly the..."
**Data Extraction:**
| Field | Value |
| :--- | :--- |
| **Text input** | Summarize this screenshot. |
| **Target** | The screenshot shows a news article about UConn men's basketball recruiting. The article is about Dan Hurley's first recruit of the 2021 class, Rahsool Diggins, a 6'1" point guard from Philadelphia. |