## PRISMA Flow Diagram: Study Selection for a Systematic Review
### Overview
This image is a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. It visually documents the process of identifying, screening, and selecting studies for inclusion in a systematic review. The diagram is divided into two primary pathways for study identification, which converge at the final inclusion stage. All text is in English.
### Components/Axes
The diagram is structured with vertical stages on the left and two main horizontal pathways.
**Vertical Stages (Left Margin):**
1. **Identification** (Top section, blue vertical bar)
2. **Screening** (Middle section, blue vertical bar)
3. **Included** (Bottom section, blue vertical bar)
**Horizontal Pathways (Headers):**
* **Left Column (Yellow Header):** "Identification of new studies via databases and registers"
* **Right Column (Gray Header):** "Identification of new studies via other methods"
**Flowchart Boxes and Connectors:**
The process is depicted using rectangular boxes connected by directional arrows, showing the flow and attrition of records.
### Detailed Analysis
The process is broken down into two parallel identification streams.
**1. Left Column: Identification via Databases and Registers**
* **Initial Identification:**
* Box: "Records identified from: Databases (n = 13,243)"
* Sub-list within the box:
* "ACM (n = 317)"
* "ACL Anthology (n = 12,132)"
* "IEEE Xplore (n = 311)"
* "ScienceDirect (n = 383)"
* *Note: The sum of the listed databases (13,143) is 100 less than the stated total (13,243), suggesting 100 records came from other, unspecified databases.*
* **Removal Before Screening:**
* Arrow points to: "Records removed before screening: Duplicate records (n = 0)"
* **Screening Stage:**
* Box: "Records screened (n = 13,243)"
* Arrow points right to: "Records excluded (n = 13,010)"
* **Retrieval Stage:**
* Box: "Reports sought for retrieval (n = 233)"
* Arrow points right to: "Reports not retrieved (n = 9)"
* **Eligibility Assessment:**
* Box: "Reports assessed for eligibility (n = 224)"
* Arrow points right to: "Reports excluded:"
* "Reason: Non-NLP (n = 38)"
* "Reason: Non-relevant (n = 52)"
* **Final Inclusion (Left Pathway Contribution):**
* The final arrow from this column points down to the "Included" box. The number of studies from this pathway that reach inclusion is not explicitly stated in its own box but is part of the final total.
**2. Right Column: Identification via Other Methods**
* **Initial Identification:**
* Box: "Records identified from: Citation searching (n = 52)"
* **Screening & Retrieval Stage:**
* Box: "Reports sought for retrieval (n = 52)"
* Arrow points right to: "Reports not retrieved (n = 0)"
* **Eligibility Assessment:**
* Box: "Reports assessed for eligibility (n = 52)"
* Arrow points right to: "Reports excluded:"
* "Reason: Non-NLP (n = 6)"
* "Reason: Non-relevant (n = 5)"
* **Final Inclusion (Right Pathway Contribution):**
* The arrow from this column's assessment box points left and down, merging into the final "Included" box.
**3. Final Included Studies (Convergence Point)**
* **Box (Bottom Left):** "Reports of new included studies (n = 175)"
* This box receives arrows from both the left column's eligibility assessment and the right column's eligibility assessment, indicating it is the combined total from both identification methods.
### Key Observations
1. **Massive Attrition:** The vast majority of initially identified records (13,010 out of 13,243, or ~98.3%) were excluded during the title/abstract screening phase.
2. **Primary Source:** The ACL Anthology database contributed the overwhelming majority of initial records (12,132 out of 13,243, or ~91.7%).
3. **Exclusion Reasons:** The primary reasons for excluding reports after full-text assessment were that they were not related to Natural Language Processing (Non-NLP) or were deemed non-relevant to the review's specific questions.
4. **Data Consistency Check:**
* **Left Pathway Math:** 224 reports assessed - 38 (Non-NLP) - 52 (Non-relevant) = 134 reports eligible from databases.
* **Right Pathway Math:** 52 reports assessed - 6 (Non-NLP) - 5 (Non-relevant) = 41 reports eligible from citation searching.
* **Combined Total:** 134 + 41 = 175. This matches the final "n = 175" in the Included box, confirming the diagram's internal numerical consistency.
5. **Comprehensive Search:** The review employed a dual-strategy search, combining a broad database search with a targeted citation search to ensure comprehensive coverage.
### Interpretation
This PRISMA diagram tells the story of a rigorous and highly focused systematic review, likely in the field of computational linguistics or NLP, given the dominance of the ACL Anthology database.
* **Process Rigor:** The flow demonstrates a methodical, transparent, and reproducible study selection process, adhering to established scientific standards. The high exclusion rate at the screening stage is typical for systematic reviews and indicates well-defined, specific inclusion criteria.
* **Search Strategy Effectiveness:** The use of "Citation searching" (also known as snowballing) as a secondary method yielded 41 included studies (23.4% of the final total). This highlights the importance of supplementing database searches with other methods to capture relevant studies that may be missed by keyword-based database queries.
* **Focus of the Review:** The explicit exclusion reasons ("Non-NLP", "Non-relevant") strongly suggest the review's scope is narrowly defined around NLP topics. The significant number excluded for being "Non-relevant" (57 total) implies the review addresses a specific sub-question or application area within NLP.
* **Outcome:** The final synthesis will be based on 175 studies, a substantial number that suggests the review aims to provide a comprehensive overview or meta-analysis of a well-researched topic within the NLP domain. The diagram successfully allows a reader to "throw away the image" and understand the exact provenance and selection journey of every study included in the final review.