# FairPFN: A Tabular Foundation Model for Causal Fairness
Abstract
Machine learning (ML) systems are utilized in critical sectors, such as healthcare, law enforcement, and finance. However, these systems are often trained on historical data that contains demographic biases, leading to ML decisions that perpetuate or exacerbate existing social inequalities. Causal fairness provides a transparent, human-in-the-loop framework to mitigate algorithmic discrimination, aligning closely with legal doctrines of direct and indirect discrimination. However, current causal fairness frameworks hold a key limitation in that they assume prior knowledge of the correct causal model, restricting their applicability in complex fairness scenarios where causal models are unknown or difficult to identify. To bridge this gap, we propose FairPFN, a tabular foundation model pre-trained on synthetic causal fairness data to identify and mitigate the causal effects of protected attributes in its predictions. FairPFN’s key contribution is that it requires no knowledge of the causal model and still demonstrates strong performance in identifying and removing protected causal effects across a diverse set of hand-crafted and real-world scenarios relative to robust baseline methods. FairPFN paves the way for promising future research, making causal fairness more accessible to a wider variety of complex fairness problems.
1 Introduction
Algorithmic discrimination is among the most pressing AI-related risks of our time, manifesting when machine learning (ML) systems produce outcomes that disproportionately disadvantage historically marginalized groups Angwin et al. (2016). Despite significant advancements by the fairness-aware ML community, critiques highlight the contextual limitations and lack of transferability of current statistical fairness measures to practical legislative frameworks Weerts et al. (2023). In response, the field of causal fairness has emerged, providing a transparent and human-in-the-loop causal framework for assessing and mitigating algorithmic bias with a strong analogy to existing anti-discrimination legal doctrines Plecko & Bareinboim (2024).
<details>
<summary>x1.png Details</summary>

### Visual Description
## Diagram: FairPFN Pre-training Process for Fair Prediction
### Overview
The diagram illustrates a technical workflow for training a fair prediction model (FairPFN) using structural causal models (SCM) and observational data. It emphasizes fairness constraints through pre-training loss calculations that compare model predictions against protected attributes and fair outcomes.
### Components/Axes
1. **Structural Causal Model (SCM)**
- Leftmost section with nodes:
- Protected attribute: `A₀` (blue)
- Unobserved confounders: `U₂` (green)
- Observables: `X₁`, `X₂`, `X₃` (purple)
- Biased outcome: `Y_b` (orange)
- Fair outcome: `Y_f` (yellow)
- Arrows indicate causal relationships (e.g., `A₀ → X₂`, `U₂ → X₁`).
2. **Observational Dataset**
- Tabular format with columns:
- `A` (protected attribute, blue)
- `X₁`, `X₂`, `X₃` (observables, purple)
- `Y_b` (biased outcome, orange)
- Color-coded cells suggest data distribution (e.g., darker purple for `X₁`).
3. **FairPFN**
- Central green block representing the model architecture.
- Equations:
- `p(y_f | x_b, D_b) ∝ ∫ p(y_f | x_b, φ)p(D_b | φ)p(φ)dφ`
- Indicates probabilistic inference over fairness parameters (φ).
4. **Pre-training Loss**
- Rightmost section with two columns:
- `Ŷ_f` (model predictions, black bars)
- `Y_f` (fair outcomes, yellow bars)
- Visual comparison of prediction accuracy vs. fairness targets.
### Detailed Analysis
- **SCM to Observational Dataset**:
The SCM generates a dataset `D` with protected attribute `A`, observables `X_b`, and biased outcome `Y_b`. A fair outcome `Y_f` is derived by removing edges from `A`.
- **Transformer Input**:
The observational dataset is split into training (`D_train`) and validation (`D_val`) sets. The transformer maps `X_b → Y_f` using in-context examples.
- **Fair Prediction**:
The transformer makes predictions `Ŷ_f` on the validation set. Pre-training loss is calculated by comparing `Ŷ_f` to `Y_f`, ensuring alignment with fairness constraints.
### Key Observations
1. **Causal Structure**:
- Protected attribute `A₀` influences observables `X_b` via confounders `U₂`, creating potential bias in `Y_b`.
- Fair outcome `Y_f` isolates `X_b` from `A₀` to mitigate bias.
2. **Data Representation**:
- Observational dataset uses color gradients to differentiate data types (e.g., blue for `A`, orange for `Y_b`).
- Pre-training loss bars show a direct comparison between model outputs (`Ŷ_f`) and ground truth (`Y_f`).
3. **Mathematical Formulation**:
- The FairPFN integrates fairness constraints via probabilistic inference over parameters `φ`, balancing prediction accuracy and fairness.
### Interpretation
This workflow demonstrates a fairness-aware machine learning pipeline. By explicitly modeling causal relationships (SCM) and incorporating fairness constraints into the loss function, the model aims to reduce bias in predictions. The pre-training loss acts as a regularization term, penalizing deviations from fair outcomes. The use of observational data with protected attributes highlights challenges in real-world deployment, where unobserved confounders (`U₂`) may still influence fairness. The diagram emphasizes the importance of causal reasoning in designing robust, equitable AI systems.
</details>
Figure 1: FairPFN Overview: FairPFN is a foundation model for causal fairness, pre-trained on synthetic datasets generated from sparse MLPs that represent SCMs with exogenous protected attributes (a). A biased dataset is created for each MLP/SCM and supplied as context to the transformer (b), with loss computed based on fair outcomes obtained by excluding the causal influence of the protected attribute (c). In practice, (d) FairPFN takes in only an observational dataset to predict fair targets by integrating over the simplest causal explanations for the biased data.
A recent review comparing outcome-based and causal fairness approaches (Castelnovo et al., 2022) argues that the non-identifiability of causal models from observational data Pearl (2009) limits the usage of current causal fairness frameworks in practical applications. In practice, users must provide full or partial information about the underlying causal model, a challenging task given the complexity of systemic inequalities. Furthermore, an incorrectly presumed causal graph, such as one falsely assuming a variable is independent of a protected attribute, can invalidate causal fairness metrics Ma et al. (2023); Binkytė-Sadauskienė et al. (2022), resulting in fairwashing and fostering a false sense of security and trust.
This paper takes a bold new perspective on achieving causal fairness. Our key contribution is FairPFN, a tabular foundation model for causal fairness, pre-trained on synthetic causal fairness data to learn to identify and remove the causal effects of protected attributes in tabular classification settings. When used on a new dataset, FairPFN does not rely on a user-specified causal model or graph, instead solely relying on the causally-generated data it has seen during pre-training. We demonstrate through extensive experiments that FairPFN effectively and consistently mitigates the causal impact of protected attributes across various hand-crafted and real-world scenarios, yielding causally fair predictions without user-specified causal information. We summarize our various contributions:
1. PFNs for Causal Fairness We propose a paradigm shift for algorithmic fairness, in which a transformer is pre-trained on synthetic causal fairness data.
1. Causal Fairness Prior: We introduce a synthetic causal data prior which offers a comprehensive representation for fairness datasets, modeling protected attributes as binary exogenous causes.
1. Foundation Model: We present FairPFN, a foundation model for causal fairness which, given only observational data, identifies and removes the causal effect of binary, exogenous protected attributes in predictions, and demonstrates strong performance in terms of both causal fairness and predictive accuracy on a combination of hand-crafted and real-world causal scenarios. We provide a prediction interface to evaluate and assess our pre-trained model, as well as code to generate and visualize our pre-training data at https://github.com/jr2021/FairPFN.
2 Related Work
In recent years, causality has gained prominence in the field of algorithmic fairness, providing fairness researchers with a structural framework to reason about algorithmic discrimination. Unlike traditional fairness research Kamishima et al. (2012); Agarwal et al. (2018); Hardt et al. (2016), which focuses primarily on optimizing statistical fairness measures, causal fairness frameworks concentrate on the structure of bias. This approach involves modeling causal relationships among protected attributes, observed variables, and outcomes, assessing the causal effects of protected attributes, and mitigating biases using causal methods, such as optimal transport Plecko & Bareinboim (2024) or latent variable estimation Kusner et al. (2017); Ma et al. (2023); Bhaila et al. (2024).
Counterfactual fairness, introduced by Kusner et al. (2017), posits that predictive outcomes should remain invariant between the actual world and a counterfactual scenario in which a protected attribute assumes an alternative value. This notion has spurred interest within the fairness research community, resulting in developments like path-specific extensions Chiappa (2019) and the application of Variational Autoencoders (VAEs) to create counterfactually fair latent representations Ma et al. (2023).
The initial counterfactual fairness framework necessitates comprehensive knowledge of the causal model. In contrast, the Causal Fairness Analysis (CFA) framework Plecko & Bareinboim (2024) relaxes this requirement by organizing variables within a Standard Fairness Model (SFM) for bias assessment and mitigation. Moreover, the CFA framework presents the Fairness Cookbook, which defines causal fairness metrics—Indirect-Effect, Direct-Effect, and Spurious-Effect—that directly align with US legal doctrines of disparate impact and treatment. Furthermore, the CFA framework challenges Kusner et al. (2017) ’s modeling of protected attributes as exogenous causes, permitting correlations between protected attributes and confounding variables that contribute to the legally admissible Spurious-Effect.
3 Background
This section establishes the scientific foundation of FairPFN, including terminology relevant to algorithmic fairness, causal ML, counterfactual fairness, and prior-data fitted networks (PFNs).
Algorithmic Fairness
Algorithmic discrimination occurs when historical biases against demographic groups (e.g., ethnicity, sex) are reflected in the training data of ML algorithms, leading to the perpetuation and amplification of these biases in predictions Barocas et al. (2023). Fairness research focuses on measuring algorithmic bias and developing fairness-aware ML models that produce non-discriminatory predictions. Practitioners have established over 20 fairness metrics, which generally break down into group-level and individual-level metrics Castelnovo et al. (2022). These metrics can be used to optimize predictive models, balancing the commonly observed trade-off between fairness and predictive accuracy Weerts et al. (2024).
Causal Machine Learning Causal ML is a developing field that leverages modern ML methods for causal reasoning Pearl (2009), facilitating advancements in causal discovery, causal inference, and causal reasoning Peters et al. (2014). Causal mechanisms are often represented as Structural Causal Models (SCMs), defined as $\mathcal{M}=(U,O,F)$ , where $U$ are unobservables, $O$ are observables, and $F$ is a set of structural equations. These equations are expressed as $f_{j}:X_{j}=f_{j}(PA_{j},N_{j})$ , indicating that an outcome variable $F$ depends on its parent variables $PA$ and independent noise $N_{j}$ . Non-linearities in the set of structural equations $F$ influence data complexity and identifiability of causal quantities from observational data Schölkopf et al. (2012). In an SCM, interventions can be made by setting $X← x_{1}$ and propagating this value through the model $\mathcal{M}$ , posing the question of "what will happen if I do something?". Counterfactuals expand upon the idea of interventions and are relevant when a value of $X$ is already observed, instead posing the question of "what would have happened if something had been different?" In addition to posing a slightly different question, counterfactuals require that exogenous noise terms are held constant, and thus classically require full knowledge of the causal model. In the context of algorithmic fairness, we are limited to level of counterfactuals as protected attributes are typically given and already observed.
In causal reasoning frameworks, one major application of counterfactuals is the estimation of causal effects such as the individual and average treatment effects (ITE and ATE) which quantify the difference and expected difference between outcomes under different values of $X$ .
$$
ITE:\tau=Y_{X\leftarrow x}-Y_{X\leftarrow x^{\prime}} \tag{1}
$$
$$
ATE:E[\tau]=E[Y_{X\leftarrow x}]-E[Y_{X\leftarrow x^{\prime}}]. \tag{2}
$$
Counterfactual Fairness
is a foundational notion of causal fairness introduced by Kusner et al. (2017), requiring that an individual’s predictive outcome should match that in a counterfactual scenario where they belong to a different demographic group. This notion is formalized in the theorem below.
**Theorem 1 (Unit-level/probabilistic)**
*Given an SCM $\mathcal{M}=(U,O,F)$ where $O=A\cup X$ , a predictor $\hat{Y}$ is counterfactually fair on the unit-level if $∀\hat{y}∈\hat{Y},∀ x,a,a^{\prime}∈ A$
$$
P(\hat{y}_{A\rightarrow a}(u)|X,A=x,a)=P(\hat{y}_{A\rightarrow a^{\prime}}(u)|%
X,A,=x,a)
$$*
Kusner et al. (2017) notably choose to model protected attributes as exogenous, which means that they may not be confounded by unobserved variables with respect to outcomes. We note that the definition of counterfactual fairness in Theorem 1 is the unit-level probabilistic one as clarified by Plecko & Bareinboim (2024), because counterfactual outcomes are generated deterministically with fixed unobservables $U=u$ . Theorem 1 can be applied on the dataset level to form the population-level version also provided by Plecko & Bareinboim (2024) which measures the alignment of natural and counterfactual predictive distributions.
**Theorem 2 (Population-level)**
*Given an SCM $\mathcal{M}=(U,O,F)$ where $O=A\cup X$ , a predictor $\hat{Y}$ is counterfactually fair on the population-level if $∀\hat{y}∈\hat{Y},∀ x,a,a^{\prime}∈ A$
$$
P(\hat{y}_{A\rightarrow a}|X,A=x,a)=P(\hat{y}_{A\rightarrow a^{\prime}}|X,A=x,a)
$$*
Theorem 1 can also be transformed into a counterfactual fairness metric by quantifying the difference between natural and counterfactual predictive distributions. In this study we quantify counterfactual fairness as the distribution of the counterfactual absolute error (AE) between predictions in each distribution.
**Definition 1 (Absolute Error (AE))**
*Given an SCM $\mathcal{M}=(U,O,F)$ where $O=A\cup X$ , the counterfactual absolute error of a predictor $\hat{Y}$ is the distribution
$$
AE=|P(\hat{y}_{A\rightarrow a}(u)|X,A=x,a)-P(\hat{y}_{A\rightarrow a^{\prime}}%
(u)|X,A=x,a)|
$$*
We note that because the outcomes are condition on the same noise terms $u$ our definition of AE builds off of Theorem 1. Intuitively, when the AE is skewed towards zero, then most individuals receive the same prediction in both the natural and counterfactual scenarios.
Kusner et al. (2017) present various implementations of Counterfactually Fair Prediction (CFP). The three levels of CFP can be achieved by fitting a predictive model $\hat{Y}$ to observable non-descendants if any exist (Level-One), inferred values of an exogenous unobserved variable $K$ (Level-Two), or additive noise terms (Level-Three). Kusner et al. (2017) acknowledge that in practice, Level-One rarely occurs. Level-Two requires that the causal model be invertible, which allows the unobservable $K$ to be inferred by abduction. Level-Three models the scenario as an Additive Noise Model, and thus is the strongest in terms of representational capacity, allowing more degrees of freedom than in Level-Two to represent fair terms. The three levels of CFP are depicted in Appendix Figure 22.
Causal Fairness The Causal Fairness Analysis (CFA) framework Plecko & Bareinboim (2024) introduces the Standard Fairness Model (SFM), which classifies variables as protected attributes $A$ , mediators $X_{med}$ , confounders $X_{conf}$ , and outcomes $Y$ . This framework includes a Fairness Cookbook of causal fairness metrics with a strong analogy to the legal notions of direct and indirect discrimination and business necessity as illustrated in Appendix Figure 23. Plecko & Bareinboim (2024) refute the modeling choice of Kusner et al. (2017) by their inclusion of confounders $X_{conf}$ in the SFM, arguing that these variables contribute to the legally admissible Spurious-Effect (SE).
For simplicity of our experimental results, we follow the modeling of Kusner et al. (2017), and focus on the elimination of the Total-Effect (TE) of protected attributes as defined by Plecko & Bareinboim (2024), while noting in Section 6 the importance of relaxing this assumption in future extensions.
Prior-data Fitted Networks Prior-data Fitted Networks (PFNs) Müller et al. (2022) and TabPFN Hollmann et al. (2023, 2025) represent a paradigm shift from traditional ML with a causal motivation, namely that simple causal models offer a quality explanation for real-world data. PFNs incorporate prior knowledge into transformer models by pre-training on datasets from a specific prior distribution Müller et al. (2022). TabPFN, a popular application of PFNs, applies these ideas to small tabular classification tasks by training a transformer on synthetic datasets derived from sparse Structural Causal Models (SCMs). As noted in Hollmann et al. (2023), a key advantage of TabPFN is its link to Bayesian Inference; where the transformer approximates the Posterior Predictive Distribution (PPD), thus achieving state-of-the-art performance by integrating over simple causal explanations for the data.
4 Methodology
In this section, we introduce FairPFN, a foundation model for legally or ethically sensitive tabular classification problems that draws inspiration from PFNs and principles of causal fairness. We introduce our pre-training scheme, synthetic data prior, and draw connections to Bayesian Inference to explain the inner workings of FairPFN.
4.1 FairPFN Pre-Training
First, we present our pre-training scheme, where FairPFN is fit to a prior of synthetic causal fairness data to identify and remove the causal effects of protected attributes in practice from observational data alone. We provide pseudocode for our pre-training algorithm in Algorithm 2, and outline the steps below.
Input:
Number of pre-training epochs $E$ and steps $S$
Transformer $\mathcal{M}$ with weights $\theta$
Hypothesis space of SCMs $\phi∈\Phi$
begin
for $epoch=1$ to $E$ do
for $step=1$ to $S$ do
Draw a random SCM $\phi$ from $\Phi$
Sample $D_{bias}=(A,X_{bias},Y_{bias})$ from $\phi$ where A $\{a_{0},a_{1}\}$ is an exogenous binary protected attribute
Sample $Y_{fair}$ from $\phi$ by performing dropout on outgoing edges of $A$ if any exist
Partition $D_{bias}$ and $D_{fair}$ into $train/val$
Pass $D_{bias}^{train}$ into $\mathcal{M}$ as context
Pass $D_{bias}^{val}$ into $\mathcal{M}$ to generate $Y_{pred}^{val}$
Calculate loss $L=CE(Y_{pred}^{val},Y_{fair}^{val})$
Update weights $\theta$ w.r.t $∇_{\theta}L$
end for
end for
Output: Transformer $\mathcal{M}:X_{bias}→ Y_{fair}$
Algorithm 1 FairPFN Pre-training
Data Generating Mechanisms FairPFN pre-training begins by creating synthetic datasets that capture the causal mechanisms of bias in real-world data. Following the approach of Hollmann et al. (2023), we use Multi-Layer Perceptrons (MLPs) to model Structural Causal Models (SCMs) via the structural equation $f=z(P· W^{T}x+\epsilon)$ , where $W$ denotes activation weights, $\epsilon$ represents Gaussian noise, $P$ is a dropout mask sampled from a log-scale to promote sparsity, and $z$ is a non-linearity. Figure 1 illustrates the connection among sampled MLPs, their corresponding SCMs, and the resulting synthetic pre-training data generated. We note that independent noise terms are not visualized in Figure 1.
Biased Data Generation An MLP is randomly sampled and sparsity is induced through dropout on select edges. The protected attribute is defined as a binary exogenous variable $A∈\{a_{0},a_{1}\}$ at the input layer. We uniformly select $m$ features $X$ from the second hidden layer onwards to capture rich representations of exogenous causes. The target variable $Y$ is chosen from the output layer and discretized into a binary variable using a random threshold. A forward pass through the MLP produces a dataset $D_{bias}=(A,X_{bias},Y_{bias})$ with $n$ samples containing the causal influence of the protected attribute.
Fair Data Generation
A second forward pass generates a fair dataset $D_{fair}$ by applying dropout to the outgoing edges of the protected attribute $A$ in the MLP, as shown by the red edges in Figure 1. This dropout, similar to that in TabPFN, masks the causal weight of $A$ to zero, effectively reducing its influence to Gaussian noise $\epsilon$ . This increases the influence of fair exogenous causes $U_{0}$ and $U_{1}$ and independent noise terms all over the MLP visualized in Figure 1. We note that $A$ is sampled from an arbitrary distribution $A∈\{a_{0},a_{1}\}$ , as opposed to $A∈\{0,1\}$ , since both functions $f=0· wx+\epsilon$ and $f=p· 0x+\epsilon$ yield equivalent outcomes. Only after generating the pre-training dataset is $A$ converted to a binary variable for processing by the transformer.
In-Context Learning After generating $D_{bias}$ and $D_{fair}$ , we partition them into training and validation sets: $D_{bias}^{train}$ , $D_{bias}^{val}$ , $D_{fair}^{train}$ , and $D_{fair}^{val}$ . We pass $D_{bias}^{train}$ as context to the transformer to provide information about feature-target relationships. To simulate inference, we input $X_{bias}^{val}$ into the transformer $\mathcal{M}$ , yielding predictions $Y_{pred}$ . We then compute the binary-cross-entropy (BCE) loss $L(Y_{pred},Y_{fair}^{val})$ against the fair outcomes $Y_{fair}^{val}$ , which do not contain effects of the protected attribute. Thus, the transformer $\mathcal{M}$ learns the mapping $\mathcal{M}:X_{bias}→ Y_{fair}$ .
1
Input:
- Number of exogenous causes $U$
- Number of endogenous variables $U× H$
- Number of features and samples $M× N$
begin
- Define MLP $\phi$ with depth $H$ and width $U$
- Initialize random weights $W:(U× U× H-1)$
- Sample sparsity masks $P$ with same dimensionality as weights
- Sample $H$ per-layer non-linearities $z_{i}\sim\{Identity,ReLU,Tanh\}$
- Initialize output matrix $X:(U× H)$
- Sample location $k$ of protected attribute in $X_{0}$
- Sample locations of features $X_{biased}$ in $X_{1:H-1}$ , and outcome $y_{bias}$ in $X_{H}$
- Sample protected attribute threshold $a_{t}$ and binary values $\{a_{0},a_{1}\}$
for $n=0$ to $N$ samples do
- Sample values of exogenous causes $X_{0}:(U× 1)$
- Sample values of additive noise terms $\epsilon:(U× H)$
for $i=0$ to $H-1$ layers do
- Pass intermediate representation through hidden layer $X_{i+1}=z_{i}(P_{i}· W_{i}^{T}X_{i}+\epsilon_{i})$
end for
- Select prot. attr. $A$ , features $X_{bias}$ and outcome $y_{bias}$ from $X_{0}$ , $X_{1:H-1}$ , and $X_{H}$
- Binarize $A∈\{a_{0},a_{1}\}$ over threshold $a_{t}$
- Set input weights in row $k$ of $W_{0}$ to 0
for $j=0$ to $H-1$ layers do
- Pass intermediate representation through hidden layer $X_{j+1}=z_{i}(P_{i}· W_{j}^{T}X_{j}+\epsilon_{j})$
end for
2 - Select the fair outcome $y_{fair}$ from $X_{H}$
end for
- Binarize $y_{fair}∈\{0,1\}$ and $y_{bias}∈\{0,1\}$ over randomly sampled output threshold $y_{t}$
3 Output: $D_{bias}=(A,X_{bias},y_{bias})$ and $y_{fair}$
Algorithm 2 FairPFN Synthetic Data Generation
Prior-Fitting The transformer is trained for approximately 3 days on an RTX-2080 GPU on approximately 1.5 million different synthetic data-generating mechanisms, in which we vary the MLP architecture, the number of features $m$ , the sample size $n$ , and the non-linearities $z$ .
Real-World Inference During real-world inference, FairPFN requires no knowledge of causal mechanisms in the data, but instead only takes as input a biased observational dataset and implicitly infers potential causal explanations for the data (Figure 1 d) based on the causally generated data it has seen during pre-training. Crucially, FairPFN is provided information regarding which variable is the protected attribute, which is represented in a protected attribute encoder step in the transformer. A key advantage of FairPFN is its alignment with Bayesian Inference, as transformers pre-trained in the PFN framework have been shown to approximate the Posterior Predictive Distribution (PPD) Müller et al. (2022).
FairPFN thus approximates a modified PPD, predicting a causally fair target $y_{f}$ given biased features $X_{b}$ and a biased dataset $D_{b}$ by integrating over hypotheses for the SCM $\phi∈\Phi$ :
$$
p(y_{f}|x_{b},D_{b})\propto\int_{\Phi}p(y_{f}|x_{b},\phi)p(D_{b}|\phi)p(\phi)d\phi \tag{3}
$$
This approach has two advantages: it reduces the necessity of precise causal model inference, thereby lowering the risk of fairwashing from incorrect models Ma et al. (2023), and carries with it regularization-related performance improvements observed in Hollmann et al. (2023). We also emphasize that FairPFN is a foundation model and thus does not need to be trained for new fairness problems in practice. Instead, FairPFN performs predictions in a single forward pass of the data through the transformer.
5 Experiments
This section assesses FairPFN’s performance on synthetic and real-world benchmarks, highlighting its capability to remove the causal influence of protected attributes without user-specified knowledge of the causal model, while maintaining high predictive accuracy.
5.1 Baselines
We implement several baselines to compare FairPFN against a diverse set of traditional ML models, causal-fairness frameworks, and fairness-aware ML approaches. We summarize our baselines below, and provide a visualization of our baselines applied to the Fair Observable benchmark in Appendix Figure 25.
- Unfair: Fit the entire training set $(X,A,Y)$ .
- Unaware: Fit to the entire training set $(X,A,Y)$ . Inference returns the average of predictions on the original test set $(X,A)$ and the test set with alternative protected attribute values $(X,A→ a^{\prime})$ .
- Avg. Cnft: Fit to the entire training set $(X,A,Y)$ . Inference returns the average (avg.) of predictions on the original test set $(X,A)$ and the counterfactual (cntf) test set $(X_{A→ a^{\prime}},A→ a^{\prime})$ .
- Constant: Always predicts the majority class
- Random: Randomly predicts the target
- CFP: Combination of the three-levels of CFP as proposed in Kusner et al. (2017). Fit to non-descendant observables, unobservables, and independent noise terms $(X_{fair},U_{fair},\epsilon_{fair},Y)$ .
- EGR: Exponentiated Gradient Reduction (EGR) as proposed by Agarwal et al. (2018) is fit to non-protected attributes $(X,Y)$ with XGBoost Chen & Guestrin (2016) as a base model.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Causal Diagram: Fairness in Machine Learning Models
### Overview
The image presents six causal diagrams illustrating different fairness scenarios in machine learning models. Each panel represents a distinct causal structure with variables, equations, and fairness constraints. The diagrams use color-coded nodes and arrows to depict relationships between protected attributes (A), outcomes (Y), and confounding variables (X_b, X_f). Equations quantify these relationships, while a legend at the bottom explains color/symbol meanings.
### Components/Axes
- **Panels**: Six labeled scenarios (1-6) with distinct causal structures.
- **Nodes**:
- **Blue**: Protected Attribute (A)
- **Purple**: Unfair Observable (X_b)
- **Yellow**: Fair Observable (X_f)
- **Green**: Fair Unobservable (U)
- **Orange**: Outcome (Y)
- **Arrows**: Indicate causal relationships (solid = direct cause, dashed = additive noise).
- **Equations**: Mathematical formulations of relationships under each panel.
- **Legend**: Located at the bottom, mapping colors/symbols to concepts (e.g., "Fair Observable," "Additive Noise").
### Detailed Analysis
#### Panel 1: Biased
- **Structure**: A → X_b → Y with noise terms (ε_Xb, ε_Y).
- **Equations**:
- X_b = w_A*A² + ε_Xb
- Y = w_Xb*X_b² + ε_Y
- Y = 1(Y ≥ Ÿ)
- **Key Features**: Direct bias from A to Y via X_b.
#### Panel 2: Direct-Effect
- **Structure**: A → X_f → Y with X_b as a confounder.
- **Equations**:
- X_f ~ N(μ,σ)
- X_b = w_A*A² + ε_Xb
- Y = w_Xf*X_f² + w_A*A² + ε_Y
- **Key Features**: Introduces fair observable X_f to mitigate bias.
#### Panel 3: Indirect-Effect
- **Structure**: A → X_b → Y and A → X_f → Y.
- **Equations**:
- X_b = w_A*A² + ε_Xb
- X_f ~ N(μ,σ)
- Y = w_Xb*X_b² + w_Xf*X_f² + ε_Y
- **Key Features**: Combines direct and indirect effects of A.
#### Panel 4: Fair Observable
- **Structure**: A → X_b → Y with X_f as a fair observable.
- **Equations**:
- X_b = w_A*A² + ε_Xb
- X_f ~ N(μ,σ)
- Y = w_Xb*X_b² + w_Xf*X_f² + w_A*A² + ε_Y
- **Key Features**: Explicitly models fair observables to address bias.
#### Panel 5: Fair Unobservable
- **Structure**: A → X_b → Y with unobservable U.
- **Equations**:
- U ~ N(μ,σ)
- X_b = w_A*A² + ε_Xb
- Y = w_Xb*X_b² + w_A*U² + ε_Y
- **Key Features**: Accounts for unobservable confounders (U).
#### Panel 6: Fair Additive Noise
- **Structure**: A → X_b → Y with additive noise.
- **Equations**:
- X_b = w_A*A² + ε_Xb
- Y = w_Xb*X_b² + w_A*A² + ε_Y
- **Key Features**: Adds noise to balance fairness constraints.
### Key Observations
1. **Color Consistency**: All panels use the same color scheme (blue for A, orange for Y, etc.), ensuring cross-panel comparability.
2. **Noise Terms**: ε_Xb and ε_Y appear in all panels, representing inherent variability.
3. **Fairness Mechanisms**:
- Panels 2-6 introduce fairness constraints (e.g., X_f, U, additive noise) to counteract bias.
- Panels 4-6 explicitly model fairness observables/unobservables.
4. **Equation Complexity**: Later panels (4-6) include more terms, reflecting advanced fairness adjustments.
### Interpretation
The diagrams illustrate progressive strategies to address bias in causal models:
- **Panel 1** represents a naive, biased model where A directly influences Y through X_b.
- **Panels 2-3** introduce fairness observables (X_f) to disentangle direct/indirect effects.
- **Panels 4-5** address observability challenges by incorporating fair unobservables (U) or adjusting for confounders.
- **Panel 6** uses additive noise to balance fairness, ensuring Y is not disproportionately influenced by A.
The equations reveal that fairness interventions often involve adding terms (e.g., w_A*A²) or noise to neutralize bias. The use of "FairPFN" (Fair Probabilistic Fairness Notion) in the legend suggests a framework for evaluating these models. Notably, all panels condition Y on a threshold (Y ≥ Ÿ), implying a focus on binary outcomes or fairness thresholds.
</details>
Figure 2: Causal Case Studies: Visualization and data generating processes of synthetic causal case studies, a handcrafted set of benchmarks designed to evaluate FairPFN’s ability to remove various sources of bias in causally generated data. For each group, 100 independent datasets are sampled, varying the number of samples, the standard deviation of noise terms $\sigma$ and the base causal effect $w_{A}$ of the protected attribute.
In the CFP, Unfair, Unaware, and Avg. Cntf. baselines, we employ FairPFN with a random noise term passed as a "protected attribute." We opt to use this UnfairPFN instead of TabPFN so as to not introduce any TabPFN-specific behavioral characteristics or artifacts. We show in Appendix Figure 17 that this reverts FairPFN to a normal tabular classifier with competitive peformance to TabPFN. We also note that our Unaware baseline is not the standard approach of dropping the protected attribute. We opt for our own implementation of Unaware as it shows improved causal effect removal to the standard approach (Appendix Figure 17).
5.2 Causal Case Studies
We first evaluate FairPFN using synthetic causal case studies to establish an experimental setting where the data-generating processes and all causal quantities are known, presenting a series of causal case studies with increasing difficulty to evaluate FairPFN’s capacity to remove various sources of bias in causally generated data. The data-generating processes and structural equations are illustrated in Figure 2, following the notation: $A$ for protected attributes, $X_{b}$ for biased-observables, $X_{f}$ for fair-observables, $U$ for fair-unobservables, $\epsilon_{X}$ for additive noise terms, and $Y$ for the outcome, discretized as $Y=\mathbb{1}(Y≥\bar{Y})$ . We term a variable $X$ "fair" iff $A∉ anc(X)$ . The structural equations in Figure 2 contain exponential non-linearities to ensure the direction of causality is identifiable Peters et al. (2014), distinguishing the Fair Unobservable and Fair Additive Noise scenarios, with the former including an unobservable yet identifiable causal effect $U$ .
For a robust evaluation, we generate 100 datasets per case study, varying causal weights of protected attributes $w_{A}$ , sample sizes $m∈(100,10000)$ (sampled on a log-scale), and the standard deviation $\sigma∈(0,1)$ (log-scale) of additive noise terms. We also create counterfactual versions of each dataset to assess FairPFN and its competitors across multiple causal and counterfactual fairness metrics, such as average treatment effect (ATE) and absolute error (AE) between predictions on observational and counterfactual datasets. We highlight that because our synthetic datasets are created from scratch, the fair causes, additive noise terms, counterfactual datasets, and ATE are ground truth. As a result, our baselines that have access to causal quantities are more precise in our causal case studies than in real-world scenarios where this causal information must be inferred.
<details>
<summary>extracted/6522797/figures/trade-off_by_group_synthetic.png Details</summary>

### Visual Description
## Scatter Plot Matrix: Error vs Causal Effect (ATE) Across Different Methods
### Overview
The image contains six scatter plots arranged in a 2x3 grid, each comparing error rates (1-AUC) against causal effect magnitudes (ATE) for different fairness/intervention methods. Each plot uses distinct geometric markers and colors to represent specific algorithms or fairness criteria.
### Components/Axes
**Axes:**
- X-axis: "Causal Effect (ATE)" (0.00–0.25 in 0.05 increments)
- Y-axis: "Error (1-AUC)" (0.20–0.50 in 0.05 increments)
- All subplots share identical axis scales
**Legend (bottom center):**
1. Blue circle: Unfair
2. Green triangle: Constant
3. Purple square: EGR
4. Pink star: FairPFN
5. Orange triangle: Unaware
6. Red diamond: Random
7. Brown triangle: CFP
8. Yellow diamond: Cntf. Avg.
**Subplot Titles:**
1. Biased
2. Direct-Effect
3. Indirect-Effect
4. Fair Observable
5. Fair Unobservable
6. Fair Additive Noise
### Detailed Analysis
**1. Biased**
- Red diamond (Random): (0.05, 0.50)
- Green triangle (Constant): (0.00, 0.45)
- Purple square (EGR): (0.10, 0.40)
- Brown triangle (CFP): (0.02, 0.38)
- Yellow diamond (Cntf. Avg.): (0.03, 0.35)
- Pink star (FairPFN): (0.04, 0.32)
- Blue circle (Unfair): (0.20, 0.30)
**2. Direct-Effect**
- Red diamond (Random): (0.05, 0.48)
- Green triangle (Constant): (0.00, 0.42)
- Purple square (EGR): (0.10, 0.38)
- Brown triangle (CFP): (0.02, 0.35)
- Yellow diamond (Cntf. Avg.): (0.03, 0.32)
- Pink star (FairPFN): (0.04, 0.30)
- Blue circle (Unfair): (0.20, 0.28)
**3. Indirect-Effect**
- Red diamond (Random): (0.05, 0.45)
- Green triangle (Constant): (0.00, 0.40)
- Purple square (EGR): (0.10, 0.35)
- Brown triangle (CFP): (0.02, 0.33)
- Yellow diamond (Cntf. Avg.): (0.03, 0.30)
- Pink star (FairPFN): (0.04, 0.28)
- Blue circle (Unfair): (0.20, 0.25)
**4. Fair Observable**
- Red diamond (Random): (0.05, 0.48)
- Green triangle (Constant): (0.00, 0.42)
- Purple square (EGR): (0.10, 0.38)
- Brown triangle (CFP): (0.02, 0.35)
- Yellow diamond (Cntf. Avg.): (0.03, 0.32)
- Pink star (FairPFN): (0.04, 0.30)
- Blue circle (Unfair): (0.20, 0.28)
**5. Fair Unobservable**
- Red diamond (Random): (0.05, 0.45)
- Green triangle (Constant): (0.00, 0.40)
- Purple square (EGR): (0.10, 0.35)
- Brown triangle (CFP): (0.02, 0.33)
- Yellow diamond (Cntf. Avg.): (0.03, 0.30)
- Pink star (FairPFN): (0.04, 0.28)
- Blue circle (Unfair): (0.20, 0.25)
**6. Fair Additive Noise**
- Red diamond (Random): (0.05, 0.45)
- Green triangle (Constant): (0.00, 0.40)
- Purple square (EGR): (0.10, 0.35)
- Brown triangle (CFP): (0.02, 0.33)
- Yellow diamond (Cntf. Avg.): (0.03, 0.30)
- Pink star (FairPFN): (0.04, 0.28)
- Blue circle (Unfair): (0.20, 0.25)
### Key Observations
1. **FairPFN (pink star)** consistently shows the lowest error rates across all scenarios
2. **Unfair (blue circle)** demonstrates unexpectedly low error in "Fair Additive Noise" scenario
3. **Random (red diamond)** consistently exhibits highest error rates
4. **Cntf. Avg. (yellow diamond)** shows moderate performance across all scenarios
5. Error rates decrease with increasing ATE in most scenarios
### Interpretation
The data suggests that:
- FairPFN algorithm demonstrates superior performance across all causal effect scenarios
- The Unfair method's performance varies significantly by scenario, particularly excelling in additive noise conditions
- Random method consistently underperforms, indicating potential fundamental limitations
- Causal effect magnitude (ATE) generally correlates with reduced error rates, except in the Biased scenario where higher ATE doesn't improve performance for some methods
- The grid layout reveals methodological consistency across different fairness criteria, with similar performance patterns emerging in observable vs unobservable scenarios
The visualization emphasizes the importance of method selection based on specific causal effect characteristics and fairness requirements.
</details>
Figure 3: Fairness Accuracy Trade-Off (Synthetic): Average Treatment Effect (ATE) of predictions, predictive error (1-AUC), and Pareto Front performance of FairPFN versus baselines in our causal case studies. Baselines which have access to causal information are indicated by a light border. FairPFN is on the Pareto Front on 40% of synthetic datasets using only observational data, demonstrating competitive performance with the CFP and Cntf. Avg. baselines that utilize causal quantities from the true data-generating process.
Fairness-Accuracy Trade-Off
Figure 3 presents the fairness-accuracy trade-off for FairPFN and its baselines, displaying the mean absolute treatment effect (ATE) and mean predictive error (1-AUC) observed across synthetic datasets, along with the Pareto Front of non-dominated solutions. FairPFN (which only uses observational data) attains Pareto Optimal performance in 40% of the 600 synthetic datasets, exhibiting a fairness-accuracy trade-off competitive with CFP and Cntf. Avg., which use causal quantities from the true data-generating process. This is even the case in the Fair Unobservable and Fair Additive Noise benchmark groups, producing causally fair predictions using only observational variables that are either a protected attribute or a causal ancestor of it. This indicates FairPFN’s capacity to infer latent unobservables, which we further investigate in Section 5.3. We also highlight how the Cntf. Avg. baseline achieves lower error than CFP. We believe that this is due to Cntf. Avg. having access to both the observational and counterfactual datasets, which implicitly contains causal weights and non-linearities, while CFP is given only fair unobservables and must infer this causal information. The fact that a PFN is used as a base model in Cntf. Avg. could further explain this performance gain, as access to more observable variables helps guide the PFN toward accurate predictions realistic for the data. We suggest that this Cntf. Avg. as an alternative should be explored in future studies.
<details>
<summary>extracted/6522797/figures/tce_by_group_synthetic_new.png Details</summary>

### Visual Description
## Box Plot Chart: Causal Effect Analysis Across Model Conditions
### Overview
The image presents six box plots arranged in two rows (three per row) comparing causal effect distributions across different model conditions. Each plot visualizes the distribution of Average Treatment Effect (ATE) values under specific scenarios, with color-coded categories representing model fairness and performance metrics.
### Components/Axes
- **Y-Axis**: "Causal Effect (ATE)" with range -0.5 to 0.75
- **X-Axis**: Unlabeled categorical axis with six conditions:
1. Biased
2. Direct-Effect
3. Indirect-Effect
4. Fair Observable
5. Fair Unobservable
6. Fair Additive Noise
- **Legend** (bottom-center):
- Pink: FairPFN: 1.88/4
- Purple: EGR: 2.11/4
- Orange: Unaware: 2.16/4
- Blue: Unfair: 3.42/4
### Detailed Analysis
1. **Biased Condition** (Top-left):
- Blue (Unfair) box dominates with median ~0.2, IQR 0.1-0.3
- Pink (FairPFN) median ~0.05, IQR -0.1 to 0.2
- Orange (Unaware) median ~0.0, IQR -0.1 to 0.1
- Purple (EGR) median ~0.0, IQR -0.1 to 0.1
2. **Direct-Effect Condition** (Top-center):
- Blue (Unfair) median ~0.3, IQR 0.15-0.45
- Other categories cluster near 0 with narrower IQRs
3. **Indirect-Effect Condition** (Top-right):
- Blue (Unfair) median ~0.2, IQR 0.05-0.35
- Purple (EGR) shows slight positive skew
- Orange (Unaware) median ~0.0, IQR -0.05 to 0.05
4. **Fair Observable** (Bottom-left):
- Blue (Unfair) median ~0.2, IQR 0.1-0.3
- Pink (FairPFN) median ~0.05, IQR -0.05 to 0.15
5. **Fair Unobservable** (Bottom-center):
- Blue (Unfair) median ~0.25, IQR 0.15-0.4
- Purple (EGR) median ~0.05, IQR -0.05 to 0.15
6. **Fair Additive Noise** (Bottom-right):
- Blue (Unfair) median ~0.2, IQR 0.1-0.3
- Pink (FairPFN) median ~0.05, IQR -0.05 to 0.15
### Key Observations
1. **Unfair Condition Dominance**: Blue (Unfair) boxes consistently show highest medians across all conditions, with values ranging from 0.05 to 0.3
2. **Fair Model Variability**: Pink (FairPFN) and Purple (EGR) categories show similar performance patterns, with medians clustered near 0
3. **Statistical Significance**: Orange (Unaware) category demonstrates near-zero effects in most conditions, suggesting baseline performance
4. **Outlier Patterns**: Circular outliers appear in all plots, with highest frequency in "Biased" and "Direct-Effect" conditions
5. **Rank Metrics**: Legend values (e.g., 3.42/4 for Unfair) indicate average ranking positions, with lower values representing better performance
### Interpretation
The data reveals systematic performance disparities between model conditions:
- **Unfair models** (blue) consistently demonstrate stronger causal effects across all scenarios, suggesting potential bias amplification
- **Fair models** (pink/purple) show more balanced performance, with effects clustering near zero
- The "Fair Additive Noise" condition mirrors "Fair Observable" patterns, indicating similar robustness mechanisms
- The Unfair condition's higher average rank (3.42/4) compared to FairPFN (1.88/4) quantitatively confirms its inferior performance
- Outlier distributions suggest potential data quality issues or model instability in extreme cases
This analysis highlights critical tradeoffs between model fairness and causal effect strength, with implications for ethical AI development and deployment strategies.
</details>
Figure 4: Causal Fairness (Synthetic): Average Treatment Effect (ATE) of predictions of FairPFN compared to baselines which do not have access to causal information. FairPFN consistently removes the causal effect with a margin of error of (-0.2, 0.2) and achieves an average rank of 1.88 out of 4, only to be outperformed on the Direct-Effect benchmark where Unaware is the optimal strategy.
Causal Effect Removal We evaluate FairPFN’s efficacy in causal effect removal by analyzing box plots depicting the median, interquartile range (IQR), and average treatment effect (ATE) of predictions, compared to baseline predictive models that also do not access causal information (Figure 4). We observe that FairPFN exhibits a smaller IQR than the state-of-the-art bias mitigation method EGR. In an average rank test across 600 synthetic datasets, FairPFN achieves an average rank of 1.88 out of 4. We provide a comparison of FairPFN against all baselines in Figure 24. We note that our case studies crucially fit our prior assumptions about the causal representation of protected attributes. We show in Appendix Figure 13 that FairPFN reverts to a normal classifier when, for example, the exogeneity assumption is violated.
Ablation Study
We finally conduct an ablation study to evaluate FairPFN’s performance in causal effect removal across synthetic datasets with varying size, noise levels, and base rates of causal effect. Results indicate that FairPFN maintains consistent performance across different noise levels and base rates, improving in causal effect removal as dataset size increases and causal effects become easier to distinguish from spurious correlations Dai et al. (1997). We note that the variance of FairPFN, illustrated by box-plot outliers in Figure 4 that extend to 0.2 and -0.2, is primarily arises from small datasets with fewer than 250 samples (Appendix Figure 11), limiting FairPFN’s ability to identify causal mechanisms. We also show in Appendix Figure 14 that FairPFN’s fairness behavior remains consistent as graph complexity increases, though accuracy drops do to the combinatorially increasing problem complexity.
For a more in-depth analysis of these results, we refer to Appendix B.
5.3 Real-World Data
This section evaluates FairPFN’s causal effect removal, predictive error, and correlation with fair latent variables on two real-world datasets with established causal graphs (Figure 5). For a description of our real-world datasets and the methods we use to obtain causal models, see Appendix A.
Fairness-Accuracy Trade-Off
We evaluate FairPFN’s effectiveness on real-world data in reducing the causal impact of protected attributes while maintaining strong predictive accuracy. Figure 6 shows the mean prediction average treatment effect (ATE) and predictive error (1-AUC) across 5 K-fold cross-validation iterations. FairPFN achieves a prediction ATE below 0.01 on both datasets and maintains accuracy comparable to Unfair. Furthermore, FairPFN exhibits lower variability in prediction ATE across folds compared to EGR, indicating stable causal effect removal We note that we also evaluate a pre-trained version of CLAIRE Ma et al. (2023) on the Adult Census income dataset, but observe little improvement to EGR.
Counterfactual Fairness
Next, we evaluate the counterfactual fairness of FairPFN on real-world datasets as introduced in Section 3, noting that the following analysis is conducted at the individual sample level, rather than at the dataset level. Figure 7 illustrates the distribution of Absolute Error (AE) achieved by FairPFN and baselines that do not have access to causal information. FairPFN significantly reduces this error in both datasets, achieving maximum divergences of less than 0.05 on the Law School dataset and 0.2 on the Adult Census Income dataset. For a visual interpretation of the AE on our real-world datasets we refer to Appendix Figure 16.
In contrast, EGR performs similarly to Random in terms of counterfactual divergence, confirming previous studies which show that optimmizing for group fairness metrics does not optimize for individual level criteria Robertson et al. (2024). Interestingly, in an evaluation of group fairness metric Statistical Parity (DSP) FairPFN outperforms EGR on both our real-world data and causal case studies, a baseline was specifically optimized for this metric (Appendix Figures 20 and 21).
<details>
<summary>x3.png Details</summary>

### Visual Description
## Diagram: Causal Models of Law School Admissions and Adult Census Income
### Overview
The image presents two interconnected causal diagrams comparing factors influencing **Law School Admissions** and **Adult Census Income**. Nodes are color-coded to represent protected attributes, outcomes, and unobservable variables, with directional arrows indicating causal relationships and dotted lines representing additive noise. The diagram includes a legend explaining color coding and node types.
---
### Components/Axes
#### **Legend** (bottom of image):
- **Blue**: Protected Attributes (e.g., RACE, SEX)
- **Orange**: Outcome Variables (e.g., FYA, INC)
- **Purple**: Unfair Observables (e.g., GPA, LSAT, MAR, EDU)
- **Green**: Fair Unobservables (e.g., ε_GPA, ε_LSAT, ε_MAR, ε_HPW)
- **Striped**: Nodes "Seen by FairPFN"
- **Circles**: Non-descendent nodes
- **Arrows**: Causal relationships
- **Dotted lines**: Additive noise
#### **Nodes and Connections**:
1. **Law School Admissions**:
- **Protected Attributes** (Blue):
- RACE → GPA, LSAT, FYA
- SEX → GPA, LSAT, FYA
- **Unfair Observables** (Purple):
- GPA → FYA (with ε_GPA noise)
- LSAT → FYA (with ε_LSAT noise)
- FYA → Outcome (orange node)
- **Fair Unobservables** (Green):
- ε_GPA, ε_LSAT, ε_FYA (additive noise)
2. **Adult Census Income**:
- **Protected Attributes** (Blue):
- RACE → MAR, EDU, OCC, HPW
- SEX → MAR, EDU, OCC, HPW
- **Unfair Observables** (Purple):
- MAR → EDU, OCC, HPW
- EDU → OCC, HPW
- OCC → HPW
- HPW → INC (outcome)
- **Fair Unobservables** (Green):
- ε_MAR, ε_EDU, ε_OCC, ε_HPW (additive noise)
- **Outcome** (Orange):
- INC (income)
---
### Detailed Analysis
#### **Law School Admissions**:
- **Protected Attributes** (RACE, SEX) directly influence **Unfair Observables** (GPA, LSAT, FYA).
- **Unfair Observables** (GPA, LSAT, FYA) are noisy (ε_GPA, ε_LSAT, ε_FYA) and causally linked to the **Outcome** (FYA).
- **Fair Unobservables** (ε_GPA, ε_LSAT, ε_FYA) represent non-descendent noise affecting outcomes.
#### **Adult Census Income**:
- **Protected Attributes** (RACE, SEX) influence **Unfair Observables** (MAR, EDU, OCC, HPW).
- **Unfair Observables** form a chain: MAR → EDU → OCC → HPW → INC.
- **Fair Unobservables** (ε_MAR, ε_EDU, ε_OCC, ε_HPW) add noise at each stage.
- **Outcome** (INC) is directly influenced by HPW, which is shaped by prior variables.
---
### Key Observations
1. **Protected Attributes** (RACE, SEX) are upstream drivers of disparities in both diagrams.
2. **Unfair Observables** (e.g., GPA, EDU) mediate the impact of protected attributes on outcomes.
3. **Additive Noise** (dotted lines) suggests measurement error or unmodeled variables in causal pathways.
4. **Fair Unobservables** (green nodes) are explicitly labeled as "Fair," implying they are not proxies for protected attributes.
5. **Non-descendent Nodes** (circles) are isolated from direct causal paths, possibly representing confounding factors.
---
### Interpretation
The diagrams illustrate how **protected attributes** (RACE, SEX) indirectly affect outcomes (FYA, INC) through **unfair observables** (e.g., GPA, EDU) and **fair unobservables** (e.g., ε_GPA, ε_EDU). The presence of additive noise highlights the complexity of isolating true causal effects in real-world systems.
- **Law School Admissions**: Racial and gender biases may manifest through GPA and LSAT scores, which are influenced by systemic inequities. The outcome (FYA) is further distorted by measurement noise.
- **Adult Census Income**: Income disparities are mediated by marital status, education, occupation, and household power, all shaped by race and sex. Fair unobservables (e.g., ε_EDU) suggest residual factors not captured by the model.
The diagrams emphasize the need for fairness-aware models (e.g., FairPFN) to account for both observable and unobservable variables while mitigating bias from protected attributes. The striped nodes ("Seen by FairPFN") imply that certain variables are prioritized in fairness assessments, though their exact role is not explicitly defined.
</details>
Figure 5: Real-World Scenarios: Assumed causal graphs of real-world datasets Law School Admissions and Adult Census Income.
<details>
<summary>extracted/6522797/figures/trade-off_lawschool.png Details</summary>

### Visual Description
## Scatter Plot: Law School Admissions
### Overview
The image is a scatter plot titled "Law School Admissions," visualizing the relationship between **Causal Effect (ATE)** and **Error (1-AUC)**. Data points are color-coded and shaped to represent different admission factors (e.g., GPA, LSAT). An inset box plot in the top-right corner provides additional distributional insights.
---
### Components/Axes
- **Y-Axis (Error (1-AUC))**: Ranges from 0.33 to 0.50, labeled with increments of 0.01.
- **X-Axis (Causal Effect (ATE))**: Ranges from 0.00 to 0.10, labeled with increments of 0.01.
- **Legend**: Located in the top-right corner, mapping colors/shapes to categories:
- Red diamond: GPA
- Purple square: LSAT
- Yellow triangle: Undergrad Major
- Blue circle: Extracurriculars
- Orange triangle: Letters of Recommendation
- Pink star: Other
---
### Detailed Analysis
#### Main Scatter Plot
- **Data Points**:
- **GPA (Red Diamond)**: Positioned at (0.00, 0.49), indicating the highest error (1-AUC) and lowest causal effect.
- **LSAT (Purple Square)**: At (0.05, 0.45), showing moderate error and causal effect.
- **Undergrad Major (Yellow Triangle)**: Clustered near (0.03, 0.38–0.40), with moderate error and low-to-moderate causal effect.
- **Extracurriculars (Blue Circle)**: Spread between (0.08–0.10, 0.33–0.35), suggesting low error and higher causal effect.
- **Letters of Recommendation (Orange Triangle)**: At (0.04, 0.36), low error and moderate causal effect.
- **Other (Pink Star)**: At (0.07, 0.38), moderate error and causal effect.
- **Trends**:
- Higher error (1-AUC) correlates with lower causal effect (ATE) for most categories (e.g., GPA, LSAT).
- Extracurriculars and Letters of Recommendation show lower error but higher causal effect.
#### Inset Box Plot
- **Median**: 0.375 (horizontal line).
- **Range**: Whiskers extend from -0.02 to 0.02.
- **Outliers**: A single outlier at 0.380 (marked with a star).
- **Distribution**: Symmetric around the median, with most values clustered near 0.375.
---
### Key Observations
1. **GPA** has the highest error (0.49) and lowest causal effect (0.00), suggesting it is the least reliable predictor.
2. **Extracurriculars** and **Letters of Recommendation** exhibit the lowest error (0.33–0.36) and higher causal effects (0.04–0.10), indicating stronger predictive power.
3. The inset box plot reveals variability in causal effects, with most values near 0.375 but a notable outlier at 0.380.
---
### Interpretation
The plot demonstrates that **non-academic factors** (e.g., Extracurriculars, Letters of Recommendation) are more reliable predictors of law school admissions outcomes compared to academic metrics like GPA. The inverse relationship between error and causal effect highlights the trade-off between model accuracy and interpretability. The box plot’s outlier at 0.380 suggests an exceptional case, potentially warranting further investigation.
The legend’s spatial placement (top-right) ensures clarity, while the scatter plot’s color/shape coding enables easy differentiation of factors. The inset box plot contextualizes the main data, emphasizing distributional patterns.
</details>
<details>
<summary>extracted/6522797/figures/trade-off_adult.png Details</summary>

### Visual Description
## Scatter Plot: Adult Census Income vs. Causal Effect (ATE)
### Overview
The image is a scatter plot comparing "Adult Census Income" (y-axis) against "Causal Effect (ATE)" (x-axis). Data points are represented by distinct symbols and colors, each corresponding to a method or category (e.g., Unfair, Unaware, Constant). A highlighted box in the top-right corner emphasizes a specific region of interest.
### Components/Axes
- **X-axis (Causal Effect (ATE))**: Ranges from 0.00 to 0.08 in increments of 0.02.
- **Y-axis (Adult Census Income)**: Ranges from 0.15 to 0.50 in increments of 0.05.
- **Legend**: Located on the right, mapping symbols/colors to categories:
- Blue circle: Unfair
- Orange triangle: Unaware
- Green triangle: Constant
- Red diamond: Random
- Purple square: EGR
- Brown triangle: CFP
- Pink star: FairPFN
- Cyan triangle: CLAIRE
- Yellow diamond: Cntf. Avg.
### Detailed Analysis
- **Data Points**:
- **FairPFN (Pink star)**: Positioned at (0.01, 0.18), within the highlighted box.
- **Cntf. Avg. (Yellow diamond)**: Positioned at (0.015, 0.19), also within the highlighted box.
- **Constant (Green triangle)**: At (0.01, 0.48), high income but low causal effect.
- **Random (Red diamond)**: At (0.02, 0.49), slightly higher causal effect than Constant.
- **Unaware (Orange triangle)**: At (0.03, 0.20), mid-range values.
- **Unfair (Blue circle)**: Clustered near (0.07–0.08, 0.15–0.20), low performance in both axes.
- **EGR (Purple square)**: At (0.05, 0.28), moderate values.
- **CFP (Brown triangle)**: At (0.04, 0.27), similar to EGR.
- **CLAIRE (Cyan triangle)**: At (0.06, 0.25), higher causal effect but lower income.
- **Highlighted Box**: A shaded rectangle spans x=0.00–0.02 and y=0.15–0.20, containing FairPFN and Cntf. Avg.
### Key Observations
1. **FairPFN and Cntf. Avg.** are the only points within the highlighted box, suggesting they balance causal effect and income optimally.
2. **Constant** and **Random** methods achieve high income (y ≈ 0.48–0.49) but low causal effect (x ≈ 0.01–0.02), indicating potential trade-offs.
3. **Unfair** methods cluster at the lower end of both axes, performing poorly.
4. **Unaware**, **EGR**, **CFP**, and **CLAIRE** occupy mid-to-high causal effect ranges but vary in income.
### Interpretation
The plot evaluates methods based on their ability to balance "Causal Effect (ATE)" and "Adult Census Income."
- **FairPFN** and **Cntf. Avg.** appear most effective, operating within the highlighted optimal region.
- **Constant** and **Random** methods prioritize income over causal effect, possibly overlooking fairness or causal relationships.
- **Unfair** methods underperform in both metrics, suggesting systemic biases or inefficiencies.
- The highlighted box likely represents a target region where both metrics are sufficiently high, guiding method selection for balanced outcomes.
</details>
Figure 6: Fairness-Accuracy Trade-off (Real-World): Average Treatment Effect (ATE) of predictions, predictive error (1-AUC), and Pareto Front of the performance of FairPFN compared to our baselines on each of 5 validation folds (light) and across all five folds (solid) of our real-world datasets. Baselines which have access to causal information have a light border. FairPFN matches the performance of baselines which have access to inferred causal information with only access to observational data.
<details>
<summary>extracted/6522797/figures/kl_real.png Details</summary>

### Visual Description
## Violin Plots: Fairness Metrics vs. Absolute Error
### Overview
The image contains two side-by-side violin plots comparing the distribution of absolute errors (AE) for different fairness metrics across two datasets: "Law School Admissions" and "Adult Census Income." Each plot uses color-coded distributions to represent five fairness approaches: Unfair, Unaware, Random, EGR, and FairPFN. The x-axis measures absolute error (0.0–1.0), while the y-axis categorizes the datasets.
---
### Components/Axes
- **X-Axis**: Absolute Error (AE) ranging from 0.0 to 1.0, with vertical dashed lines at 0.2, 0.4, 0.6, and 0.8.
- **Y-Axis**: Two categories:
- "Law School Admissions" (top plot)
- "Adult Census Income" (bottom plot)
- **Legend** (top-left):
- **Blue**: Unfair
- **Orange**: Unaware
- **Red**: Random
- **Purple**: EGR
- **Pink**: FairPFN
---
### Detailed Analysis
#### Law School Admissions
1. **Red (Random)**:
- Widest distribution, spanning the full x-axis (0.0–1.0).
- Median error ~0.5 (center of the black box).
- High variability in errors.
2. **Blue (Unfair)**:
- Narrower distribution, concentrated between 0.2–0.4.
- Median error ~0.3.
3. **Purple (EGR)**:
- Extremely narrow distribution, concentrated near 0.0–0.1.
- Median error ~0.05.
4. **Pink (FairPFN)**:
- Narrow distribution, similar to EGR but slightly higher (0.0–0.15).
- Median error ~0.07.
5. **Orange (Unaware)**:
- Moderate width, spanning 0.1–0.3.
- Median error ~0.2.
#### Adult Census Income
1. **Red (Random)**:
- Widest distribution, spanning 0.0–1.0.
- Median error ~0.5.
2. **Blue (Unfair)**:
- Narrower distribution, concentrated between 0.2–0.4.
- Median error ~0.3.
3. **Purple (EGR)**:
- Extremely narrow distribution, concentrated near 0.0–0.1.
- Median error ~0.05.
4. **Pink (FairPFN)**:
- Narrow distribution, similar to EGR but slightly higher (0.0–0.15).
- Median error ~0.07.
5. **Orange (Unaware)**:
- Moderate width, spanning 0.1–0.3.
- Median error ~0.2.
---
### Key Observations
1. **Random (Red)**:
- Consistently shows the highest variability and median error (~0.5) in both datasets.
- Indicates poor performance and instability.
2. **Unfair (Blue)**:
- Lower median error (~0.3) than Random but higher than EGR/FairPFN.
- Suggests improved performance but limited coverage.
3. **EGR (Purple)** and **FairPFN (Pink)**:
- Narrowest distributions with the lowest median errors (~0.05–0.07).
- Demonstrate high consistency and minimal error.
4. **Unaware (Orange)**:
- Moderate performance, with errors between EGR/FairPFN and Unfair.
- Suggests a trade-off between fairness and error.
---
### Interpretation
The data highlights significant differences in performance across fairness metrics:
- **Random (Red)** performs worst, with high variability and errors, likely due to lack of fairness constraints.
- **EGR (Purple)** and **FairPFN (Pink)** achieve the lowest errors, indicating they effectively balance fairness and accuracy.
- **Unfair (Blue)** and **Unaware (Orange)** fall in the middle, suggesting partial fairness but suboptimal error rates.
- The vertical dashed lines (0.2–0.8) may represent error thresholds, with EGR and FairPFN consistently operating below these levels.
This analysis implies that fairness-aware algorithms like EGR and FairPFN are critical for minimizing prediction errors while maintaining ethical standards, particularly in sensitive domains like law school admissions and income prediction.
</details>
Figure 7: Counterfactual Fairness (Real-World): Distributions of Absolute Error (AE) between predictive distributions on observational and counterfactual datasets. Compared to baselines that do not have access to causal information, FairPFN achieves the lowest median and maximum AE on both datasets.
Trust & Interpretability
In order to build trust in FairPFN and explain its internal workings, we first perform a feature correlation analysis of FairPFN and baseline models using the Law School Admissions dataset. We measure the Kendall rank correlation between observable variables "LSAT" and "UGPA," and inferred noise terms $\epsilon_{LSAT}$ and $\epsilon_{UGPA}$ , with predicted admission probabilities $\hat{FYA}$ .
Figure 8 shows that despite only having access to observational data, FairPFN’s predictions correlate with fair noise terms similarly to CFP which was fit solely to these variables. This result suggests FairPFN’s ability to not only integrate over realistic causal explanations for the data, but also correctly remove the causal effect of the protected attribute such that its predictions are influenced only by fair exogenous causes. We note that while FairPFN mitigates the effect of "Race," it increases the correlation of "Sex" compared to the Unfair and CFP baselines. We discuss how future versions of FairPFN can tackle the problem of intersectionality in Section 6. We also further investigate this result in Appendix Figure 12, which confirms that FairPFN does not remove the effect of additional protected attributes other than the one specified.
We also observe in Figure 3 and 6 the strong performance of our Cntf. Avg. baseline, which predicts the average outcome probability in the observational and counterfactual worlds. We thus carry out a similarity test to Cntf. Avg. in Appendix Tables 1 and 2, calculating for each other baseline the mean difference in predictions, the standard deviation of this distribution, and the percentage of outliers. We find that FairPFN’s predictions are among the closest to this target, with a mean error on synthetic datasets of 0.00±0.06 with 1.87% of samples falling outside of three standard deviations, and a mean error on real-world datasets of 0.02±0.04 with 0.36% of outlying samples.
<details>
<summary>extracted/6522797/figures/lawschool_corr.png Details</summary>

### Visual Description
## Bar Chart: Law School Admissions
### Overview
The chart visualizes the correlation coefficients between law school admission features (Race, Sex, UGPA, LSAT, εUGPA, εLSAT) and a target variable (FŶA). Four fairness metrics are compared: Unfair, CFP, FairPFN, and Cntf. Avg. (Cumulative Average). Bars represent correlation strength (0–0.7) for each feature-metric pair.
### Components/Axes
- **X-axis**: Feature Names (Race, Sex, UGPA, LSAT, εUGPA, εLSAT)
- **Y-axis**: Correlation coefficient (labeled "Correlation c(X, FŶA)")
- **Legend**:
- Blue: Unfair
- Brown: CFP
- Pink: FairPFN
- Green: Cntf. Avg.
- **ε Features**: Likely represent transformed/adjusted versions of UGPA/LSAT (e.g., εUGPA = adjusted UGPA).
### Detailed Analysis
1. **Race**:
- Unfair: ~0.5 (highest correlation)
- CFP/FairPFN/Cntf. Avg.: ~0.01–0.05 (near-zero correlation)
- *Trend*: Unfair metric dominates; other metrics show negligible association.
2. **Sex**:
- Unfair: ~0.1
- CFP: ~0.12
- FairPFN: ~0.05
- Cntf. Avg.: ~0.08
- *Trend*: CFP slightly outperforms others, but all correlations are weak.
3. **UGPA**:
- Unfair: ~0.42
- CFP: ~0.45
- FairPFN: ~0.40
- Cntf. Avg.: ~0.36
- *Trend*: CFP and Unfair metrics are strongest; Cntf. Avg. lags.
4. **LSAT**:
- Unfair: ~0.62
- CFP: ~0.63
- FairPFN: ~0.66
- Cntf. Avg.: ~0.69
- *Trend*: All metrics show strong positive correlation, with Cntf. Avg. leading.
5. **εUGPA**:
- Unfair: ~0.40
- CFP: ~0.55
- FairPFN: ~0.57
- Cntf. Avg.: ~0.62
- *Trend*: εUGPA improves CFP/FairPFN performance; Cntf. Avg. peaks.
6. **εLSAT**:
- Unfair: ~0.40
- CFP: ~0.55
- FairPFN: ~0.57
- Cntf. Avg.: ~0.62
- *Trend*: Mirrors εUGPA pattern; Cntf. Avg. dominates.
### Key Observations
- **LSAT/εLSAT**: Highest overall correlations (~0.6–0.7), suggesting strong predictive power for admissions.
- **Race**: Unfair metric shows disproportionately high correlation (~0.5), raising fairness concerns.
- **ε Features**: Adjusted variables (εUGPA/εLSAT) outperform raw features in CFP/FairPFN/Cntf. Avg. metrics.
- **Cntf. Avg.**: Consistently highest correlation across all features except Race.
### Interpretation
The chart highlights trade-offs between fairness and predictive accuracy in law school admissions:
1. **LSAT Dominance**: LSAT and εLSAT are the most predictive features, with Cntf. Avg. achieving the strongest correlation (~0.69). This suggests LSAT remains a critical factor in admissions models.
2. **Fairness vs. Bias**: The Unfair metric’s high correlation with Race (~0.5) implies potential racial bias in admissions algorithms using this metric. Other fairness methods (CFP, FairPFN) mitigate this but show weaker correlations.
3. **ε Feature Utility**: Adjusted variables (εUGPA/εLSAT) improve fairness metrics without sacrificing predictive power, indicating their value in balancing equity and accuracy.
4. **Cntf. Avg. Superiority**: This metric consistently outperforms others across features, suggesting it optimally balances fairness and utility.
The data underscores the need for careful feature selection and fairness metric choice in admissions algorithms to avoid perpetuating biases while maintaining predictive validity.
</details>
Figure 8: Feature Correlation (Law School): Kendall Tau rank correlation between feature values and the predictions FairPFN compared to our baseline models. FairPFN produces predictions that correlate with fair noise terms $\epsilon_{UGPA}$ and $\epsilon_{LSAT}$ to a similar extent as the CFP baseline, variables which it has never seen in context-or at inference.
6 Future Work & Discussion
This study introduces FairPFN, a tabular foundation model pretrained to minimize the causal influence of protected attributes in binary classification tasks using solely observational data. FairPFN overcomes a key limitation in causal fairness by eliminating the need for user-supplied knowledge of the true causal graph, facilitating its use in complex, unidentifiable causal scenarios. This approach enhances the applicability of causal fairness and opens new research avenues.
Extended Problem Scope We limit our experimental scope to a simple testable setting with a single, binary protected attribute but believe that our prior and transformer architecture can be extended to handle multiple, non-binary protected attributes, addressing both their individual effects and intersectional interactions. We also suggest that FairPFN is capable of predicting not only a fair binary target but also accommodating multi-objective scenarios Lin et al. (2019), regression problems Hollmann et al. (2025), and time series Hoo et al. (2025). Additionally, FairPFN can generate causally fair versions of previously unfair observables, improving prediction explainability. This enables practitioners to use FairPFN as a fairness preprocessing technique while employing their preferred predictive models in practical applications.
PFNs for Causal ML FairPFN implicitly provides evidence for the efficacy of PFNs to perofm causal tasks, and we believe that our methodology can be extended to more complex challenges both within and outside of algorithmic fairness. In algorithmic fairness, one promising extension could be path-specific effect removal Chiappa (2019). For example, in medical diagnosis, distinguishing social effects of sex (e.g., sampling bias, male-focus of clinical studies) from biological effects (e.g., symptom differences across sex) is essential for fair and individualized treatment and care. Beyond fairness, we believe PFNs can predict interventional and counterfactual effects, with the latter potentially facilitating FairPFN’s evaluation in real-world contexts without relying on estimated causal models. Currently, FairPFN can also mitigate the influence of binary exogenous confounders, such as smoking, on the prediction of treatment success.
Alignment to Anti-Discrimination Law Future versions of FairPFN could also relax the assumption of exogenous protected attributes, enabling differentiation between legally admissible spurious effects and direct or indirect effects. Another key concept proposed by Plecko & Bareinboim (2024) introduces "Business Necessity" (BN) variables that allow the impact of the protected attribute to indirectly contribute to outcomes to achieve a specified business objectives, such as a research company hiring doctorate holders. In EU law, the analogous concept of "objective justification" necessitates a "proportionality test," asserting that justifiable indirect effects must persist only as necessary Weerts et al. (2023). We contend that proportionality bears a causal interpretation, akin to counterfactual explanations Wachter et al. (2018).
Broader Impact
This study attempts to overcome a current limitation in causal fairness, making what we believe is a useful framework for addressing algorithmic discrimination, more accessible to a wider variety of complex fairness problems. While the goal of this work is to have a positive impact on a problem we think is crucial, we acknowledge that we our perspective on fairness is limited in scope to align with EU/US legal doctrines of anti-discrimination. These doctrines are not representative of the world as a whole, and even within these systems, there are vastly different normative viewpoints regarding what constitutes algorithmic fairness and justice.
Acknowledgements
The authors of this work would like to thank the reviewers, editors and organizers of ICML ’25 for the opportunity to share our work and receive valuable feedback from the community. We would like to additionally thank the Zuse School ELIZA Master’s Scholarship Program for their financial and professional support of our main author. We would finally like to thank Sai Prasanna, Magnus Bühler, and Prof. Dr. Thorsten Schmidt for their insights, feedback, and discussion.
References
- Agarwal et al. (2018) Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., and Wallach, H. A reductions approach to fair classification. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning (ICML’18), volume 80, pp. 60–69. Proceedings of Machine Learning Research, 2018.
- Angwin et al. (2016) Angwin, J., Larson, J., Mattu, S., and Kirchner, L. Machine bias. ProPublica, May, 23(2016):139–159, 2016.
- Barocas et al. (2023) Barocas, S., Hardt, M., and Narayanan, A. Fairness and Machine Learning: Limitations and opportunities. MIT Press, 2023.
- Bhaila et al. (2024) Bhaila, K., Van, M., Edemacu, K., Zhao, C., Chen, F., and Wu, X. Fair in-context learning via latent concept variables. 2024.
- Binkytė-Sadauskienė et al. (2022) Binkytė-Sadauskienė, R., Makhlouf, K., Pinzón, C., Zhioua, S., and Palamidessi, C. Causal discovery for fairness. 2022.
- Castelnovo et al. (2022) Castelnovo, A., Crupi, R., Greco, G., Regoli, D., Penco, I. G., and Cosentini, A. C. A clarification of the nuances in the fairness metrics landscape. Scientific Reports, 12(1), 2022.
- Chen & Guestrin (2016) Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Krishnapuram, B., Shah, M., Smola, A., Aggarwal, C., Shen, D., and Rastogi, R. (eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), pp. 785–794, 2016.
- Chiappa (2019) Chiappa, S. Path-specific counterfactual fairness. In Hentenryck, P. V. and Zhou, Z.-H. (eds.), Proceedings of the Thirty-Third Conference on Artificial Intelligence (AAAI’19), volume 33, pp. 7801–7808. AAAI Press, 2019.
- Dai et al. (1997) Dai, H., Korb, K. B., Wallace, C. S., and Wu, X. A study of causal discovery with weak links and small samples. In Pollack, M. E. (ed.), Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI’95), 1997.
- Ding et al. (2021) Ding, F., Hardt, M., Miller, J., and Schmidt, L. Retiring adult: New datasets for fair machine learning. In Ranzato, M., Beygelzimer, A., Nguyen, K., Liang, P., Vaughan, J., and Dauphin, Y. (eds.), Proceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), volume 34, pp. 6478–6490, 2021.
- Dua & Graff (2017) Dua, D. and Graff, C. Uci machine learning repository, 2017.
- Hardt et al. (2016) Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning. In Lee, D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R. (eds.), Proceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’16), pp. 3323–3331, 2016.
- Hollmann et al. (2023) Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. In International Conference on Learning Representations (ICLR’23), 2023. Published online: iclr.cc.
- Hollmann et al. (2025) Hollmann, N., Müller, S., Purucker, L., Krishnakumar, A., Körfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature, 637(8045):319–326, 2025.
- Hoo et al. (2025) Hoo, S. B., Müller, S., Salinas, D., and Hutter, F. The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. 2025.
- Hoyer et al. (2008) Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., and Schölkopf, B. Nonlinear causal discovery with additive noise models. In Platt, J. and Koller, D. (eds.), Proceedings of the 22 International Conference on Advances in Neural Information Processing Systems (NeurIPS’08), pp. 689–696, 2008.
- Kamishima et al. (2012) Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pp. 35–50. Springer, 2012.
- Kusner et al. (2017) Kusner, M., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), pp. 4069–4079, 2017.
- Lin et al. (2019) Lin, X., Zhen, H.-L., Li, Z., Zhang, Q., and Kwong, S. Pareto multi-task learning. 2019.
- Ma et al. (2023) Ma, J., Guo, R., Zhang, A., and Li, J. Learning for counterfactual fairness from observational data. In Singh, A. K., Sun, Y., Akoglu, L., Gunopulos, D., Yan, X., Kumar, R., Ozcan, F., and Ye, J. (eds.), Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’23), pp. 1620–1630, 2023.
- Müller et al. (2022) Müller, S., Hollmann, N., Arango, S., Grabocka, J., and Hutter, F. Transformers can do bayesian inference. In Proceedings of the International Conference on Learning Representations (ICLR’22), 2022. Published online: iclr.cc.
- Pearl (2009) Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, 2009.
- Peters et al. (2011) Peters, J., Janzing, D., and Schölkopf, B. Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):2436–2450, 2011.
- Peters et al. (2014) Peters, J., Mooij, J. M., Janzing, D., and Schölkopf, B. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014.
- Plecko & Bareinboim (2024) Plecko, D. and Bareinboim, E. Causal fairness analysis. Foundations and Trends in Machine Learning, 17:304–589, 2024.
- Robertson et al. (2024) Robertson, J., Schmidt, T., Hutter, F., and Awad, N. A human-in-the-loop fairness-aware model selection framework for complex fairness objective landscapes. In Das, S., Green, B. P., Varshney, K., Ganapini, M., and Renda, A. (eds.), Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA - Volume 1, pp. 1231–1242. AAAI Press, 2024.
- Schölkopf et al. (2012) Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In Langford, J. and Pineau, J. (eds.), Proceedings of the 29th International Conference on Machine Learning (ICML’12). Omnipress, 2012.
- Sharma & Kiciman (2020) Sharma, A. and Kiciman, E. Dowhy: An end-to-end library for causal inference. arXiv:2011.04216 [stat.ME], 2020.
- Wachter et al. (2018) Wachter, S., Mittelstadt, B., and Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law and Technology, 15:842–887, 2018.
- Weerts et al. (2023) Weerts, H., Xenidis, R., Tarissan, F., Olsen, H. P., and Pechenizkiy, M. Algorithmic unfairness through the lens of eu non-discrimination law. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 805–816, 2023.
- Weerts et al. (2024) Weerts, H., Pfisterer, F., Feurer, M., Eggensperger, K., Bergman, E., Awad, N., Vanschoren, J., Pechenizkiy, M., Bischl, B., and Hutter, F. Can fairness be automated? guidelines and opportunities for fairness-aware automl. Journal of Artificial Intelligence Research, 79:639–677, 2024.
- Wightman (1998) Wightman, L. F. Lsac national longitudinal bar passage study. lsac research report series, 1998.
Appendix A Real-World Datasets
Law School Admissions
The first dataset is the Law School Admissions dataset from the 1998 LSAC National Longitudinal Bar Passage Study Wightman (1998), which includes admissions data fr approximately 30,000 US law school applicants, revealing disparities in bar passage rates and first-year averages by ethnicity. We generate counterfactual data and measure causal effects using a slightly different causal model than what was originally proposed by Kusner et al. (2017), which additionally includes edges $\text{"UGPA"}→\text{"LSAT"}$ and $\text{"LSAT"}→\text{"FYA"}$ . These edges have a plausible temporal explanation, and create a more realistic scenario where "Race" and "Sex" have both a direct and indirect effect on first year averages.
Causal Modeling with DoWhy
We use the causal graph in Figure 5 (left) and observational data as inputs for the dowhy.gcm module Sharma & Kiciman (2020), employing an automated search using the dowhy.gcm.auto, which selects the best predictive model in a model zoo of non-linear tree-based models to represent each edge, minimizing either the MSE or negative F1-score depending on the distribution of target following Hoyer et al. (2008) and Peters et al. (2011). We apply each models generate counterfactual datasets, allowing for the estimation of the Average Treatment Effect (ATE) and absolute error (AE). We also use the compute_noise function to estimate noise terms $\epsilon_{GPA}$ and $\epsilon_{LSAT}$ for our CFP baseline.
Adult Census Income
The second dataset, derived from the 1994 US Census, is the Adult Census Income problem Dua & Graff (2017), containing demographic and income outcome data ( $INC≥ 50K$ ) for nearly 50,000 individuals We note that Adult has been heavily criticized in the fairness literature Ding et al. (2021) due to evidence of sampling bias and an arbitrary chosen income threshold, but elect to include it due to its widely accepted causal model and appearance as a benchmark in other similar studies Ma et al. (2023). We fit a causal model to assess the Average Treatment Effect (ATE) of the protected attribute $RACE$ , generate a counterfactual dataset, and calculate noise term values $\epsilon$ .
Appendix B Ablation Study
To evaluate FairPFN’s performance across datasets with varying characteristics, we conduct an ablation study comparing the prediction Average Treatment Effect (ATE) of FairPFN and Unfair under different noise levels, base rates of the protected attribute’s causal effect, and dataset sizes.
Base Rate Causal Effect
We analyze the distributions of prediction ATE from FairPFN and Unfair across five quintiles (Q1-Q5) of base ATE (Figure 9). FairPFN’s prediction ATE remains stable, while Unfair ’s prediction ATE increases linearly. In datasets within the Biased, Direct Effect, Level-Two, and Level-Three benchmark groups, where the protected attribute has a high base ATE (Q5), FairPFN exhibits a greater tendency for positive discrimination, resulting in negative prediction ATE values.
<details>
<summary>extracted/6522797/figures/effect_effect.png Details</summary>

### Visual Description
## Violin Plot Grid: Predicted vs. Base Causal Effects Across Fairness Scenarios
### Overview
The image displays a 2x3 grid of violin plots comparing predicted causal effects (y-axis) to base causal effects (x-axis) across six fairness scenarios. Each plot uses two colors: pink for "FairPFN" and blue for "Unfair," with medians marked by black lines and quartiles by black boxes. The x-axis ranges vary per plot, while the y-axis consistently spans -0.2 to 1.0.
---
### Components/Axes
- **Legend**: Top center, labeled "FairPFN" (pink) and "Unfair" (blue).
- **X-Axes**: Labeled "Base Causal Effect (ATE)" with scenario-specific ranges:
- 1. Biased: -0.04–0.88
- 2. Direct-Effect: 0.00–0.64
- 3. Indirect-Effect: -0.01–0.83
- 4. Fair Observable: -0.01–0.79
- 5. Fair Unobservable: 0.00–0.72
- 6. Fair Additive Noise: -0.01–0.79
- **Y-Axes**: Labeled "Pred. Causal Effect (ATE)" with uniform scale (-0.2 to 1.0).
- **Plot Titles**: Bold black text above each plot (e.g., "1. Biased," "6. Fair Additive Noise").
---
### Detailed Analysis
#### 1. Biased
- **X-Axis**: -0.04–0.88 (widest range).
- **Y-Axis**: Distributions show significant overlap.
- **FairPFN (pink)**: Narrower spread, median ~0.0–0.2.
- **Unfair (blue)**: Broader spread, median ~0.2–0.4, with outliers up to 0.8.
- **Trend**: Unfair predictions exhibit higher variability and larger magnitudes.
#### 2. Direct-Effect
- **X-Axis**: 0.00–0.64.
- **Y-Axis**:
- **FairPFN**: Median ~0.1–0.3, compact distribution.
- **Unfair**: Median ~0.2–0.4, wider spread with peaks near 0.6.
- **Trend**: Unfair predictions align closer to higher base effects.
#### 3. Indirect-Effect
- **X-Axis**: -0.01–0.83.
- **Y-Axis**:
- **FairPFN**: Median ~0.0–0.2, tightly clustered.
- **Unfair**: Median ~0.3–0.5, extended spread to 0.8.
- **Trend**: Unfair predictions show stronger positive bias.
#### 4. Fair Observable
- **X-Axis**: -0.01–0.79.
- **Y-Axis**:
- **FairPFN**: Median ~0.0–0.2, minimal spread.
- **Unfair**: Median ~0.1–0.3, slightly wider distribution.
- **Trend**: Both groups cluster near zero, but Unfair shows marginal deviation.
#### 5. Fair Unobservable
- **X-Axis**: 0.00–0.72.
- **Y-Axis**:
- **FairPFN**: Median ~0.0–0.2, compact.
- **Unfair**: Median ~0.1–0.3, moderate spread.
- **Trend**: Similar to plot 4, but Unfair predictions show slightly higher central tendency.
#### 6. Fair Additive Noise
- **X-Axis**: -0.01–0.79.
- **Y-Axis**:
- **FairPFN**: Median ~0.0–0.2, narrow distribution.
- **Unfair**: Median ~0.2–0.4, extended spread to 0.8.
- **Trend**: Unfair predictions exhibit pronounced positive bias, especially at higher base effects.
---
### Key Observations
1. **Bias Amplification**: In scenarios labeled "Biased" and "Indirect-Effect," Unfair predictions consistently show higher medians and wider spreads than FairPFN, suggesting model amplification of bias.
2. **Fairness Scenarios**:
- "Fair Observable" and "Fair Unobservable" plots show minimal divergence, indicating robustness in controlled fairness conditions.
- "Fair Additive Noise" reveals significant Unfair bias, implying sensitivity to noise injection.
3. **Outliers**: Unfair distributions in plots 1, 3, and 6 include extreme values (up to 0.8), absent in FairPFN.
---
### Interpretation
The data demonstrates that fairness-aware models (FairPFN) generally produce more stable and unbiased predictions across scenarios compared to Unfair models. However, Unfair models exhibit:
- **Bias Amplification**: Larger predicted effects in biased or indirect-effect scenarios.
- **Noise Sensitivity**: Increased deviation in "Fair Additive Noise," suggesting vulnerability to input perturbations.
- **Robustness**: FairPFN maintains consistency in observable/unobservable fairness conditions, highlighting its design efficacy.
These trends underscore the importance of fairness constraints in causal modeling, particularly in high-stakes applications where biased predictions could exacerbate disparities.
</details>
Figure 9: Effect of Base ATE (Synthetic): Distributions of prediction ATE produced by FairPFN and Unfair over quintiles (Q1-Q5) of the protected attributes’s base causal effect (base ATE). FairPFN remains consistent across quintiles, sometimes over-correcting and producing a negative prediction ATE in Q5.
Dataset Noise
Analyzing dataset noise, indicated by the standard deviation (STD) $\sigma$ of exogenous noise in the structural equations Figure 10 shows that FairPFN retains consistency across varying noise levels. Conversely, Unfair exhibits decreased and more peaked distributions of prediction ATE as noise increases from Q1 to Q5, suggests that noise terms may obscure causal effects and diminish their observed impact in the data.
<details>
<summary>extracted/6522797/figures/noise-effect_by_group_synthetic.png Details</summary>

### Visual Description
## Violin Plot Array: Causal Effect Analysis Across Fairness Scenarios
### Overview
The image presents six violin plots arranged in a 2x3 grid, comparing the distribution of predicted causal effects (ATE) between "FairPFN" (pink) and "Unfair" (blue) models across different fairness scenarios. Each plot varies in additive noise levels (x-axis) and causal effect magnitude (y-axis).
### Components/Axes
- **Legend**: Top-center, labeled "FairPFN" (pink) and "Unfair" (blue)
- **Y-axis**: "Pred. Causal Effect (ATE)" ranging from -0.4 to 1.2
- **X-axis**: "Additive Noise (std.)" with scenario-specific ranges:
1. Biased: 0.12–0.88
2. Direct-Effect: 0.38–1.75
3. Indirect-Effect: 0.12–0.75
4. Fair Observable: 0.4–1.75
5. Fair Unobservable: 0.55–1.45
6. Fair Additive Noise: 0.38–1.75
### Detailed Analysis
1. **Biased Scenario**
- **Trend**: FairPFN (pink) shows narrower distributions centered near 0.2–0.4 ATE, while Unfair (blue) spans wider ranges (0.0–0.8).
- **Key Data**: Median ATE for FairPFN ≈ 0.3; Unfair ≈ 0.5.
2. **Direct-Effect Scenario**
- **Trend**: Unfair distributions are taller and skewed right (up to 0.8 ATE), while FairPFN remains concentrated below 0.4.
- **Key Data**: FairPFN median ≈ 0.2; Unfair median ≈ 0.6.
3. **Indirect-Effect Scenario**
- **Trend**: Both models show overlapping distributions, but Unfair exhibits higher variability (up to 0.8 ATE).
- **Key Data**: FairPFN median ≈ 0.1; Unfair median ≈ 0.4.
4. **Fair Observable Scenario**
- **Trend**: FairPFN maintains tight distributions (0.0–0.6 ATE), while Unfair spreads wider (0.0–0.8).
- **Key Data**: FairPFN median ≈ 0.3; Unfair median ≈ 0.5.
5. **Fair Unobservable Scenario**
- **Trend**: FairPFN distributions narrow further (0.0–0.5 ATE), contrasting with Unfair’s broader spread (0.0–0.7).
- **Key Data**: FairPFN median ≈ 0.2; Unfair median ≈ 0.4.
6. **Fair Additive Noise Scenario**
- **Trend**: As noise increases (x-axis), both models show declining ATE, but FairPFN remains more stable (0.0–0.6 ATE vs. Unfair’s 0.0–0.8).
- **Key Data**: At 1.75 std. noise, FairPFN median ≈ 0.1; Unfair ≈ 0.3.
### Key Observations
- **FairPFN Consistency**: Narrower distributions across all scenarios suggest more stable causal effect estimates.
- **Unfair Variability**: Wider distributions indicate higher sensitivity to noise and bias.
- **Noise Impact**: In "Fair Additive Noise," both models degrade with noise, but FairPFN retains lower ATE variability.
- **Outliers**: No extreme outliers observed; all distributions remain within -0.4 to 1.2 ATE bounds.
### Interpretation
The data demonstrates that FairPFN models consistently produce more reliable causal effect estimates (narrower ATE distributions) compared to Unfair models, particularly under noise and bias. The "Fair Additive Noise" scenario highlights FairPFN’s robustness, maintaining lower ATE variability even as noise increases. This suggests FairPFN’s design mitigates fairness-aware degradation better than Unfair models, which exhibit amplified sensitivity to noise and bias across scenarios. The consistent median ATE values for FairPFN (0.1–0.3) vs. Unfair (0.3–0.6) reinforce its superiority in fairness-aware causal inference tasks.
</details>
Figure 10: Effect of Dataset Noise (Synthetic): Distributions of prediction ATE produced by FairPFN and Unfair over quintiles (Q1-Q5) of the standard deviation (std.) of exogenous noise terms in the data. FairPFN remains consistent across quintiles, while increased noise decreases the prediction ATE of Unfair
.
Dataset Size
Ablation studies on dataset size (Figure 11) show that FairPFN’s prediction ATE displays a tighter distribution with larger datasets, indicating improved performance in causal effect removal. This improvement arises from better identification of causal mechanisms as data availability increases, enabling the transformer to distinguish noise from causal effects.
Appendix C Future Extensions
In this section we expand upon our discussion of future extensions of FairPFN, in order to encourage the community to build upon and expand our approach.
Regression Problems
FairPFN can be pre-trained as a regression model with very little architectural changes by discretizing continuous output distributions into piecewise intervals and calculating misclassification costs in order to reflect the natural ordering between categories . Thoroughly evaluated in Hollmann et al. (2025), such post-proccessing strategies have shown strong performance in tabular regression problems and enable the effective use of classification architectures for continuous targets.
Protected Attributes in the Wild
While we limit the scope of this study to binary classification tasks with single, binary protected attributes, we acknowledge that real-world fairness-aware ML problems are often more complex than that. More precisely, protected attributes can be not only binary, but continuous or mulit-category, and discrimination may occur not only with respect to individual protected attributes but with respect to multiple and the interactions between them. Our prior is currently extensible to handle multiple by changing the number of protected attributes that are sampled into each synthetic dataset, removing the outgoing edges of all protected attributes to generate $y_{fair}$ , and informing the transformer about which variables are protected attributes. Changing the distribution of protected attributes is also possible, and simply requires transporting the protected attribute into the distribution(s) of choice either before or after its natural continuous value is propagated through the MLP during pre-training.
<details>
<summary>extracted/6522797/figures/size-effect_by_group_synthetic.png Details</summary>

### Visual Description
## Violin Plots: Causal Effect Analysis by Dataset Size and Condition
### Overview
The image presents six violin plots arranged in two rows and three columns, comparing predicted causal effects (ATE) across different dataset sizes and experimental conditions. Each plot visualizes the distribution of causal effect estimates, with black box plots indicating median, quartiles, and outliers. The x-axis represents dataset size ranges (98-250, 250-630, 630-1583, 1583-3981, 3981-9998), while the y-axis shows causal effect values from -0.2 to 0.2.
### Components/Axes
- **X-axis (Dataset Size)**: Categorical ranges (98-250, 250-630, 630-1583, 1583-3981, 3981-9998)
- **Y-axis (Pred. Causal Effect (ATE))**: Continuous scale from -0.2 to 0.2
- **Violin Plots**: Purple distributions with black box plots (median, quartiles, outliers)
- **Titles**: Six conditions labeled 1-6 (Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, Fair Additive Noise)
### Detailed Analysis
1. **Biased (1)**:
- Distributions centered near 0 with moderate spread.
- Median values stable across dataset sizes (~-0.02 to 0.02).
- Outliers present in smaller datasets (98-250).
2. **Direct-Effect (2)**:
- Slight positive trend as dataset size increases.
- Median shifts from ~0.01 (98-250) to ~0.05 (3981-9998).
- Spread narrows with larger datasets.
3. **Indirect-Effect (3)**:
- Slight negative trend with increasing dataset size.
- Median decreases from ~0.03 (98-250) to ~-0.02 (3981-9998).
- Distributions become tighter in larger datasets.
4. **Fair Observable (4)**:
- Similar to Biased but with reduced spread.
- Median values stable (~-0.01 to 0.01).
- Fewer outliers across all dataset sizes.
5. **Fair Unobservable (5)**:
- Increased variability in larger datasets.
- Median values stable (~-0.01 to 0.01).
- Wider distributions in 1583-3981 and 3981-9998 ranges.
6. **Fair Additive Noise (6)**:
- Distributions widen significantly with dataset size.
- Median values stable (~-0.01 to 0.01).
- Outliers increase in frequency for 250-630 and larger datasets.
### Key Observations
- **Trend Divergence**: Direct-Effect (positive trend) and Indirect-Effect (negative trend) show opposing directional biases.
- **Noise Impact**: Fair Additive Noise (6) exhibits the largest spread, suggesting noise amplifies uncertainty.
- **Dataset Size Effects**: Larger datasets (3981-9998) generally show tighter distributions except in Fair Unobservable and Fair Additive Noise.
- **Condition-Specific Variability**: Fair Unobservable (5) and Fair Additive Noise (6) demonstrate higher sensitivity to dataset size changes.
### Interpretation
The plots reveal how experimental conditions influence causal effect estimation:
1. **Biased vs. Fair Conditions**: Fair conditions (4-6) show reduced spread compared to Biased (1), indicating better estimation stability.
2. **Direct vs. Indirect Effects**: Direct-Effect (2) demonstrates a consistent positive bias, while Indirect-Effect (3) shows a negative bias, suggesting methodological differences in effect measurement.
3. **Noise Sensitivity**: Fair Additive Noise (6) highlights how noise introduces uncertainty, particularly in larger datasets where spread increases despite more data.
4. **Unobservable Factors**: Fair Unobservable (5) shows dataset size has diminishing returns for reducing variability, implying unobservable confounders persist even with more data.
These patterns suggest that experimental design (e.g., noise control, observability) critically impacts causal inference reliability, with larger datasets offering limited benefits in certain conditions.
</details>
Figure 11: Effect of Dataset Size (Synthetic): Distributions of prediction ATE produced by FairPFN over quintiles (Q1-Q5) of dataset sizes from 100-10,000 (log-scale). FairPFN becomes better at its task of removing the causal effect of protected attributes when more data is available.
<details>
<summary>x4.png Details</summary>

### Visual Description
## Scatter Plot: Multiple Protected Attributes
### Overview
The image contains two components: a causal diagram on the left and a scatter plot on the right. The scatter plot visualizes the relationship between **Causal Effect (ATE)** and **Error (1 - AUC)** for two groups: "Unfair" (pink circles) and "FairPFN" (blue stars). The diagram illustrates relationships between protected attributes, variables, and error terms.
---
### Components/Axes
#### Left Diagram (Causal Model)
- **Nodes**:
- **A₀** (blue circle): Protected attribute 0
- **A₁** (blue circle): Protected attribute 1
- **X_b** (purple circle): Intermediate variable
- **Y_b** (orange circle): Outcome variable
- **X_f** (yellow circle): Feature variable
- **ε_Xb** (green circle): Error term for X_b
- **ε_Yb** (green circle): Error term for Y_b
- **Connections**:
- A₀ → X_b
- A₁ → X_b
- X_b → Y_b
- X_f → Y_b
- ε_Xb → X_b
- ε_Yb → Y_b
#### Right Scatter Plot
- **Axes**:
- **X-axis**: Causal Effect (ATE) (0.00 to 1.00)
- **Y-axis**: Error (1 - AUC) (0.00 to 0.80)
- **Legend**:
- **Pink circles**: Unfair
- **Blue stars**: FairPFN
---
### Detailed Analysis
#### Left Diagram
- **Flow**:
- Protected attributes **A₀** and **A₁** directly influence **X_b**.
- **X_b** propagates its effect to **Y_b**, with **X_f** also contributing to **Y_b**.
- Error terms **ε_Xb** and **ε_Yb** are connected to **X_b** and **Y_b**, respectively, suggesting noise or unobserved factors.
#### Right Scatter Plot
- **Data Distribution**:
- **Unfair (pink circles)**:
- Clustered in the **lower-right quadrant** (high ATE, low error).
- Some outliers extend toward higher error values (e.g., ATE ~0.2, error ~0.6).
- **FairPFN (blue stars)**:
- Concentrated in the **upper-left quadrant** (low ATE, high error).
- A few points overlap with Unfair in the lower-right quadrant.
- **Trends**:
- Both groups show a **negative correlation** between ATE and error (as ATE increases, error decreases).
- **FairPFN** consistently exhibits **lower error** than Unfair for equivalent ATE values.
---
### Key Observations
1. **FairPFN Advantage**: FairPFN achieves lower error (1 - AUC) across most ATE values, indicating better performance.
2. **Unfair Outliers**: A subset of Unfair data points (e.g., ATE ~0.2, error ~0.6) deviates from the general trend, suggesting potential misclassification or edge cases.
3. **Causal Diagram Complexity**: The diagram implies that protected attributes (A₀, A₁) and features (X_f) jointly determine outcomes (Y_b), with error terms introducing variability.
---
### Interpretation
- **Fairness vs. Performance**: The scatter plot highlights a trade-off between fairness (FairPFN) and error rates. FairPFN reduces bias (lower error) but may sacrifice some predictive power (higher ATE in some cases).
- **Causal Relationships**: The diagram suggests that protected attributes (A₀, A₁) and features (X_f) are critical drivers of outcomes (Y_b), with error terms (ε_Xb, ε_Yb) representing external noise or model limitations.
- **Practical Implications**: The Unfair group’s outliers may indicate scenarios where fairness interventions (e.g., FairPFN) fail to generalize, warranting further investigation into model robustness.
---
### Notes on Data Extraction
- **No explicit numerical values** are provided for individual data points; trends are inferred from spatial distribution.
- **Legend alignment**: Pink circles (Unfair) and blue stars (FairPFN) are consistently matched to their respective data clusters.
</details>
Figure 12: Multiple Protected Attributes (Synthetic): Distributions of prediction ATE and predictive accuracy produced by FairPFN vs the Unfair predictor when there are multiple protected attributes. This violates FairPFN’s prior assumptions and reverts it to a normal classifier.
<details>
<summary>x5.png Details</summary>

### Visual Description
## Diagram: Endogenous Protected Attribute and Scatter Plot of Fairness vs. Causal Effect
### Overview
The image contains two components:
1. **Left Diagram**: A causal flow diagram titled "Endogenous Protected Attribute" with colored nodes and directional arrows.
2. **Right Scatter Plot**: A graph titled "Endogenous Prot. Attrs." showing the relationship between "Error (1 - AUC)" and "Causal Effect (ATE)" with two data series differentiated by color and shape.
---
### Components/Axes
#### Left Diagram (Causal Flow)
- **Nodes**:
- **A₁**: Light blue circle (top-left).
- **A₀**: Dark blue circle (center-left).
- **X_f**: Orange circle (bottom-left).
- **Y_b**: Red circle (bottom-center).
- **ε_A₀**: Green circle (top-right, connected to A₀).
- **ε_Y_b**: Green circle (bottom-right, connected to Y_b).
- **Arrows**:
- A₁ → A₀ (solid black).
- A₁ → Y_b (solid black).
- A₀ → Y_b (solid black).
- A₀ → ε_A₀ (dashed black).
- Y_b → ε_Y_b (dashed black).
#### Right Scatter Plot
- **Axes**:
- **X-axis**: "Causal Effect (ATE)" (0.0 to 0.4).
- **Y-axis**: "Error (1 - AUC)" (0.1 to 0.7).
- **Legend**:
- **Pink Circles**: Labeled "Unfair".
- **Blue Stars**: Labeled "FairPFN".
- **Data Points**:
- Pink circles (Unfair) dominate the upper-left quadrant (high error, low ATE).
- Blue stars (FairPFN) cluster in the lower-right quadrant (low error, moderate ATE).
---
### Detailed Analysis
#### Left Diagram
- **Flow Structure**:
- A₁ influences both A₀ and Y_b directly.
- A₀ further influences Y_b, creating a dependency chain.
- ε_A₀ and ε_Y_b represent exogenous noise terms affecting A₀ and Y_b, respectively.
- **Color Coding**:
- Blue (A₁, A₀) suggests primary causal variables.
- Orange (X_f) and red (Y_b) indicate intermediate/dependent variables.
- Green (ε terms) denotes noise.
#### Right Scatter Plot
- **Data Distribution**:
- **Unfair (Pink Circles)**:
- Clustered between ATE = 0.0–0.2 and Error = 0.3–0.7.
- Outliers extend to ATE = 0.4 (Error ≈ 0.2).
- **FairPFN (Blue Stars)**:
- Concentrated between ATE = 0.1–0.3 and Error = 0.1–0.3.
- Fewer points in the lower-left quadrant (low ATE, low error).
---
### Key Observations
1. **FairPFN vs. Unfair**:
- FairPFN methods achieve lower error (1 - AUC) while maintaining moderate causal effect (ATE).
- Unfair methods exhibit higher error, especially at lower ATE values.
2. **Causal Flow**:
- The diagram suggests Y_b is a downstream variable influenced by both A₀ and A₁, with noise terms ε_A₀ and ε_Y_b introducing variability.
3. **Scatter Plot Trends**:
- No clear linear relationship between ATE and Error; FairPFN points show a trade-off between fairness and causal effect.
---
### Interpretation
1. **FairPFN Advantage**:
- The scatter plot implies FairPFN methods reduce error (improving AUC) without sacrificing causal effect, making them preferable for fairness-aware modeling.
2. **Endogenous Attribute Dynamics**:
- The diagram highlights how protected attributes (A₀) and their noise (ε_A₀) propagate through the system, affecting outcomes (Y_b). This underscores the need to model endogenous confounding in fairness interventions.
3. **Outliers**:
- A few Unfair points at high ATE (0.3–0.4) with low error suggest rare cases where unfair methods perform well, possibly due to specific data distributions or model configurations.
---
### Conclusion
The image demonstrates that FairPFN methods outperform Unfair approaches in balancing fairness (lower error) and causal effect (ATE). The causal diagram emphasizes the importance of addressing endogenous protected attributes and their noise in fairness-aware machine learning systems.
</details>
Figure 13: Endogenous Protected Attributes (Synthetic): Distributions of prediction ATE and predictive accuracy produced by FairPFN vs the Unfair predictor when the protected attribute is endogenous. This violates FairPFN’s prior assumptions and reverts it to a normal classifier.
<details>
<summary>extracted/6522797/figures/complexity.png Details</summary>

### Visual Description
## Scatter Plot: Statistical Parity vs. Accuracy by SCM Size
### Overview
The image is a scatter plot comparing two metrics—Statistical Parity (DSP) and Accuracy (AUC)—across varying SCM sizes (number of nodes). Two distinct clusters are visible: blue points represent DSP, while orange points represent AUC. The plot includes contour lines to indicate data density.
### Components/Axes
- **X-axis**: Labeled "SCM Size (# Nodes)" with a range from 0 to 200.
- **Y-axis**: Labeled "Metric" with a range from 0.0 to 1.0.
- **Legend**: Located in the top-left corner, with:
- **Blue**: Statistical Parity (DSP)
- **Orange**: Accuracy (AUC)
- **Data Points**:
- Blue circles (DSP) clustered near the bottom-left.
- Orange circles (AUC) clustered near the top-right.
- **Contour Lines**: Overlaid on both clusters to highlight density.
### Detailed Analysis
1. **Statistical Parity (DSP)**:
- **Trend**: Blue points form a dense cluster concentrated in the lower-left quadrant (x ≈ 0–100, y ≈ 0.0–0.2). Contour lines show higher density near x ≈ 50 and y ≈ 0.1.
- **Outliers**: A few isolated blue points appear near x ≈ 150–200 and y ≈ 0.2–0.3, but these are sparse.
- **Distribution**: The majority of DSP values are tightly grouped, suggesting low variability in smaller SCM sizes.
2. **Accuracy (AUC)**:
- **Trend**: Orange points dominate the upper-right quadrant (x ≈ 100–200, y ≈ 0.6–1.0). Contour lines indicate peak density near x ≈ 150 and y ≈ 0.8.
- **Outliers**: A small cluster of orange points appears near x ≈ 50–100 and y ≈ 0.6–0.7, but these are less dense than the main cluster.
- **Distribution**: AUC values are more spread out but still concentrated in the upper range, indicating higher variability in larger SCM sizes.
### Key Observations
- **Inverse Relationship**: DSP decreases as SCM size increases, while AUC increases with SCM size.
- **Cluster Separation**: The two metrics occupy distinct regions of the plot, with minimal overlap.
- **Density Patterns**: DSP is tightly clustered at lower metric values, whereas AUC is spread across higher metric values but still concentrated.
### Interpretation
The data suggests a trade-off between statistical parity and accuracy as SCM size increases:
- **Statistical Parity (DSP)**: Smaller SCM sizes (x ≈ 0–100) exhibit higher statistical parity (y ≈ 0.1–0.2), but this metric declines sharply as SCM size grows. This implies that larger models may disproportionately affect fairness metrics.
- **Accuracy (AUC)**: Larger SCM sizes (x ≈ 100–200) correlate with higher accuracy (y ≈ 0.6–1.0), indicating improved performance with model complexity. However, the spread in AUC values suggests diminishing returns or variability in accuracy gains at very large sizes.
- **Practical Implications**: The separation between DSP and AUC highlights a potential conflict between fairness and performance. Optimizing for accuracy in larger models may come at the cost of reduced statistical parity, necessitating careful balancing in applications where both metrics are critical.
</details>
Figure 14: Graph Complexity (Prior): Distributions of Statistical Parity and predictive accuracy produced by FairPFN on prior samples with graph complexity between 10 and 200 nodes. As graph complexity increases, accuracy drops but fairness remains constant.
Appendix D Supplementary Results
<details>
<summary>extracted/6522797/figures/roc_by_group_synthetic_new.png Details</summary>

### Visual Description
## Box Plot Chart: Error (1-AUC) Across Scenarios and Methods
### Overview
The image displays six box plots arranged in a 2x3 grid, comparing the error rates (1-AUC) of different fairness-aware machine learning methods across six scenarios: Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, and Fair Additive Noise. Each box plot represents a method's performance distribution, with colors corresponding to specific methods and their average ranks.
### Components/Axes
- **Y-Axis**: "Error (1-AUC)" (range: 0 to 0.75)
- **X-Axis**: Six scenarios (labeled 1–6 with descriptive titles)
- **Legend**: Located at the bottom, mapping colors to methods and their average ranks:
- **Blue**: Unfair (Avg. Rank: 2.17)
- **Orange**: Unaware (Avg. Rank: 2.62)
- **Purple**: FairPFN (Avg. Rank: 3.51)
- **Brown**: Cntf. Avg. (Avg. Rank: 3.62)
- **Red**: Random (Avg. Rank: 6.67)
- **Green**: Constant (Avg. Rank: 6.75)
### Detailed Analysis
1. **Biased (1)**:
- **Unfair (blue)**: Lowest median error (~0.3), narrowest interquartile range.
- **Unaware (orange)**: Slightly higher median (~0.4), wider spread.
- **FairPFN (purple)**: Median ~0.45, moderate spread.
- **Cntf. Avg. (brown)**: Median ~0.5, similar spread to FairPFN.
- **Random (red)**: Highest median (~0.55), large spread.
- **Constant (green)**: Median ~0.6, widest spread.
2. **Direct-Effect (2)**:
- **Unfair (blue)**: Lowest median (~0.3), tightest distribution.
- **Unaware (orange)**: Median ~0.4, moderate spread.
- **FairPFN (purple)**: Median ~0.45, similar to Cntf. Avg. (brown).
- **Random (red)**: Median ~0.5, large spread.
- **Constant (green)**: Median ~0.6, widest spread.
3. **Indirect-Effect (3)**:
- **Unfair (blue)**: Lowest median (~0.3), narrowest range.
- **Unaware (orange)**: Median ~0.4, moderate spread.
- **FairPFN (purple)**: Median ~0.45, similar to Cntf. Avg. (brown).
- **Random (red)**: Median ~0.5, large spread.
- **Constant (green)**: Median ~0.6, widest spread.
4. **Fair Observable (4)**:
- **Unfair (blue)**: Lowest median (~0.3), tightest distribution.
- **Unaware (orange)**: Median ~0.4, moderate spread.
- **FairPFN (purple)**: Median ~0.45, similar to Cntf. Avg. (brown).
- **Random (red)**: Median ~0.5, large spread.
- **Constant (green)**: Median ~0.6, widest spread.
5. **Fair Unobservable (5)**:
- **Unfair (blue)**: Lowest median (~0.3), tightest distribution.
- **Unaware (orange)**: Median ~0.4, moderate spread.
- **FairPFN (purple)**: Median ~0.45, similar to Cntf. Avg. (brown).
- **Random (red)**: Median ~0.5, large spread.
- **Constant (green)**: Median ~0.6, widest spread.
6. **Fair Additive Noise (6)**:
- **Unfair (blue)**: Lowest median (~0.3), narrowest range.
- **Unaware (orange)**: Median ~0.4, moderate spread.
- **FairPFN (purple)**: Median ~0.45, similar to Cntf. Avg. (brown).
- **Random (red)**: Median ~0.5, large spread.
- **Constant (green)**: Median ~0.6, widest spread.
### Key Observations
- **Unfair (blue)** consistently shows the lowest error rates (1-AUC) across all scenarios, with the tightest distributions.
- **Unaware (orange)** performs second-best, with slightly higher medians and wider spreads than Unfair.
- **FairPFN (purple)** and **Cntf. Avg. (brown)** have overlapping medians (~0.45) but FairPFN has slightly tighter distributions.
- **Random (red)** and **Constant (green)** exhibit the highest errors, with Constant showing the widest spread (indicating high variability).
### Interpretation
The data suggests that **Unfair** and **Unaware** methods outperform fairness-aware approaches (FairPFN, Cntf. Avg.) in terms of error rates (1-AUC) across all scenarios. This implies that fairness-aware methods may introduce trade-offs between fairness and accuracy. The **Random** and **Constant** methods perform worst, with Constant showing the highest variability (widest spreads). The average ranks confirm this hierarchy: Unfair (2.17) and Unaware (2.62) rank highest (best performance), while Constant (6.75) ranks lowest. The consistent pattern across scenarios indicates that fairness constraints may not always align with optimal predictive performance.
</details>
Figure 15: Predictive Error (Synthetic): Predictive error (1-AUC) of FairPFN compared to our baselines. FairPFN maintains a competitive level of predictive error with traditional ML algorithms, achieving an average rank of 3.51 out of 7.
<details>
<summary>extracted/6522797/figures/lawschool_dist.png Details</summary>

### Visual Description
## Line Graphs: Law School Admissions Fairness Metrics
### Overview
The image contains six comparative density plots analyzing fairness metrics in law school admissions. Three graphs in the top row compare "Real" and "Counterfactual" (Cntf.) distributions for different fairness approaches ("Unfair," "Unaware," "FairPFN"), while three graphs in the bottom row show the absolute differences between these distributions for the same approaches.
### Components/Axes
**Top Row Graphs:**
- **X-axis**: FYA (Fairness Yield Advantage) scale 0.0–1.0
- **Y-axis**: Density (0–5)
- **Legends**:
- Blue solid line = Real
- Dashed blue line = Counterfactual
- **Graph Titles**:
- Unfair
- Unaware
- FairPFN
**Bottom Row Graphs:**
- **X-axis**: |FYA_{a→a'} - FYA_{a→a}| (absolute difference) scale 0.0–0.4
- **Y-axis**: Density (scales vary: 0–10, 0–17.5, 0–70)
- **Legends**:
- Solid color = Real (no counterfactual shown)
- **Graph Titles**:
- FYA
- FYA
- FYA
### Detailed Analysis
**Top Row Trends:**
1. **Unfair**:
- Real distribution: Bimodal peaks at ~0.3 and ~0.6
- Counterfactual: Single peak at ~0.4
- Overlap: Significant between 0.2–0.5 range
2. **Unaware**:
- Real: Single peak at ~0.5
- Counterfactual: Slightly shifted peak at ~0.45
- Overlap: Strong between 0.3–0.6 range
3. **FairPFN**:
- Real: Sharp peak at ~0.5
- Counterfactual: Narrower peak at ~0.45
- Overlap: Minimal (0.4–0.55 range)
**Bottom Row Trends:**
1. **FYA**:
- Single sharp peak at ~0.25
- Density: ~8 (top row scale)
2. **Unaware**:
- Single peak at ~0.15
- Density: ~15 (middle row scale)
3. **FairPFN**:
- Extremely narrow peak at ~0.05
- Density: ~70 (bottom row scale)
### Key Observations
1. **Bimodal Reality**: The "Unfair" approach shows two distinct admission groups (low and high FYA), while counterfactual attempts to merge them into a single group.
2. **Peak Shifts**: All counterfactual distributions show systematic shifts toward lower FYA values compared to real distributions.
3. **FairPFN Precision**: The absolute difference graph for FairPFN shows the smallest disparity (peak at 0.05), indicating minimal fairness impact.
4. **Scale Disparity**: Bottom row graphs show FairPFN's difference metric is 10x narrower than Unaware and 20x narrower than Unfair.
### Interpretation
The data demonstrates three distinct fairness approaches in law school admissions:
1. **Unfair** creates/perpetuates systemic disparities (bimodal distribution)
2. **Unaware** reduces but doesn't eliminate disparities (peak shift of 0.05)
3. **FairPFN** achieves near-perfect fairness (difference <0.1)
The counterfactual distributions consistently underestimate real-world disparities across all approaches. FairPFN's extreme narrowness in the absolute difference graph suggests it successfully neutralizes admission biases, while Unfair's bimodal pattern reveals entrenched inequities. The density scale differences between graphs highlight how FairPFN's impact is most pronounced despite similar FYA ranges.
</details>
Figure 16: Counterfactual Distributions (Law School): Predictive distributions of Unfair, Unaware, and FairPFN on observational and counterfactual versions of the Lawschool Admissions dataset. FairPFN reduces the maximum pairwise difference between these distributions to 0.05.
<details>
<summary>extracted/6522797/figures/trade-off_by_group_synthetic_alt.png Details</summary>

### Visual Description
## Scatter Plot Grid: Method Performance Across Fairness Scenarios
### Overview
The image presents six scatter plots arranged in a 2x3 grid, comparing the performance of different fairness-aware machine learning methods across six scenarios: Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, and Fair Additive Noise. Each plot visualizes the relationship between **Causal Effect (ATE)** (x-axis) and **Error (1-AUC)** (y-axis), with data points connected by dashed lines to indicate trends.
---
### Components/Axes
- **X-axis**: Causal Effect (ATE) ranging from 0.0 to 0.3 in increments of 0.1.
- **Y-axis**: Error (1-AUC) ranging from 0.15 to 0.40 in increments of 0.05.
- **Legend** (bottom-center):
- **TabPFN (v1)**: Cyan pentagon (□)
- **Unfair**: Blue circle (●)
- **Unaware**: Orange triangle (▲)
- **Fairness Through Unawareness**: Gray cross (✘)
- **Plot Titles**: Each subplot is labeled with a scenario (e.g., "1. Biased", "2. Direct-Effect").
---
### Detailed Analysis
#### 1. Biased
- **Trend**: All methods cluster near the top-right (high error, low ATE).
- **Data Points**:
- TabPFN (v1): ~0.12 ATE, ~0.37 Error
- Unfair: ~0.10 ATE, ~0.36 Error
- Unaware: ~0.11 ATE, ~0.35 Error
- Fairness Through Unawareness: ~0.13 ATE, ~0.36 Error
#### 2. Direct-Effect
- **Trend**: Error decreases as ATE increases for most methods.
- **Data Points**:
- Unfair: ~0.15 ATE, ~0.35 Error
- TabPFN (v1): ~0.20 ATE, ~0.28 Error
- Unaware: ~0.18 ATE, ~0.29 Error
- Fairness Through Unawareness: ~0.22 ATE, ~0.27 Error
#### 3. Indirect-Effect
- **Trend**: Similar to Direct-Effect but with tighter clustering.
- **Data Points**:
- Unfair: ~0.14 ATE, ~0.34 Error
- TabPFN (v1): ~0.19 ATE, ~0.29 Error
- Unaware: ~0.17 ATE, ~0.28 Error
- Fairness Through Unawareness: ~0.21 ATE, ~0.27 Error
#### 4. Fair Observable
- **Trend**: All methods show lower errors compared to biased scenarios.
- **Data Points**:
- Unfair: ~0.18 ATE, ~0.21 Error
- TabPFN (v1): ~0.22 ATE, ~0.20 Error
- Unaware: ~0.19 ATE, ~0.22 Error
- Fairness Through Unawareness: ~0.23 ATE, ~0.21 Error
#### 5. Fair Unobservable
- **Trend**: Similar to Fair Observable but with slightly higher errors.
- **Data Points**:
- Unfair: ~0.17 ATE, ~0.22 Error
- TabPFN (v1): ~0.21 ATE, ~0.21 Error
- Unaware: ~0.19 ATE, ~0.23 Error
- Fairness Through Unawareness: ~0.22 ATE, ~0.22 Error
#### 6. Fair Additive Noise
- **Trend**: Error increases slightly compared to Fair Observable/Unobservable.
- **Data Points**:
- Unfair: ~0.16 ATE, ~0.23 Error
- TabPFN (v1): ~0.20 ATE, ~0.22 Error
- Unaware: ~0.18 ATE, ~0.24 Error
- Fairness Through Unawareness: ~0.21 ATE, ~0.23 Error
---
### Key Observations
1. **Biased Scenario**: All methods exhibit high error rates (~0.35–0.37), with TabPFN (v1) and Unaware performing marginally better.
2. **Direct/Indirect-Effect Scenarios**:
- Unfair methods consistently show the highest errors.
- TabPFN (v1) and Unaware methods demonstrate lower errors, with Fairness Through Unawareness slightly outperforming others in Indirect-Effect.
3. **Fair Scenarios**:
- All methods achieve lower errors (~0.20–0.24), with TabPFN (v1) and Unaware methods maintaining the best performance.
- Fairness Through Unawareness shows diminishing returns in Fair Additive Noise.
---
### Interpretation
The data suggests that **TabPFN (v1)** and **Unaware** methods generally outperform **Unfair** and **Fairness Through Unawareness** in reducing error rates across fairness scenarios. Notably:
- **Causal Effect (ATE)** inversely correlates with error in most scenarios, indicating that higher causal effects improve model performance.
- **Fairness Through Unawareness** underperforms in indirect-effect and additive noise scenarios, suggesting limitations in handling complex fairness constraints.
- **TabPFN (v1)** consistently achieves the lowest errors in fair scenarios, highlighting its robustness in balanced data distributions.
This analysis underscores the importance of method selection based on the fairness profile of the dataset and the nature of causal relationships.
</details>
Figure 17: Baseline Validation (Synthetic): Fairness-accuracy trade-off achieved by our baselines Unfair and Unaware compared to alternative choices of TabPFN (v1) and "Fairness Through Unawareness." Unfair achieves competitive performance with TabPFN (v1), while Unaware outperforms the standard strategy of dropping the protected attribute from the dataset.
<details>
<summary>extracted/6522797/figures/trade-off_lawschool_alt.png Details</summary>

### Visual Description
## Scatter Plot: Law School Admissions
### Overview
The image is a scatter plot titled "Law School Admissions," visualizing the relationship between **Causal Effect (ATE)** and **Error (1-AUC)**. Data points are represented by geometric shapes (triangles, circles, pentagons) in distinct colors (gray, orange, blue, light blue), with a dashed line connecting two specific points. The plot uses a grid with dashed lines for reference.
---
### Components/Axes
- **X-axis (Causal Effect, ATE)**:
- Label: "Causal Effect (ATE)"
- Scale: 0.10 to 0.30 in increments of 0.05.
- **Y-axis (Error, 1-AUC)**:
- Label: "Error (1-AUC)"
- Scale: 0.325 to 0.355 in increments of 0.005.
- **Legend**:
- Position: Top-left corner.
- Entries:
- Gray crosses (X): Unspecified category.
- Orange triangles (△): Unspecified category.
- Blue circles (●): Unspecified category.
- Light blue pentagons (□): Unspecified category.
---
### Detailed Analysis
1. **Data Points**:
- **Gray crosses (X)**:
- Clustered near the top-left (low ATE, high error).
- Example: (0.10, 0.350) connected via dashed line.
- **Orange triangles (△)**:
- Located at (0.10, 0.340) and (0.10, 0.335).
- **Blue circles (●)**:
- Spread across mid-to-high ATE (0.20–0.25) and mid-to-low error (0.335–0.345).
- Example: (0.25, 0.340) connected via dashed line.
- **Light blue pentagons (□)**:
- Distributed from ATE 0.20 to 0.30 and error 0.330–0.345.
2. **Dashed Line**:
- Connects a gray cross (0.10, 0.350) to a blue circle (0.25, 0.340).
- Suggests a trend of decreasing error with increasing ATE.
3. **Grid**:
- Dashed gridlines for reference, no numerical annotations.
---
### Key Observations
- **Trend**:
- A general inverse relationship between ATE and error: higher ATE correlates with lower error (1-AUC).
- The dashed line explicitly highlights this trend.
- **Clustering**:
- Light blue pentagons dominate the lower-right quadrant (high ATE, low error).
- Gray crosses and orange triangles cluster in the upper-left (low ATE, high error).
- **Outliers**:
- Orange triangle at (0.10, 0.340) deviates slightly from the gray cross cluster.
---
### Interpretation
- **Causal Effect vs. Error**:
- The plot implies that variables with higher causal effects (ATE) are associated with lower prediction errors (1-AUC), suggesting better model performance or stronger predictive validity for admissions outcomes.
- **Categorical Differences**:
- The distinct shapes/colors likely represent subgroups (e.g., GPA, LSAT scores, demographic factors). For example:
- Light blue pentagons (low error) may correspond to high-impact variables.
- Gray crosses/orange triangles (high error) may represent less influential or noisy variables.
- **Dashed Line Significance**:
- The connection between (0.10, 0.350) and (0.25, 0.340) emphasizes a critical transition point where increasing ATE reduces error, potentially highlighting a threshold for meaningful causal influence.
---
### Notes on Data Extraction
- **Uncertainty**:
- Values are approximate due to the absence of error bars or confidence intervals.
- Example: The gray cross at (0.10, 0.350) could vary slightly (±0.002) based on visual alignment.
- **Legend Clarity**:
- No explicit labels for shapes/colors; categories remain undefined in the image.
---
### Final Remarks
The plot underscores the importance of causal effect size in optimizing admission prediction models. Further analysis is needed to identify the specific variables represented by each shape/color and validate the trends with statistical rigor.
</details>
<details>
<summary>extracted/6522797/figures/trade-off_adult_alt.png Details</summary>

### Visual Description
## Scatter Plot: Adult Census Income
### Overview
The image is a scatter plot titled "Adult Census Income," visualizing the relationship between **Causal Effect (ATE)** (x-axis) and **Adult Census Income** (y-axis). Data points are categorized by fairness-related labels (e.g., "Unaware," "Unfair," "Fairness Through Unawareness") and represented by distinct symbols and colors. The plot includes a legend on the right, axis labels, and gridlines for reference.
---
### Components/Axes
- **X-axis (Causal Effect (ATE))**: Ranges from 0.04 to 0.12, with gridlines at 0.04, 0.06, 0.08, 0.10, and 0.12.
- **Y-axis (Adult Census Income)**: Ranges from 0.15 to 0.20, with gridlines at 0.15, 0.16, 0.17, 0.18, 0.19, and 0.20.
- **Legend**: Located on the right, with the following mappings:
- **Teal pentagon**: TabPFN (v1)
- **Dark blue circle**: Unfair
- **Orange triangle**: Unaware
- **Gray cross**: Fairness Through Unawareness
---
### Detailed Analysis
#### Data Points and Trends
1. **Orange Triangles (Unaware)**:
- Clustered around **x = 0.04–0.08** and **y = 0.18–0.19**.
- Example points: (0.06, 0.185), (0.07, 0.18), (0.05, 0.19).
- **Trend**: High income values with relatively low causal effects.
2. **Dark Blue Circles (Unfair)**:
- Spread across **x = 0.10–0.12** and **y = 0.18–0.19**.
- Example points: (0.11, 0.185), (0.12, 0.19), (0.10, 0.18).
- **Trend**: Higher causal effects and income values, suggesting a potential trade-off between fairness and performance.
3. **Teal Pentagons (TabPFN v1)**:
- Located near **x = 0.10** and **y = 0.16**.
- Example points: (0.10, 0.16), (0.09, 0.165).
- **Trend**: Moderate causal effects and lower income compared to other groups.
4. **Gray Crosses (Fairness Through Unawareness)**:
- Clustered around **x = 0.08–0.10** and **y = 0.17–0.18**.
- Example points: (0.08, 0.175), (0.10, 0.17).
- **Trend**: Balanced but lower income and causal effects compared to "Unaware" and "Unfair" groups.
---
### Key Observations
- **Outliers**:
- A single orange triangle (Unaware) at (0.09, 0.17) deviates from the cluster, suggesting variability in the "Unaware" group.
- A dark blue circle (Unfair) at (0.12, 0.19) is the highest point on the plot, indicating an extreme case.
- **Grouping Patterns**:
- "Unaware" and "Unfair" groups occupy distinct regions of the plot, with "Unaware" favoring higher income and "Unfair" favoring higher causal effects.
- "TabPFN v1" and "Fairness Through Unawareness" groups are intermediate, with "TabPFN v1" showing slightly lower income and "Fairness Through Unawareness" showing slightly lower causal effects.
---
### Interpretation
The plot highlights trade-offs between **causal effect (ATE)** and **income** across different fairness approaches:
1. **Unaware** (orange triangles): Prioritizes high income but achieves lower causal effects, possibly indicating a focus on outcomes over fairness.
2. **Unfair** (dark blue circles): Achieves higher causal effects but may compromise fairness, as suggested by the label.
3. **TabPFN v1** (teal pentagons): Balances moderate causal effects and income, potentially reflecting a more nuanced approach.
4. **Fairness Through Unawareness** (gray crosses): Sits in the middle, suggesting a compromise between fairness and performance.
The data implies that fairness strategies (e.g., "Unaware" vs. "Unfair") influence the relationship between causal effects and income, with no single approach dominating both metrics. The "Unaware" group’s high income but low causal effects may indicate a focus on optimizing outcomes at the expense of fairness, while the "Unfair" group’s high causal effects might reflect a focus on fairness metrics that inadvertently reduce income. The intermediate groups ("TabPFN v1" and "Fairness Through Unawareness") suggest potential middle-ground solutions.
This analysis underscores the complexity of balancing fairness and performance in algorithmic decision-making, as visualized through the Adult Census Income dataset.
</details>
Figure 18: Baseline Validation (Real-World): Fairness-accuracy trade-off achieved by our baselines Unfair and Unaware compared to alternative choices of TabPFN (v1) and "Fairness Through Unawareness." Our choices of baselines achieve competitive performance on the Law School Admissions problem, while alternative baselines perform slightly better on the Adult Census Income problem.
<details>
<summary>extracted/6522797/figures/adult_dist.png Details</summary>

### Visual Description
## Chart/Diagram Type: Density Plots of Adult Census Income Metrics
### Overview
The image contains six density plots arranged in a 3x2 grid, comparing "Real" and "Counterfactual" (Cntf.) distributions across different fairness metrics ("Unfair," "Unaware," "FairPFN") and an "INC" metric. Each plot uses distinct colors for Real and Cntf. lines, with x-axes representing normalized values (0–1) and y-axes showing density. The bottom row focuses on absolute differences in "ÎNC" values.
### Components/Axes
1. **Top Row Plots**:
- **X-axis**: "ÎNC" (0–1) for Unfair, Unaware, FairPFN.
- **Y-axis**: Density (0–8).
- **Legends**:
- **Unfair**: Blue (Real), Gray (Cntf.).
- **Unaware**: Orange (Real), Brown (Cntf.).
- **FairPFN**: Pink (Real), Purple (Cntf.).
- **Positioning**: Legends in top-right corner of each plot.
2. **Bottom Row Plots**:
- **X-axis**: Absolute difference "|ÎNC_{a→a'} − ÎNC_{a→a}|" (0–0.5).
- **Y-axis**: Density (0–12).
- **Colors**:
- First plot: Blue (Real).
- Second plot: Orange (Real).
- Third plot: Pink (Real).
### Detailed Analysis
#### Top Row Trends:
1. **Unfair (Top Left)**:
- Real (blue) peaks at ~0.2 (density ~5), with a secondary peak at ~0.4 (density ~3).
- Cntf. (gray) peaks at ~0.3 (density ~4), with a smaller peak at ~0.5 (density ~2).
2. **Unaware (Top Middle)**:
- Real (orange) peaks at ~0.1 (density ~5), with a secondary peak at ~0.4 (density ~3).
- Cntf. (brown) peaks at ~0.3 (density ~4), with a smaller peak at ~0.5 (density ~2).
3. **FairPFN (Top Right)**:
- Real (pink) peaks at ~0.1 (density ~5), with a secondary peak at ~0.4 (density ~3).
- Cntf. (purple) peaks at ~0.3 (density ~4), with a smaller peak at ~0.5 (density ~2).
#### Bottom Row Trends:
1. **First INC Plot (Blue)**:
- Peaks at ~0.1 (density ~5) and ~0.3 (density ~3).
2. **Second INC Plot (Orange)**:
- Single peak at ~0.1 (density ~8).
3. **Third INC Plot (Pink)**:
- Sharp peak at ~0.1 (density ~12), tapering off rapidly.
### Key Observations
1. **Peak Consistency**:
- Cntf. lines consistently peak near 0.3–0.5 across Unfair, Unaware, and FairPFN plots.
- Real lines peak earlier (0.1–0.4) but with varying secondary peaks.
2. **Density Disparities**:
- The third INC plot (pink) has the highest density (~12), suggesting a concentrated distribution of absolute differences.
3. **Color-Legend Alignment**:
- All legend colors match their respective lines (e.g., blue = Real in Unfair plot).
### Interpretation
- **Fairness Metrics**:
- The Real and Cntf. distributions differ significantly across metrics. For example, FairPFN’s Real line shows a sharper secondary peak at 0.4 compared to Unfair’s broader distribution.
- Cntf. lines (gray, brown, purple) suggest counterfactual scenarios often shift toward higher ÎNC values (0.3–0.5).
- **INC Metric**:
- The absolute difference plots (bottom row) highlight variability in ÎNC changes. The third plot’s sharp peak at 0.1 indicates a high frequency of small absolute differences, while the second plot’s single peak suggests a more uniform distribution.
- **Implications**:
- The data may reflect trade-offs between fairness metrics and income-related outcomes. For instance, FairPFN’s distribution could indicate better alignment with desired fairness criteria compared to Unfair or Unaware.
- The INC plots emphasize the magnitude of changes in ÎNC, with the third plot’s high density suggesting most differences are minor.
- **Anomalies**:
- The Unfair plot’s Real line has a bimodal distribution, which may indicate distinct subgroups in the data.
- The third INC plot’s extreme density (~12) contrasts with others, warranting further investigation into its underlying causes.
</details>
Figure 19: Aligning Counterfactual Distributions (Adult): Alignment of observational and counterfactual predictive distributions $\hat{Y}$ and $\hat{Y}_{a→ a^{\prime}}$ on the Adult Census Income problem. FairPFN best aligns the predictive distributions (top) and achieves the lowest mean (0.01) and maximum (0.75) absolute error.
<details>
<summary>extracted/6522797/figures/ddsp_by_group_synthetic.png Details</summary>

### Visual Description
## Box Plot Chart: Statistical Parity (DSP) Across Different Fairness Scenarios and Methods
### Overview
The image presents six box plots comparing statistical parity (DSP) across six fairness scenarios: Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, and Fair Additive Noise. Each plot evaluates six fairness methods (Constant, CFP, FairPFN, Random, Unaware, Unfair) using colored boxes and outliers. The y-axis ranges from 0 to 0.75, representing statistical parity disparity, while the x-axis lists methods with their average ranks (ATE) in parentheses.
### Components/Axes
- **Y-Axis**: "Statistical Parity (DSP)" (0 to 0.75, linear scale).
- **X-Axis**: Methods labeled with average ranks (ATE):
- Constant: 1.0 (blue)
- CFP (Ground): 2.96 (brown)
- FairPFN: 3.97 (purple)
- Random: 4.16 (red)
- Unaware: 4.52 (orange)
- Unfair: 6.15 (dark blue)
- **Legend**: Located at the bottom, mapping colors to methods and their ATE ranks.
- **Subplot Titles**:
1. Biased
2. Direct-Effect
3. Indirect-Effect
4. Fair Observable
5. Fair Unobservable
6. Fair Additive Noise
### Detailed Analysis
1. **Biased Scenario**:
- **Constant (blue)**: Median ~0.25, range 0–0.5, outliers up to 0.75.
- **CFP (brown)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **FairPFN (purple)**: Median ~0.1, range 0–0.25, outliers up to 0.3.
- **Random (red)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unaware (orange)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unfair (dark blue)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
2. **Direct-Effect Scenario**:
- **Constant (blue)**: Median ~0.2, range 0–0.4, outliers up to 0.6.
- **CFP (brown)**: Median ~0.1, range 0–0.3, outliers up to 0.4.
- **FairPFN (purple)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Random (red)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Unaware (orange)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Unfair (dark blue)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
3. **Indirect-Effect Scenario**:
- **Constant (blue)**: Median ~0.2, range 0–0.5, outliers up to 0.7.
- **CFP (brown)**: Median ~0.1, range 0–0.3, outliers up to 0.4.
- **FairPFN (purple)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Random (red)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Unaware (orange)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
- **Unfair (dark blue)**: Median ~0.05, range 0–0.2, outliers up to 0.3.
4. **Fair Observable Scenario**:
- **Constant (blue)**: Median ~0.25, range 0–0.5, outliers up to 0.7.
- **CFP (brown)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **FairPFN (purple)**: Median ~0.1, range 0–0.25, outliers up to 0.3.
- **Random (red)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unaware (orange)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unfair (dark blue)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
5. **Fair Unobservable Scenario**:
- **Constant (blue)**: Median ~0.25, range 0–0.5, outliers up to 0.7.
- **CFP (brown)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **FairPFN (purple)**: Median ~0.1, range 0–0.25, outliers up to 0.3.
- **Random (red)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unaware (orange)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
- **Unfair (dark blue)**: Median ~0.1, range 0–0.2, outliers up to 0.3.
6. **Fair Additive Noise Scenario**:
- **Constant (blue)**: Median ~0.3, range 0–0.6, outliers up to 0.8.
- **CFP (brown)**: Median ~0.2, range 0–0.4, outliers up to 0.5.
- **FairPFN (purple)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **Random (red)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **Unaware (orange)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
- **Unfair (dark blue)**: Median ~0.15, range 0–0.3, outliers up to 0.4.
### Key Observations
- **FairPFN (purple)** consistently shows the lowest median DSP values across most scenarios, indicating superior statistical parity.
- **Unfair (dark blue)** and **Unaware (orange)** methods exhibit higher median DSP values, suggesting poorer fairness performance.
- **Fair Additive Noise** scenario has the highest DSP values overall, implying greater disparity in this context.
- Outliers in the **Biased** and **Indirect-Effect** scenarios for **Constant** and **CFP** methods suggest occasional extreme disparities.
### Interpretation
The data demonstrates that **FairPFN** is the most effective method for maintaining statistical parity (lowest DSP), while **Unfair** and **Unaware** methods perform worst. The **Fair Additive Noise** scenario introduces the greatest disparity, likely due to noise-induced bias. The average ranks (ATE) align with these trends: **Constant** (1.0) and **CFP** (2.96) rank highest (best), while **Unfair** (6.15) ranks lowest (worst). The box plots reveal that fairness methods with lower ATE ranks (e.g., FairPFN) exhibit tighter interquartile ranges and fewer outliers, indicating more consistent performance. The **Direct-Effect** and **Indirect-Effect** scenarios show similar trends, but **Fair Additive Noise** amplifies disparities, highlighting sensitivity to noise in fairness evaluations.
</details>
Figure 20: Statistical Parity (Synthetic): Statistical Parity (DSP) of FairPFN compared to our baselines. FairPFN achieves a similar DSP as the Random baseline and outperforms EGR which was optimized specifically for this fairness metric, achieving an average rank of 3.97 out of 7.
<details>
<summary>extracted/6522797/figures/ddsp_lawschool.png Details</summary>

### Visual Description
## Scatter Plot: Law School Admissions
### Overview
The image is a scatter plot titled "Law School Admissions" comparing **Error (1-AUC)** on the y-axis (ranging from 0.33 to 0.50) and **Statistical Parity (DSP)** on the x-axis (ranging from 0.00 to 0.10). The plot includes multiple colored markers (triangles, diamonds, squares, stars, circles) representing distinct data points or groups. A beige inset box in the top-right corner highlights a subset of data with annotations.
---
### Components/Axes
- **Y-Axis (Error (1-AUC))**: Labeled with values from 0.33 to 0.50 in increments of 0.01. A dashed horizontal line at **0.38** spans the plot.
- **X-Axis (Statistical Parity (DSP))**: Labeled with values from 0.00 to 0.10 in increments of 0.01. A dashed vertical line at **0.05** spans the plot.
- **Legend**: Not explicitly visible in the image, but markers are color-coded (e.g., green triangles, red diamonds, purple squares, etc.).
- **Inset Box**: A beige-highlighted region in the top-right corner (x: 0.00–0.02, y: 0.375–0.380) with additional annotations (**0.380**, **−0.02**).
---
### Detailed Analysis
#### Data Points
1. **Top-Left Cluster** (x: 0.00–0.02, y: 0.48–0.50):
- Green triangle at (0.00, 0.50).
- Red diamond at (0.00, 0.48).
- Pink diamond at (0.01, 0.49).
- Light pink diamond at (0.01, 0.47).
2. **Center-Left Cluster** (x: 0.03–0.05, y: 0.43–0.45):
- Purple square at (0.04, 0.44).
- Light purple square at (0.03, 0.43).
- Dark purple square at (0.05, 0.45).
3. **Bottom-Left Cluster** (x: 0.00–0.05, y: 0.38–0.40):
- Pink star at (0.00, 0.38).
- Light pink star at (0.01, 0.39).
- Orange triangle at (0.04, 0.37).
4. **Right Cluster** (x: 0.08–0.10, y: 0.33–0.35):
- Blue circles at (0.08, 0.34), (0.09, 0.33), and (0.10, 0.35).
5. **Inset Box** (x: 0.00–0.02, y: 0.375–0.380):
- Pink star at (0.01, 0.38).
- Yellow diamond at (0.015, 0.375).
- Brown triangle at (0.02, 0.38).
- Annotation: **0.380** (top-left) and **−0.02** (bottom-left).
---
### Key Observations
1. **High Error, Low DSP**: The top-left cluster (e.g., green triangle, red diamond) shows the highest error (1-AUC) values (0.48–0.50) with near-zero DSP.
2. **Moderate Error, Moderate DSP**: The center-left cluster (purple squares) has mid-range error (0.43–0.45) and DSP (0.03–0.05).
3. **Low Error, High DSP**: The right cluster (blue circles) exhibits the lowest error (0.33–0.35) and highest DSP (0.08–0.10).
4. **Inset Focus**: The beige box highlights a narrow range of DSP (0.00–0.02) and error (0.375–0.380), with mixed markers (star, diamond, triangle).
---
### Interpretation
- **Statistical Parity vs. Error Tradeoff**: Higher DSP (x-axis) correlates with lower error (y-axis), suggesting that law schools with greater statistical parity (fairness) achieve better predictive accuracy (lower 1-AUC error).
- **Outliers**: The top-left cluster (high error, low DSP) may represent admissions processes with significant bias or inefficiency.
- **Inset Significance**: The beige box isolates a critical region where small changes in DSP (0.00–0.02) correspond to minimal error variation (0.375–0.380), possibly indicating a threshold for acceptable fairness-accuracy balance.
- **Color/Shape Encoding**: The use of distinct markers (triangles, diamonds, etc.) likely represents different admission criteria or demographic groups, though the legend is missing for confirmation.
---
### Technical Notes
- **Missing Legend**: The legend’s labels (e.g., what each color/shape represents) are not visible, limiting interpretability.
- **Dashed Lines**: The 0.38 (error) and 0.05 (DSP) lines may denote benchmarks for acceptable error rates or fairness thresholds.
- **Inset Annotations**: The **−0.02** value in the inset box could indicate a negative adjustment or offset in the analysis.
This plot underscores the tension between fairness (DSP) and accuracy (1-AUC) in law school admissions, with data suggesting that optimizing for statistical parity improves predictive performance.
</details>
<details>
<summary>extracted/6522797/figures/ddsp_adult.png Details</summary>

### Visual Description
## Scatter Plot: Adult Census Income vs Statistical Parity (DSP)
### Overview
The image is a scatter plot comparing "Adult Census Income" (y-axis) against "Statistical Parity (DSP)" (x-axis). Data points are represented by distinct geometric shapes and colors, each corresponding to a fairness metric (e.g., Unfair, Unaware, Constant). A legend on the right maps symbols to metrics. The plot includes a vertical dashed line at x=0.02 and a horizontal dashed line at y=0.45, with an inset box in the top-right corner highlighting specific data points.
---
### Components/Axes
- **X-axis (Statistical Parity, DSP)**: Ranges from 0.00 to 0.08 in increments of 0.02.
- **Y-axis (Adult Census Income)**: Ranges from 0.15 to 0.50 in increments of 0.05.
- **Legend**: Located on the right, outside the plot area. Symbols and colors include:
- Blue circle: Unfair
- Orange triangle: Unaware
- Green triangle: Constant
- Red diamond: Random
- Purple square: EGR
- Brown triangle: CFP
- Pink star: FairPFN
- Cyan triangle: CLAIRE
- Yellow diamond: Cntf. Avg.
- **Dashed Lines**:
- Vertical line at x=0.02 (Statistical Parity threshold).
- Horizontal line at y=0.45 (Income threshold).
- **Inset Box**: Beige-colored box in the top-right corner (x=0.00–0.02, y=0.15–0.20) containing a pink star and yellow diamond.
---
### Detailed Analysis
- **Data Points**:
- **Unfair (blue circles)**: Clustered near the bottom-right (x≈0.08, y≈0.15–0.20).
- **Unaware (orange triangles)**: Spread across mid-range x (0.02–0.06) and y (0.20–0.35).
- **Constant (green triangles)**: Near the top-left (x≈0.00–0.02, y≈0.45–0.50).
- **Random (red diamonds)**: Scattered across the plot, with some near the top-right (x≈0.06–0.08, y≈0.30–0.40).
- **EGR (purple squares)**: Concentrated near the center (x≈0.04–0.06, y≈0.25–0.30).
- **CFP (brown triangles)**: Clustered near the bottom-left (x≈0.00–0.02, y≈0.15–0.20).
- **FairPFN (pink stars)**: Located near the origin (x≈0.00–0.02, y≈0.15–0.20).
- **CLAIRE (cyan triangles)**: Spread across mid-range x (0.02–0.06) and y (0.20–0.35).
- **Cntf. Avg. (yellow diamonds)**: Clustered near the top-right (x≈0.06–0.08, y≈0.30–0.40).
- **Inset Box**: Contains a pink star (FairPFN) and yellow diamond (Cntf. Avg.), suggesting a focus on these metrics in the highlighted region.
---
### Key Observations
1. **Thresholds**:
- The vertical dashed line at x=0.02 may represent a target or acceptable range for Statistical Parity.
- The horizontal dashed line at y=0.45 could indicate a benchmark for Adult Census Income.
2. **Clustering**:
- Metrics like **Unaware** and **CLAIRE** show moderate Statistical Parity (x≈0.02–0.06) and mid-range income (y≈0.20–0.35).
- **Unfair** and **CFP** exhibit low Statistical Parity (x≈0.00–0.02) and low income (y≈0.15–0.20).
- **Constant** and **Cntf. Avg.** have high Statistical Parity (x≈0.00–0.02) and high income (y≈0.30–0.50).
3. **Inset Focus**: The beige box emphasizes **FairPFN** and **Cntf. Avg.**, possibly indicating their significance in the highlighted region.
---
### Interpretation
The plot illustrates trade-offs between Statistical Parity (DSP) and Adult Census Income across fairness metrics. Metrics like **Unaware** and **CLAIRE** balance moderate parity and income, while **Unfair** and **CFP** prioritize low parity at the cost of income. **Constant** and **Cntf. Avg.** achieve high parity and income but may lack fairness considerations. The inset box highlights **FairPFN** and **Cntf. Avg.**, suggesting these metrics are critical for evaluating fairness-income trade-offs. The thresholds (x=0.02, y=0.45) likely represent ideal or target values for these metrics.
</details>
Figure 21: Group-Fairness-Accuracy Trade-off (Real-World): Statistical Parity (DSP), predictive error (1-AUC), and Pareto Front of the performance of FairPFN compared to our baselines on each of 5 validation folds (light) and across all five folds (solid) of our real-world datasets. FairPFN dominates EGR which was specifically optimized for this group fairness metric.
<details>
<summary>x6.png Details</summary>

### Visual Description
## Causal Diagram: Three-Level Model of Fairness in Educational Outcomes
### Overview
The diagram illustrates a three-level causal model examining how protected attributes (race, sex) influence educational outcomes (FYA) through observable/unobservable factors. It uses color-coded nodes and directional arrows to represent causal relationships, mediators, and error terms.
### Components/Axes
- **Nodes**:
- **Blue**: Protected Attributes (RACE, SEX)
- **Purple**: Unfair Observables (GPA, LSAT)
- **Orange**: Outcome (FYA)
- **Yellow**: Fair Observable (X_fair)
- **Green**: Fair Unobservable (K)
- **Dashed Green**: Error Terms (ε_GPA, ε_LSAT)
- **Arrows**:
- Solid arrows: Direct causation
- Dotted arrows: Additive noise
- Dashed arrows: Causal paths seen by Conditional Fairness Preservation (CFP)
- **Legend**: Located at the bottom, mapping colors to node types.
### Detailed Analysis
#### Level-One
- **Protected Attributes** (RACE, SEX) directly influence **Unfair Observables** (GPA, LSAT).
- **Unfair Observables** collectively affect the **Outcome** (FYA) through a **Fair Observable** (X_fair).
- **Additive Noise** (dotted lines) connects RACE/SEX to X_fair, indicating unmeasured confounding.
#### Level-Two
- Introduces **Fair Unobservable** (K) as a mediator between Unfair Observables (GPA, LSAT) and Outcome (FYA).
- K is influenced by both GPA and LSAT, suggesting it captures latent fairness-related factors.
- X_fair remains a direct predictor of FYA, now alongside K.
#### Level-Three
- Adds **Error Terms** (ε_GPA, ε_LSAT) as dashed green nodes, representing unobserved variability in GPA/LSAT.
- These errors directly influence FYA, acknowledging measurement limitations or omitted variables.
### Key Observations
1. **Protected Attributes** (RACE, SEX) exert influence through both observable (GPA, LSAT) and unobservable pathways.
2. **Mediation by K**: The fair unobservable K partially mediates the effect of GPA/LSAT on FYA, suggesting latent fairness mechanisms.
3. **Error Terms**: Unobserved variability in GPA/LSAT (ε_GPA, ε_LSAT) directly impacts FYA, highlighting model uncertainty.
4. **Causal Flow**: Protected attributes → Unfair Observables → Outcome, with K and X_fair as fairness-correcting mediators.
### Interpretation
This model demonstrates how protected attributes may indirectly affect educational outcomes through observable metrics (GPA, LSAT) and unobservable factors (K). The inclusion of K as a fairness unobservable suggests that traditional metrics alone may not capture equity dynamics. The error terms emphasize the limitations of relying solely on observed data. The diagram implies that fairness interventions should address both observable metrics (via X_fair) and latent factors (K) to mitigate bias in outcomes like FYA. The additive noise in Level-One indicates that racial/sexual disparities may persist even after accounting for observed variables, pointing to systemic biases beyond individual-level factors.
</details>
Figure 22: Counterfactually Fair Prediction (CFP): Three levels of counterfactually fair prediction (CFP) Kusner et al. (2017), obtained by fitting a predictor 1) to fair observables (if any exist; left), 2) the inferred values of fair exogenous variables (middle) and 3) the inferred values of independent noise terms (right).
<details>
<summary>x7.png Details</summary>

### Visual Description
## Diagram: Fairness Modeling Frameworks
### Overview
The image presents two interconnected diagrams illustrating fairness modeling concepts:
1. **Standard Fairness Model (SFM)** (left)
2. **Fairness Cookbook** (right)
Both diagrams use color-coded nodes (blue, purple, orange) and directional arrows to represent relationships between variables.
---
### Components/Axes
#### Standard Fairness Model (SFM)
- **Nodes**:
- **A**: Protected Attributes (blue)
- **X**: Confounders (purple)
- **V**: Mediators (purple)
- **Y**: Outcomes (orange)
- **Connections**:
- Dashed lines between **A** ↔ **X** and **A** ↔ **V**
- Solid lines: **V** → **Y**, **X** → **Y**
#### Fairness Cookbook
- **Nodes**:
- **A**: Protected Attributes (blue)
- **Y**: Outcomes (orange)
- **X**: Confounders/Mediators (purple, background)
- **Connections**:
- Red arrow: **A** → **Y** (Direct Effect, DE)
- Green arrow: **Y** → **A** (Spurious Effect, SE)
- Dashed green arrow: **A** → **Y** (Indirect Effect, IE)
---
### Detailed Analysis
#### Standard Fairness Model (SFM)
- **Protected Attributes (A)** influence **Confounders (X)** and **Mediators (V)** via dashed lines, suggesting latent or indirect relationships.
- **Mediators (V)** directly affect **Outcomes (Y)**.
- **Confounders (X)** also directly influence **Outcomes (Y)**, creating potential bias pathways.
#### Fairness Cookbook
- **Direct Effect (DE)**: Explicit causal link from **A** to **Y** (red arrow).
- **Indirect Effect (IE)**: Dashed green arrow from **A** to **Y** via mediators (**X**).
- **Spurious Effect (SE)**: Reverse causal arrow from **Y** to **A** (green), indicating potential feedback loops or measurement errors.
---
### Key Observations
1. **SFM** emphasizes structural relationships between protected attributes, confounders, mediators, and outcomes.
2. **Cookbook** highlights causal pathways (DE/IE) and unintended feedback (SE), critical for bias mitigation.
3. **Mediators (V)** act as intermediaries in both models but are visually deemphasized in the Cookbook.
---
### Interpretation
- The **SFM** provides a foundational framework for understanding how protected attributes propagate through systems via confounders and mediators.
- The **Cookbook** operationalizes fairness by quantifying direct/indirect effects and identifying spurious correlations, which are critical for auditing algorithmic fairness.
- The **Spurious Effect (SE)** in the Cookbook warns against reverse causality assumptions, a common pitfall in fairness audits.
- **Mediators (V)** are central to both models but require explicit modeling to avoid oversimplification of causal pathways.
This dual-diagram approach bridges theoretical fairness concepts (SFM) with practical causal analysis (Cookbook), essential for developing robust fairness-aware algorithms.
</details>
Figure 23: Causal Fairness Analysis (CFA) Framework: Components of the CFA framework relevant to FairPFN’s prior and evaluation. Plecko & Bareinboim (2024) Standard Fairness Model (left; SFM), which provides a meta-model for causal fairness and heavily the design of our prior, and the Fairness Cookbook of causal fairness metrics (right).
| | 1) Biased | 2) Direct-Effect | 3) Indirect-Effect |
| --- | --- | --- | --- |
| Unfair | -0.00±0.13 (3.05%) | 0.00±0.14 (0.00%) | -0.00±0.12 (1.65%) |
| Unaware | -0.01±0.09 (2.60%) | 0.00± 0.00 (0.12%) | -0.01±0.08 (1.81%) |
| Constant | -0.36±0.34 (0.00%) | -0.27±0.43 (0.00%) | -0.38±0.34 (0.00%) |
| Random | 0.01±0.30 (0.01%) | 0.01±0.31 (0.01%) | 0.00±0.30 (0.00%) |
| EGR | -0.05±0.46 (0.00%) | -0.07±0.42 (0.00%) | -0.06±0.45 (0.00%) |
| CFP | -0.00± 0.03 (1.31%) | -0.01± 0.03 (0.56%) | -0.01±0.07 (2.29%) |
| FairPFN | 0.00±0.06 (2.03%) | -0.01± 0.03 (1.29%) | -0.00± 0.05 (2.22%) |
| | 4) Fair Observable | 5) Fair Unobservable | 6) Fair Additive Noise | Average |
| --- | --- | --- | --- | --- |
| Unfair | 0.00±0.14 (0.02%) | -0.00±0.19 (0.00%) | -0.00±0.18 (0.00%) | 0.00±0.15 (0.79%) |
| Unaware | -0.00± 0.05 (2.63%) | -0.00±0.09 (3.68%) | -0.00±0.10 (3.07%) | -0.00±0.07 (2.32%) |
| Constant | -0.49±0.18 (30.10%) | -0.38±0.30 (4.63%) | -0.37±0.33 (0.11%) | -0.38±0.32 (5.81%) |
| Random | 0.01±0.34 (0.00%) | 0.08±0.37 (0.00%) | 0.06±0.37 (0.00%) | 0.03±0.33 (0.00%) |
| EGR | -0.09±0.38 (0.00%) | -0.06±0.39 (0.00%) | -0.07±0.37 (0.00%) | -0.07±0.41 (0.00%) |
| CFP | -0.02±0.14 (1.72%) | 0.00± 0.06 (1.02%) | -0.00± 0.05 (1.00%) | -0.01± 0.06 (1.32%) |
| FairPFN | -0.01±0.07 (1.01%) | 0.01±0.07 (2.20%) | 0.01±0.09 (2.47%) | 0.00± 0.06 (1.87%) |
Table 1: Difference to Cntf. Avg. (Synthetic): Mean, standard deviation and percentage of outliers of the predictions on our causal casestudies of FairPFN and our baseline models compared to the predictions of the Cntf. Avg. baseline, which shows strong performance in causal effect removal and predictive error due to access to both observational and counterfactual datasets. FairPFN achieves predictions with an average difference to Cntf. Avg. of 0.00±0.06, with 1.87% of samples falling outside of three standard deviations.
<details>
<summary>extracted/6522797/figures/tce_by_group_synthetic.png Details</summary>

### Visual Description
## Box Plots: Causal Effect (ATE) Across Different Conditions and Methods
### Overview
The image presents six box plots comparing the distribution of causal effect estimates (ATE) across six conditions: Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, and Fair Additive Noise. Each plot evaluates seven methods (Cntf. Avg., Random, Unaware, Constant, FairPFN, CFP, EGR) using a shared y-axis scale (-0.5 to 0.75). The legend at the bottom maps colors to methods, with average ranks (ATE) provided for each method.
### Components/Axes
- **Y-Axis**: "Causal Effect (ATE)" with values ranging from -0.5 to 0.75 in increments of 0.25.
- **X-Axis**: Six conditions (Biased, Direct-Effect, Indirect-Effect, Fair Observable, Fair Unobservable, Fair Additive Noise), each with a box plot for seven methods.
- **Legend**: Located at the bottom, mapping colors to methods:
- Cntf. Avg.: Green (Avg. Rank: 2.24)
- Constant: Dark Green (Avg. Rank: 2.24)
- CFP: Brown (Avg. Rank: 2.24)
- Random: Red (Avg. Rank: 2.53)
- Unaware: Orange (Avg. Rank: 3.57)
- FairPFN: Pink (Avg. Rank: 3.0)
- EGR: Purple (Avg. Rank: 3.33)
- **X-Axis Labels**: Positioned above each box plot, with method names listed in the legend.
### Detailed Analysis
1. **Biased Condition**:
- Highest variability in causal effects (widest interquartile ranges).
- Cntf. Avg., Constant, and CFP show near-zero medians with tight distributions.
- Unaware and EGR exhibit positive skews, with EGR reaching up to ~0.5.
2. **Direct-Effect Condition**:
- Cntf. Avg. and Constant dominate with near-zero medians.
- Random and Unaware show moderate positive effects (~0.2–0.3).
- EGR and FairPFN cluster near zero but with larger spreads.
3. **Indirect-Effect Condition**:
- Similar to Direct-Effect but with slightly higher variability.
- Unaware and EGR show positive effects (~0.1–0.2), while Cntf. Avg. and Constant remain near zero.
4. **Fair Observable**:
- Cntf. Avg., Constant, and CFP maintain near-zero medians.
- Random and Unaware show small positive effects (~0.1–0.2).
- EGR and FairPFN cluster near zero with minimal spread.
5. **Fair Unobservable**:
- Cntf. Avg. and Constant remain near zero.
- Random and Unaware show moderate positive effects (~0.2–0.3).
- EGR and FairPFN cluster near zero but with larger spreads.
6. **Fair Additive Noise**:
- Cntf. Avg. and Constant dominate with near-zero medians.
- Random and Unaware show small positive effects (~0.1–0.2).
- EGR and FairPFN cluster near zero with minimal spread.
### Key Observations
- **Bias Sensitivity**: The Biased condition exhibits the highest variability, suggesting methods struggle with uncorrected bias.
- **Method Performance**:
- **Cntf. Avg., Constant, CFP**: Consistently show the lowest average ranks (2.24) and tightest distributions, indicating robustness.
- **Unaware**: Highest average rank (3.57) and largest positive effects in Biased and Fair Unobservable conditions.
- **FairPFN and EGR**: Moderate performance (ranks 3.0–3.33) with variable effects across conditions.
- **Fair Conditions**: Observability and noise levels (Fair Observable, Fair Additive Noise) reduce variability compared to Biased, but method performance remains method-dependent.
### Interpretation
The data suggests that methods like Cntf. Avg., Constant, and CFP are most effective at estimating causal effects under bias, as evidenced by their low average ranks and near-zero medians. In contrast, the Unaware method performs poorly, particularly in biased scenarios, with inflated positive effects. The Fair conditions (Observable, Unobservable, Additive Noise) demonstrate that observability and noise levels influence causal effect estimates, but method choice remains critical. Notably, EGR and FairPFN show mixed performance, performing better in fair conditions but struggling with bias. The consistent dominance of Cntf. Avg., Constant, and CFP across conditions highlights their reliability in causal inference tasks.
</details>
Figure 24: Causal Fairness (Synthetic-All Baselines): Average Treatment Effect (ATE) of predictions of FairPFN compared to all baselines. FairPFN consistently removes the causal effect with a margin of error of (-0.2, 0.2) and achieves an average rank of 3.0 out of 7.
<details>
<summary>x8.png Details</summary>

### Visual Description
## Heatmap Diagram: Fairness Method Component Analysis
### Overview
The image presents a comparative analysis of eight fairness methods in machine learning through a grid of heatmaps. Each heatmap visualizes the presence/absence of five key components (A, Xb, Xf, εXb, Y) across training and inference phases, with color-coded annotations indicating component status and causal relationships.
### Components/Axes
**Main Elements:**
1. **Top Row (4 Methods):** Unfair, Unaware, Cntf. Avg., Constant
2. **Bottom Row (4 Methods):** FairPFN, CFP, EGR, Random
3. **Columns (Components):** A (action), Xb (background variable), Xf (fairness variable), εXb (bias term), Y (outcome)
4. **Legend (Right Side):**
- Yellow: Causal effect removed
- Green: Training examples
- Red: Inference examples
- Blue: Predictions
- Dashed lines: Accesses causal model
5. **Causal Diagram (Bottom Right):** Nodes A→Xb→εXb→Y and Xf→Y with directional arrows
**Spatial Grounding:**
- Legend positioned vertically on the far right
- Causal diagram occupies bottom-right quadrant
- Heatmaps arranged in 2x4 grid (top row: 4 methods, bottom row: 4 methods)
- Component labels (A, Xb, Xf, εXb, Y) consistently positioned at top of each heatmap
### Detailed Analysis
**Unfair Method:**
- All cells colored (green/red/yellow)
- No causal effect removal (no yellow)
- No causal model access (no dashed lines)
**Unaware Method:**
- Yellow blocks in bottom-left quadrant (A→a', Y_A→a')
- Indicates causal effect removal in specific components
**Cntf. Avg. Method:**
- Yellow blocks in bottom-left quadrant (X_A→a, X_A→a')
- Dashed lines in bottom-right quadrant (Y_X→a, Y_X→a')
- Shows partial causal model access
**Constant Method:**
- Single blue cell labeled "c" in bottom-right
- All other cells uncolored
- Suggests constant prediction across components
**FairPFN Method:**
- Yellow blocks in bottom-left quadrant (A, Xb)
- Green blocks in top-right quadrant (Xf, εXb, Y)
- No causal model access
**CFP Method:**
- Yellow blocks in bottom-left quadrant (A, Xb)
- Green blocks in top-right quadrant (Xf, εXb, Y)
- Red blocks in bottom-right quadrant (Y)
- Shows inference example differentiation
**EGR Method:**
- Yellow blocks in bottom-left quadrant (A, Xb)
- Green blocks in top-right quadrant (Xf, εXb, Y)
- Red blocks in bottom-right quadrant (Y)
- Similar to CFP but with different spatial distribution
**Random Method:**
- Yellow blocks in bottom-left quadrant (A, Xb)
- Green blocks in top-right quadrant (Xf, εXb, Y)
- Blue blocks in bottom-right quadrant (Y)
- Uniform distribution across components
### Key Observations
1. **Causal Effect Removal:** All methods except Unfair show some yellow blocks (causal effect removal)
2. **Causal Model Access:** Only Cntf. Avg. and CFP/EGR methods show dashed lines
3. **Prediction Uniformity:** Constant method shows single blue cell vs. others with multiple colored cells
4. **Component Isolation:** Xf and εXb consistently appear in top-right quadrant across methods
5. **Y Component:** Appears in bottom-right quadrant in all methods except Constant
### Interpretation
The diagram demonstrates how different fairness methods manipulate causal components to achieve fairness:
- **Unfair** represents the baseline with all components intact
- **Unaware** and **FairPFN** show selective causal effect removal
- **Cntf. Avg.** combines causal effect removal with model access
- **CFP** and **EGR** demonstrate inference-phase component differentiation
- **Random** shows uniform component distribution
- **Constant** represents extreme simplification
The causal diagram reveals that bias (εXb) flows through background variables (Xb) to affect outcomes (Y), with fairness variables (Xf) providing alternative pathways. Methods accessing the causal model (dashed lines) appear to better isolate bias components, suggesting that causal awareness enables more targeted fairness interventions. The Constant method's single-cell approach implies a trade-off between fairness complexity and model simplicity.
</details>
Figure 25: Baseline Models: Visualization of FairPFN and our baseline models on our Fair Observable benchmark group, in terms of which variables each model is fit to and performs inference on on.
| Unfair Unaware | Law School Admissions 0.09±0.10 (0.00%) 0.03± 0.03 (0.00%) | Adult Census Income 0.05±0.06 (0.60%) 0.02± 0.04 (1.49%) | Average 0.07±0.08 (0.30%) 0.03± 0.04 (0.75%) |
| --- | --- | --- | --- |
| Constant | -0.40±0.08 (97.51%) | -0.18±0.10 (15.69%) | -0.29±0.09 (56.60%) |
| Random | 0.10±0.30 (0.00%) | 0.32±0.31 (0.30%) | 0.21±0.31 (0.15%) |
| EGR | 0.06±0.45 (0.00%) | 0.01±0.35 (0.00%) | 0.03±0.40 (0.00%) |
| CFP | 0.09± 0.03 (49.21%) | 0.05±0.06 (2.13%) | 0.07±0.05 (25.67%) |
| FairPFN | 0.01± 0.03 (0.11%) | 0.02± 0.04 (0.60%) | 0.02± 0.04 (0.36%) |
Table 2: Difference to Cntf. Avg. (Real): Mean, standard deviation and percentage of outliers of the predictions on our real-world datasets of FairPFN and our baseline models compared to the predictions of the Cntf. Avg. baseline, which shows strong performance in causal effect removal and predictive error due to access to both observational and counterfactual data. FairPFN achieves predictions with an average difference to Cntf. Avg. of 0.02±0.04, with 0.36% of samples falling outside of three standard deviations.