## Causal inference via algebraic geometry: feasibility tests for functional causal structures with two binary observed variables
## Ciar´ an M. Lee ∗
Department of Computer Science, University of Oxford, Oxford, UK OX1 3QD
## Abstract
We provide a scheme for inferring causal relations from uncontrolled statistical data based on tools from computational algebraic geometry, in particular, the computation of Groebner bases. We focus on causal structures containing just two observed variables, each of which is binary. We consider the consequences of imposing different restrictions on the number and cardinality of latent variables and of assuming different functional dependences of the observed variables on the latent ones (in particular, the noise need not be additive). We provide an inductive scheme for classifying functional causal structures into distinct observational equivalence classes. For each observational equivalence class, we provide a procedure for deriving constraints on the joint distribution that are necessary and sufficient conditions for it to arise from a model in that class. We also demonstrate how this sort of approach provides a means of determining which causal parameters are identifiable and how to solve for these. Prospects for expanding the scope of our scheme, in particular to the problem of quantum causal inference, are also discussed.
## 1 Introduction
Causal relationships, unlike statistical dependences, support inferences about the effects of interventions and the truths of counterfactuals. While a randomised controlled experiment can be used to determine causal relationships, these may not be available for various reasons: they could be restrictively expensive, technologically infeasible, unethical (e.g., assessing the effect of smoking on lung cancer), or indeed physically impossible (e.g., for variables describing properties of distant astronomical bodies). Therefore, inferring causal relationships from uncontrolled statistical data is an important problem, with broad applicability across scientific disciplines. Over the past-twenty
∗ Electronic address: ciaran.lee@cs.ox.ac.uk.
† Electronic address: rspekkens@perimeterinstitute.ca.
## Robert W. Spekkens †
Perimeter Institute for Theoretical Physics, Waterloo, Ontario, Canada N2L 2Y5
five years, there has been much progress in developing methods to solve this problem [1, 2, 3, 4, 5].
As has become standard practice, we formalize the notion of causal structure using directed acyclic graphs (DAGs) with random variables as nodes and arrows representing direct causal influence [1, 2]. A more refined description of causal dependences specifies not only what causes what, but also, for every variable, its functional dependence on its causal parents. We shall use the term functional causal structure to refer to the specification of the set of functions, which includes a specification of the DAG. As is standard, the variables that are not observed are termed latent , and the DAG does not include any latent variables that act as causal mediaries, so that all the latent variables are parentless. We shall use the term causal model to describe the functional causal structure together with a specification, for each latent variable, of a probability distribution over its values. Each causal model associated to a given functional causal structure defines a possible joint probability distribution over the observed variables. We are interested in the set of possible joint distributions over the observed variables for a given functional causal structure, that is, those that can arise from some set of distributions on the latent variables. We will say that two functional causal structures are observationally equivalent if they are characterized by the same set of distributions over the observed variables. 1
Many causal inference algorithms, such as those of [1] and [2], only make use of conditional independence relations among the observed variables. If two causal structures are such that the same set of conditional independence relations are faithful to them, then they are said to be Markov equivalent. Note that Markov equivalence can be decided purely on the basis of the DAG (i.e., the causal structure), while the notion of observational equivalence of interest here depends on the functional dependences (i.e., the functional causal structure ). In the case of just two observed variables, which is the one we consider here, the set of all causal structures are partitioned into just two Markov equivalence classes: those wherein the variables are causally connected, and those wherein they are not. As we show, however, the joint distribution over the observed variables supports many more inferences about the
1 This should not be confused with the notion of observational equivalence as applied to DAGs [1].
functional causal structure, thereby providing a more finegrained classification than is provided by Markov equivalence.
In recent years, several methods have been suggested that make use not only of conditional independences, but also other properties of the joint statistical distribution between the observed variables [3, 4, 5, 6] (See also the works discussed in Secs. 6.2 and 6.3). These newer methods also have limitations in the sense that they impose restrictions on the number of latent variables allowed in the underlying causal model and also on the mechanisms by which these latent variables influence the observed ones.
In the present work, we restrict attention to the causal inference problem where there are just two observed variables, each of which is binary (that is, discrete with just two possible values). We allow any functional causal structure involving latent variables that are discrete (with a finite number of values), and we impose no restriction on the number of latent variables or the mechanisms by which these influence the observed ones.
We provide an inductive scheme for characterizing the observational equivalence classes of functional causal structures. This scheme has a few steps. First we show that, in each observational class, there is a functional causal structure wherein all of the latent variables are binary. Restricting ourselves to the latter sort of functional causal structure, we show that one can inductively build up any functional causal structure from a pair of others having fewer latent variables. Thus, starting with functional causal structures with no latent variables, we can recursively build up all functional causal structures, and therefore all observational equivalence classes of these, by applying our inductive scheme.
Using this scheme, we catalogue all observational equivalence classes generated by functional causal structures with four or fewer binary latent variables. We have evidence, but no proof yet, that our catalogue is complete in the sense that a functional causal structure with any number of binary latent variables-and hence, by the connection described above, any functional causal structure with discrete latent variables-belongs to one of the classes we have identified.
We also describe a procedure for deriving, for each class, the set of necessary and sufficient conditions on the joint distribution over observed variables for it to be possible to generate it from functional causal structures in this class. We call such a set of conditions a feasibility test for the class. The procedure for deriving these is as follows. We start with a particular functional causal structure within the class, express the parameters in the joint probability distribution over the observed variables in terms of the parameters in the probability distributions over the latent variables, then eliminate the latter using techniques from algebraic geometry.
Finally, we consider applications to the problem of identifying causal parameters. For the parameters describing the probability distributions over the latent variables, we note that our technique allows one to find expressions for these
Fig. 1: (a) DAG for causal model defined by A = B ⊕ λ and B = ν (b) Joint distributions that can be generated by this causal model.
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Diagram: Bidirectional Relationship and 3D Geometric Structure
### Overview
The image contains two distinct components:
1. **Part (a)**: A bidirectional relationship diagram with labeled arrows and nodes.
2. **Part (b)**: A 3D pyramid-like structure with labeled edges and color gradients.
### Components/Axes
#### Part (a): Bidirectional Relationship
- **Nodes**:
- `A` (left) and `B` (right), connected by a horizontal bidirectional arrow.
- **Arrows**:
- Vertical arrows labeled `λ` (from `A` to `B`) and `ν` (from `B` to `A`).
- **Interpretation**:
- Represents a transformation or mapping between `A` and `B`, with `λ` and `ν` as directional operators or functions.
#### Part (b): 3D Geometric Structure
- **Structure**:
- A tetrahedral pyramid with triangular faces.
- Base edges labeled `[00]`, `[10]`, `[11]`.
- Apex labeled `[01]`.
- **Color Gradient**:
- Blue (base) to red (apex), suggesting a scalar parameter (e.g., intensity, probability, or value).
- **Labels**:
- Binary-like labels (`[00]`, `[01]`, `[10]`, `[11]`) on edges, possibly representing coordinates, states, or binary vectors.
### Detailed Analysis
#### Part (a):
- No numerical values or scales provided.
- Arrows imply a reversible relationship between `A` and `B`, mediated by `λ` and `ν`.
#### Part (b):
- **Edge Labels**:
- `[00]`: Base-left edge.
- `[10]`: Base-right edge.
- `[11]`: Base-center edge.
- `[01]`: Apex.
- **Color Gradient**:
- Blue (low value) to red (high value), but no explicit legend or scale.
- **Geometry**:
- Symmetric triangular base with apex offset, forming a tetrahedron.
### Key Observations
1. **Part (a)**:
- The bidirectional arrow between `A` and `B` suggests mutual dependency or equivalence.
- `λ` and `ν` may represent distinct transformations (e.g., linear operators, logical implications).
2. **Part (b)**:
- Binary labels (`[00]`, `[01]`, etc.) resemble binary coordinates or state representations.
- Color gradient implies a scalar field, but no quantitative data is provided.
- The apex (`[01]`) is distinct from the base, possibly indicating a hierarchical or prioritized structure.
### Interpretation
- **Part (a)**:
- Likely represents a mathematical or logical system where `A` and `B` are interrelated through transformations `λ` and `ν`. This could model processes like data flow, function composition, or state transitions.
- **Part (b)**:
- The 3D structure may visualize a geometric or computational concept, such as:
- A probability simplex (if colors represent probabilities).
- A binary decision tree or state machine (with `[00]`, `[01]`, etc., as states).
- A coordinate system in a 3D space with binary or ternary logic.
- The absence of a legend for the color gradient limits quantitative interpretation.
### Notable Patterns
- **Part (a)**: Symmetry in the bidirectional relationship suggests equivalence or duality.
- **Part (b)**: The apex (`[01]`) is isolated from the base, potentially signifying a unique or dominant state.
### Conclusion
The diagrams collectively imply a system with bidirectional interactions (Part a) and a hierarchical or geometric structure (Part b). The binary labels and color gradient hint at computational or mathematical modeling, but further context is required for precise interpretation.
</details>
in terms of the observational data for each observational equivalence classes that we have considered. For the parameters describing the functional relations, we note that the limits to what one can infer about these, which may be different for different points in the space of possible joint distributions over the observed variables, can be inferred from our feasibility tests.
## 2 Setting up the problem
Consider the causal model of Fig.1(a). From the DAG, it is clear that B is a cause of A , while λ is noise local to A and ν is noise local to B . The functional dependences are given by A = B ⊕ λ and B = ν . Amodel with this sort of functional dependence is referred to as an additive noise model (ANM) in Refs. [3, 4, 5, 6]. The values of A , for different values of B and λ , are given in the table below.
| ν | λ | B | A = B ⊕ λ |
|-----|-----|-----|-------------|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 1 | 0 |
In Ref. [5], it was shown that one can distinguish between the causal model of Fig.1(a) and the causal models depicted in Fig. 2(a) and Fig. 2(c), except for special cases of the distributions over the noise variables, such as, for instance, when λ and ν are uniformly distributed. Thus if we are promised that the causal model is an ANM, then (except for the special cases) we can distinguish between B causing A , A causing B and A and B being causally disconnected. To see how this works we will need to determine the correlations generated by this model.
To describe the correlations we adopt the following notational convention.
<!-- formula-not-decoded -->
Let q 1 be the probability that ν = 0 and q 2 be the probability that λ = 0 , then the correlations for the above causal model are
<!-- formula-not-decoded -->
Fig. 2: (a) DAG for causal model defined by A = λ and B = A ⊕ ν . (b) Joint distributions that can be generated by the model of (a). Note that this is a head-on view of a fan shape of the same type as is depicted in (d). (c) DAG for causal model defined by A = λ and B = ν . (d) Joint distributions that can be generated by the model of (c).
<details>
<summary>Image 2 Details</summary>

### Visual Description
## Diagram and 3D Plot Analysis: Material Transformation and Stress Distribution
### Overview
The image contains two distinct sections:
1. **Diagrams (a) and (c)**: Arrows labeled with variables (λ, ν) and states (A, B) suggesting directional relationships or transformations.
2. **3D Plots (b) and (d)**: Tetrahedral structures with color gradients and labeled axes ([00], [10], [11], [01]), likely representing crystallographic orientations or stress/strain distributions.
---
### Components/Axes
#### Diagrams (a) and (c):
- **Labels**:
- **A** and **B**: States or phases (e.g., initial/final configurations).
- **λ** and **ν**: Parameters (e.g., wavelength, Poisson’s ratio, or transformation coefficients).
- **Flow**:
- Diagram (a): A → B with λ and ν acting as directional forces.
- Diagram (c): A ← B with λ and ν reversed, implying a feedback or iterative process.
#### 3D Plots (b) and (d):
- **Axes**:
- **X-axis**: [00] (reference plane).
- **Y-axis**: [10] (primary crystallographic direction).
- **Z-axis**: [11] and [01] (secondary crystallographic directions).
- **Color Gradient**:
- **Blue to Red**: Likely represents magnitude (e.g., stress intensity, strain energy).
- **White Regions**: Absence of data or neutral zones.
---
### Detailed Analysis
#### Diagrams (a) and (c):
- **Diagram (a)**:
- λ and ν act as **input forces** driving the transformation from A to B.
- Spatial grounding: λ is positioned above A, ν above B, suggesting sequential application.
- **Diagram (c)**:
- λ and ν now act as **output forces** from B to A, indicating a reverse process or equilibrium state.
- Spatial grounding: λ and ν are positioned below A and B, respectively, emphasizing feedback.
#### 3D Plots (b) and (d):
- **Plot (b)**:
- Color gradient transitions from **blue ([00])** to **red ([01])**, peaking at [11].
- Suggests highest intensity (e.g., stress) at [11] orientation.
- **Plot (d)**:
- Color gradient shifts to **purple ([00])** and **yellow ([10])**, with a white void at [11].
- Indicates reduced intensity or material failure at [11] under altered conditions.
---
### Key Observations
1. **Reversal of Forces**: Diagram (c) inverts the direction of λ and ν compared to (a), implying a dynamic system (e.g., cyclic loading).
2. **Stress Concentration**: 3D Plot (b) highlights [11] as a critical zone, while Plot (d) shows material degradation at this orientation.
3. **Color Correlation**: Blue (low intensity) and red (high intensity) align with typical stress visualization conventions.
---
### Interpretation
- **Material Behavior**: The diagrams likely model phase transformations (A → B) under mechanical parameters (λ, ν), while the 3D plots visualize resultant stress/strain distributions.
- **Critical Failure**: The white void in Plot (d) at [11] suggests a fracture or phase boundary under specific conditions.
- **Dynamic Equilibrium**: The bidirectional arrows in (a) and (c) may represent hysteresis or reversible transformations in the material.
---
**Note**: No explicit numerical data or legends are present in the image. Trends and interpretations are based on standard conventions for stress/strain visualization and crystallographic analysis.
</details>
This means P ( A = 0 , B = 0) = q 1 q 2 , P ( A = 0 , B = 1) = (1 -q 1 )(1 -q 2 ) and so on. From now on, we will use the shorthand q i ≡ 1 -q i to simplify expressions.
Note that if a latent variable were to take one of its values with probability 1, then it would be trivial and could be eliminated from the functional causal structure. We therefore consider only functional causal structures with nontrivial latent variables, that is, latent variables that have some statistical variation in their value, so that the probability of any value is bounded away from 0 and 1. In the present example, therefore, 0 < q 1 , q 2 < 1 .
For a general causal model we have P ( A,B ) = p 00 [00] + p 01 [01]+ p 10 [10]+ p 11 [11] , where P ( A = i, B = j ) = p ij . We note that p 00 + p 01 + p 10 + p 11 = 1 . As we only need three real parameters to specify P ( A,B ) , we can plot it in R 3 . It is easy to see that the points { P ( A = i, B = j ) = 1 : i, j ∈ Z 2 } form the vertices of a tetrahedron in R 3 and so the plot of P ( A,B ) must lie within this tetrahedron.
We can rewrite P ( A,B ) for our current example as
<!-- formula-not-decoded -->
So, if we fix the value of q 2 in the range (0 , 1) and vary q 1 over the interval (0 , 1) , the plot of P ( A,B ) consists of the line passing through a point on the edge of the tetrahedron containing the vertices { [00] , [10] } and a point on the edge containing the vertices { [11] , [01] } (but excluding these points). The full plot of P ( A,B ) , as q 1 and q 2 each range over the interval (0 , 1) , is depicted in Fig. 1(b) (where the boundary points are excluded). We refer to this shape as a fan . Fig. 2(b) and Fig. 2(d) depict the set of joint distributions for the ANM where A causes B and the causal structure where A and B are causally disconnected.
Given some joint distribution, P ( A,B ) , how do we determine if it lies on one of the fans of Fig. 1(b), Fig. 2(b) or Fig. 2(d)? Recall that, because the latent variables are un- observed, we do not have access to the q i 's directly, only the observed p ij 's. Thus, the problem can be posed as follows: what are the defining equations of the fans in terms of the observed p ij 's?
This problem was solved for the example of Fig. 1 in Ref. [5] using the following technique. First, it was noted that the DAG implies that λ is marginally independent of B , and therefore P ( λ | B = 0) = P ( λ | B = 1) . Given that λ is a binary variable, this is true if and only if P ( λ = 1 | B = 0) = P ( λ = 1 | B = 1) . We wish to eliminate λ from this condition. Recall from the definition of conditional probability that P ( λ = 1 | B = b ) = P ( λ = 1 , B = b ) / P ( B = b ) . The functional dependence A = B ⊕ λ can be used to conclude that P ( λ = 1 , B = b ) = P ( A = b ⊕ 1 , B = b ) . Note that this last step is only possible because the noise is additive, so that one can infer λ from A and B . Therefore, reverting to our notational conventions, where P ( A = 1 ⊕ b, B = b ) = p 1 ⊕ b,b and P ( B = b ) = p 0 ,b + p 1 ,b , the condition becomes
<!-- formula-not-decoded -->
which can be rewritten as:
<!-- formula-not-decoded -->
This equation, together with the open-interval constraints ,
<!-- formula-not-decoded -->
defines the fan in Fig. 1(b). Using similar techniques, one can show that Figs. 2(b) and 2(d) are defined by equation
<!-- formula-not-decoded -->
respectively
<!-- formula-not-decoded -->
together with the open-interval constraint.
The question is: how can one find feasibility tests for generic causal models? In particular, how does one treat models where the noise is not additive? Consider, for instance, the causal model that has the same DAG as in Fig. 1(a), but where the noise is multiplicative, that is, A = Bλ . In this case, the value of λ cannot be inferred from A and B (given that these could be zero), and consequently one cannot use the approach of Ref. [5]. It is also unclear how one can characterize the possibilities for the joint distribution when the causal model involves an arbitrary number of latent variables. We will show that these questions can be answered using powerful tools from algebraic geometry, which we describe in the next section.
## 3 Deriving the feasibility tests
We begin with an introduction to some of the main concepts of algebraic geometry following the presentation given in [7]. For a more detailed discussion, see appendix A.
Denote the set of all polynomials in variables x 1 , . . . , x n with coefficients in some field k by k [ x 1 , . . . , x n ] . When dealing with polynomials, we are mainly interested in the solution set of systems of polynomial equations. This leads us to the main geometrical objects studied in algebraic geometry, algebraic varieties and semi-algebraic sets.
An algebraic variety 2 V ( f 1 , . . . , f s ) ⊂ k n is the solution set of the system of polynomial equations f 1 ( x 1 , . . . , x n ) = · · · = f s ( x 1 , . . . , x n ) = 0 . A basic semi-algebraic set is defined to be the solution set of a system of polynomial equalities and inequalities, that is, { x ∈ R n : g i ( x ) 0 , ∀ i = 1 , . . . , m } , where g 1 , . . . , g n ∈ R [ x 1 , . . . , x n ] are polynomials ove the reals 3 and where corresponds to either ≥ , = , or ≤ . Note that algebraic varieties are examples of basic semi-algebraic sets. A semi-algebraic set is formed by taking finite combinations of unions, intersections, or complements of basic semi-algebraic sets. For instance, the fan in Fig.1(b) is the semi-algebraic set that results from the intersection of the algebraic variety defined by the single polynomial equation p 00 p 01 -p 11 p 10 = 0 and the set of inequalities that define the interior of the tetrahedral probability simplex (requiring each probability to be in the interval (0 , 1) ).
More generally, for any causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic set.
We now define ideals , the main algebraic object studied in algebraic geometry. A subset I ⊂ k [ x 1 , . . . , x n ] is an ideal if it satisfies: (1) 0 ∈ I , (2) If f, g ∈ I , then f + g ∈ I , and (3) If f ∈ I and h ∈ k [ x 1 , . . . , x n ] , then hf ∈ I .
A natural example of an ideal is the ideal generated by a finite number of polynomials, defined as follows. Let f 1 , . . . , f s be polynomials in k [ x 1 , . . . , x n ] , then the ideal generated by f 1 , . . . , f s is:
<!-- formula-not-decoded -->
The polynomials f 1 , . . . , f s are called the basis of the ideal.
Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. One can even define the algebraic variety V ( I ) defined by the ideal I ⊂ k [ x 1 , . . . , x n ] , where
<!-- formula-not-decoded -->
Interestingly, it can also be shown that if I = 〈 f 1 , . . . , f s 〉 , then V ( I ) = V ( f 1 , . . . , f s ) , which is to say that the variety defined by a set of polynomials is the same as the variety defined by the ideal generated by those polynomials. Hence, varieties are determined by ideals .
We can now use the language of algebraic geometry to restate the question asked at the end of the last section. Let
2 Also called an affine variety or an algebraic set .
3 Note that one can replace the real field R used in the last definition with any ordered field.
V ⊆ k n be an algebraic variety given parametrically as
<!-- formula-not-decoded -->
where the g i are polynomials in q 1 , . . . , q m . The conjunction of the above equalities with the inequalities ensuring that the variables q 1 , . . . , q m are in the interval (0 , 1) (probabilities bounded away from 0 and 1) defines a semialgebraic set on p 1 , . . . , p n , q 1 , . . . , q m . We seek to infer which values of p 1 , . . . , p n are possible for some values of the q 1 , . . . , q m in their allowed intervals. By the TarskiSeidenberg theorem [8], the solution to this problem is also a semi-algebraic set. We determine the latter as follows. First, we eliminate the variables q 1 , . . . , q m to find a system of polynomial equations in p 1 , . . . , p n ,. These define the smallest algebraic variety on p 1 , . . . , p n , q 1 , . . . , q m that contains the semi-algebraic set that we seek to characterize. This problem is known as implicitization . The second step is to determine which points in this algebraic variety can be extended to a solution of the equalities and inequalities of the original parametric characterization.
For example, consider the algebraic variety that is defined parametrically by the polynomial equations
<!-- formula-not-decoded -->
We would like to characterize the semi-algebraic set that this variety defines on the observed variables p 00 , p 01 , p 10 , p 11 alone when one eliminates the parameters q 1 and q 2 while enforcing that they are probabilities in (0 , 1) . In Sec. 2, it was shown how one can do so, and that the resulting semi-algebraic set is the one depicted in Fig.1(b). However, the technique was not generalizable to arbitrary functional causal structures. Here, we reconsider this example using techniques that are generally applicable.
The problem can be solved by employing a specific choice of basis for the ideal generated by the system of polynomial equations that define the variety (3.0.1). The basis that achieves this feat is known as the Groebner basis .
Groebner bases simplify many calculations in algebraic geometry and they have many interesting properties [7]. There are efficient algorithms for calculating Groebner bases and many software packages that one can use to implement them.
We discovered in this section that the fan of Fig. 1(b) is in fact the intersection of the algebraic variety defined by the ideal
<!-- formula-not-decoded -->
with the tetrahedron.The Groebner basis 4 of this ideal is
4 with respect to the lex order q 1 > q 2 > p 00 > p 10 > p 01 > p 11 , see appendix A
found to be
<!-- formula-not-decoded -->
Solutions to g 1 = · · · = g 4 = 0 provide solutions to
<!-- formula-not-decoded -->
which define our algebraic variety. Looking more closely at the Groebner basis we note that the variables q 1 , q 2 have been eliminated from the polynomials g 3 and g 4 . The solution of g 3 = p 00 + p 01 + p 10 + p 11 -1 = 0 is exactly the normalisation condition. The solution of g 4 = 0 gives us the following
<!-- formula-not-decoded -->
which, using the normalization condition, then gives us
<!-- formula-not-decoded -->
Ondemanding 0 < p 00 , p 01 , p 10 , p 11 < 1 and p ij ∈ R , ∀ ij (i.e. on taking the intersection of this algebraic variety with the tetrahedron), we obtain the semi-algebraic set corresponding to the fan of Fig.1(b), which we derived in section 2. This is a special case of a general result, known as the elimination theorem , which provides us with a way of using Groebner bases to systematically eliminate certain variables from a system of polynomial equations and, thus, to solve the implicitization problem.
The general procedure for finding the semi-algebraic set is as follows. First, given the system of polynomial equations defining the implicitization problem, as in Eq. (3.0.1), form the ideal generated by these polynomials and compute 5 its Groebner basis. The elements of this basis that do not contain the variables q 1 , . . . , q m constitute constraints on the variables p 1 , . . . , p n alone. These constraints consitute polynomial equalities, and therefore define an algebraic variety on the variables p 1 , . . . , p n . Second, we determine which points on this variety correspond to solutions of the original equalities and inequalities on q 1 , . . . , q n and p 1 , . . . , p m . This will result in inequality constraints. The equality constraints from the first step and the inequality constraints from the second step together characterize the semi-algebraic set on p 1 , . . . , p n that is compatible with the given functional causal structure. We note that one trivial consequence of the fact that each of the q 1 , . . . , q m is in the interval (0 , 1) is that each of the p 1 , . . . , p n is in the interval (0 , 1) . As such, the semialgebraic set we seek to characterize is always a subset of the geometric intersection of the algebraic variety we find in the first step and the probability simplex on p 1 , . . . , p n . Note, however, that it is generically a strict subset of this intersection.
These inequality constraints manifest themselves in different ways. We present an example of one such manifestation below and leave the remaining examples to appendix B.
5 with respect to the lexicographic order q 1 > q 2 > · · · > q m > p 1 > · · · > p n .
Fig. 3: (a) A = µν and B = µλ . (b) p 00 p 11 ≥ p 01 p 10 .
<details>
<summary>Image 3 Details</summary>

### Visual Description
## Diagram: Bidirectional Interaction and 3D Geometric Structure
### Overview
The image contains two distinct components:
1. **Diagram (a)**: A bidirectional relationship between points **A** and **B** with labeled arrows.
2. **Diagram (b)**: A 3D pyramid-like structure with labeled faces and a color gradient.
---
### Components/Axes
#### Diagram (a):
- **Labels**:
- Arrows from **A** to **B**: **ν** (vertical), **μ** (diagonal).
- Arrow from **B** to **A**: **λ** (vertical).
- **Flow**:
- **A** → **B** via **ν** and **μ**.
- **B** → **A** via **λ**.
#### Diagram (b):
- **Axes/Face Labels**:
- Top face: **[01]**
- Left face: **[00]**
- Right face: **[11]**
- Base face: **[10]**
- **Color Gradient**:
- Top-to-bottom gradient: **blue** (apex) → **green** → **yellow** (base).
- **No explicit legend** for the gradient.
---
### Detailed Analysis
#### Diagram (a):
- **Arrows**:
- **ν**: Direct vertical connection from **A** to **B**.
- **μ**: Diagonal connection from **A** to **B**.
- **λ**: Direct vertical connection from **B** to **A**.
- **Interpretation**:
- Represents a system with two-way interactions (**A** ↔ **B**) mediated by three distinct pathways (**ν**, **μ**, **λ**).
#### Diagram (b):
- **Structure**:
- A tetrahedral pyramid with triangular faces.
- Face labels resemble **Miller indices** (common in crystallography), suggesting a geometric or material science context.
- **Color Gradient**:
- **Blue** at the apex ([01] face) transitions to **yellow** at the base ([10] face).
- No numerical scale provided; gradient likely represents a scalar quantity (e.g., stress, energy, or concentration).
---
### Key Observations
1. **Diagram (a)**:
- Symmetry in vertical arrows (**ν** and **λ**) suggests reciprocal interactions.
- Diagonal arrow (**μ**) implies a non-linear or secondary pathway.
2. **Diagram (b)**:
- Face labels **[01]**, **[00]**, **[11]**, **[10]** align with crystallographic plane notation.
- Color gradient may indicate a property varying with depth or orientation.
---
### Interpretation
- **Diagram (a)**:
- Likely models a physical or abstract system with bidirectional interactions (e.g., chemical reactions, data flow, or mechanical forces). The labels **ν**, **μ**, **λ** could represent velocity, momentum, and wavelength, respectively, depending on context.
- **Diagram (b)**:
- The 3D structure resembles a **Wulff shape** or **Voronoi cell**, common in materials science to represent equilibrium configurations.
- The color gradient might correlate with a property like **surface energy** or **defect density**, with **blue** (low values) at the apex and **yellow** (high values) at the base.
- **Relationships**:
- Diagram (a) could represent interactions between planes or phases in the 3D structure (b), with **ν**, **μ**, **λ** governing transitions between crystallographic orientations.
---
### Notes on Missing Data
- No numerical values or explicit units are provided.
- The color gradient in (b) lacks a legend, limiting quantitative interpretation.
- Assumptions about crystallography are based on Miller index notation; alternative contexts (e.g., quantum states) are possible.
</details>
Consider the causal model of Fig. 3(a). Defining q 1 , q 2 and q 3 to be the probabilities for µ = 0 , ν = 0 and λ = 0 respectively, the joint distribution generated by this model is
<!-- formula-not-decoded -->
We begin by providing an intuitive account of the semialgebraic set describing such joint distributions. Note first that P ( A,B ) can be rewritten as
<!-- formula-not-decoded -->
implying that it is the convex combination, with weight q 1 , of the point distribution [00] , and with weight q 1 , of the distribution arising from the functional causal structure of Fig. 2(c), shown above to be characterized by the equality p 00 p 11 = p 10 p 01 . It follows that the semi-algebraic set defined by P ( A,B ) contains all interior points on any line extending from the vertex [00] to a point on the fan depicted in Fig. 2(d); this variety is depicted in Fig. 3(b).
Reading off the expressions for p 00 , p 01 , p 10 , and p 11 from Eq. (3.0.2), we obtain the set of polynomials that define the full algebraic variety. The ideal generated by these is:
<!-- formula-not-decoded -->
To implement the first step of the general procedure outlined above, we derive the Groebner basis for this ideal 6 :
<!-- formula-not-decoded -->
Now g 5 = 0 is just the normalisation condition and g 6 = 0 gives the following:
<!-- formula-not-decoded -->
6 with respect to the lex order q 1 > q 2 > q 3 > p 00 > p 01 > p 10 > p 11
which, using the normalisation condition, results in
<!-- formula-not-decoded -->
To implement the second step of our procedure, we begin by enforcing q 1 > 0 . This results in the following inequality
<!-- formula-not-decoded -->
None of the remaining constraints 0 < q i < 1 , for i ∈ { 2 , 3 } result in nontrivial relations among the p ij 's, so the latter inequality is the only nontrivial constraint. Together with the open-interval constraints 0 < p 00 , p 01 , p 10 , p 11 < 1 , it describes the necessary and sufficient conditions for the distribution on observed variables to be compatible with the functional causal structure of Fig. 3(a). These conditions define the semi-algebraic set depicted in Fig. 3(b).
## 4 Characterizing the observational equivalence classes
In this section, we will provide a scheme for inductively characterizing all observational equivalence classes. As noted in the introduction, we consider only causal models where there is a pair of binary observed variables, which we denote by A and B .
## 4.1 Sufficiency of considering purely common-cause models
A causal model having no directed causal influences between the observed variables will be termed purely common-cause .
Lemma 4.1.1. Every causal model wherein there is a directed causal influence between A and B (either A → B or B → A ) is observationally equivalent to one that is purely common-cause.
The proof is as follows. Suppose that there is a directed causal influence B → A . If the collection of all latent variables is denoted by λ , then a general causal model can be specified by the functional dependences B = f ( λ ) and A = g ( λ, B ) for some functions f and g . But this is observationally equivalent to the causal model that is purely common-cause with functional dependences B = f ( λ ) and A = g ′ ( λ ) where g ′ ( λ ) ≡ g ( λ, f ( λ )) . In characterizing the distinct observational equivalence classes, therefore, it suffices for us to consider the models that are purely common-cause, and therefore we restrict our attention to these henceforth.
An explicit example serves to illustrate this equivalence. The causal model depicted in Fig. 4(a), with functional dependences A = λ ⊕ B and B = ν , involving a directed causal influence from B to A , is observationally equivalent to the causal model depicted in Fig. 4(b), with functional dependences A = λ ⊕ µ and B = µ , which is purely common-cause. To see this, note that one can express the functional dependences of the first causal model
Fig. 4: (a) DAG for causal model defined by A = λ ⊕ B and B = ν (b) DAG for causal model defined by A = λ ⊕ µ and B = µ .
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Diagrams: Directional Flow Models (a) and (b)
### Overview
The image contains two directional flow diagrams labeled (a) and (b). Both depict relationships between entities A and B using labeled arrows (λ, ν, μ). Diagram (a) shows a unidirectional flow with two independent downward arrows, while diagram (b) introduces a bidirectional flow with a feedback loop.
### Components/Axes
- **Diagram (a)**:
- **Labels**:
- Vertical arrow from A labeled **λ** (downward).
- Vertical arrow from B labeled **ν** (downward).
- Horizontal arrow from A to B (no label).
- **Flow Direction**:
- A → B (horizontal).
- A ↓ (λ) and B ↓ (ν) as independent downward flows.
- **Diagram (b)**:
- **Labels**:
- Vertical arrow from A labeled **λ** (downward).
- Diagonal arrow from A to B labeled **μ** (downward-right).
- Diagonal arrow from B to A labeled **μ** (upward-left).
- **Flow Direction**:
- A → B (μ).
- B → A (μ, feedback loop).
- A ↓ (λ) as an independent downward flow.
### Detailed Analysis
- **Diagram (a)**:
- Represents a **unidirectional process** where A influences B directly (horizontal arrow) while both A and B independently contribute to separate downstream processes (λ and ν).
- No feedback or cyclical interactions.
- **Diagram (b)**:
- Introduces **bidirectional interaction** between A and B via μ, creating a feedback loop (B → A).
- Maintains the independent downward flow λ from A but adds complexity through mutual influence.
### Key Observations
1. **Directional Symmetry**: Diagram (b) replaces the unidirectional A→B flow in (a) with a symmetric μ flow (A→B and B→A).
2. **Feedback Mechanism**: The B→A arrow in (b) suggests a **cyclical relationship**, contrasting with the linear flow in (a).
3. **Independent Downward Flow**: Both diagrams retain λ as a standalone downward process from A, implying a persistent external dependency.
### Interpretation
- **System Dynamics**:
- Diagram (a) models a **linear system** with no interdependencies between A and B beyond the initial A→B interaction.
- Diagram (b) represents a **feedback-driven system**, where A and B mutually influence each other (via μ), potentially leading to stabilization or oscillation depending on context.
- **Implications**:
- The introduction of μ in (b) could signify **emergent behavior** (e.g., mutual reinforcement or conflict) absent in (a).
- The retained λ in both diagrams suggests a **constant external factor** acting on A, independent of B.
No numerical data or quantitative trends are present; the diagrams focus on qualitative relationships and directional logic.
</details>
as A = g ( λ, ν, B ) = λ ⊕ B and B = f ( λ, ν ) = ν. Performing the substitution described in the previous paragraph yields A = g ( λ, ν, f ( λ, ν )) = g ′ ( λ, ν ) = λ ⊕ ν , which on identifying ν with µ , results in the second causal model.
## 4.2 Sufficiency of considering models with binary latents
We call a causal model where all the latent variables are binary a causal model with binary latents . If there are n binary latent variables, it is called an n -latent-bit causal model .
Theorem 4.2.1. Consider the family of causal models where the latent variables are discrete and finite, but not necessarily binary. Every such model is observationally equivalent to one with binary latents. Equivalently, there is a causal model with binary latents in each observational equivalence class.
The proof is provided in appendix C, but we now present a simple example which illustrates the main idea of the proof.
Consider the causal model of Fig. 5(a), where C, D are binary, but τ is a three-valued variable, i.e., a trit. Suppose the functional relationships are as follows: C = τ mod 2 and D = ( 2( τ ⊕ 3 1) ) mod 2 , where ⊕ k means addition modulo k . The values of C, D for different values of τ are given in the table below.
| τ | C | D |
|-----|-----|-----|
| 0 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 0 | 1 |
One can see that the distributions over C, D that can be generated by this model correspond to the face of the tetrahedron that contains the vertices { [00] , [11] , [01] } .
The trick to simulating this model using a 2 -latent-bit model is to replace the latent three-valued variable τ with a pair of binary variables γ and η and to imagine that these are causally related in the manner depicted in Fig. 5(b). That is, we imagine a latent bit ν acting locally on γ and a latent bit µ acting as a common cause of γ and η with the functional dependence γ = µν and η = µ . This causal model can generate any distribution over γ and η that has support only on the values ( γ, η ) ∈ { (0 , 1) , (0 , 0) , (1 , 1) } , as can be seen by consulting the row containing class
Fig. 5: Example of how to reduce a causal model with a latent trit to one involving only latent bits. (a) The original causal model, with functional dependences C = τ mod 2 , and D = ( 2( τ ⊕ 3 1) ) mod 2 . (b) the trit τ is replaced by two bits, γ and η , which are presumed to be determined by a causal model having the depicted causal structure with functional dependences γ = µν , and η = µ . (c) The causal model with binary latents that simulates the original model; the functional dependences are C = νµ ⊕ 2 ν , and D = ν .
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Diagram: Tripartite Systemic Relationships
### Overview
The image presents three distinct diagrams (a, b, c) illustrating abstract systemic relationships between labeled components (C, D, γ, μ, ν, τ, η). Arrows indicate directional flows or dependencies, with Greek letters representing transformation or interaction mechanisms.
### Components/Axes
**Diagram (a):**
- **Labels:** C (source), D (target), τ (transformation mechanism)
- **Flow:** Bidirectional arrows between C and D, mediated by τ
- **Spatial Positioning:** τ centered above C-D axis
**Diagram (b):**
- **Labels:** γ (source), μ (intermediate), ν (feedback), D (target), η (transformation)
- **Flow:**
- γ → μ → D (primary pathway)
- ν → γ (feedback loop)
- μ → ν (cross-path interaction)
- **Spatial Positioning:** γ at top-left, D at bottom-right, η centered
**Diagram (c):**
- **Labels:** C (source), D (target), μ (pathway), ν (pathway)
- **Flow:**
- μ: C → D (direct)
- ν: C → D (parallel)
- **Spatial Positioning:** μ and ν arrows diverge from C, converge at D
### Detailed Analysis
1. **Diagram (a):** Represents a symmetric bidirectional relationship between C and D, with τ acting as a mutual transformation catalyst. The equal arrow thickness suggests balanced interaction strength.
2. **Diagram (b):** Depicts a hierarchical system with γ initiating processes through μ to reach D, while ν creates a feedback loop to γ. The η label implies a secondary transformation mechanism between μ and ν.
3. **Diagram (c):** Shows parallel pathways (μ and ν) from C to D, suggesting redundancy or alternative routes for C→D conversion.
### Key Observations
- τ in (a) vs. μ/ν in (c) both mediate C→D relationships but with different structural implications (bidirectional vs. parallel)
- Diagram (b) introduces feedback (ν→γ) absent in other diagrams
- All diagrams use Greek letters for transformation mechanisms, suggesting standardized notation
### Interpretation
These diagrams likely represent:
1. **System Dynamics:** τ in (a) could model equilibrium states, while (b) shows non-equilibrium processes with feedback
2. **Pathway Redundancy:** (c) suggests biological/technical systems with multiple conversion routes
3. **Feedback Importance:** (b) emphasizes regulatory mechanisms through its γ→μ→D→ν→γ loop
4. **Component Roles:** C consistently acts as source, D as target, while Greek letters represent process-specific mediators
The absence of quantitative data prevents statistical analysis, but the structural patterns suggest these diagrams model either:
- Biological pathways (e.g., metabolic processes)
- Technical systems (e.g., data flow architectures)
- Socio-technical interactions (e.g., organizational communication flows)
</details>
(2 , 1 , b ) Id from the 3 -page table appearing later in this paper, where A and B play the role of γ and η .
If we take γ and η to be related to τ by τ = ( γ mod 3) ⊕ 3 ( η mod 3) , so that the values (0 , 0) , (0 , 1) and (1 , 1) of ( γ, η ) map respectively to the values 0 , 1 and 2 of τ , then any distribution over τ can be emulated by some distribution over the values (0 , 0) , (0 , 1) and (1 , 1) of ( γ, η ) and hence some distribution over µ and ν . Finally, we can express C and D explicitly in terms of µ and ν by eliminating γ and η , obtaining the causal model depicted in Fig. 5(c) with dependences C = νµ ⊕ 2 ν and D = ν . By construction, we must obtain precisely the same semi-algebraic set for C and D in the model of Fig. 5(c) as one does in the model of Fig. 5(a). We have therefore defined a 2 -latent-bit model that simulates our latent trit model.
The key ingredient of the above example was that we were able find a causal model which could-by appropriately varying over the distribution of its latent variablesgenerate any distribution over a given face of the tetrahedron, and hence any distribution on a trit. In the case of an m -valued latent variable however, one would need to find a k -latent-bit model which could generate any distribution on an m -simplex. We provide an inductive procedure for constructing such a latent-bit model in appendix C.
Theorem 4.2.1 implies that for the project of determining the observational equivalence classes, it suffices to consider models with binary latents. and so we restrict our attention to these henceforth.
## 4.3 Inductive scheme
Next, we define a scheme for composing pairs of n -latentbit causal models into a single ( n + 1) -latent bit causal model, such that if we start with all possible pairs of n -latent-bit causal models, and apply the composition oper- ation, we generate all possible ( n + 1) -latent-bit causal models.
Denote the n latent binary variables by λ ≡ ( λ 1 , . . . , λ n ) . A general n -latent-bit causal model is then defined by the functional dependences
<!-- formula-not-decoded -->
where λ α is shorthand for the monomial λ α 1 1 . . . λ α n n for some set of exponents α ≡ ( α 1 , . . . α n ) , and a α , b α ∈ Z 2 are parameters that specify the nature of the functional dependences.
We assume that the first causal model is defined by parameters { a (0) α } and { b (0) α } , and the second is defined by parameters { a (1) α } and { b (1) α } . The additional binary latent variable, which supplements the n binary variables of the original two models is denoted δ . The ( n + 1) -latent-bit model which is the composition of the two models is defined by the functional dependences
<!-- formula-not-decoded -->
This construction has been chosen such that δ acts as a switch variable: if we set δ = 0 in the resulting ( n + 1) -latent-bit model, we recover the first n -latent-bit model, while if we set δ = 1 , we recover the second n -latent-bit model.
With these definitions, our composition result can be summarized as follows.
Theorem 4.3.1. Consider the map that takes a pair of n -latent-bit causal models defined by the functional dependences of Eq. (4.3.1) with parameters { a (0) α } ∪ { b (0) α } for the first model, and parameters { a (1) α }∪{ b (1) α } for the second model, and returns the ( n +1) -latent-bit causal model defined by the functional dependences of Eq. (4.3.2). Under this map, the image of the set of all pairs of n -latentbit causal models is the set of all ( n +1) -latent-bit causal models.
Proof. The functional dependences of Eq. (4.3.2) can equivalently be expressed as polynomials in λ and δ as
<!-- formula-not-decoded -->
It now suffices to note that as one varies over all possible joint values for the variables in the set { a (0) α } ∪ { a (1) α } (there are 2 2 n +1 possibilities), one necessarily varies over all possible joint values for the variables in the set { a (0) α }∪ { a (0) α ⊕ a (1) α } , which in turn implies that one is varying over all possible polynomials in λ 1 , . . . , λ n and δ in the expresson for A . By a similar argument, as one varies over all possible joint values for the variables in the set { b (0) α }∪{ b (1) α } , one varies over all possible polynomials in λ 1 , . . . , λ n and δ in the expression for B . It follows that as one varies over all possible joint values for the variables in the set { a (0) α } ∪ { a (1) α } ∪ { b (0) α } ∪ { b (1) α } , one obtains all possible manners in which A and B might be functionally dependent on the latent variables in the ( n +1) -latent-bit causal model. Thus as one varies over all possible pairs of n -latent-bit causal models in our switch-variable construction, one varies over all possible ( n +1) -latent-bit causal models.
We can therefore generate all causal models with binary latents by this inductive rule starting from the 0 -latent-bit causal models.
## 4.4 Catalogue of observational equivalence classes
Recall that two causal models are observationally equivalent if they define the same semi-algebraic set. Thus, to characterize the observational equivalence classes, we proceed as follows. For each new causal model that we generate by the inductive scheme, we determine the corresponding semi-algebraic set. Every time one obtains a variety that has not appeared previously, one adds it to the catalogue of observational equivalence classes.
Note that if a causal model has been obtained from two simpler models via our composition scheme, then the semi-algebraic set associated to it necessarily includes as subsets both of the semi-algebraic sets of the simpler models (note that this semi-algebraic set is generally not the convex hull of the semi-algebraic sets of the two simpler models). It follows that if the semi-algebraic set of a given causal model is found to be the entire tetrahedron, then composing this model with any other will also yield the tetrahedron. In this case, there are no new observational equivalence classes to be found among the descendants of this causal model in the inductive scheme.
In particular, if it were to occur that at some level of the inductive scheme, every newly generated causal model could be shown either to reduce to a previously generated causal model or to yield a semi-algebraic set that is the entire tetrahedron, then one could conclude that one's catalogue of the observational equivalence classes of causal models was complete in the sense that any n -latent bit causal model belongs to one of these classes.
We have used our inductive scheme to construct all observational equivalence classes generated by causal models with four or fewer binary latent variables. We have also considered a large number of causal models with five binary latent variables and found no new observational equivalence classes. This suggests that our catalogue may already be complete, although we do not have a proof of this. Above, we noted circumstances in which our inductive scheme would terminate, which provides one strategy for attempting to settle the question. Even in the absence of a proof of completeness, the inductive scheme presented here for classifying observational equivalence classes may be of independent interest to researchers in the field.
The observational equivalence classes of causal models that we have obtained (which cover all causal models with four or fewer binary latent variables) are presented in the table covering the next three pages. For each class, we depict the semi-algebraic set that defines the class, the feasibility test for the class, and a representative causal model from the class. Note that the open-interval constraints 0 < p 00 , p 01 , p 10 , p 11 < 1 are part of every feasibility test unless explicitly stated otherwise. The corresponding constraint on the affine varieties is that those varieties confined to the edges exclude the vertices, those confined to the faces exclude the edges, and those in the bulk exclude the faces.
The task of describing the catalogue is simplified by the fact that many of the observational equivalence classes are related to one another by simple symmetries. We therefore organize the classes into orbits, where an orbit is a set of classes whose elements are related to one another by a set of symmetry transformations. For one of the classes in the orbit (which we term the 'fiducial' class), we provide a full description, and below this description, we specify the set of symmetry transformations that must be applied to it to obtain the other elements of the orbit. Formally, this is a set of representatives of the right cosets of the subgroup of symmetries of the semi-algebraic set in the full symmetry group of the tetrahedron.
We express these representatives as compositions of the following set of symmetry transformations, which we define below: { Id , f A , f B , S, X } . For each of the five, we specify both their action on the causal model, i.e., their action on the functional dependences, from which their action on the DAG can be inferred, and on the elements of the joint distribution { p ab : a, b ∈ Z 2 } , from which their action on the feasibility test can be inferred. Each symmetry transformation also defines an action on the tetrahedron in an obvious manner. Id is the identity transformation, leaving the model and p ab invariant; f A is the bit flip on A , replacing the functional dependence A = f ( λ ) with A = f ( λ ) ⊕ 1 and mapping p ab → p a ⊕ 1 ,b ; f B is the bit flip on B , defined analogously to f A ; S is the swap transformation, replacing the functional dependences A = f ( λ ) , B = g ( λ ) with A = g ( λ ) , B = f ( λ ) , and mapping p ab → p ba ; X is the 'add B to A ' transformation, replacing the functional dependences A = f ( λ ) , B = g ( λ ) with A = f ( λ ) ⊕ g ( λ ) , B = g ( λ ) and mapping p ab → p a ⊕ b,b . We denote a composition of two symmetry transformations by a right-to-left product: for instance, a bit flip on A followed by a swap is denoted Sf A . The conjunction of a bit flip on A and a bit flip on B yields the same transformation regardless of the order in which they are implemented and is denoted f AB .
Finally, a given observational equivalence class will be distinguished by a label of the form ( n, m, x ) g . Here, n is the number of binary latent variables in the causal model, m is the number of these that act as common causes, x is an optional label that is used for distinguishing functional dependences that are consistent with a given ( n, m ) but are observationally inequivalent, and g labels the symmetry transformation that relates the class to the fiducial class
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Table: Semi-Algebraic Sets, Feasibility Conditions, and Causal Models
### Overview
The image presents a structured table analyzing probabilistic and causal relationships across multiple classes labeled as (n,m)Id, where n and m are integers. Each class includes:
1. A 3D semi-algebraic set diagram (left column)
2. Feasibility test conditions (middle column)
3. Minimal causal models (right column)
4. Transformations between classes (horizontal dividers)
### Components/Axes
- **Classes**: Labeled as (n,m)Id (e.g., (0,0)Id, (1,0)Id, (2,2)Id)
- **Semi-Algebraic Sets**: 3D diagrams with nodes [00], [01], [10], [11] connected by edges
- **Feasibility Tests**: Probabilistic constraints (e.g., p₀₀=1, p₀₁=p₁₀=p₁₁=0)
- **Causal Models**: Directed graphs with variables A and B, sometimes involving Greek letters (μ, λ, ν)
- **Transformations**: Function mappings like G(n,m) = {Id, f_A, f_B, f_AB, ...}
### Detailed Analysis
#### Class (0,0)Id
- **Semi-Algebraic Set**: Tetrahedral diagram with nodes [00], [01], [10], [11]. Edges connect all nodes except [01]-[10].
- **Feasibility**: p₀₀=1; p₀₁=p₁₀=p₁₁=0
- **Causal Model**: A=0, B=0 (no causal influence)
#### Class (1,0)Id
- **Semi-Algebraic Set**: Tetrahedral diagram with highlighted edge [00]-[10]. Nodes [00] and [10] emphasized.
- **Feasibility**: p₀₀=p₀₁=0
- **Causal Model**: A=1, B=λ (λ represents a parameter)
#### Class (1,1)Id
- **Semi-Algebraic Set**: Tetrahedral diagram with highlighted edge [00]-[10]. Nodes [00] and [10] emphasized.
- **Feasibility**: p₀₀=p₀₁=0
- **Causal Model**: A=μ, B=μ (μ represents a shared parameter)
#### Class (2,0)Id
- **Semi-Algebraic Set**: Colored tetrahedral diagram with gradient shading (blue to green to purple). Nodes [00], [01], [10], [11] connected.
- **Feasibility**: p₀₀p₁₁ = p₀₁p₁₀
- **Causal Model**: A=λ, B=ν (λ and ν represent independent parameters)
#### Class (2,1)aId
- **Semi-Algebraic Set**: Tetrahedral diagram with gradient shading (blue to orange). Nodes [00], [01], [10], [11] connected.
- **Feasibility**: p₀₀p₀₁ = p₁₁p₁₀
- **Causal Model**: A=μ⊕ν, B=μ (μ and ν represent combined parameters)
#### Class (2,1)bId
- **Semi-Algebraic Set**: Tetrahedral diagram with gradient shading (red to white). Nodes [00], [01], [10], [11] connected.
- **Feasibility**: p₁₀=0
- **Causal Model**: A=νμ, B=μ (ν and μ represent dependent parameters)
#### Class (2,2)Id
- **Semi-Algebraic Set**: Tetrahedral diagram with gradient shading (blue to purple). Nodes [00], [01], [10], [11] connected.
- **Feasibility**: (p₀₁ + 2p₁₁ - 2)² ≥ 4p₀₀; p₁₀=0
- **Causal Model**: A=μ⊕ν, B=μ⊕ν⊕μν (complex parameter interactions)
### Transformations
- **G(0,0)**: {Id, f_A, f_B, f_AB}
- **G(1,1)**: {Id, f_A}
- **G(2,0)**: {Id}
- **G(2,1,a)**: {Id, S}
- **G(2,1,b)**: {Id, f_A, f_B, S}
- **G(2,2)**: {Id, S, f_A, f_B, f_AB, f_A SX, f_B SX, f_A S, f_B S, f_XS, f_AB S, f_X, f_AB XS}
### Key Observations
1. **Probabilistic Constraints**: Feasibility conditions enforce specific relationships between joint probabilities (e.g., p₀₀p₁₁ = p₀₁p₁₀ in (2,0)Id).
2. **Causal Hierarchy**: Higher-index classes (e.g., (2,2)Id) exhibit increasingly complex causal dependencies involving parameter combinations (⊕, ⊗).
3. **Transformation Complexity**: Later transformations (e.g., G(2,2)) include more functions, suggesting expanded operational capabilities.
4. **Color Coding**: Gradient shading in semi-algebraic sets correlates with parameter ranges (e.g., darker regions in (2,1)bId indicate higher p₁₁ values).
### Interpretation
This table appears to model a probabilistic system where:
- **Semi-algebraic sets** represent feasible state spaces under constraints
- **Feasibility tests** define valid probability distributions
- **Causal models** formalize dependencies between variables A and B
- **Transformations** encode operations that modify the system's state
The progression from simple (0,0)Id to complex (2,2)Id classes suggests a framework for analyzing increasingly sophisticated probabilistic-causal systems. The use of Greek letters (μ, λ, ν) implies parameterized relationships, while the transformation functions (f_A, f_AB, etc.) likely represent interventions or measurements affecting the system's dynamics.
The constraints (e.g., p₀₀p₁₁ = p₀₁p₁₀) resemble conditional independence conditions, while the causal models' structure hints at a Bayesian network interpretation. The final class (2,2)Id's complex feasibility condition and causal model suggest this framework can model systems with feedback loops or higher-order interactions.
</details>
| Class | Semi-algebraic set | Test for feasibility 0 < p 00 , p 01 , p 10 , p 11 < 1 unless stated otherwise | Minimal causal model |
|--------------------------------------------------------|--------------------------------------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------|
| (0 , 0) Id | p 00 = 1 p 01 = p 10 = p 11 = 0 | | A = 0 , B = 0 |
| Transformations: G (0 , 0) = { Id , f A , f B , f AB } | Transformations: G (0 , 0) = { Id , f A , f B , f AB } | Transformations: G (0 , 0) = { Id , f A , f B , f AB } | Transformations: G (0 , 0) = { Id , f A , f B , f AB } |
| (1 , 0) Id p 00 = p 01 = | 0 | A = 1 , B = λ | G (1 , 0) = { Id , f A , S, f B S } |
| p 10 = p 01 | p 10 = p 01 | p 10 = p 01 | p 10 = p 01 |
| (1 , 1) Id = 0 | | A = µ, B = µ G (1 , 1) = { Id , f A } | (2 , 0) Id |
| 00 11 G (2 | p p = p 01 p 10 , | A = λ, B = ν = { Id } | 0) (2 , 1 ,a ) Id |
| p 00 p 01 = p | p 00 p 01 = p | p 00 p 01 = p | p 00 p 01 = p |
| 11 p 10 G (2 , 1 | = µ ⊕ ν, B = µ ,a | A ) = { Id , S } (2 , 1 , b ) Id | p 10 |
| = 0 | = 0 | = 0 | = 0 |
| A = νµ, B = µ G (2 , 1 , b ) | = (2 , 2) Id | { Id , f A , f B , S } ( p 01 +2 | p 11 - 2) 2 ≥ 4 p 00 , p 10 = 0 |
| A = µν, B = µ ⊕ ν ⊕ µν AB S , fXS , f B XS } | A = µν, B = µ ⊕ ν ⊕ µν AB S , fXS , f B XS } | A = µν, B = µ ⊕ ν ⊕ µν AB S , fXS , f B XS } | A = µν, B = µ ⊕ ν ⊕ µν AB S , fXS , f B XS } |
Class
(3 , 1 , a ) Id
(3 , 1 , b ) Id
(3 , 1 , c ) Id
(3 , 2 , a ) Id
(3 , 2 , b ) Id
(3 , 2 , c ) Id
[00]
[11]
Semi-algebraic set
[01]
<details>
<summary>Image 7 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Labeled Geometric Structure
### Overview
The image depicts a 3D pyramid with a grid pattern, divided into two distinct sections: a **blue-shaded left face** and a **white-shaded right face** with a gradient transition from dark blue to light blue. The pyramid has three visible triangular faces, with labels [A], [10], and [11] positioned at specific vertices. The grid lines are black, and the diagram uses a wireframe-style representation.
### Components/Axes
- **Labels**:
- **[A]**: Positioned at the apex (top vertex) of the pyramid.
- **[10]**: Located at the bottom-right vertex of the base.
- **[11]**: Positioned at the bottom-left vertex of the base.
- **Color Coding**:
- **Blue**: Left triangular face (shaded uniformly).
- **White**: Right triangular face (with a gradient from dark blue to light blue).
- **Grid Lines**: Black lines forming a grid pattern across the pyramid’s surfaces, suggesting a coordinate or measurement system.
### Detailed Analysis
- **Label Placement**:
- **[A]** is centered at the top vertex, directly above the base.
- **[10]** and **[11]** are placed at the base vertices, with **[10]** on the right and **[11]** on the left.
- **Gradient Transition**: The right face (white) shows a smooth gradient from dark blue (near the apex) to light blue (near the base), indicating a potential variable (e.g., temperature, pressure, or material property) changing along the face.
- **Grid Pattern**: The grid lines are evenly spaced, suggesting a structured coordinate system (e.g., x, y, z axes) for spatial analysis.
### Key Observations
1. The pyramid is divided into two distinct regions (blue and white), possibly representing different materials, states, or zones.
2. The gradient on the right face implies a continuous variation in a property, though no numerical scale is provided.
3. Labels [A], [10], and [11] are positioned at critical vertices, likely denoting specific points of interest (e.g., nodes, reference points).
### Interpretation
The diagram appears to represent a **3D geometric model** with labeled vertices and a gradient-based property distribution. The blue and white sections may symbolize contrasting conditions (e.g., solid vs. fluid, high vs. low values). The grid lines suggest the pyramid is part of a larger coordinate system, potentially for engineering, architectural, or scientific analysis. The absence of numerical data or explicit units leaves the gradient’s exact meaning ambiguous, but its visual progression implies a measurable variable. The labels [A], [10], and [11] could correspond to nodes in a network, reference points in a simulation, or identifiers for further analysis.
</details>
<details>
<summary>Image 8 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Geometric Structure with Color-Coded Edges
### Overview
The image depicts a 3D pyramid with four triangular faces, labeled vertices, and color-coded edges. The pyramid is rendered with a grid overlay, and a color gradient transitions from blue to red across the structure. No explicit numerical data or legends are present, but the labels and color coding suggest a focus on geometric relationships or parameterized properties.
### Components/Axes
- **Vertices**:
- **[00]**: Bottom-left corner (blue-green edge).
- **[01]**: Top vertex (blue edge).
- **[10]**: Bottom-right corner (red edge).
- **[11]**: Bottom-back corner (purple edge).
- **Edges**:
- **[00]–[01]**: Blue (left face).
- **[01]–[10]**: Green (right face).
- **[10]–[11]**: Red (back face).
- **[11]–[00]**: Purple (base face).
- **Grid**: Black grid lines subdivide the faces, suggesting a coordinate system or measurement framework.
- **Color Gradient**: Smooth transition from blue (cool) to red (warm) across the pyramid, possibly indicating a scalar field (e.g., stress, temperature).
### Detailed Analysis
- **Vertex Labels**:
- **[00]**, **[01]**, **[10]**, and **[11]** are positioned at the pyramid’s corners. The labels use binary-like notation, potentially representing coordinates or states.
- **Edge Colors**:
- Blue ([00]–[01]) and green ([01]–[10]) dominate the front faces, while red ([10]–[11]) and purple ([11]–[00]) define the back and base. No legend explains the color coding, but the gradient implies a continuous variable.
- **Grid Overlay**:
- The grid is denser near the apex ([01]) and sparser at the base, possibly reflecting perspective distortion or a focus on upper regions.
### Key Observations
1. **Symmetry**: The pyramid is geometrically regular, with equal angles at the apex.
2. **Color Gradient**: The blue-to-red transition suggests a parameterized property (e.g., stress magnitude), but no explicit scale or units are provided.
3. **Missing Legends**: No explicit explanation of edge colors or gradient meaning is present.
4. **Perspective**: The pyramid is viewed from a slightly elevated angle, emphasizing the front and right faces.
### Interpretation
This diagram likely represents a conceptual model for a geometric or physical system. The color gradient and edge labels may encode relationships between vertices (e.g., forces, connectivity, or state transitions). However, the absence of numerical data or legends limits quantitative analysis. The structure could relate to:
- **Mathematical frameworks**: Binary states ([00], [01], etc.) might represent binary variables or coordinates in a 2D/3D space.
- **Engineering applications**: The pyramid could model stress distribution, with colors indicating magnitude (blue = low, red = high).
- **Network topology**: Vertices as nodes and edges as connections, with colors denoting link types or capacities.
**Critical Uncertainties**:
- The exact meaning of the color gradient and edge labels remains ambiguous without additional context.
- No numerical values or scales are provided, making precise interpretation speculative.
**Note**: The image contains no textual data beyond vertex labels and color coding. All interpretations are based on visual structure and common conventions in technical diagrams.
</details>
<details>
<summary>Image 9 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Binary State Representation
### Overview
The image depicts a 3D pyramid-like structure with three triangular faces converging at a central apex. The pyramid is divided into a grid pattern with color gradients transitioning from blue (left) to yellow (right). Labels are positioned at the vertices of the pyramid, indicating binary or ternary states.
### Components/Axes
- **Vertices**:
- Apex: `[01]` (top center)
- Base vertices: `[00]` (bottom left), `[10]` (bottom right)
- **Grid Lines**: Black grid lines divide each triangular face into smaller sections, suggesting subdivisions or transitions.
- **Color Gradient**: Blue-to-yellow gradient across the pyramid, possibly indicating a variable (e.g., intensity, probability, or state transition).
- **No legend, axis titles, or numerical scales are present.**
### Detailed Analysis
- **Labels**:
- `[01]` (apex): Likely represents a binary state where the second bit is active.
- `[00]` (bottom left): Binary state with both bits inactive.
- `[10]` (bottom right): Binary state where the first bit is active.
- **Grid Pattern**: The grid lines may represent intermediate states or transitions between the labeled vertices. However, no explicit data points or numerical values are provided.
- **Color Gradient**: The blue-to-yellow gradient could imply a scalar value (e.g., probability, energy, or confidence) increasing from left to right, but this is speculative without a legend.
### Key Observations
1. **Symmetry**: The pyramid is symmetrical, with the apex at the top and two base vertices.
2. **Binary/Ternary States**: The labels `[00]`, `[01]`, and `[10]` suggest a binary or ternary system, possibly representing states in a computational or logical model.
3. **Lack of Explicit Data**: No numerical values, legends, or axis labels are present, limiting quantitative analysis.
### Interpretation
The diagram likely represents a **state transition model** or **binary/ternary system** with three primary states (`[00]`, `[01]`, `[10]`). The apex (`[01]`) may act as a central or transitional state, while the base vertices represent distinct states. The grid lines and color gradient could symbolize intermediate transitions or probabilistic relationships between states. However, the absence of a legend or numerical data prevents definitive interpretation of the color gradient or grid subdivisions.
**Note**: The image does not contain explicit facts, numerical data, or a legend. The analysis is based on structural and symbolic elements.
</details>
[11]
<details>
<summary>Image 10 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Binary Coordinate Hierarchy
### Overview
The image depicts a 3D pyramid (tetrahedron) with four vertices labeled using two-bit binary coordinates: [00], [01], [10], and [11]. The structure is divided into two color-coded regions: a red-shaded left half and a green-shaded right half, with a grid pattern overlaying the right side. The apex is labeled [01], while the base vertices are [00], [10], and [11].
### Components/Axes
- **Vertices**:
- [00] (bottom-left base)
- [01] (apex)
- [10] (bottom-right base)
- [11] (top-right base)
- **Color Coding**:
- Red: Left half of the pyramid (vertices [00], [01], [10])
- Green: Right half of the pyramid (vertices [01], [10], [11])
- **Grid Pattern**: Overlays the green-shaded right half, suggesting a coordinate or data matrix.
- **Edges**: Connect all vertices, forming a tetrahedral structure.
### Detailed Analysis
- **Binary Labels**:
- [00]: Binary "00" (0 in decimal)
- [01]: Binary "01" (1 in decimal)
- [10]: Binary "10" (2 in decimal)
- [11]: Binary "11" (3 in decimal)
- **Color Distribution**:
- Red dominates the left half, encompassing three vertices ([00], [01], [10]).
- Green dominates the right half, encompassing three vertices ([01], [10], [11]).
- **Grid Pattern**: The green-shaded region features a grid, implying a structured data layout (e.g., a 2x2 matrix or hierarchical subdivision).
### Key Observations
1. **Hierarchical Structure**: The pyramid’s apex ([01]) connects to all base vertices, suggesting a central node or root in a hierarchical system.
2. **Binary Coordinate System**: Labels indicate a 2-bit binary framework, potentially representing states, categories, or decision paths.
3. **Color Division**: The red-green split may denote opposing groups, data partitions, or operational states (e.g., active/inactive).
4. **Grid in Green Region**: The grid’s presence implies granular data organization within the right half, possibly a matrix or network.
### Interpretation
- **Technical Context**: The diagram likely represents a binary decision tree, hierarchical clustering, or a 2D coordinate system mapped to a 3D structure. The apex ([01]) could symbolize a root node or primary state, while the base vertices ([00], [10], [11]) represent subordinate or terminal states.
- **Color Significance**: The red-green division might indicate a binary classification (e.g., true/false, active/inactive) or a spatial partition in a dataset.
- **Grid Function**: The grid in the green region suggests a structured relationship between the binary labels, such as a lookup table or a network of dependencies.
- **Anomalies**: The apex ([01]) is shared between red and green regions, creating ambiguity in its classification. This could imply a transitional or neutral state.
This diagram emphasizes a binary framework with hierarchical and spatial relationships, potentially useful for modeling decision trees, data partitioning, or coordinate-based systems.
</details>
<details>
<summary>Image 11 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Hierarchical Structure Representation
### Overview
The image depicts a 3D pyramid diagram with labeled vertices, grid lines, and color-coded sections. The pyramid is divided into three distinct triangular faces, each shaded in red, blue, and green. The diagram includes a legend in the top-left corner and labeled vertices at the base and apex.
### Components/Axes
- **Vertices**:
- Base vertices: `[00]` (bottom-left), `[10]` (bottom-right), `[11]` (bottom-center).
- Apex vertex: `[01]` (top).
- **Grid Lines**:
- Red grid lines: Connect `[00]` to `[01]` (left face).
- Blue grid lines: Connect `[10]` to `[01]` (right face).
- Green grid lines: Connect `[00]` to `[10]` (base face).
- **Legend**:
- Located in the **top-left corner**, associating colors with axes:
- Red: `[00]` → `[01]` (left axis).
- Blue: `[10]` → `[01]` (right axis).
- Green: `[00]` → `[10]` (base axis).
### Detailed Analysis
- **Color-Coded Sections**:
- The pyramid is divided into three triangular regions:
1. **Red section**: Left face (`[00]` → `[01]`).
2. **Blue section**: Right face (`[10]` → `[01]`).
3. **Green section**: Base face (`[00]` → `[10]`).
- Grid lines within each section form a mesh pattern, suggesting a coordinate system or hierarchical relationships.
- **Vertex Labels**:
- Labels are positioned at the corners of the pyramid, with `[00]` and `[10]` at the base and `[01]` at the apex.
- The label `[11]` is placed at the bottom-center, possibly indicating a midpoint or additional node.
### Key Observations
1. **Symmetry**: The pyramid is geometrically symmetric, with equal angles at the base vertices.
2. **Color Coding**: The red, blue, and green sections likely represent distinct dimensions or categories (e.g., X, Y, Z axes).
3. **Grid Lines**: The mesh patterns imply a structured relationship between vertices, possibly for data mapping or visualization.
4. **Legend Placement**: The legend is positioned outside the pyramid, ensuring clarity without obstructing the diagram.
### Interpretation
This diagram likely represents a **3D coordinate system** or **hierarchical model** where:
- The vertices `[00]`, `[10]`, and `[01]` define the primary axes (e.g., X, Y, Z).
- The color-coded sections differentiate the axes, aiding in spatial orientation.
- The grid lines suggest a framework for mapping data points or relationships within the pyramid.
- The label `[11]` at the base-center may indicate a secondary node or intersection point, though its role is unclear without additional context.
The absence of numerical data or explicit annotations limits quantitative analysis. However, the structure emphasizes **spatial relationships** and **categorical differentiation** through color and labeling.
</details>
>
>
Test for feasibility p 00 p 11 > p 01 p 10
## G (3 , 1 , a ) = { Id , f A }
( p 01 -p 11 )( p 00 p 11 -p 01 p 10 ) < 0 ( p 01 -p 11 )( p 10 p 11 -p 01 p 00 ) < 0
## G (3 , 1 , b ) = { Id , f AB }
p
p
p
-
>
p
p
+
p
p
p
p
-
+
p
p
-
p
-
p
p
-
p
p
p
+
p
-
## G (3 , 1 , c ) = { Id }
p 10 p 11 > p 01 p 00
## G (3 , 2 , a ) = { Id, Sf A , f A , f A XS }
( p 01 -p 10 )( p 11 p 10 -p 01 p 00 ) < 0 ( p 01 -p 10 )( p 00 p 10 -p 01 p 11 ) < 0
## G (3 , 2 , b ) = { Id, f AB XS, f B XS, f A }
<details>
<summary>Image 12 Details</summary>

### Visual Description
## 3D Pyramid Diagram: Hierarchical Categorization Model
### Overview
The image depicts a 3D pyramid structure with four triangular faces, labeled vertices, and color-coded regions. The pyramid is divided into three distinct color zones (blue, purple, yellow) with varying shading intensities, suggesting a hierarchical or categorical relationship between the regions. The base vertex is labeled [00], while the apex is [01], and the base edges are labeled [10] and [11].
### Components/Axes
- **Vertices**:
- Apex: [01] (top)
- Base vertices: [00] (bottom center), [10] (bottom left), [11] (bottom right)
- **Faces**:
- Four triangular faces connecting the vertices.
- **Color Legend** (right side):
- **Blue**: Largest region, dominant shading.
- **Purple**: Medium region, moderate shading.
- **Yellow**: Smallest region, minimal shading.
- **Grid Lines**: Black grid overlaying the pyramid for spatial reference.
### Detailed Analysis
- **Color Distribution**:
- **Blue** occupies the majority of the pyramid, concentrated near the apex [01] and extending toward the base edge [10].
- **Purple** is confined to the lower-left triangular face, adjacent to [10] and [00].
- **Yellow** is limited to the lower-right triangular face, near [11] and [00].
- **Shading Intensity**:
- Blue exhibits the darkest shading, transitioning to lighter tones toward the edges.
- Purple and yellow regions have progressively lighter shading, with yellow being the lightest.
- **Spatial Relationships**:
- The blue region dominates the upper and central portions of the pyramid.
- Purple and yellow regions are isolated to the base edges, with no overlap.
### Key Observations
1. **Hierarchical Dominance**: The blue region’s size and shading suggest it represents the primary category or highest priority.
2. **Categorical Segmentation**: Purple and yellow regions are distinct and non-overlapping, indicating mutually exclusive categories.
3. **Shading Ambiguity**: The gradient shading lacks a numerical scale, making quantitative interpretation speculative. It may represent depth, intensity, or another qualitative variable.
4. **Vertex Labeling**: The labels [00], [01], [10], [11] imply a binary or binary-derived coordinate system (e.g., binary states or binary decisions).
### Interpretation
The diagram likely represents a **hierarchical categorization model** or **decision tree**, where:
- The **blue region** signifies the dominant or most probable outcome (e.g., a primary class in classification).
- The **purple and yellow regions** represent secondary or conditional outcomes, possibly dependent on specific inputs (e.g., [10] and [11] as binary inputs).
- The **shading intensity** could metaphorically indicate confidence, priority, or resource allocation, though this is not explicitly defined.
- The **binary vertex labels** ([00], [01], etc.) suggest a connection to binary logic, computational models, or Boolean systems.
### Limitations
- No numerical data or explicit legend for shading intensity is provided, limiting quantitative analysis.
- The diagram’s purpose (e.g., decision-making, classification, resource distribution) is inferred rather than explicitly stated.
This structure emphasizes categorical relationships over quantitative metrics, prioritizing visual hierarchy and spatial segmentation.
</details>
A
Minimal causal model
V
μ
入
A
B
A = µν, B = µλ
μ
入
W
人
A
B
A = µ ⊕ ν, B = µλ
V
μ
入
W
A
B
A = µ ⊕ ν, B = µ ⊕ λ
入
μ
A
B
## A = µν ⊕ µλ ⊕ 1 , B = µν ⊕ 1
入
μ
A
B
V
A = µ ⊕ ν ⊕ µλ ⊕ 1 , B = µ ⊕ ν ⊕ 1
μ
A
=
µν,
## G (3 , 2 , c ) = { Id , S, f A , f AB SX, f B XS, XS, f A XS, f A X, f AB X, f A SXS, X }
B
⊕
ν
| 4( p 10 -p 11 )( p 00 p 10 -p 01 p 11 ) | ≤ ( p 11 (2 p 01 +2 p 10 + p 00 ) -p 10 (2 p 00 +2 p 11 + p 01 ) ) 2
,
,
.
V
B
=
µ
⊕
δ
<details>
<summary>Image 13 Details</summary>

### Visual Description
## Table: Classification of Semi-Algebraic Sets, Feasibility Tests, and Minimal Causal Models
### Overview
The image presents a structured table with multiple rows, each corresponding to a specific class (e.g., (3,2,d)Id, (3,2,e)Id, etc.). Each row contains three components:
1. A **3D semi-algebraic set** (visualized as a triangular prism or pyramid with labeled axes).
2. A **test for feasibility** (mathematical inequalities involving variables like $ p_{00}, p_{01}, p_{10}, p_{11} $).
3. A **minimal causal model** (a directed graph with nodes $ A $ and $ B $, edges labeled with Greek letters like $ \mu, \delta, \nu $).
The table is divided into sections with headers like $ G(3,2,d) = \{ \text{Id}, f_B, f_A, f_{AB}, f_{AB}SX, f_BXS, f_{AX}S, f_{AB}SX, f_{AX}SX, f_{B}SX, f_{A}SX, f_{AB}SX \} $, indicating the semi-algebraic sets for each class.
---
### Components/Axes
#### 1. **Semi-Algebraic Set (3D Plot)**
- **Axes Labels**:
- X-axis: [00], [01], [10], [11] (likely representing binary or categorical indices).
- Y-axis: [00], [01], [10], [11] (same as X-axis).
- Z-axis: Not explicitly labeled but implied by the 3D structure.
- **Visual Features**:
- Triangular prisms/pyramids with color gradients (e.g., blue, red, yellow).
- Grid lines and shading to indicate regions.
#### 2. **Test for Feasibility**
- **Equations**:
- Example for (3,2,d)Id:
$$
(p_{01} + 2p_{10} + 2p_{11} - 2)^2 \geq 4p_{00}
$$
- Example for (3,2,e)Id:
$$
4(p_{10} - p_{11})(p_{00}p_{10} - p_{01}p_{11}) \leq (p_{11}(2p_{01} + p_{10}) - p_{10}(2p_{00} + p_{11}))^2
$$
- Variables: $ p_{00}, p_{01}, p_{10}, p_{11} $ (probabilities or parameters).
#### 3. **Minimal Causal Model**
- **Graph Structure**:
- Nodes: $ A $ and $ B $.
- Edges:
- $ A \rightarrow B $: Labeled $ \mu, \delta, \nu $.
- $ B \rightarrow A $: Labeled $ \mu, \nu, \delta $.
- Additional constraints:
- $ A = \mu \oplus \nu $, $ B = \mu \nu \oplus \delta $ (for (3,2,d)Id).
---
### Detailed Analysis
#### 1. **Semi-Algebraic Sets**
- **Class (3,2,d)Id**:
- 3D plot shows a triangular prism with axes [00], [01], [10], [11].
- Color gradients suggest varying values across the set.
- **Class (3,2,e)Id**:
- Similar 3D structure but with different shading (e.g., blue, orange, purple).
- **Class (3,2,f)Id**:
- 3D plot with a more complex shape (e.g., a pyramid with a flat base).
#### 2. **Feasibility Tests**
- **Key Inequalities**:
- For (3,2,d)Id:
$$
(p_{01} + 2p_{10} + 2p_{11} - 2)^2 \geq 4p_{00}
$$
- This implies a quadratic constraint on the parameters $ p_{00}, p_{01}, p_{10}, p_{11} $.
- For (3,2,e)Id:
$$
4(p_{10} - p_{11})(p_{00}p_{10} - p_{01}p_{11}) \leq (p_{11}(2p_{01} + p_{10}) - p_{10}(2p_{00} + p_{11}))^2
$$
- A complex inequality involving products and differences of parameters.
#### 3. **Minimal Causal Models**
- **Graphs**:
- All models have two nodes ($ A $, $ B $) with directed edges.
- Edge labels (e.g., $ \mu, \delta, \nu $) likely represent causal relationships or transformations.
- Example for (3,2,d)Id:
$$
A = \mu \oplus \nu, \quad B = \mu \nu \oplus \delta
$$
- Suggests $ A $ is a combination of $ \mu $ and $ \nu $, while $ B $ depends on their product and $ \delta $.
---
### Key Observations
1. **Repetition of Elements**:
- The semi-algebraic sets and causal models repeat across rows, suggesting a systematic classification.
- Example: $ f_{AB}SX $ appears in multiple classes.
2. **Feasibility Constraints**:
- Equations involve quadratic terms and products of parameters, indicating non-linear constraints.
- For (3,2,f)Id:
$$
|4(p_{10} - p_{11})(p_{00}p_{10} - p_{01}p_{11})| \leq (p_{11}(2p_{01} + p_{10}) - p_{10}(2p_{00} + p_{11}))^2
$$
- This inequality likely ensures the semi-algebraic set is bounded or valid.
3. **Causal Model Patterns**:
- The causal models consistently use $ A $ and $ B $ with edge labels $ \mu, \delta, \nu $.
- For (3,2,g)Id:
$$
A = \mu \nu \oplus 1, \quad B = \mu \nu \delta \oplus \nu
$$
- Introduces a constant term (1) in $ A $, suggesting a baseline or fixed component.
---
### Interpretation
- **Purpose**: The table appears to formalize a mathematical framework for classifying semi-algebraic sets, verifying their feasibility via inequalities, and representing their causal dependencies via minimal models.
- **Relationships**:
- The semi-algebraic sets (3D plots) are constrained by the feasibility tests (equations).
- The causal models (graphs) abstract the dependencies between variables (e.g., $ A $ and $ B $) using symbolic operations ($ \oplus, \otimes $).
- **Notable Trends**:
- Increasing complexity in equations and diagrams as the class parameters change (e.g., (3,2,d)Id vs. (4,2,a)Id).
- The use of $ \mu, \delta, \nu $ in causal models suggests a standardized notation for causal relationships.
---
### Final Notes
- **Language**: All text is in English.
- **Missing Elements**: No explicit legend for color coding in 3D plots, but color gradients likely represent parameter magnitudes.
- **Uncertainty**: Some equations and labels are partially obscured, but the visible text is transcribed as-is.
</details>
for the particular n, m and x . The set of possibilities for g for a given n, m, and x is a subgroup of the group closure of { Id, f A , f B , S, X } , which we denote by G ( n, m, x ) . Note that n, m ∈ N , m ≤ n , and we take x ∈ { a, b, c, . . . } .
The first few steps of our iterative procedure for the construction of causal models proceed as follows.
The semi-algebraic sets associated to the four 0 -latent-bit causal models are the four vertices of the tetrahedron, labelled by the deterministic assignments to A and B , that is, as [00] , [10] , [01] and [11] . These correspond to the classes { (0 , 0) g : g ∈ { Id , f A , f B , f AB }} , depicted in the first row of the table (because there is only one observational equivalence class with n = 0 and m = 0 , the label x is not necessary in this case and so it is not excluded from the name of the class).
One finds that by composing these with one another into 1 -latent-bit causal models, one arrives at six new observational equivalence classes. Four of these correspond to models with a single latent bit that acts locally, and their semi-algebraic sets are the four edges of the tetrahedron with endpoints { [00] , [01] } , { [10] , [11] } , { [00] , [10] } , { [01] , [11] } , which we might call the AB -uncorrelated edges ; these correspond to the classes { (1 , 0) g : g ∈ { Id , f A , S, f B S }} , depicted in the second row of the table. Two of these correspond to models with a single latent bit acting as a common cause and their semi-algebraic sets are the [00] -[11] and [01] -[10] edges of the tetrahedron, which we might call the AB -correlated edges , corresponding to the classes { (1 , 1) g : g ∈ { Id , f A }} , depicted in the third row of the table.
Next, one constructs all of the 2 -latent-bit causal models and finds their semi-algebraic sets. This set includes the model of Fig. 2(c), where both latent bits act locally, and whose semi-algebraic set is the fan of Fig. 2(d), which touches each of the AB -uncorrelated edges of the tetrahedron, corresponding to the single class (2 , 0) Id , depicted in the fourth row of the table. This set also includes the models of Fig. 1(a) and Fig. 2(a) where one of the latent bits acts as a common cause and whose semi-algebraic sets are the fans of Fig. 1(b) and Fig. 2(b), which touch the AB -correlated edges of the tetrahedron. They correspond to the pair of classes { (2 , 1 , a ) g : g ∈ { Id , S }} , depicted in the fifth row of the table. There is also a second type of 2 -latent-bit causal model where one latent bit acts as a common cause which yield the semi-algebraic sets corresponding to the four faces of the tetrahedron. These are the four classes { (2 , 1 , b ) g : g ∈ { Id , f A , f B , S }} in the table. When both of the latent variables act as common causes, one obtains semi-algebraic sets that are subsets of a face of the tetrahedron and which have the appearance of the StarFleet insignia from Star Trek, of which there are twelve in total. These are the classes labelled (2 , 2) g in the table. The construction of 3 -latent-bit and 4 -latentbit causal models proceeds similarly and the new observational equivalence classes one thereby obtains are depicted in the rest of the table.
## 5 Identification of parameters in the causal model
Our results also have applications for the identification problem, that is, the problem of determining which parameters in a causal model can be identified or bounded using observational data.
Consider the problem of identifying the probability distributions over the latent variables (our q j parameters) in a causal model associated to a given functional causal structure. From the description of our algorithm, it is clear that the q j parameters are generally identifiable because the Groebner basis provides a means of computing expressions for them in terms of our p i parameters (the observational data). Indeed, we often must compute the explicit expressions for one or more of the q j s in terms of the p i s as an intermediate step on the way to deriving our feasibility tests. Eq. (3.0.3) is an example of such an identification formula.
The other sort of parameter of a causal model that one may wish to identify is the nature of the functional dependences (assuming the model is indeed functional). For the sorts of models we consider, this problem is also solved by our results.
Consider the problem where the causal structure is given, but where there is uncertainty over the nature of the functional dependences thereon. For instance, suppose that it is known that the functional causal structure is either the minimal structure associated to the class (2 , 1 , a ) Id in our table or the one associated to (2 , 1 , b ) Id . Because the semialgebraic sets defining these two classes do not intersect 7 , it is clear that one can settle this decision problem on the basis of the observational data.
As another, more nuanced example, suppose that it is known that the functional causal structure is the minimal structure associated to one of the three classes (3 , 1 , a ) Id , (3 , 1 , b ) Id , and (3 , 1 , c ) Id in our table. Here, one finds that certain points in the space of distributions over the observed variables pass the feasibility test for just one of these functional causal structures, other points pass the test for two of them, while still others pass the test for all three.
More generally, one might know only the causal structure. For instance, the set of possible functional causal structures might be the minimal ones in each of the classes in the set { (2 , 1 , a ) g : g ∈ { Id , S }} ∪ { (2 , 1 , b) g : g ∈ { Id , f A , f B , S }} . The feasibility tests we have derived provide a means of determining, for any given point in the space of distributions over the observed variables, precisely which of these functional causal structures is compatible with that observational data.
7 Recall our convention of demanding the probabilities for latent variable to be bounded away from 0 and 1, so that all of our semi-algebraic sets are confined to the interior of the tetrahedron.
## 6 Discussion
## 6.1 Future directions
The restriction to pairs of binary observed variables is a limitation of our analysis. In future work, we hope to extend our approach to cases where the observed variables have an arbitrary number of values and where the number of observed variables is also arbitrary. While the tools from algebraic geometry employed in this paper provide a procedure for deriving feasibility tests for such functional causal structures in principle, in practice it is unlikely that such procedures will be scalable. Indeed, calculating Groebner bases is an EXPSPACE -complete problem [9]. Nevertheless, it may still be possible to develop new tools for causal inference in these cases using the approach described here.
It also remains an open problem to decide, for any given functional causal structure, which observational equivalence class it belongs to. That is, even if our catalogue of classes is complete, it merely establishes that every functional causal structure falls into one of these classes, but it does not provide a means of deciding, for a given functional causal structure, having an arbitrary number of latent variables and functional dependences, which class it is a member of. Of course, if one supplements a given functional causal structure with distributions over the latent variables, then one obtains a joint distribution over the observed variables and this can be subjected to the feasibility tests for different observational equivalence classes. It is likely, however, that there are better ways of solving the classification problem, for instance, by determining how the functional dependences can be simplified. Solving this classification problem would allow one to find common features of all of the functional causal structures in a given class, for instance, features of the topology of the causal structure.
We have here made the idealization that the uncontrolled statistical data is given as a joint distribution whereas in practice it is a finite sample from this distribution. To contend with this idealization, one should in practice evaluate causal models by considering how well the finite statistical data can be fit to them.
## 6.2 Relevance to quantum foundations
One of the motivations of the current work was the prospect of new insights into the interplay between causal structure and observed correlations in quantum theory. In particular, for a pair of quantum systems-each subjected to one of a set of possible measurements-a Bell inequality [10, 11, 12] is a constraint on the joint probability distribution over the outcomes of each possible choice of the local measurements (that is, for every combination of local measurement settings ). It has recently been noted [13, 14] that one can understand the assumptions required to derive a Bell inequality as the standard assumptions for causal inference together with a particular hypothesis about the underlying causal structure, namely, that each local outcome depends causally on the corresponding local setting and on a latent common cause between the two systems. This causal structure is illustrated in Fig.6, where A and B are the measurement outcomes for each quantum system, X and Y are the local choices of measurement, and µ is the latent common cause. The complete set of Bell inequalities for this scenario, therefore, can be understood as a feasibility test for such a causal model.
The problem considered in the current work is different from that of deriving the complete set of Bell inequalities in a couple of ways: (i) The observational input to our causal inference problem is different; there are no setting variables in our problem-that is to say, any variable distinct from the observed A,B appearing in a causal structure must be latent-and therefore our input is a single joint distribution over two observed binary variables rather than a set of such distributions, one for each choice of the setting variables. (ii) The hypotheses whose feasibility we are testing are different; while the set of all Bell inequalities provides a test of the feasibility of the causal structure illustrated in Fig. 6, we here seek to assess the feasibility of a causal structure for a given choice of cardinalities for the latent variables appearing therein (e.g., whether a given latent variable consists of a single bit, two bits, etcetera) and for a given choice of the precise form of the functional dependence of the observed variables on the latent variables.
Consider the Bell scenario of Fig. 6 where A,B,X, and Y are binary variables. To define a functional causal structure, one must supplement this causal structure with a hypothesis about the cardinality of µ and a hypothesis about the function f that maps X,µ to A and the function g that maps Y, µ to B . (Given that there are only 16 possible values of A,B,X, and Y , a µ of cardinality 16 is sufficient to simulate any other case.) The conditional distributions P ( A,B | X,Y ) compatible with this functional causal structure are
<!-- formula-not-decoded -->
where q µ denotes a probability distribution over the latent variable µ . To determine the semi-algebraic set of possibilities for P ( A,B | X,Y ) that are compatible with this functional causal structure, one could use the techniques of the present article. From the polynomial equalities that hold between the P ( A,B | X,Y ) and the q µ (those given in Eq. (6.2.1)), one seeks to obtain constraints on the P ( A,B | X,Y ) alone by eliminating the q µ . Because the variables to be eliminated appear linearly, to eliminate the q µ , it suffices to use quantifier elimination techniques that are less computationally demanding than implicitization, such as Fourier-Motzkin elimination.
Note that if some observed correlations violate an inequality derived in this fashion, it only establishes the infeasibility of a given classical functional causal structure. Violation of Bell inequalities, on the other hand, rule out the feasibility of the causal structure, regardless of the cardinality of the latent variables and the nature of the functional dependences. In this sense, deriving Bell inequali-
?
<details>
<summary>Image 14 Details</summary>

### Visual Description
## Diagram: Bidirectional Relationship Between A and B with Connections to X and Y
### Overview
The diagram depicts a bidirectional relationship between two nodes labeled **A** and **B**, connected by a double-headed arrow labeled **μ**. Both **A** and **B** have unidirectional arrows pointing to two lower nodes labeled **X** and **Y**, respectively. The structure suggests a system where **A** and **B** influence **X** and **Y**, while maintaining a mutual relationship via **μ**.
### Components/Axes
- **Nodes**:
- **A** (top-left triangle)
- **B** (top-right triangle)
- **X** (bottom-left node)
- **Y** (bottom-right node)
- **Arrows**:
- **μ**: Double-headed arrow between **A** and **B** (center of diagram).
- Unidirectional arrows from **A** to **X** and **Y** (left side).
- Unidirectional arrows from **B** to **X** and **Y** (right side).
- **Labels**:
- All labels are in uppercase Latin script (English).
### Detailed Analysis
- **μ**: Positioned centrally between **A** and **B**, indicating a reciprocal or interdependent relationship.
- **A → X/Y**: Arrows from **A** to **X** and **Y** suggest **A** directly influences or contributes to both **X** and **Y**.
- **B → X/Y**: Similarly, **B** has direct influence over **X** and **Y**, mirroring **A**'s role.
- **Spatial Layout**:
- **A** and **B** are positioned at the top, forming a symmetrical pair.
- **X** and **Y** are at the bottom, acting as shared targets for **A** and **B**.
- **μ** bridges **A** and **B**, emphasizing their interconnectedness.
### Key Observations
1. **Symmetry**: The diagram is horizontally symmetrical, with **A** and **B** mirroring each other in structure.
2. **Shared Influence**: Both **A** and **B** independently target **X** and **Y**, implying redundancy or collaboration in their effects.
3. **Bidirectional μ**: The double-headed arrow **μ** suggests a feedback loop or equilibrium between **A** and **B**.
### Interpretation
This diagram likely represents a system where two entities (**A** and **B**) operate in tandem, influencing shared outcomes (**X** and **Y**). The bidirectional **μ** implies that **A** and **B** are not isolated; their relationship is dynamic, potentially involving mutual adjustment or dependency. For example:
- **A** and **B** could represent competing or complementary processes (e.g., supply chains, biological pathways) that jointly affect downstream outcomes (**X**, **Y**).
- The absence of numerical data or explicit directionality (e.g., "A → X" vs. "B → Y") leaves the nature of the influence (e.g., additive, competitive) open to interpretation.
- The symmetry suggests a designed balance, though real-world systems might introduce asymmetry (e.g., one node dominating the other).
### Notable Anomalies
- No explicit directionality for **μ** (double-headed arrow) could imply equal influence, but real-world systems often have asymmetrical power dynamics.
- **X** and **Y** lack distinguishing features, making it unclear if they represent distinct outcomes or interchangeable states.
This structure is common in models of interdependent systems, such as ecological networks, economic models, or organizational workflows, where mutual relationships and shared targets are critical to understanding system behavior.
</details>
?
\_
\_
?
?
\_
\_
Fig. 6: In the Bell scenario, one is interested in the conditional distribution P ( A,B | X,Y ) . This is equivalent to a set of distributions over A,B , { P ( X,Y ) ( A,B ) } , one for each choice of measurement setting.
ties is more challenging than deriving feasibility tests for functional causal structures. However, in another sense, deriving Bell inequalities is more straightforward because the semi-algebraic set defined by Eq. (6.2.1) is a polytope, whereas for a general funcational causal structure this is not the case. The mathematical tools that have been used to derive Bell-type inequalities-which include semi-definite [15] and linear programming [16] as well as Fourier-Motzkin elimination [17, 18]-are therefore quite different from those used here.
Bell inequalities are significant to the foundations of quantum theory because they are found to be violated in experiments on pairs of separated quantum systems, implying that the predictions of quantum theory cannot be explained by a classical causal model with the causal structure that one expects to hold for the experiment (that of Fig. 6) without fine-tuning [13].
Researchers in the field of quantum foundations have now begun to apply insights obtained from the study of Bell inequalities to the problem of deriving constraints on observed correlations in more general causal scenarios [14, 19, 20, 21, 22, 23], and the current work constitutes another contribution in this direction.
More importantly, there are now a few proposals for how to generalize the standard notion of a causal model to the quantum realm. Ref. [24], for instance, proposes a definition of a quantum causal model in terms of a noncommutative generalization of conditional probability, while Refs. [19, 25, 26] follow a more operational approach. With a notion of quantum causal model in hand, one can explore the problem of inferring facts about the quantum causal model from observed correlations. This is the problem of quantum causal inference .
In the case of Bell-type experiments, for instance, one expects a quantum causal model with the natural causal structure (that of Fig. 6) to be feasible only if the observed correlations satisfy the so-called Cirel'son bound, which is a generalization of a Bell inequality [27]. A simple case of quantum causal inference that has been investigated recently is the problem of distinguishing a cause-effect relation from a common-cause relation. Here, it has been shown that the quantum correlations can distinguish the two cases even in uncontrolled experiments, implying a quantum advantage for causal inference [28].
In quantum causal models, variables are replaced by sys- tems, each represented by a Hilbert space, and one makes a distinction between observed systems, upon which a measurement is made, and latent systems. Sets of systems are described by joint quantum states (as opposed to the joint probability distributions that describe sets of variables), and the functional dependences are specified by unitary maps. A natural analogue of the classical causal inference problem is to make inferences about the causal structure and the parameters of the causal model given a joint quantum state on the observed systems. The natural analogue of the functional causal structures considered in this article are quantum causal structures together with a specification of the dimensions of the latent systems and the unitaries that describe the functional dependences. To derive a feasibility test for a functional causal structure, one must eliminate the real-valued parameters that specify the quantum state of the latent systems. For example, if a given latent system is 2-dimensional (the quantum analogue of a binary latent variable), there are three real-valued parameters needed to specify the state completely (as opposed to the one real parameter needed to completely specify a distribution over a classical bit). The expectation values of the three Pauli operators, for instance, suffice to do so. Nonetheless, one can still take advantage of the techniques from algebraic geometry employed in this work to eliminate these parameters and determine constraints on the quantum state of the observed systems. In this way, we ought to be able to derive feasability tests for functional causal structures in the quantum sphere.
## 6.3 Related work
The extent to which the mathematical tools associated to quantifier elimination are well-suited to problems of causal inference has been previously emphasized by Geiger and Meek [29]. Many authors have noted, in particular, the applicability of quantifier elimination to the problem of deriving tests for the feasibility of a causal structure when the cardinality of the latent variables is known. Ref. [29], for instance, used Cylindrical Algebraic Decomposition to derive equality and inequality constraints for a particular causal model. However the computational complexity of such brute-force quantifier elimination (doubly exponential in the number of parameters) means that its applications are limited to very simple examples.
Many previous works have appealed to implicitization procedures using Groebner bases to obtain equality constraints for causal models. Geiger and Meek [30], Garcia, Stillman and Sturmfels [31], and Garcia [32] have used implicitization to obtain the smallest algebraic variety that contains the semi-algebraic set of joint distributions over observed variables for various causal structures with known cardinalities of latent variables. This yields polynomial equalities on the joint distribution whose satisfaction are necessary conditions for compatibility with the causal structure. Kang and Tian [33] have also applied implicitization techniques to the problem of identifying polynomial equality constraints on observational and interventional distributions (using the framework supplied by Refs. [34, 35]).
Our work goes beyond these treatments insofar as it uses implicitization as one step in an algorithm that finds the semi-algebraic set itself rather than the smallest algebraic variety containing it. The second step is to use the extension theorem (described in Appendix A) to find inequality constraints on the joint probability distribution over observed variables from knowledge of the Groebner basis. To illustrate the difference, consider the observational equivalence class labelled (2 , 2) Id in our classification. This corresponds to a semi-algebraic set for which the smallest algebraic variety containing it is the plane p 10 = 0 . The intersection of this variety with the tetrahedral probability simplex is its p 10 = 0 facet. The semi-algebraic set, however, is a strict subset of this, the one satisfying the additional inequality ( p 01 +2 p 11 -2) 2 ≥ 4 p 00 .
One novel feature of our approach which distinguishes it from previous uses of implicitization is that we focus on deriving feasibility constraints for a causal structure with specific functional dependences. In previous approaches, the set of variables that needed to be eliminated included both the parameters describing the probability distributions for the root variables and the parameters describing the conditional probability distributions for each non-root variable. In our approach, the second sort of parameter is fixed and not in need of elimination. The restriction to binary variables ensures that the number of distinct possible functional dependences is relatively modest.
Finally, the use of Groebner bases in identifying or bounding parameters in a causal model has also been highlighted in previous work such as Garcia-Puente et al. [36].
After the completion of this work, we became aware of related independent works by Chaves [37] and Rossett et al. [38] which also derive nonlinear inequalities for determining the feasibility of certain causal structures. These authors consider structures which, like Bell scenarios, have multiple pairs of observed variables that are related as cause and effect (understood as setting-outcome pairs) but which, unlike Bell scenarios, can have more than one latent common cause acting on the outcome variables. Chaves simplifies the quantifier elimination problem that must be solved using a round of Fourier-Motzkin elimination, while Rossett et al. provide an inductive approach for deriving new inequalities from given inequalities for subgraphs of the causal network under consideration. Combining our approach with these other methods constitutes an interesting direction for future work.
## Acknowledgements
This research began while CML was completing the Perimeter Scholars International Masters program at Perimeter Institute. Research at Perimeter Institute is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research and Innovation.
## References
- [1] Judea Pearl, Causality (2nd edition) , Cambridge University Press, 2009.
- [2] P. Sprites and C. Glymour and R. Scheines, Causation, Prediction and Search , The MIT press, 2nd edition, 2000.
- [3] Dominik Janzing and Joris Mooij and Jonas Peters and Bernhard Scholkopf, Identifying counfounders using additive noise models , Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
- [4] Patrick Hoyer and Dominik Janzing and Joris Mooij and Jonas Peters and Bernhard Scholkopf, Nonlinear causal discovery with additive noise models Neural Information Processing Systems Foundation , 2009.
- [5] J. Peters and D. Janzing and B. Scholkopf, Causal inference on Discrete Data using Additive Noise Models , IEEE TPAMI vol. 33 no. 12 2436-2450, 2011.
- [6] J. Peters and J. M. Mooij and D. Janzing and B. Scholkopf, Causal discovery with continuous additive noise models , Journal of Machine Learning Research 15:2009-2053, 2014.
- [7] D. Cox and J. Little and D. O'Shea, Ideals, Varieties and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra , Springer Verlag, New York, 2007.
- [8] B. Mishra, Algorithmic algebra , Springer Verlag, New York, 2012.
- [9] Magali Bardet, On the complexity of a Groebner basis algorithm , Algorithms Seminar, 20022004. 2005.
- [10] John S. Bell, Speakable and Unspeakable in Quantum Mechanics , Cambridge University Press, 1964.
- [11] John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt, Proposed Experiment to Test Local Hidden-Variable Theories , Phys. Rev. Lett. 23 880, 1969.
- [12] Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio Scarani, and Stephanie Wehner, Bell nonlocality , Rev. Mod. Phys. 86, 419, 2014.
- [13] Christopher J. Wood and Robert W. Spekkens, The lesson of causal discovery algorithms for quantum correlations: Causal explanations of Bell-inequality violations require fine-tuning , New J. Phys. 17, 033002, 2015.
- [14] T. Fritz, Beyond Bell's theorem: Correlation scenarios New J. Phys. 14 103001, 2012
- [15] Greg Ver Steeg, Aram Galstyan, A Sequence of Relaxations Constraining Hidden Variable Models , Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011.
- [16] Blai Bonet, Instrumentality Tests Revisited Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, 2001.
- [17] Costantino Budroni, and Adan Cabello, Bell inequalities from variable elimination methods , J. Phys. A: Math. Theor. 45 (2012) 385304.
- [18] Tobias Fritz, and Rafael Chaves, Entropic Inequalities and Marginal Problems , IEEE Trans. on Information Theory, vol. 59, pages 803 - 817 (2013).
- [19] T. Fritz, Beyond Bell's theorem II: Scenarios with arbitrary causal structure , Comm. Math. Phys. 341(2), 391-434, 2016.
- [20] R. Chaves and R. Kuen and J. Brask and D. Gross, A unifying framework for relaxations of the causal assumptions in Bell's theorem Phys. Rev. Lett. 114, 140403, 2015.
- [21] R. Chaves and C. Majenz and D. Gross, InformationTheoretic implications of Quantum Causal Structures , Nature Communications 6, 5766, 2015.
- [22] A. Tavakoli and P. Skrzypczyk and D. Cavalcanti and A. Acin, Non-local correlations in the star-network configuration , Phys. Rev. A 90, 062109, 2014.
- [23] C. Branciard and D. Rosset and N. Gisin and S. Pironio, Bilocal versus nonbilocal correlations in entanglement swapping , Phys. Rev. A 85, 032129, 2012.
- [24] M. S. Leifer and R. W. Spekkens, Formulating quantum theory as a causally neutral theory of bayesian inference , Phys. Rev. A 88, 052130 (2013).
- [25] J. Hensen and R. Lal and M. F. Pusey, Theoryindependent limits on correlations from generalised bayesian networks , New J. Phys. 16, 113043, 2014.
- [26] J. Pienaar and C. Brukner, A graph-separation theorem for quantum causal models New J. Phys. 17 073020, 2015.
- [27] B. S. Cirel'son, Quantum generalisations of Bell's inequality Lett. Math. Phys. Volume 4, Issue 2, 1980.
- [28] K. Ried and M. Agnew and L. Vermeyden and D. Janzing and R. W. Spekkens and K. Resch, A quantum advantage for inferring causal structure , Nature Physics 11, 414 2015.
- [29] D. Geiger and C. Meek, Quantifier elimination for statistical problems , Proceedings of the 15th Conference on Uncertainty in artificial intelligence, 1999.
- [30] D. Geiger and C. Meek, Graphical models and exponential families , Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 1998.
- [31] L. D. Garcia, M. Stillman, and B. Sturmfels, Algebraic geometry of bayesian networks. Journal of Symbolic Computation, 39(34):331355, 2005.
- [32] Luis David Garcia, Algebraic Statistics in Model Selection , Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004.
- [33] Changsung Kang, and Jin Tian , Polynomial Constraints in Causal Bayesian Networks , Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 2007.
- [34] Jin Tian, Judea Pearl, On the Testable Implications of Causal Models with Hidden Variables , Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002.
- [35] Changsung Kang, and Jin Tian , Inequality Constraints in Causal Models with Hidden Variables , Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006.
- [36] L. D. Garc´ ıa-Puente and S. Spielvogel and Seth Sullivant, Identifying causal effects with computer algebra , Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010.
- [37] R. Chaves, Polynomial Bell inequalities , Phys. Rev. Lett. 116, 010402 (2016)
- [38] D. Rosset, C. Branciard, T. J. Barnea, G. Ptz, N. Brunner, and N. Gisin, Nonlinear Bell Inequalities Tailored for Quantum Networks , Phys. Rev. Lett. 116, 010403 (2016)
## Appendices
## A Ideals, varieties and Groebner bases
We now introduce useful concepts and tools from algebraic geometry that we will make use of in solving the problem mentioned in section 2 of the main text. We will follow the presentation given in [7].
We define a monomial in x 1 , . . . , x n to be a product of the form x α 1 1 x α 2 2 . . . x α n n , where the exponents are non-negative integers, α i ∈ Z ≥ 0 for i = 1 , . . . , n . We can simplify our notation slightly by letting α = ( α 1 , . . . , α n ) and setting
<!-- formula-not-decoded -->
We can now define a polynomial over a field k .
Definition A.0.1. A polynomial f in x 1 , . . . , x n with coefficients in a field k is a finite linear combination of monomials. We write f as
<!-- formula-not-decoded -->
where the sum is taken over a finite number of α 's.
The set of all polynomials in x 1 , . . . , x n with coefficients in k is denoted k [ x 1 , . . . , x n ] . When we deal with polynomials we are mainly interested in the solution set of systems of polynomial equations. This leads us to the main geometrical objects studied in algebraic geometry, algebraic varieties and semi-algebraic sets, which we now define.
Definition A.0.2. Let k be a field and let f 1 , . . . , f s be polynomials in k [ x 1 , . . . , x n ] . Then we set
<!-- formula-not-decoded -->
We call V ( f 1 , . . . , f s ) the algebraic variety (also called the affine variety ) defined by f 1 , . . . , f s .
Thus, an algebraic variety V ( f 1 , . . . , f s ) ⊂ k n is the solution set of the system of polynomial equations f 1 ( x 1 , . . . , x n ) = · · · = f s ( x 1 , . . . , x n ) = 0 . A basic semi-algebraic set is defined to be the solution set of a system of polynomial equalities and in equalities, that is:
Definition A.0.3. A basic semi-algebraic set is defined by { x ∈ R n : g i ( x ) 0 , ∀ i = 1 , . . . , m } , where g 1 , . . . , g n ∈ R [ x 1 , . . . , x n ] are polynomials over the reals 8 and where corresponds to either ≥ , = , or ≤ .
Note that algebraic varieties are examples of basic semi-algebraic sets.
Definition A.0.4. A semi-algebraic set is formed by taking finite combinations of unions, intersections, or complements of basic semi-algebraic sets.
For any causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic set.
We now introduce and define ideals , the main algebraic object studied in algebraic geometry.
Definition A.0.5. A subset I ⊂ k [ x 1 , . . . , x n ] is an ideal if it satisfies:
1. 0 ∈ I ,
2. If f, g ∈ I , then f + g ∈ I ,
3. If f ∈ I and h ∈ k [ x 1 , . . . , x n ] , then hf ∈ I .
A natural example of an ideal is the ideal generated by a finite number of polynomials.
Definition A.0.6. Let f 1 , . . . , f s be polynomials in k [ x 1 , . . . , x n ] . Then we set
<!-- formula-not-decoded -->
It is not hard to show that 〈 f 1 , . . . , f s 〉 is an ideal. We call it the ideal generated by f 1 , . . . , f s and we call f 1 , . . . , f s the basis of the ideal.
8 Note that one can replace the real field R used in the last definition with any ordered field.
Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. One can even define the algebraic variety V ( I ) defined by the ideal I ⊂ k [ x 1 , . . . , x n ] , where
<!-- formula-not-decoded -->
The proof that V ( I ) forms an algebraic variety can be found in [7]. Interestingly, it can also be shown that if I = 〈 f 1 , . . . , f s 〉 , then V ( I ) = V ( f 1 , . . . , f s ) . That is to say that varieties are determined by ideals . This will have interesting consequences for us, as we will see shortly.
To find a general solution to the implicitization problem introduced in the main text we need to introduce monomial orderings and Groebner bases.
First, note that we can reconstruct the monomial x α 1 1 . . . x α n n from the n -tuple of exponents ( α 1 , . . . , α n ) ∈ Z n ≥ 0 . This establishes a one-to-one correspondence between Z n ≥ 0 and monomials in k [ x 1 , . . . , x n ] . It follows that any ordering > on the space Z n ≥ 0 will induce an ordering on monomials: if α > β according to this ordering, then we will also say that x α > x β .
Now, we want the induced ordering to be 'compatible' with the algebraic structure of the polynomial ring that our monomials live in. This requirement leads us to the following definition.
Definition A.0.7. A monomial ordering on k [ x 1 , . . . , x n ] is any relation > on Z n ≥ 0 satisfying:
1. > is a total ordering on Z n ≥ 0 . That is to say that, for every α, β ∈ Z n ≥ 0 either α > β , β > α or α = β .
2. If α > β and γ ∈ Z n ≥ 0 , then α + γ > β + γ .
3. > is a well ordering on Z n ≥ 0 . This means that every non-empty subset of Z n ≥ 0 has a smallest element under > .
The main monomial ordering we will make use of here is the lexicographic order , which we define as follows.
Definition A.0.8 (Lexicographic order) . Let α = ( α 1 , . . . , α n ) and β = ( β 1 , . . . , β n ) ∈ Z n ≥ 0 . We say α > lex β if, in the vector difference α -β ∈ Z n , the leftmost non-zero entry is positive. We will write x α > lex x β if α > lex β .
Once we fix a monomial order, each f ∈ k [ x 1 , . . . , x n ] has a unique leading term LT ( f ) relative to this order. We denote by LT ( I ) the set of leading terms of elements of the ideal I . We can then define 〈 LT ( I ) 〉 to be the ideal generated by the elements of LT ( I ) . Consider a finitely generated ideal I = 〈 f 1 , . . . , f s 〉 , it is interesting to note that 〈 LT ( f 1 ) , . . . , LT ( f s ) 〉 and 〈 LT ( I ) 〉 may in general be different ideals. But surprisingly there always exists [7] a choice of basis g 1 , . . . , g t ∈ I such that 〈 LT ( g 1 ) , . . . , LT ( g t ) 〉 = 〈 LT ( I ) 〉 . These bases are know as Groebner bases .
Definition A.0.9. Fix a monomial ordering. A finite subset G = { g 1 , . . . , g t } of an ideal I is said to be a Groebner basis if
<!-- formula-not-decoded -->
More informally, a set G = { g 1 , . . . , g t } ⊂ I is a Groebner basis for I if and only if the leading term of any element of I is divisible by (at least) one of the LT ( g i ) . Groebner bases simplify performing many calculations in algebraic geometry and they have many interesting properties, some of which we will see shortly. There are efficient algorithms for calculating Groebner bases and many software packages that one can use to implement them. An example of a Groebner basis was given in the main text. Our use of the Groebner basis in that example was a special case of a general result, known as the elimination theorem , which provides us with a way of using Groebner bases to systematically eliminate certain variables from a system of polynomial equations and thereby solve the implicitization problem. We will state the elimination theorem shortly. First, we require the following definition.
Definition A.0.10. Given I = 〈 g 1 , . . . , g t 〉 ⊂ k [ x 1 , . . . , x n ] , the l th elimination ideal I l is the ideal of k [ x 1 , . . . , x n ] defined by
<!-- formula-not-decoded -->
Thus I l consists of all consequences of g 1 = · · · = g t = 0 which eliminate the variables x 1 , . . . , x l . Using this language, we see that eliminating x 1 , . . . , x l means finding non-zero polynomials in the l th elimination ideal of k [ x l +1 , . . . , x n ] . With the proper ordering, Groebner bases allow us to do this instantly. We can now state the elimination theorem (for a proof, see [7]).
Theorem A.0.11 (Elimination theorem) . Let I ⊂ k [ x 1 , . . . , x n ] be an ideal and let G be a Groebner basis for I with respect to the lex order where x 1 > x 2 > · · · > x n . Then, for every 0 ≤ l ≤ n , the set
<!-- formula-not-decoded -->
is a Groeber basis of the l th elimination ideal.
Fig. 7: (a) A = µν and B = µ ⊕ ν ⊕ µν .
<details>
<summary>Image 15 Details</summary>

### Visual Description
## Diagram: Bidirectional Relationship and 3D Geometric Representation
### Overview
The image contains two distinct diagrams:
- **(a)** A bidirectional diamond-shaped diagram with labeled nodes and directional arrows.
- **(b)** A 3D pyramid-like structure with labeled vertices and a shaded triangular region.
### Components/Axes
#### Diagram (a): Bidirectional Relationship
- **Nodes**:
- Labeled `A` (left) and `B` (right).
- **Arrows**:
- Two bidirectional arrows between `A` and `B`:
- Top arrow labeled `μ` (from `A` to `B`).
- Bottom arrow labeled `ν` (from `B` to `A`).
- **No legend or axis titles** present.
#### Diagram (b): 3D Pyramid
- **Vertices**:
- Labeled with binary coordinates:
- Top vertex: `[01]`
- Bottom-left vertex: `[00]`
- Bottom-right vertex: `[10]`
- Base vertex: `[11]`
- **Axes**:
- No explicit axis titles, but coordinates suggest a 3D Cartesian-like system.
- **Shaded Region**:
- A triangular area shaded in purple, spanning from `[00]` to `[01]` and extending toward the apex `[01]`.
- **Grid Pattern**:
- A grid overlay on the shaded region, suggesting a parametric or functional relationship.
### Detailed Analysis
#### Diagram (a): Bidirectional Relationship
- The diagram represents a system with two states (`A` and `B`) and two transition mechanisms (`μ` and `ν`).
- Arrows indicate reversible interactions: `A` transitions to `B` via `μ`, and `B` transitions back to `A` via `ν`.
#### Diagram (b): 3D Pyramid
- The pyramid’s vertices are labeled with binary coordinates, possibly representing binary states or binary variables.
- The shaded triangular region between `[00]` and `[01]` may indicate a subset of interest, such as a probability distribution or a constrained solution space.
- The grid pattern on the shaded area implies a functional or parametric relationship, though no explicit equation or scale is provided.
### Key Observations
1. **Bidirectional Flow**: Diagram (a) emphasizes symmetry in transitions between `A` and `B`, with equal emphasis on `μ` and `ν`.
2. **Geometric Constraints**: Diagram (b) highlights a specific region (shaded area) within the pyramid, potentially representing a boundary or critical zone.
3. **Lack of Numerical Data**: No explicit values, scales, or legends are provided, limiting quantitative interpretation.
### Interpretation
- **Diagram (a)**: Likely models a dynamic system (e.g., chemical reactions, state transitions) where `μ` and `ν` represent forward and reverse processes. The absence of rates or magnitudes suggests a qualitative rather than quantitative analysis.
- **Diagram (b)**: The 3D pyramid could represent a geometric or probabilistic framework. The shaded region might denote a critical threshold or a region of integration (e.g., in calculus or optimization). The binary coordinates hint at a connection to binary logic, quantum states, or combinatorial systems.
- **Relationship Between Diagrams**: While disconnected visually, both diagrams emphasize directional relationships (`μ`/`ν` in (a), shaded region in (b)) and may belong to a broader system involving state transitions and geometric constraints.
### Limitations
- No numerical data or units are provided, making quantitative analysis impossible.
- The purpose of the shaded region in (b) and the exact nature of `μ`/`ν` in (a) remain ambiguous without additional context.
</details>
(b)
(
p
+2
p
)
≥
p
.
So, in our example with the fan depicted in Fig.1(b)-discussed in the main textg 3 and g 4 form a Groebner basis of the 2 nd elimination ideal and this is what allowed us to eliminate the variables q 1 and q 2 .
How do we know that we can extend solutions from the l th elimination ideal to the ( l -1) th ? More concretely, in our specific example of the fan, how do we know that the equation p 00 p 01 = p 10 p 11 defines the entire algebraic variety and not just some part of it? The following result shows us the conditions under which we can extent partial solutions to full ones.
Theorem A.0.12 (Extension theorem) . Let I ⊂ C [ x 1 , . . . , x n ] and let I 1 be the first elimination ideal of I . For each 1 ≤ i ≤ s , write f i in the form
<!-- formula-not-decoded -->
where N i ≥ 0 and g i ∈ C [ x 1 , . . . , x n ] is non-zero. Suppose we had a partial solution ( a 2 , . . . , a n ) ∈ V ( I 1 ) . If ( a 2 , . . . , a n ) / ∈ V ( g 1 , . . . , g s ) , then there exists a 1 ∈ C such that ( a 1 , . . . , a n ) ∈ V ( I ) .
When we work over (0 , 1) ⊂ R we also, in conjunction with the conditions of the above theorem, need to ensure that at every extension step the new solution is real and lies in (0 , 1) .
We can apply the above theorem to our example to see that, indeed, the equation p 00 p 01 = p 10 p 11 defines the smallest algebraic variety that contains the semi-algebraic set depicted in Fig.1(b) in the main text.
## B More examples of deriving tests for feasibility
Consider the functional causal structure of Fig. 7(a). The joint distributions that can arise from it are of the form
<!-- formula-not-decoded -->
The semi-algebraic set defined by P ( A,B ) is shown in Fig. 7(b). We refer to this variety as a StarFleet insignia . The Groebner basis for the ideal
<!-- formula-not-decoded -->
with respect to the lex order q 1 > q 2 > p 00 > p 01 > p 11 , is
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
The equation g 2 = 0 defines an equality constraint that restricts the joint probability distribution to the plane p 10 = 0 and therefore to the face of the tetrahedron containing the vertices [00] , [01] and [11] . In order to extend the partial solution { p 00 , p 01 , p 11 } to a full solution { q 1 , q 2 , p 00 , p 01 , p 11 } using the extension theorem, we must ensure that all the solutions are real. Now the equation g 3 = 0 allows us to write q 2 in terms of the p ij 's as follows
<!-- formula-not-decoded -->
So in order to ensure that q 2 ∈ R , we must set ( p 01 + 2 p 11 -2) 2 + 4( p 11 + p 01 -1) ≥ 0 . Using the normalisation condition and rearranging gives us
<!-- formula-not-decoded -->
which defines the semi-algebraic set depicted in Fig. 7(b). None of the remaining constraints 0 ≤ q i ≤ 1 , for i = 1 , 2 , 3 result in non-trivial relations among the p ij 's.
Fig. 8: (a) A = µν and B = µ ⊕ ν ⊕ δ . (b) | 4( p 10 -p 11 )( p 00 p 10 -p 01 p 11 ) | ≤ ( p 11 (2 p 01 +2 p 10 + p 00 ) -p 10 (2 p 00 + 2 p 11 + p 01 ) ) 2
<details>
<summary>Image 16 Details</summary>

### Visual Description
## Diagram and 3D Surface Plots: Mathematical Transformations and Function Visualizations
### Overview
The image contains three components:
1. **(a)** A directed graph with labeled nodes and arrows.
2. **(b)** and **(c)** Two 3D surface plots with grid lines and color gradients.
### Components/Axes
#### (a) Directed Graph
- **Nodes**: Labeled `A` and `B`.
- **Arrows**:
- `μ` (mu) from `A` to `B`.
- `ν` (nu) from `B` to `A`.
- **Delta Symbol (δ)**: A downward arrow from `B` to an unlabeled node below.
#### (b) and (c) 3D Surface Plots
- **Axes**:
- **X-axis**: Labeled `[00]`, `[01]`, `[10]`, `[11]` (binary-like labels).
- **Y-axis**: Labeled `[00]`, `[01]`, `[10]`, `[11]` (same as X-axis).
- **Z-axis**: Implicitly represents the function's output (no explicit label).
- **Grid Lines**: Black grid overlaying the surfaces.
- **Color Gradients**:
- **(b)**: Purple-to-blue gradient (left plot).
- **(c)**: Rainbow gradient (purple-to-yellow) (right plot).
### Detailed Analysis
#### (a) Directed Graph
- **Flow**:
- `μ` maps `A` → `B`.
- `ν` maps `B` → `A`.
- `δ` maps `B` → (unlabeled node).
- **Interpretation**: Likely represents a transformation or mapping between sets `A` and `B`, with `δ` indicating a secondary operation or decay.
#### (b) and (c) 3D Surface Plots
- **Structure**:
- Both plots use a triangular prism-like structure with vertices at `[00]`, `[01]`, and `[11]`.
- The fourth vertex `[10]` is implied but not explicitly labeled.
- **Color Interpretation**:
- **(b)**: Purple-to-blue gradient suggests a transition from high to low values or a phase change.
- **(c)**: Rainbow gradient indicates a multi-variable function or a normalized probability distribution.
### Key Observations
1. **Symmetry in Labels**: Both 3D plots share identical axis labels (`[00]`, `[01]`, `[10]`, `[11]`), suggesting a binary or Boolean context.
2. **Color Transitions**:
- **(b)**: Sharp gradient from purple (peak) to blue (valley) implies a steep function slope.
- **(c)**: Smoother rainbow gradient suggests a more gradual variation.
3. **Missing Labels**: No explicit legend or numerical scale is provided, limiting quantitative interpretation.
### Interpretation
- **(a)** likely models a reversible transformation (`μ`/`ν`) with an irreversible component (`δ`), common in dynamical systems or state transitions.
- **(b)** and **(c)** visualize functions over a binary input space (e.g., quantum states, logic gates). The color gradients may represent:
- **(b)**: A loss function or error metric (steep decline).
- **(c)**: A probability density or activation function (smooth transition).
- **Anomalies**: The absence of a legend or numerical scale prevents precise quantification. The unlabeled node in **(a)** raises questions about the completeness of the transformation graph.
### Spatial Grounding
- **(a)**: Central placement emphasizes its role as a conceptual bridge between the two 3D plots.
- **(b)** and **(c)**: Positioned side-by-side for direct comparison of gradient behaviors.
### Content Details
- **No numerical data** or explicit units are provided.
- **Text Embedded**: Labels `A`, `B`, `μ`, `ν`, `δ`, and binary axis labels are the only textual elements.
### Final Notes
This image combines a transformation graph with function visualizations, suggesting a study of mappings and their effects on multidimensional data. Further context (e.g., equations, units) is required for rigorous analysis.
</details>
Consider the functional causal structure of Fig. 8(a). The joint distributions that can arise from it are of the form
<!-- formula-not-decoded -->
The semi-algebraic set defined by P ( A,B ) is shown from different angles in Fig. 8(b). We note that conditioning on the variable δ being equal to 0 or 1 reduces this variety to one of the star trek symbols depicted on the faces. Similarly conditioning on ν = 1 (or µ = 1 ) reduces this variety to a fan.
The Groebner basis for the ideal
<!-- formula-not-decoded -->
with respect to the usual lex order, is given by
<!-- formula-not-decoded -->
The equation g 2 = 0 is just the usual normalisation condition restricting the joint probability distribution to the tetrahedron.In order to use the extension theorem to extend a partial solution { p 00 , p 01 , p 10 , p 11 } to a full solution { q 1 , q 2 , q 3 , p 00 , p 01 , p 10 , p 11 } , we must ensure that each solution is real. The only situations in which we need to impose this is in the case of q 2 . The equation g 7 = 0 is a quadratic in q 2 and in order for its solutions to be real, we must stipulate that
<!-- formula-not-decoded -->
Using g 1 = 0 to write q 3 in terms of p 10 and p 11 and substituting this into g 5 = 0 gives us another quadratic in q 2 . For the solutions of this quadratic to be real we must enforce that
<!-- formula-not-decoded -->
Combining these two inequalities we get
<!-- formula-not-decoded -->
where | . | denotes the absolute value. None of the remaining constraints 0 ≤ q i ≤ 1 , for i = 1 , 2 , 3 result in non-trivial relations among the p ij 's.
Examining this inequality more closely, we see that setting p 10 = 0 reduces it to the inequality defining the StarFleet insignia, 4 p 01 ≤ (2 p 01 + p 00 ) 2 , on the face spanned by { [00] , [01] , [11] } , as it should (this is visible in Fig. 8(b)). Similarly, for p 11 = 0 we get the StarFleet insignia on the face { [00] , [01] , [10] } (also visible in Fig. 8(b)). The appearance of the term p 00 p 10 -p 01 p 11 is also noteworthy. Recall that the equation p 00 p 10 = p 01 p 11 defines the fan depicted in Fig.2(b) in the main text, so the above inequality quantitatively bounds the deviation from the surface of this fan by an amount proportional to the two star trek symbols discussed above. This is intuitively what we would expect from looking at the semi-algebraic set depicted in Fig. 8(b).
These examples cover all the different situations one may encounter while using algebraic geometry techniques to derive tests for feasibility of the causal models we are considering in this work. The remaining tests are derived in an analogous fashion.
## C Sufficiency of n -latent-bit models
We here present the proof of Theorem 4.2.1.
The example presented in section 4.2 suggests a general procedure for replacing an m -valued latent variable with some finite number of binary latent variables. Replace the m -valued variable with a number of substitute variables-the analogues of γ and η above, but which now can take an arbitrary number of values-such that any distribution over the m -valued variable can be simulated using a k -latent-bit causal model-the analogue of the causal model containing µ and ν above-underlying the substitute variables. By eliminating the intermediary variables, the dependence of the observed variables on the m -valued latent variable is replaced with a dependence on k binary latent variables.
We now describe a procedure for replacing an m -valued variable, for any m , by two variables γ and η in such a way that any distribution over the m -valued variable is obtained by some k -latent-bit causal model underlying γ and η .
Recall that for a 3-valued variable, we can take γ and η to be bits and use the fiducial model from class (2 , 1 , c ) Id , whose distribution is the convex combination of an edge of the tetrahedron and a vertex not contained in that edge. Similarly, for a 4-valued variable, we can take γ and η to be bits and use the fiducial model from class (3 , 2 , g ) Id , whose distribution is the convex combination of a face and vertex not contained in that face.
For a 5-valued variable, we can take γ to be a trit and η to be a bit. For any causal model underlying γ and η , the semialgebraic set generated by this model is now a subset of a simplex with six vertices, [ γη ] ∈ { [00] , [01] , [10] , [11] , [20] , [21] } We now construct a causal model underlying γ and η by combining two simpler models, using the procedure described in section 4 in the main text: the first model is one whose semi-algebraic set is the tetrahedron (considered as the subset of the six-simplex having [ γη ] ∈ { [00] , [01] , [10] , [11] } ) and the second is one whose semi-algebraic set is a vertex of the six-simplex not contained in the tetrahedron. A binary switch variable toggles between these two simpler models. Given the geometry, the semi-algebraic set defined by the model is clearly the convex combination of the tetrahedron and the vertex outside the tetrahedron. In particular, we can take the first model to be the fiducial model from class (3 , 3) Id (where γ is replaced by a trit but its dependence on its causal parents is unchanged) and the second model to be a deterministic model that sets γ = 2 and η = 0 . Denoting the switch variable by ρ , and the other latent bits by µ , ν , δ (as in the row containing class (3 , 3) Id ), we obtain the following functional dependences by the switch-variable construction: γ = ρ ( µν ⊕ 2 1) ⊕ 3 2( ρ ⊕ 2 1) and η = ρ ( µνδ ⊕ 2 ν ) . One easily verified that if ρ = 1 , one recovers the fiducial model of class (3 , 3) Id and hence the tetrahedron spanned by [ γη ] ∈ { [00] , [01] , [10] , [11] } , while if ρ = 0 , one obtains the point [ γη ] = [20] .
By increasing the number of values that γ and η can take, one can ensure that the number of vertices in the space of distributions over γ and η is at least m , such that one can simulate an m -valued latent variable by finding a causal model underlying γ and η whose semi-algebraic set is an m -simplex. To construct such a model, we apply the switch-variable construction to a pair of simpler models, one of which has an semi-algebraic set corresponding to an ( m -1) -simplex, and the other of which is a deterministic model corresponding to a vertex outside of this ( m -1) -simplex. In this way, we can recursively build up a causal model involving only binary latent variables whose semi-algebraic set is an m -simplex for any m .