# Adversarial Signed Graph Learning with Differential Privacy
**Authors**: Haobin Ke, Sen Zhang, Qingqing Ye, Xun Ran, Haibo Hu
> The Hong Kong Polytechnic University Hung Hom Hong Kong
> The Hong Kong Polytechnic University Research Centre for Privacy and Security Technologies in Future Smart Systems, PolyU Hung Hom Hong Kong
by
(2026)
## Abstract
Signed graphs with positive and negative edges can model complex relationships in social networks. Leveraging on balance theory that deduces edge signs from multi-hop node pairs, signed graph learning can generate node embeddings that preserve both structural and sign information. However, training on sensitive signed graphs raises significant privacy concerns, as model parameters may leak private link information. Existing protection methods with differential privacy (DP) typically rely on edge or gradient perturbation for unsigned graph protection. Yet, they are not well-suited for signed graphs, mainly because edge perturbation tends to cascading errors in edge sign inference under balance theory, while gradient perturbation increases sensitivity due to node interdependence and gradient polarity change caused by sign flips, resulting in larger noise injection. In this paper, motivated by the robustness of adversarial learning to noisy interactions, we present ASGL, a privacy-preserving adversarial signed graph learning method that preserves high utility while achieving node-level DP. We first decompose signed graphs into positive and negative subgraphs based on edge signs, and then design a gradient-perturbed adversarial module to approximate the true signed connectivity distribution. In particular, the gradient perturbation helps mitigate cascading errors, while the subgraph separation facilitates sensitivity reduction. Further, we devise a constrained breadth-first search tree strategy that fuses with balance theory to identify the edge signs between generated node pairs. This strategy also enables gradient decoupling, thereby effectively lowering gradient sensitivity. Extensive experiments on real-world datasets show that ASGL achieves favorable privacy-utility trade-offs across multiple downstream tasks. Our code and data are available in https://github.com/KHBDL/ASGL-KDD26.
Differential privacy, Adversarial signed graph learning, Constrained breadth first search-trees, Balanced theory. journalyear: 2026 copyright: cc conference: Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1; August 09–13, 2026; Jeju Island, Republic of Korea booktitle: Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD ’26), August 09–13, 2026, Jeju Island, Republic of Korea doi: 10.1145/3770854.3780282 isbn: 979-8-4007-2258-5/2026/08 ccs: Security and privacy Data anonymization and sanitization
## 1. Introduction
The signed graph is a common and widely adopted graph structure that can represent both positive and negative relationships using signed edges (19; 20; 21). For example, in online social networks shown in Fig. 1, while user interactions reflect positive relationships (e.g., like, trust, friendship), negative relationships (e.g., dislike, distrust, complaint) also exist. Signed graphs provide more expressive power than unsigned graphs to capture such complex user interactions.
Recently, some studies (22; 23; 24) have explored signed graph learning methods, aiming to obtain low-dimensional vector representations of nodes that preserve key signed graph properties: neighbor proximity and structural balance. These embeddings are subsequently applied to downstream tasks such as edge sign prediction, node clustering, and node classification. Among existing signed graph learning methods, balance theory (27) has proven effective in identifying the edge signs between the source node and multi-hop neighbor nodes. It is leveraged in graph neural network (GNN)-based models to guide message passing across signed edges, ensuring that information aggregation is aligned with the node proximity (36; 38; 39). Moreover, to enhance the robustness and generalization capability of deep learning models, the adversarial graph embedding model (03; 14) learns the underlying connectivity distribution of signed graphs by generating high-quality node embeddings that preserve signed node proximity.
Despite their ability to effectively capture signed relationships between nodes, graph learning models remain vulnerable to link stealing attacks (25; 42; 43), which aim to infer the existence of links between arbitrary node pairs in the training graph. For instance, in online social graphs, such attacks may reveal whether two users share a friendly or adversarial relationship, compromising user privacy and damaging personal or professional reputations.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Network Diagram: User Interaction Graph
### Overview
The image displays a directed graph diagram illustrating social interactions among eight users. The diagram uses icons to represent users and colored, labeled lines to represent the nature of interactions between them. The overall structure is a network with nodes (users) and edges (interactions), showing a mix of positive and negative relationships.
### Components/Axes
* **Nodes (Users):** Eight user icons, each labeled with a unique identifier.
* **Top Row (Left to Right):** User 1, User 2, User 8
* **Middle Row (Left to Right):** User 3, User 4
* **Bottom Row (Left to Right):** User 5, User 6, User 7
* **Edges (Interactions):** Lines connecting users, color-coded and marked with a symbol (+ or -).
* **Legend (Located on the right side):**
* **Blue Line:** Labeled "Positive Interaction (Like, Gift)"
* **Red Line:** Labeled "Negative Interaction (Distrust, Complaint)"
### Detailed Analysis
The following is a complete list of all interactions depicted in the diagram, identified by connecting users, line color, and the associated symbol.
**Positive Interactions (Blue Lines with '+'):**
1. User 1 → User 2 (Horizontal line, top)
2. User 1 → User 6 (Diagonal line, top-left to bottom-center)
3. User 2 → User 6 (Vertical line, top-center to bottom-center)
4. User 3 → User 5 (Diagonal line, middle-left to bottom-left)
5. User 4 → User 7 (Diagonal line, middle-right to bottom-right)
**Negative Interactions (Red Lines with '-'):**
1. User 1 → User 3 (Vertical line, top-left to middle-left)
2. User 1 → User 5 (Vertical line, top-left to bottom-left)
3. User 2 → User 4 (Diagonal line, top-center to middle-right)
4. User 2 → User 8 (Horizontal line, top)
5. User 4 → User 6 (Diagonal line, middle-right to bottom-center)
6. User 4 → User 8 (Diagonal line, middle-right to top-right)
7. User 5 → User 6 (Horizontal line, bottom)
8. User 6 → User 7 (Horizontal line, bottom)
9. User 7 → User 8 (Vertical line, bottom-right to top-right)
### Key Observations
* **Central Nodes:** User 2 and User 4 appear to be central hubs. User 2 has the highest number of outgoing connections (4 total: 2 positive, 2 negative). User 4 also has a high degree of connectivity (4 total: 1 positive, 3 negative).
* **Isolated Node:** User 8 is only connected via negative interactions (with User 2, User 4, and User 7) and has no positive incoming or outgoing links shown.
* **Cluster of Positivity:** A positive interaction cluster exists between Users 1, 2, 3, 5, and 6, though it is interspersed with negative links.
* **Prevalence of Negative Links:** Negative interactions (red lines) are more numerous (9 instances) than positive ones (5 instances) in this specific network snapshot.
* **Reciprocity:** No direct reciprocal interactions (e.g., A→B and B→A) are shown. All connections are unidirectional as drawn.
### Interpretation
This diagram models a social or trust network where the quality of relationships is explicitly categorized. The data suggests a complex environment where influence (indicated by high connectivity) does not equate to popularity or trust. For instance, User 2 and User 4 are highly connected but are sources or targets of significant negative sentiment.
The network structure reveals potential fault lines or subgroups. The lack of positive connections to User 8 might indicate an outlier or a user who is distrusted by the central figures (Users 2 and 4). The positive loop involving Users 1, 2, and 6 could represent a core cooperative group, but its integrity is undermined by the negative link from User 1 to User 3 and User 5.
From a Peircean perspective, this diagram is an **icon** (it resembles a network) and an **index** (the lines point to specific relationships). The **symbolic** layer is provided by the legend, which assigns social meaning (like/gift vs. distrust/complaint) to the visual variables (color and sign). The diagram's value lies in making abstract social dynamics concrete and analyzable, highlighting central actors, potential conflicts, and isolated individuals. It serves as a diagnostic tool for understanding group cohesion and identifying users who may require mediation or support.
</details>
Figure 1. A signed social graph with blue edges for positive links and red edges for negative links.
Differential privacy (DP) (06) is a rigorous privacy framework that guarantees statistically indistinguishable outputs regardless of any individual data presence. Such guarantee is achieved through sufficient perturbation while maintaining provable privacy bounds and computational feasibility. Existing privacy-preserving graph learning methods with DP can be categorized into two types based on the perturbation mechanism: one applies edge perturbation (53) to protect the link information by modifying the graph structure, and the other adopts gradient perturbation (54; 52) to obscure the relationships between nodes during model training. However, these methods are not well-suited for signed graph learning due to the following two challenges:
- Cascading error: As illustrated in Fig. 2, balance theory facilitates the inference of the edge sign between two unconnected nodes by computing the product of edge signs along a path. However, existing methods that use edge perturbation to protect link information may alter the sign of any edge along the path, thereby leading to incorrect inference of edge signs under balance theory. Such a local error can further propagate along the path, resulting in cascading errors in edge sign inference.
- High sensitivity: While gradient perturbation methods without directly perturbing edges may mitigate cascading errors, they are still ill-suited for signed graph learning because the node interdependence in signed graphs leads to high gradient sensitivity. The presence or absence of a node affects gradient updates of itself and its neighbors. Furthermore, edge change may induce sign flips that reverse gradient polarity within the loss function (see Eq. (10) for details), resulting in higher sensitivity compared to unsigned graphs. This increased sensitivity requires larger noise for privacy protection, thereby reducing the data utility.
To address these challenges, we turn to an adversarial learning-based approach for private signed graph learning. The core motivation is that this adversarial method generates node embeddings by approximating the true connectivity distribution, making it naturally robust to noisy interactions during optimization. As a result, we propose ASGL, a differentially private adversarial signed graph learning method that achieves high utility while maintaining node-level differential privacy. Within ASGL, the signed graph is first decomposed into positive and negative subgraphs based on edge signs. These subgraphs are then processed through an adversarial learning module within shared model parameters, enabling both positive and negative node pairs to be mapped into a unified embedding space while effectively preserving signed proximity. Based on this, we develop the adversarial learning module with differentially private stochastic gradient descent (DPSGD), which generates private node embeddings that closely approximate the true signed connectivity distribution. In particular, the gradient perturbation helps mitigate cascading errors, while the subgraph separation avoids gradient polarity reversals induced by edge sign flips within the loss function, thereby reducing the sensitivity to changes in edge signs. Considering that node interdependence further increases gradient sensitivity, we design a constrained breadth-first search (BFS) tree strategy within adversarial learning. This strategy integrates balance theory to identify the edge signs between generated node pairs, while also constraining the receptive fields of nodes to enable gradient decoupling, thereby effectively lowering gradient sensitivity and reducing noise injection. Our main contributions are listed as follows:
- We present a privacy-preserving adversarial learning method for signed graphs, called ASGL. To our best knowledge, it is the first work that can ensure the node-level differential privacy of signed graph learning while preserving high data utility.
- To mitigate cascading errors, we develop the adversarial learning module with DPSGD, which generates private node embeddings that closely approximate the true signed connectivity distribution. This approach avoids direct perturbation of the edge structure, which helps mitigate cascading errors and prevents gradient polarity reversals in the loss function.
- To further reduce the sensitivity caused by complex node relationships, we design a constrained breadth-first search tree strategy that integrates balance theory to identify edge signs between generated node pairs. This strategy also constrains the receptive fields of nodes, enabling gradient decoupling and effectively lowering gradient sensitivity.
- Extensive experiments demonstrate that our method achieves favorable privacy-accuracy trade-offs and significantly outperforms state-of-the-art methods in edge sign prediction and node clustering tasks. Additionally, we conduct link stealing attacks, demonstrating that ASGL exhibits stronger resistance to such attacks across all datasets.
The remainder of our work is organized as follows. Section 2 describes the preliminaries of our solution. The problem statement is introduced in Section 3. Our proposed solution and its privacy analysis are presented in Section 4. The experimental results are reported in Section 5. We discuss related works in Section 6, followed by conclusion in Section 7.
## 2. Preliminaries
In this section, we provide an overview of signed graphs, differential privacy, and DPSGD. Additionally, the vanilla adversarial graph learning is introduced in App. A, and the frequently used notations are summarized in Table 5 (See App. B).
### 2.1. Signed Graph with Balance Theory
A signed graph is denoted as $G=(V,E^+,E^-)$ , where $V$ is the set of nodes, and $E^+/E^-$ represent positive and negative edge sets, respectively. An edge $e_ij=(v_i,v_j)∈ E^+/E^-$ represents the positive/negative link between node pair $(v_i,v_j)∈ V$ , respectively. Notably, $E^+∩ E^-=∅$ ensures that any node pair cannot maintain both positive and negative relationships simultaneously. The objective of signed graph embedding is to learn a mapping function $f:V→ℝ^k$ that projects each node $v∈ V$ into a low $k$ -dimensional vector while preserving both the structural properties of the original signed graph. In other words, node pairs connected by positive edges should be embedded closely, while those connected by negative edges should be placed farther apart in the embedding space.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Diagram: Signed Graph Edge Relationships
### Overview
The image is a technical diagram illustrating three distinct configurations of signed relationships between nodes in a graph or network. It demonstrates how the signs of direct edges from a common intermediary node determine the nature of the indirect relationship between two terminal nodes, \( v_r \) and \( v_t \). The diagram is composed of three separate triangular structures arranged horizontally, with a legend below explaining the edge types.
### Components/Axes
* **Nodes:** Each of the three structures contains three nodes arranged in a triangle.
* **Top Node:** An unlabeled, white-filled circle at the apex of each triangle.
* **Bottom-Left Node:** A blue-filled circle labeled \( v_r \).
* **Bottom-Right Node:** A circle labeled \( v_t \). Its fill color varies: light green in the first two structures, pink in the third.
* **Edges & Signs:**
* **Direct Edges:** Solid black lines connecting the top node to each bottom node (\( v_r \) and \( v_t \)). Each is annotated with a sign: `+` (positive) or `-` (negative).
* **Indirect Edge:** A dashed line connecting \( v_r \) and \( v_t \) directly. Its color and sign are determined by the configuration of the direct edges.
* **Legend:** A dashed-border box at the bottom center defines the edge types:
* `Direct Edge`: Represented by a solid black line.
* `Positive Indirect Edge`: Represented by a dashed green line.
* `Negative Indirect Edge`: Represented by a dashed red line.
### Detailed Analysis
The diagram presents three specific cases from left to right:
**1. Left Structure (Positive Indirect Relationship):**
* **Direct Edges:** Top node to \( v_r \) is `+`. Top node to \( v_t \) is `+`.
* **Indirect Edge:** A **dashed green line** connects \( v_r \) and \( v_t \), annotated with a `+` sign.
* **Interpretation:** Two positive direct relationships result in a positive indirect relationship between the endpoints.
**2. Middle Structure (Positive Indirect Relationship):**
* **Direct Edges:** Top node to \( v_r \) is `-`. Top node to \( v_t \) is `-`.
* **Indirect Edge:** A **dashed green line** connects \( v_r \) and \( v_t \), annotated with a `+` sign.
* **Interpretation:** Two negative direct relationships also result in a positive indirect relationship between the endpoints (the "enemy of my enemy is my friend" principle).
**3. Right Structure (Negative Indirect Relationship):**
* **Direct Edges:** Top node to \( v_r \) is `+`. Top node to \( v_t \) is `-`.
* **Indirect Edge:** A **dashed red line** connects \( v_r \) and \( v_t \), annotated with a `-` sign.
* **Interpretation:** A mix of one positive and one negative direct relationship results in a negative indirect relationship between the endpoints.
### Key Observations
* The color of the \( v_t \) node changes from light green (in positive indirect outcomes) to pink (in the negative indirect outcome), providing a visual cue for the result.
* The sign of the indirect edge is the **product** of the signs of the two direct edges: `(+) * (+) = (+)`, `(-) * (-) = (+)`, `(+) * (-) = (-)`.
* The diagram is a visual representation of **balance theory** or **structural balance** in signed graphs, often used in social network analysis, psychology, and game theory.
### Interpretation
This diagram is a foundational illustration of how local relationships (direct edges) propagate to determine global structure (indirect relationships) in a signed network. It encodes a simple algebraic rule: the sign of a path of length two is the product of the signs of its constituent edges.
The key insight is that **balanced triads** (where the product of all three edges in the triangle is positive) are stable. The first two structures are balanced (`+ * + * + = +` and `- * - * + = +`), while the third is unbalanced (`+ * - * - = +`? Wait, let's check: the direct edges are + and -, and the indirect is -. The product of all three edges is `+ * - * - = +`. Actually, all three shown triads are balanced according to the standard definition where the product of the three edge signs is positive. The third triad is balanced because the two negative edges (one direct, one indirect) make the product positive). The diagram effectively teaches the rule for inferring the likely relationship between two entities (\( v_r \) and \( v_t \)) based on their shared relationships with a common third party.
</details>
Figure 2. The signs of multi-hop connection based on balanced theory.
Balance theory (27) is a well-established standard to describe the signed relationships of unconnected node pairs. It is commonly summarized by four intuitive rules: “A friend of my friend is my friend,” “A friend of my enemy is my enemy,” “An enemy of my friend is my enemy,” and “An enemy of my enemy is my friend.” Based on these rules, the balance theory can deduce signs of the multi-hop connection. As shown in Fig. 2, given a path $P_rt:v_r→ v_t$ from rooted node $v_r$ to target node $v_t$ , the sign of the indirect relationships between $v_r$ and $v_t$ can be inferred by iteratively applying balance theory. Specifically, the sign of the multi-hop connection corresponds to the product of the signs of the edges along the path.
### 2.2. Differential Privacy
Differential Privacy (DP) (04) provides a rigorous mathematical framework for quantifying the privacy guarantees of algorithms operating on sensitive data. Informally, it bounds how much the output distribution of a mechanism can change in response to small changes in its input. When applying DP to signed graph data, the definition of adjacent databases typically considers two signed graphs, $G$ and $G^\prime$ , which are regarded as adjacent graphs if they differ by at most one edge or one node with its associated edges.
**Definition 0 (Edge (Node)-level DP(05))**
*Given $ε>0$ and $δ>0$ , a graph analysis mechanism $M$ satisfies edge- or node-level $(ε,δ)$ -DP, if for any two adjacent graph datasets $G$ and $G^\prime$ that only differ by an edge or a node with its associated edges, and for any possible algorithm output $S⊆ Range(M)$ , it holds that
$$
\displaystylePr[M(G)∈ S]≤ e^εPr[M(G^\prime)∈ S]+δ. \tag{1}
$$
Here, $ε$ is the privacy budget (i.e., privacy cost), where smaller values indicate stronger privacy protection but greater utility reduction. The parameter $δ$ denotes the probability that the privacy guarantee may not hold, and is typically set to be negligible. In other words, $δ$ allows for a negligible probability of privacy leakage, while ensuring the privacy guarantee holds with high probability.*
**Remark 1**
*Note that satisfying node-level DP is much more challenging than satisfying edge-level DP, as removing a single node may, in the worst case, remove $|V|-1$ edges, where $|V|$ denotes the total number of nodes. Consequently, node-level DP requires injecting substantially more noise.*
Two fundamental properties of DP are useful for the privacy analysis of complex algorithms: (1) Post-Processing Property (06): If a mechanism $M(G)$ satisfies $(ε,δ)$ -DP, then for any function $f$ that indirectly queries the private dataset $G$ , the composition $f(M(G))$ also satisfies $(ε,δ)$ -DP; (2) Composition Property (06): If $M(G)$ and $f(G)$ satisfy $(ε_1,δ_1)$ -DP and $(ε_2,δ_2)$ -DP, respectively, then the combined mechanism $F(G)=(M(G),f(G))$ which outputs both results, satisfies $(ε_1+ε_2,δ_1+δ_2)$ -DP.
### 2.3. DPSGD
A common approach to differentially private training combines noisy stochastic gradient descent with the Moments Accountant (MA) (02). This approach, known as DPSGD, has been widely adopted for releasing private low-dimensional representations, as MA effectively mitigates excessive privacy loss during iterative optimization. Formally, for each sample $x_i$ in a batch of size $B$ , we compute its gradient $∇L_i(θ)$ , denoted as $∇(x_i)$ for simplicity. Gradient sensitivity refers to the maximum change in the output of the gradient function resulting from a change in a single sample. To control the sensitivity of ${∇(x_i)}$ , the $\ell_2$ norm of each gradient is clipped by a threshold $C$ . These clipped gradients are then aggregated and perturbed with Gaussian noise $N(0,σ^2C^2I)$ to satisfy the DP guarantee. Finally, the average noisy gradient ${\tilde{∇}_B}$ is used to update the model parameters $θ$ . This process is given by:
$$
\displaystyle{\tilde{∇}_B}←\frac{1}{B}\Big(∑_i=1^BClip_C(∇(x_i))+N≤ft(0,σ^2C^2I\right)\Big). \tag{2}
$$
Here, $Clip_C(∇(x_i))=∇(x_i)/\max(1,\frac{||∇(x_i)||_2}{C})$ .
## 3. Problem Definition and Existing Solutions
### 3.1. Problem Definition
Instead of publishing a sanitized version of original node embeddings, we aim to release a privacy-preserving ASGL model trained on raw signed graph data with node-level DP guarantees, enabling data analysts to generate task-specific node embeddings.
Threat Model. We consider a black-box attack (42), where the attacker can query the trained model and observe its outputs with no access to its internal architecture or parameters. The attacker attempts to infer the presence of specific nodes or edges in the training graph solely from model outputs. This setting reflects a more practical attack surface compared to the white-box scenario (11).
Privacy Model. Signed graph data encodes both positive and negative relationships between nodes, which differs from tabular or image data. Therefore, it is necessary to adapt the standard definition of node-level DP (See Definition 1) to ensure black-box adversaries cannot determine whether a specific node and its associated signed edges are present in the training data. To this end, we define the differentially private adversarial signed graph learning model as follows.
**Definition 0 (Adversarial signed graph learning model under node-level DP)**
*The vanilla process of graph adversarial learning is illustrated in App. A, let $θ_D$ denote the discriminator parameters, and its $r$ -th row element corresponds to the $k$ -dimensional vector $d_v_{r}$ of node $v_r$ , that is $d_v_{r}∈θ_D$ . The discriminator module $L_D$ satisfies node-level ( $ε,δ$ )-DP if two adjacent signed graphs $G$ and $G^\prime$ only differ in one node with its associated signed edges, and for all possible $θ_s⊆ Range(L_D)$ , we have
$$
\displaystylePr[L_D(G)∈θ_s]≤ e^εPr[L_D(G^\prime)∈θ_s]+δ, \tag{3}
$$
where $θ_s$ denotes the set comprising all possible values of $θ_D$ .*
In particular, the generator $G$ is trained based on the feedback from the differentially private discriminator $D$ . According to the post-processing property of DP (08; 12), the generator module $L_G$ also satisfies node-level $(ε,δ)$ -DP. Leveraging the robustness to post-processing property, the privacy guarantee is preserved in the generated signed node embeddings and their downstream usage.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Diagram: Graph Processing Pipeline with Positive/Negative Edge Separation and Gradient-Based Learning
### Overview
This image is a technical flowchart illustrating a machine learning pipeline for graph data. The process involves decomposing an original graph into positive and negative subgraphs, generating constrained paths from rooted nodes, creating embeddings, and applying gradient-based optimization (with differential privacy considerations) for downstream tasks. The diagram is divided into three main sections labeled (i), (ii), and (iii).
### Components/Axes
**Legend (Top-Left Corner):**
- **Rooted node:** Yellow circle
- **Positive edge:** Blue line
- **Negative edge:** Red line
**Main Components & Flow:**
1. **Section (i) - Graph Decomposition:**
* **Input:** "The original graph G" (center-left). It contains nodes connected by both blue (positive) and red (negative) edges. One node is highlighted in yellow as the "Rooted node".
* **Output 1 (Top Path):** "The positive graph G⁺". Contains only nodes and the blue positive edges from G.
* **Output 2 (Bottom Path):** "The negative graph G⁻". Contains only nodes and the red negative edges from G.
2. **Section (ii) - Path Generation & Embedding:**
* **Top Path (from G⁺):**
* Process: "Constrained BFS-tree" applied to G⁺.
* Parameters: "Max path length L=2", "Max path amount N=2".
* Output: "Path from Constrained BFS-tree" showing sequences of nodes (e.g., yellow→blue→white). Labeled with "Positive relevance probability" `p_e⁺(v_j|v_i)`.
* Result: "Real positive edges" (blue lines between nodes).
* **Bottom Path (from G⁻):**
* Process: "Constrained BFS-tree" applied to G⁻.
* Parameters: "Max path length L=4", "Max path amount N=2".
* Output: "Path from Constrained BFS-tree" showing longer sequences. Labeled with "Negative relevance probability" `p_e⁻(v_j|v_i)`.
* Result: "Fake negative edges" (red lines between nodes, one node is shaded red).
* **Central Block:** "Embedding Space (Φ_e)". Arrows labeled "Update" point from the path outputs to this block. Dashed red arrows labeled "Guidance: Post-processing" connect this block to a central pink box labeled "Downstream tasks". Another dashed arrow points from the downstream tasks box back to the embedding space.
3. **Section (iii) - Gradient Processing:**
* **Top Path (Continuation):**
* Input: "Real positive edges" from section (ii).
* Process: Feeds into a block labeled "D⁺".
* Gradient Operations: Shows multiple gradient symbols (∇) with "Gradient Clipping" applied. These are summed (Σ) and averaged (1/n).
* Final Step: "Gradient Average" leading to "Noise Addition" (represented by a bell curve icon) and then "DPSGD" (Differentially Private Stochastic Gradient Descent).
* Output: A node labeled "Real".
* **Bottom Path (Continuation):**
* Input: "Fake negative edges" from section (ii).
* Process: Feeds into a block labeled "D⁻".
* Gradient Operations: Identical structure to the top path—gradient clipping, summation, averaging.
* Final Step: "Gradient Average" leading to "Noise Addition" and "DPSGD".
* Output: A node labeled "Fake".
* **Legend (Top-Right of this section):** Shows "Real" (white circle) and "Fake" (red-shaded circle).
**Textual Annotations & Mathematical Notation:**
- "Directly linked" (appears twice, connecting G to G⁺/G⁻ and the BFS-tree outputs to D⁺/D⁻).
- Probability notations: `p_e⁺(v_j|v_i)` and `p_e⁻(v_j|v_i)`.
- Gradient notation: `∇` with subscripts like `∇_θ L`.
- Summation and averaging formulas: `Σ (1/n) ∇_θ L(...)`.
- Process labels: "Update", "Guidance: Post-processing", "Gradient Clipping", "Noise Addition", "DPSGD".
### Detailed Analysis
The diagram describes a dual-path pipeline that treats positive and negative graph structures differently:
1. **Path Length Discrepancy:** The constrained BFS-tree for the positive graph (G⁺) uses a shorter maximum path length (L=2) compared to the negative graph (G⁻, L=4). This suggests the model expects or enforces that meaningful positive connections are more local, while negative relationships can be inferred over longer ranges.
2. **Embedding & Guidance:** The paths from both trees are used to update a shared "Embedding Space (Φ_e)". This space is actively guided by and informs "Downstream tasks" via a post-processing feedback loop.
3. **Gradient Processing with Privacy:** The outputs from the embedding space ("Real positive edges" and "Fake negative edges") are processed by separate modules (D⁺ and D⁻). The gradients from these modules are clipped, averaged, and then perturbed with noise via a DPSGD mechanism. This indicates a focus on privacy-preserving model training, likely to prevent leakage of sensitive graph structure information.
4. **Final Output:** The pipeline culminates in the generation of "Real" and "Fake" samples, suggesting this could be part of a generative model (like a GAN for graphs) or a discriminator training setup where the model learns to distinguish real graph structures from synthetically generated or negative ones.
### Key Observations
1. **Asymmetric Processing:** The core asymmetry is in the BFS-tree parameters (L=2 vs. L=4), which is a critical design choice influencing what patterns the model learns from positive versus negative edges.
2. **Closed-Loop Guidance:** The bidirectional arrows between the Embedding Space and Downstream Tasks indicate an interactive or iterative training process, not a simple feed-forward flow.
3. **Privacy Integration:** The explicit inclusion of "Gradient Clipping", "Noise Addition", and "DPSGD" highlights that differential privacy is a first-class concern in this architecture, not an afterthought.
4. **Component Labeling:** The modules D⁺ and D⁻ are not further defined but likely represent discriminator or decoder networks specific to positive and negative data streams.
### Interpretation
This diagram outlines a sophisticated graph representation learning framework with two key innovations:
1. **Contrastive Structural Learning:** By explicitly separating positive and negative edges and processing them through differently constrained path generators, the model is designed to learn a nuanced embedding space that captures not just what connections exist (positive), but also what connections are absent or invalid (negative). The longer path length for negatives implies that "non-relationship" is a more complex, higher-order property to model.
2. **Privacy-Aware Graph ML:** The integration of DPSGD into the gradient processing pipeline addresses a major challenge in graph machine learning: the risk of memorizing and leaking sensitive relational data. This makes the framework suitable for applications on confidential networks (e.g., social, financial, or medical graphs).
The overall flow suggests a method for training a graph generator or discriminator. The "Real" and "Fake" outputs at the end could be used in an adversarial setup where D⁺ and D⁻ act as critics, or the entire system could be a private graph auto-encoder where the goal is to reconstruct real graph structures while protecting individual edge privacy. The "Downstream tasks" block is the ultimate beneficiary of the privately learned embeddings, which could include node classification, link prediction, or graph classification.
</details>
Figure 3. Overview of the ASGL framework: (i) The process decomposes a signed graph into positive and negative subgraphs, (ii) then maps node pairs into a unified embedding space while preserving signed proximity. To ensure privacy, (iii) adversarial learning module with DPSGD generates private node embeddings that approximate true connectivity without cascading errors. (iv) A constrained BFS-tree strategy manages node receptive field, reduces gradient noise, and improves model utility.
### 3.2. Existing Solutions
To our best knowledge, existing differentially private graph learning methods follow two main tracks: gradient perturbation and edge perturbation. In the first category, Yang et al. (54) introduce a privacy-preserving generative model that incorporates generative adversarial networks (GAN) or variational autoencoders (VAE) with DPSGD to protect edge privacy, while Xiang et al. (52) design a node sampling mechanism that adds Laplace noise to per-subgraph gradients, achieving node-level DP. For the edge perturbation-based methods, Lin et al. (53) use randomized response to perturb the adjacency matrix for edge-level privacy, and EDGERAND (42) perturbs the graph structure while preserving sparsity by clipping the adjacency matrix according to a privacy-calibrated graph density.
Limitation. The aforementioned solutions are not directly applicable to signed graphs. This is primarily because edge perturbation can lead to cascading errors when inferring edge signs under balance theory. Moreover, gradient perturbation often suffers from high sensitivity caused by complex node dependencies and gradient polarity reversal from edge sign flips, leading to excessive noise and degraded model utility.
## 4. Our Proposal: ASGL
To tackle the above limitations, we present ASGL, a DP-based adversarial signed graph learning model that integrates a constrained BFS-tree strategy to achieve favorable utility-privacy tradeoffs.
### 4.1. Overview
The ASGL framework, illustrated in Fig. 3, comprises three steps:
- Private Adversarial Signed Graph Learning. The signed graph $G$ is first split into positive and negative subgraphs, $G^+$ and $G^-$ , based on edge signs. Subsequently, two discriminators, $D^+$ and $D^-$ , sharing parameters $θ_D$ , are trained to distinguish real from fake positive and negative edges. Guided by $D^+$ and $D^-$ , two generators $G^+$ and $G^-$ with shared parameters $θ_G$ generate node embeddings that approximate the true connectivity distribution. To ensure node-level DP, we apply gradient perturbation during discriminator training instead of directly perturbing edges. This strategy mitigates cascading errors and prevents gradient polarity reversals caused by edge sign flips, thereby reducing gradient sensitivity. By the post-processing property, the generators also preserve node-level DP.
- Optimization via Constrained BFS-tree. To further reduce gradient sensitivity and the required noise scale, ASGL employs a constrained BFS-tree strategy. By empirically limiting the number and length of paths, each node’s receptive field is restricted, which reduces node dependency and enables gradient decoupling. This significantly lowers gradient sensitivity and enhances model utility under differential privacy constraints.
- Privacy Accounting and Complexity Analysis. The complete training process for ASGL is outlined in Algorithm 2 (see App. F.3). Based on this, we present a comprehensive privacy accounting and computational complexity analysis for ASGL.
### 4.2. Private Adversarial Signed Graph Learning
Motivated by (03; 14), a signed graph $G$ is first divided into a positive subgraph $G^+$ and a negative subgraph $G^-$ according to edge signs. Let $N(v_r)$ be the set of neighbor nodes directly connected to node $v_r$ . We denote the true positive and negative connectivity distributions of $v_r$ over its neighborhood $N(v_r)$ as the conditional probabilities $p_true^+(·|v_r)$ and $p_true^-(·|v_r)$ , which capture the preference of $v_r$ to connect with other nodes in $V$ . The adversarial learning for the signed graph $G$ is conducted by two adversarial learning modules:
Generators $G^+$ and $G^-$ : Through optimizing the shared parameters $θ_G$ , generators $G^+$ and $G^-$ aim to approximate the underlying true connectivity distribution and generate the most likely but unconnected nodes $v_t∉N(v_r)$ that are relevant to a given node $v_r$ . To this end, we estimate the relevance probabilities of these fake The term “Fake” indicates that although a node $v$ selected by the generator is relevant to $v_r$ , there is no actual edge between them. node pairs. Specifically, for the implementation of $G^+$ , given the fake positive node pairs $(v_r,v_t)^+$ , we use the graph softmax function (03) to calculate the fake positive connectivity probability:
$$
p^+_fake(v_t|v_r)=G^+≤ft(v_t|v_r;θ_G\right)=σ(g_v_{t}^⊤g_v_{r})=\frac{1}{1+\exp({-g_v_{t}^⊤g_v_{r})}}, \tag{4}
$$
where $g_v_{t},g_v_{r}∈ℝ^k$ are the $k$ -dimensional vectors of nodes $v_t$ and $v_r$ , respectively, and $θ_G$ is the union of all $g_v$ ’s. The output $G^+(v_t|v_r;θ_G)$ increases with the decrease of the distance between $v_r$ and $v_t$ in the embedding space of the generator $G^+$ . Similarly, for the generator $G^-$ , given the fake negative node pairs $(v_r,v_t)^-$ , we estimate their fake negative connectivity probability:
$$
p^-_fake(v_t|v_r)=G^-(v_t|v_r;θ_G)=1-σ(g_v_{t}^⊤g_v_{r})=\frac{\exp{(-g_v_{t}^⊤g_v_{r}})}{1+\exp{(-g_v_{t}^⊤g_v_{r}})}. \tag{5}
$$
Here, Eq. (5) ensures that node pairs with higher negative connectivity probabilities are mapped farther apart in the embedding space of $G^-$ . Since generators $G^+$ and $G^-$ share the parameters $θ_G$ , they jointly learn the proximity and separation of positive and negative node pairs in a unified embedding space, respectively.
Notably, the aforementioned fake node pairs $(v_r,v_t)^+$ and $(v_r,v_t)^-$ are sampled by a breadth-first search (BFS)-tree strategy (27). Compared to depth-first search (DFS) (56), BFS ensures more uniform exploration of neighboring nodes and can be integrated with random walk techniques (29) to optimize computational efficiency. Specifically, we perform BFS on the positive subgraph $G^+$ to construct a BFS-tree $T^+_v_{r}$ rooted from node $v_r$ . Then, we calculate the positive relevance probability of node $v_r$ with its neighbors $v_k∈N({v_r})$ :
$$
p^+_T^+_v_{r}(v_k|v_r)=\frac{\exp≤ft(g_v_{k}^⊤g_v_{r}\right)}{∑_v_{k∈N({v_r})}\exp≤ft(g_v_{k}^⊤g_v_{r}\right)}, \tag{6}
$$
which is actually a softmax function over $N({v_r})$ . To further sample node pairs unconnected in $T^+_v_{r}$ as fake positive edges, we perform a random walk at $T^+_v_{r}$ : Starting from the root node $v_r$ , a path $P_rt:v_r→ v_t$ is built by iteratively selecting the next node based on the transition probabilities defined in Eq. (6). The resulting unconnected node pair $(v_r,v_t)^+$ is treated as a fake positive edge, and App. E provides an example of this process. Given the node pair $(v_r,v_t)^+$ , the generator $G^+$ estimates $p^+_fake(v_t|v_r)$ according to Eq. (4).
Similarly, we also establish a BFS-tree $T^-_v_{r}$ rooted at node $v_r$ in the negative subgraph $G^-$ . To obtain the negative node pair $(v_r,v_t)^-$ , we perform a random walk on $T^-_v_{r}$ according to the following transition probability (i.e., negative relevance probability):
$$
p^-_T^-_v_{r}(v_k|v_r)=\frac{1-\exp≤ft(g_v_{k}^⊤g_v_{r}\right)}{∑_v_{k∈N({v_r})}≤ft(1-\exp≤ft(g_v_{k}^⊤g_v_{r}\right)\right)}. \tag{7}
$$
In particular, the edge sign of the negative node pair $(v_r,v_t)^-$ depends on the length of the path $P_rt:v_r→ v_t$ . According to the balance theory introduced in Section 2.1, the edge signs of multi-hop node pairs correspond to the product of the edge signs along the path. Accordingly, the rules for generating fake negative edges within $P_rt$ are defined as follows: (1) If the path length of $P_rt$ is odd, a node pair $(v_r,v_t)^-$ for the rooted node $v_r$ and the last node $v_t$ is selected as a fake negative pair; (2) If the path length of $P_rt$ is even, a node pair $(v_r,v_t)^-$ for the rooted node $v_r$ and the second last node $v_t$ is selected as a fake negative pair. The resulting node pair $(v_r,v_t)^-$ is then used to compute $p^-_fake(v_t|v_r)$ according to Eq. (5).
Discriminators $D^+$ and $D^-$ : This module tries to distinguish between real node pairs and fake node pairs synthesized by the generators $G^+$ and $G^-$ . Accordingly, the discriminators $D^+$ and $D^-$ estimate the likelihood that positive and negative edges exists between $v_r$ and $v∈ V$ , respectively, denoted as:
$$
D^+(v_r,v|θ_D)=σ(d_v^⊤d_v_{r})=\frac{1}{1+\exp({-d_v^⊤d_v_{r})}},\\ \tag{8}
$$
$$
D^-(v,v_r|θ_D)=1-σ(d_v^⊤d_v_{r})=\frac{\exp({-d_v^⊤d_v_{r})}}{1+\exp({-d_v^⊤d_v_{r})}}, \tag{9}
$$
where $d_v,d_v_{r}∈ℝ^k$ are vectors corresponding to the $v$ -th and $v_r$ -th rows of shared parameters $θ_D$ , respectively. $σ(·)$ represents the sigmoid function of the inner product of these two vectors.
In summary, given real positive and real negative edges sampled from $p_true^+(·|v_r)$ and $p_true^-(·|v_r)$ , along with fake positive and fake negative edges generated from generators $G^+/G^-$ , the adversarial learning pairs $(D^+,G^+)$ and $(D^-,G^-)$ , operating on the positive subgraph $G^+$ and the negative subgraph $G^-$ , respectively, engage in a four-player mini-max game with the joint loss function:
$$
\displaystyle\min_θ_{G} \displaystyle\max_θ_{D}L≤ft(G^+,G^-,D^+,D^-\right) \displaystyle= \displaystyle∑_v_{r∈ V^+}≤ft(≤ft(E_v∼ p_{true ^+≤ft(·\mid v_r\right)}\right)≤ft[\log D^+≤ft(v,v_r\midθ_D\right)\right]\right. \displaystyle≤ft. +≤ft(E_v∼ G^+≤ft(·\mid v_r;θ_G\right)\right)≤ft[\log≤ft(1-D^+≤ft(v,v_r\midθ_D\right)\right)\right]\right) \displaystyle+ \displaystyle∑_v_{r∈ V^-}≤ft(≤ft(E_v∼ p_{true ^-≤ft(·\mid v_r\right)}\right)≤ft[\log D^-≤ft(v,v_r\midθ_D\right)\right]\right. \displaystyle≤ft. +≤ft(E_v∼ G^-≤ft(·\mid v_r;θ_G\right)\right)≤ft[\log≤ft(1-D^-≤ft(v,v_r\midθ_D\right)\right)\right]\right). \tag{10}
$$
Based on Eq. (10), the parameters $θ_D$ and $θ_G$ are updated alternately by maximizing and minimizing the joint loss function. Competition between $G$ and $D$ results in mutual improvement until the fake node pairs generated by $G$ are indistinguishable from the real ones, thus approximating the true connectivity distribution. Lastly, the learned node embeddings $g_v∈θ_G$ are used in downstream tasks.
How to Achieve DP? Given real and fake positive/negative edges of the node $v_i$ , the corresponding node embedding $d_v_{i}∈θ_D$ is updated by ascending gradients of the joint loss function in Eq. (10):
$$
\frac{∂ L_D}{∂d_v_{i}}=≤ft\{\begin{array}[]{l}∂\log{D^+(v_i,v_j|θ_D)}/{∂d_v_{i}}=[1-σ(d_v_{j}^⊤d_v_{i})]d_v_{j},\\
if ≤ft(v_i,v_j\right) is a real positive edge from $G^+$;\\
∂\log{(1-D^+(v_i,v_j|θ_D))}/{∂d_v_{i}}=-σ(d_v_{j}^⊤d_v_{i})d_v_{j},\\
if ≤ft(v_i,v_j\right) is a fake positive edge from ${G^+$};\\
∂\log{D^-(v_i,v_j|θ_D)}/{∂d_v_{i}}=-σ(d_v_{j}^⊤d_v_{i})d_v_{j},\\
if ≤ft(v_i,v_j\right) is a real negative edge from $G^-$;\\
∂\log{(1-D^-(v_i,v_j|θ_D))}/{∂d_v_{i}}=[1-σ(d_v_{j}^⊤d_v_{i})]d_v_{j},\\
if ≤ft(v_i,v_j\right) is a fake negative edge from ${G^-$}.\end{array}\right. \tag{11}
$$
According to Definition 1, to achieve node-level differential privacy in adversarial signed graph learning, it is necessary to add the Gaussian noise to the sum of clipped gradients over a batch of nodes. The resulting noisy gradient $\tilde{∇}{L_D}$ is formulated as:
$$
{\tilde{∇}{L_D}}=\frac{1}{B}\Big(∑_v_{i∈ V_B}Clip_C(\frac{∂ L_D}{∂d_v_{i}})+N≤ft(0,B^2C^2σ^2I\right)\Big), \tag{12}
$$
where $V_B$ denotes the batch set of nodes, with batch size $B=|V_B|$ . $C$ is the clipping threshold to control gradient sensitivity. The fact that the gradient sensitivity reaches $BC$ is explained in Section 4.3.
**Remark 2**
*To achieve node-level DP, we perturb discriminator gradients instead of signed edges, avoiding cascading errors and gradient polarity reversals from edge sign flips (see Eq. (10)), which reduces gradient sensitivity. Furthermore, generators also preserve DP under discriminator guidance via the post-processing property of DP.*
### 4.3. Optimization via Constrained BFS-Tree
According to Eq. (11), in graph adversarial learning, the interdependence among samples implies that modifying a single node $v_i$ may affect the gradients of multiple other nodes $v_j$ within the same batch. This interdependence also exists among the fake node pairs generated along the BFS-tree paths. Consequently, in the worst-case illustrated in Fig. 4 (a), all node samples within a batch may become interrelated due to the BFS-tree, resulting in the gradient sensitivity of discriminators $D$ as high as $BC$ . Such high sensitivity necessitates injecting substantial noise to satisfy node-level DP, hindering effective optimization and reducing model utility.
<details>
<summary>x4.png Details</summary>

### Visual Description
## Diagram: Constrained BFS-Tree Transformation
### Overview
The image is a technical diagram illustrating the transformation of a network graph into a "Constrained BFS-tree." It consists of two network graphs, labeled (a) and (b), connected by a red arrow indicating the transformation process. The diagram visually explains how a Breadth-First Search (BFS) tree is constructed from a root node under specific constraints, limiting the included nodes based on hop distance and association.
### Components/Axes
* **Main Elements:** Two network graphs, (a) and (b), positioned side-by-side.
* **Transformation Arrow:** A thick red arrow points from graph (a) to graph (b). Above the arrow is the text "Constrained BFS-tree".
* **Node Legend:** Located at the bottom center of the image.
* A light blue circle is labeled: "Nodes associated with \( v_r \)"
* A white circle is labeled: "Nodes not associated with \( v_r \)"
* **Graph Labels:**
* The central root node in both graphs is labeled \( v_r \) and is colored pink.
* In graph (b), two dashed, concentric arcs are drawn around \( v_r \). The inner arc is labeled "1-hop" and the outer arc is labeled "2-hop".
* **Annotations in Graph (b):** Several red "X" marks are placed on specific edges (connections between nodes).
### Detailed Analysis
**Graph (a) - Initial State:**
* **Structure:** A star-like network with the pink root node \( v_r \) at the center-left.
* **Nodes:** All nodes surrounding \( v_r \) are light blue. According to the legend, this means every node in this graph is "associated with \( v_r \)."
* **Edges:** Solid black lines connect \( v_r \) to its immediate neighbors and connect those neighbors to other nodes, forming a connected network.
**Graph (b) - Constrained BFS-tree Result:**
* **Structure:** The same spatial layout of nodes as in (a), but with critical changes in node association and edge validity.
* **Node Association Changes:**
* Nodes located within the "2-hop" dashed arc (including those within the "1-hop" arc) remain light blue ("associated").
* Nodes located outside the "2-hop" arc have changed from light blue to white ("not associated").
* **Edge Annotations (Red 'X' marks):** Red crosses are placed on edges that connect:
1. A light blue node (inside 2-hop) to a white node (outside 2-hop).
2. Two white nodes (both outside 2-hop).
* This visually indicates that these edges are excluded or constrained in the resulting BFS-tree.
* **Implied Flow:** The transformation shows a selection process. The BFS algorithm, starting from \( v_r \), includes nodes only up to a 2-hop distance (or based on another association metric), creating a subtree. Nodes and edges beyond this constraint are marked as excluded.
### Key Observations
1. **Hop-Based Constraint:** The primary constraint visualized is distance from the root. The "1-hop" and "2-hop" arcs explicitly define the boundary for node inclusion.
2. **Association vs. Distance:** The legend defines nodes as "associated" or "not associated." In graph (b), association perfectly correlates with being within the 2-hop boundary, suggesting the constraint is based on hop count.
3. **Edge Pruning:** The red 'X' marks demonstrate that the constrained tree is formed not just by selecting nodes, but by pruning specific connecting edges that violate the constraint rules.
4. **Spatial Consistency:** The physical positions of all nodes are identical between (a) and (b), allowing for direct visual comparison of the state changes.
### Interpretation
This diagram is a pedagogical tool explaining a **constrained graph traversal algorithm**. It demonstrates how a standard BFS exploration from a root node (\( v_r \)) can be limited by a specific rule—in this case, a maximum hop distance of 2.
* **What it shows:** The process transforms a fully associated network (a) into a pruned subtree (b). The subtree includes only nodes deemed "associated" (within 2 hops), and the edges connecting them. All other nodes and their connecting edges are marked as excluded.
* **Why it matters:** In real-world networks (social, communication, biological), unconstrained BFS can be computationally expensive or include irrelevant nodes. Constraints like hop limit, node attribute ("association"), or edge weight are essential for efficient and meaningful analysis. This diagram makes the abstract concept of a constrained search concrete.
* **Underlying Logic:** The red 'X's are crucial. They show that the algorithm doesn't just stop at the 2-hop boundary; it actively identifies and disregards the links that would lead outside the constrained set, formally defining the edges of the resulting BFS-tree. The perfect alignment of "association" with the 2-hop zone in this example suggests the constraint is purely topological (distance-based), but the framework could accommodate other association criteria.
</details>
Figure 4. The receptive field of node $v_r$ within a batch is illustrated in two cases: (a) An unconstrained BFS tree, and the receptive field size of $v_r$ is $B=|V_B|=34$ ; (b) A constrained BFS tree with path length $L=2$ , path amount $N=3$ of each node, and the receptive field size of $v_r$ is $∑_l=0^LN^l=13$ .
To address the aforementioned challenge, we introduce the constrained BFS-tree strategy: As illustrated in Algorithm 1 (see App. F.2), when performing a random walk on the BFS-tree $T^+_v_{r}$ or $T^-_v_{r}$ rooted at $v_r∈ V_tr$ to generate multiple unique paths, we also limit both the number of sampled paths and their lengths by $N$ and $L$ . Following this, the training set of subgraphs $S_tr$ composed of constrained paths is obtained. The rationale behind these settings is discussed below.
**Theorem 1**
*By constraining both the number and length of paths generated via random walks on the BFS-trees to $N$ and $L$ , respectively, the gradient sensitivity $Δ_{g}$ of the discriminator can be reduced from $BC$ to $\frac{N^L+1-1}{N-1}C$ . Empirical results in Section 5 demonstrate that our ASGL achieves satisfactory performance even with a relatively small receptive field. Specifically, when setting $N=3$ and $L=4$ , that is, $\frac{N^L+1-1}{N-1}=121<B=256$ , the ASGL method still performs good model utility. Thus, the noisy gradient $\tilde{∇}{L_D}$ of discriminator within a mini-batch $B_t$ is denoted as:
$$
\displaystyle{\tilde{∇}{L_D}}=\frac{1}{|B_t|}\Big(∑_v∈B_tClip_C(\frac{∂ L_D}{∂d_v})+N≤ft(0,Δ_{g}^2σ^2I\right)\Big), \tag{13}
$$
where the gradient sensitivity $Δ_{g}=\frac{N^L+1-1}{N-1}C$ .*
Proof of Theorem 1. Let the sum of clipped gradients of batch subgraphs be $g_t(G)=∑_v∈B_tClip_C(\frac{∂ L_D}{∂d_v})$ , where $B_t$ represents any choice of batch subgraphs from $S_tr$ . Consider a node-level adjacent graph $G^\prime$ formed by removing a node $v^*$ with its associated edges from $G$ , we obtain their training sets of subgraphs $S_tr$ and $S_tr^\prime$ via the SAMPLE-SUBGRAPHS method in Algorithm 1, denoted as:
$$
\displaystyle S_tr \displaystyle=SAMPLE-SUBGRAPHS(G,V_tr,N,L), \displaystyle S_tr^\prime \displaystyle=SAMPLE-SUBGRAPHS(G^\prime,V_tr,N,L). \tag{14}
$$
The only subgraphs that differ between $S_tr$ and $S_tr^\prime$ are those that involve the node $v^*$ . Let $S(v^*)$ denote the set of such subgraphs, i.e., $S(v^*)=S_tr∖ S_tr^\prime$ . According to Lemma 1 in App. G, the number of such subgraphs $S(v^*)$ is at most $R_N,L$ . Thus, in any mini-batch training, the only gradient terms $\frac{∂ L_D}{∂d_v}$ affected by the removal of node $v^*$ are those associated with the subgraphs in $(S(v^*)∩B_t)$ :
$$
\displaystyle g_t(G)-g_t(G^\prime) \displaystyle=∑_v∈B_tClip_C(\frac{∂ L_D}{∂d_v})-∑_v^\prime∈B_t^\primeClip_C(\frac{∂ L_D}{∂d_v^\prime}) \displaystyle=∑_v,v^\prime∈(S(v^{*)∩B_t)}[Clip_C(\frac{∂ L_D}{∂d_v})-Clip_C(\frac{∂ L_D}{∂d_v^\prime})], \tag{15}
$$
where $B_t^\prime=B_t∖(S(v^*)∩B_t)$ . Since each gradient term is clipped to have an $\ell_2$ -norm of at most $C$ , it holds that:
$$
||Clip_C(\frac{∂ L_D}{∂d_v})-Clip_C(\frac{∂ L_D}{∂d_v^\prime})||_F≤ C. \tag{16}
$$
In the worst case, all subgraphs in $S(v^*)$ appear in $B_t$ , so we bound the $\ell_2$ -norm of the following quantity based on Lemma 2 in App. G:
$$
||g_t(G)-g_t(G^\prime)||_F≤ C· R_N,L=C·\frac{N^L+1-1}{N-1}. \tag{17}
$$
The same reasoning applies when $G^\prime$ is obtained by adding a new node $v^*$ to $G$ . Since $G$ and $G^\prime$ are arbitrary node-level adjacent graphs, the proof is complete.
### 4.4. Privacy and Complexity Analysis
The complete training process for ASGL is outlined in Algorithm 2 (see App. F.3). In this section, we present a comprehensive privacy analysis and computational complexity analysis for ASGL.
Privacy Accounting. In this section, we adopt the functional perspective of Rényi Differential Privacy (RDP; see App. C) to analyze privacy budgets of ASGL, as summarized below:
**Theorem 2**
*Given the number of training set $N_tr$ , number of epochs $n^epoch$ , number of discriminators’ iterations $n^iter$ , batch size $B_d$ , maximum path length $L$ , and maximum path number $N$ , over $T=n^epochn^iter$ iterations, Algorithm 2 satisfies node-level $(α,2Tγ)$ -RDP, where $γ=\frac{1}{α-1}\ln≤ft(∑_i=0^R_N,Lβ_i≤ft(\exp{\frac{α(α-1)i^2}{2σ^2R_N,L^2}}\right)\right)$ , $R_N,L=\frac{N^L+1-1}{N-1}$ and $β_i=\binom{R_N,L}{i}\binom{N_tr-R_N,L}{B_d-i}/{\binom{N_tr}{B_d}}$ . Please refer to App. I for the proof.*
Complexity Analysis. To analyze the time complexity of training ASGL (App. F.3), we break down the major computations. The outer loop runs for $n^epoch$ epochs, and in each epoch, the discriminators $D^+$ and $D^-$ are trained for $n^iter$ iterations. Each iteration samples a batch of $B_d$ real and fake edges to update $θ_D$ , with DP cost updates incurring complexity $O(B_dkξ)$ , where $ξ$ is the sampling probability and $k$ is the embedding dimension (08; 17). Thus, each epoch of $D^+$ or $D^-$ costs $O(n^iterB_dk(1+ξ))$ . For the generators $G^+$ and $G^-$ , each iteration samples $B_g$ fake edges to update $θ_G$ , resulting in per-epoch complexity $O(n^iterB_gk)$ . In total, ASGL’s overall time complexity over $n^epoch$ epochs is: $O≤ft(2n^epochn^iter(B_d+B_g)(1+ξ)k\right)$ . This complexity is linear in the number of iterations and batch size, demonstrating the scalability of ASGL for large-scale graphs.
## 5. Experiments
In this section, some experiments are designed to answer the following questions: (1) How do key parameters affect the performance of ASGL (See Section 5.2)? (2) How much does the privacy budget affect the performance of ASGL and other private signed graph learning models in edge sign prediction (See Section 5.3)? (3) How much does the privacy budget affect the performance of ASGL and other baselines in node clustering (See Section 5.4)? (4) How resilient is ASGL to defense link stealing attacks (See Section 5.5)?
Table 1. Overview of the datasets
| Bitcoin-Alpha | 3,783 | 14,081 | 12,769 (90.7 $\$ ) | 1,312 (9.3 $\$ ) |
| --- | --- | --- | --- | --- |
| Bitcoin-OTC | 5,881 | 21,434 | 18,281 (85.3 $\$ ) | 3,153 (14.7 $\$ ) |
| WikiRfA | 11,258 | 185,627 | 144,451 (77.8 $\$ ) | 41,176 (22.2 $\$ ) |
| Slashdot | 13,182 | 36,338 | 30,914 (85.1 $\$ ) | 5,424 (14.9 $\$ ) |
| Epinions | 131,828 | 841,372 | 717,690 (85.3 $\$ ) | 123,682 (14.7 $\$ ) |
### 5.1. Experimental Settings
Datasets. To comprehensive evaluate our ASGL method, we conduct extensive experiments on five real-world datasets, namely Bitcoin-Alpha Collected in https://snap.stanford.edu/data., Bitcoin-OTC footnotemark: , WikiRfA footnotemark: , Slashdot Collected in https://www.aminer.cn. and Epinions footnotemark: . These datasets are regarded as undirected signed graphs, with their detailed statistics summarized in Table 1 and App. J.1.
Competitive Methods. To the best of our knowledge, this work is the first to address the problem of differentially private signed graph learning while aiming to preserve model utility. Due to the absence of prior studies in this area, we construct baselines by integrating four state-of-the-art signed graph learning methods—SGCN (36), SiGAT (38), LSNE (37), and SDGNN (39) —with the DPSGD mechanism. Since these models primarily leverage structural information, we further include the private graph learning method GAP (40), using Truncated SVD-generated spectral features (36) as input to ensure a fair comparison involving node features.
Evaluation Metrics. For edge sign prediction tasks, we follow the evaluation procedures in (14; 38; 39). Specifically, we first generate embedding vectors for all nodes in the training set using each comparative method. Then, we train a logistic regression classifier using the concatenated embeddings of node pairs as input features. Finally, we use the trained classifier to predict edge signs in the test set for each method. Considering the class imbalance between positive and negative edges (see Table 1), we adopt the area under curve (AUC) as the evaluation metric to ensure a fair comparison.
For node clustering, to fairly evaluate the clustering effect of node embeddings, we compute the average cosine distance for both positive and negative node pairs: $CD^+=∑_(v_{i,v_j)∈ E^+}Cos(Z_i,Z_j)/|E^+|$ and $CD^-=∑_(v_{n,v_m)∈ E^-}Cos(Z_n,Z_m)/|E^-|$ , where $Z_i$ is the node embedding generated by each comparative method, and $Cos(·)$ represents the cosine distance between node embeddings. Then we propose the symmetric separation index (SSI) to measure the clustering degree between the embeddings of positive and negative node pairs in the test set, denoted as $SSI=1/(|CD^+-1|+|CD^-+1|)$ . A higher SSI indicates better structural proximity, with positive node pairs more tightly clustered and negative pairs more clearly separated in the unified embedding space.
Parameter Settings. For both edge sign prediction and node clustering tasks, we set the dimensionality of all node embeddings, $d_v$ and $g_v$ , to 128, following standard practice in prior work (41; 14). ASGL adopts DPSGD-based optimization, where the total number of training epochs is determined by the moments accountant (MA) (04), which offers tighter privacy tracking across multiple iterations. We set the iteration number $n^iter$ to 10 for Bitcoin-Alpha and Bitcoin-OTC, 15 for WikiRfA and Slashdot, and 20 for Epinions. Since all comparative methods are trained using DPSGD, their number of training epochs depends on the privacy budget. As discussed in Section 5.2, the maximum path number $N$ and path length $L$ are varied to analyze their impact on ASGL’s utility. For privacy parameters, we follow (02; 51; 08) by fixing $δ=10^-5$ and $C=1$ , and vary the privacy budget $ε∈\{1,2,\dots,6\}$ to evaluate utility under different privacy levels. To ensure fair comparison, we modify the official GitHub implementations of all baselines and adopt the best hyperparameter settings reported in their original papers. To minimize random errors, each experiment is repeated five times.
### 5.2. Impact of Key Parameters
In this section, we perform experiments on two datasets by varying the maximum number $N$ and the maximum length $L$ of paths in the BFS-trees, providing a rationale for parameter selection.
#### 5.2.1. The effect of the parameter $N$
As discussed in Section 4.3, the greater the number of neighbors a rooted node has, the more paths can be obtained through random walks. Therefore, the maximum number of paths $N$ also depends on the node degrees. As shown in Fig. 8 (see App. J.2), for the Bitcoin-Alpha and Slashdot datasets, most nodes in signed graphs have degrees below 3. In addition, we investigate the impact of $N$ by varying its value within $\{2,3,4,5,6\}$ . As shown by the average AUC results in Table 2, the proposed ASGL method achieves optimal edge prediction performance at $N=3$ for Bitcoin-Alpha and $N=4$ for Slashdot. Considering both gradient sensitivity and computational efficiency, we adopt $N=3$ for subsequent experiments.
#### 5.2.2. The effect of the parameter $L$
In this experiment, we evaluate the impact of the path length $L$ on the utility of ASGL by varying its value. As shown in Table 3, ASGL achieves the best performance on both datasets when $L=4$ . This result is closely aligned with the structural characteristics of the signed graphs: As summarized in Fig. 9 (see App. J.2), most node pairs in these datasets exhibit maximum path lengths of 3 or 4. Therefore, in subsequent experiments, we set $L=4$ , as it adequately covers the receptive field of most nodes.
Table 2. Summary of average AUC with different maximum path counts $N$ under $ε=3$ and $L=3$ . (BOLD: Best)
| Bitcoin-Alpha Slashdot | 0.8025 0.7723 | 0.8562 0.8823 | 0.8557 0.8888 | 0.8498 0.8871 | 0.8553 0.8881 |
| --- | --- | --- | --- | --- | --- |
Table 3. Summary of average AUC with different path lengths $L$ under $ε=3$ and $N=3$ . (BOLD: Best)
| Bitcoin-Alpha Slashdot | 0.7409 0.7629 | 0.8443 0.8290 | 0.8587 0.8833 | 0.8545 0.8809 | 0.8516 0.8807 |
| --- | --- | --- | --- | --- | --- |
<details>
<summary>x5.png Details</summary>

### Visual Description
## Line Charts: Performance Comparison of Graph Neural Network Methods Under Differential Privacy
### Overview
The image displays a series of five line charts arranged horizontally, comparing the performance (AUC) of six different graph neural network methods as a function of an increasing privacy budget (ε). The charts evaluate these methods across five distinct datasets. The overall visual suggests a performance benchmark study in the context of privacy-preserving machine learning.
### Components/Axes
* **Legend:** Positioned at the top center of the entire figure. It contains six entries, each with a colored line and marker symbol:
* `SDGNN` (Gray line, circle marker)
* `SiGAT` (Pink line, square marker)
* `SGCN` (Blue line, diamond marker)
* `GAP` (Green line, upward-pointing triangle marker)
* `LSNE` (Yellow/Olive line, downward-pointing triangle marker)
* `ASGL (Proposed)` (Red line, star marker)
* **Common Axes:**
* **X-axis (All Charts):** Labeled `Privacy budget (ε)`. The scale is linear with integer markers at 1, 2, 3, 4, 5, and 6.
* **Y-axis (All Charts):** Labeled `AUC`. The scale is linear, ranging from approximately 0.50 to 0.90, with major tick marks at 0.05 intervals (e.g., 0.50, 0.55, 0.60... 0.90).
* **Subplot Titles:** Each chart has a label below it:
* (a) `Bitcoin_Alpha`
* (b) `Bitcoin_OCT`
* (c) `WikiRIA`
* (d) `Slashdot`
* (e) `Epinions`
### Detailed Analysis
**Chart (a) Bitcoin_Alpha:**
* **Trend:** All methods show an increasing AUC trend as ε increases. The `ASGL (Proposed)` method (red star) consistently achieves the highest AUC across all ε values, starting at ~0.78 (ε=1) and reaching ~0.88 (ε=6).
* **Data Points (Approximate AUC):**
* ε=1: ASGL (~0.78), SiGAT (~0.75), SDGNN (~0.72), SGCN (~0.60), GAP (~0.55), LSNE (~0.52).
* ε=6: ASGL (~0.88), SiGAT (~0.85), SDGNN (~0.83), SGCN (~0.82), LSNE (~0.78), GAP (~0.75).
**Chart (b) Bitcoin_OCT:**
* **Trend:** Similar upward trend for all. `ASGL` again leads, starting at ~0.80 (ε=1) and plateauing near ~0.88 (ε=4-6). `LSNE` shows a very steep improvement from ε=1 to ε=3.
* **Data Points (Approximate AUC):**
* ε=1: ASGL (~0.80), SiGAT (~0.75), SDGNN (~0.72), SGCN (~0.58), GAP (~0.55), LSNE (~0.52).
* ε=6: ASGL (~0.88), SiGAT (~0.87), SDGNN (~0.86), SGCN (~0.85), LSNE (~0.84), GAP (~0.78).
**Chart (c) WikiRIA:**
* **Trend:** `ASGL` maintains a clear lead. `LSNE` starts extremely low (~0.50 at ε=1) but improves dramatically. `GAP` shows the weakest performance, with a very shallow slope.
* **Data Points (Approximate AUC):**
* ε=1: ASGL (~0.72), SiGAT (~0.65), SDGNN (~0.64), SGCN (~0.62), GAP (~0.55), LSNE (~0.50).
* ε=6: ASGL (~0.85), SiGAT (~0.82), SDGNN (~0.80), SGCN (~0.79), LSNE (~0.78), GAP (~0.60).
**Chart (d) Slashdot:**
* **Trend:** `ASGL` and `SiGAT` perform very closely at the top, with `ASGL` having a slight edge. All methods show strong improvement from ε=1 to ε=3 before leveling off.
* **Data Points (Approximate AUC):**
* ε=1: ASGL (~0.78), SiGAT (~0.77), SDGNN (~0.70), SGCN (~0.65), GAP (~0.60), LSNE (~0.55).
* ε=6: ASGL (~0.90), SiGAT (~0.89), SDGNN (~0.85), SGCN (~0.82), LSNE (~0.78), GAP (~0.75).
**Chart (e) Epinions:**
* **Trend:** `ASGL` is the top performer. `LSNE` again shows a very steep initial climb. The performance gap between methods is more pronounced at lower ε values.
* **Data Points (Approximate AUC):**
* ε=1: ASGL (~0.75), SiGAT (~0.70), SDGNN (~0.68), SGCN (~0.62), GAP (~0.58), LSNE (~0.50).
* ε=6: ASGL (~0.88), SiGAT (~0.86), SDGNN (~0.85), SGCN (~0.84), LSNE (~0.82), GAP (~0.65).
### Key Observations
1. **Consistent Leader:** The proposed method, `ASGL` (red star line), achieves the highest AUC score across all five datasets and all privacy budget levels.
2. **Privacy-Utility Trade-off:** For every method and dataset, AUC increases as the privacy budget (ε) increases. This illustrates the fundamental trade-off: allowing less privacy (higher ε) yields better model utility (higher AUC).
3. **Method Ranking:** The relative performance ranking of the methods is largely consistent across datasets: ASGL > SiGAT ≈ SDGNN > SGCN > LSNE/GAP. However, `LSNE` often starts poorly at ε=1 but improves rapidly.
4. **Dataset Sensitivity:** The absolute AUC values and the steepness of the curves vary by dataset. For example, performance on `Slashdot` (d) reaches higher AUC values (~0.90) compared to `WikiRIA` (c) (~0.85), suggesting the task or data structure is inherently easier for these models on the Slashdot dataset.
### Interpretation
This set of charts provides strong empirical evidence for the effectiveness of the proposed `ASGL` method in the context of privacy-preserving graph learning. The data demonstrates that `ASGL` achieves a superior balance between privacy and utility compared to five other baseline methods (SDGNN, SiGAT, SGCN, GAP, LSNE).
The consistent upward trend of all lines validates the expected behavior under differential privacy: relaxing the privacy constraint (increasing ε) allows models to learn more effectively from the data, improving performance. The fact that `ASGL`'s curve is not only the highest but also often has a strong initial slope suggests it is particularly data-efficient, achieving good performance even under stricter privacy regimes (lower ε).
The variation across the five datasets (Bitcoin transactions, wiki edits, social networks) indicates that the findings are robust across different types of graph-structured data. The outlier behavior of `LSNE`—starting very low but improving sharply—might indicate it is more sensitive to the noise injected for privacy at low ε levels but can leverage the additional information allowed at higher ε more effectively than some other baselines like `GAP`.
In summary, the visualization argues that the `ASGL` framework represents a state-of-the-art advancement for tasks like link prediction or node classification on graphs when differential privacy is a requirement, offering the best-known utility for a given privacy guarantee among the methods tested.
</details>
Figure 5. AUC vs. Privacy cost ( $ε$ ) of private signed graph learning methods in edge sign prediction.
<details>
<summary>x6.png Details</summary>

### Visual Description
## Bar Charts: Comparative Performance of Six Methods Across Five Datasets
### Overview
The image displays a series of five bar charts arranged horizontally, comparing the performance of six different methods (SGCN, SDGNN, SiGAT, LSNE, GAP, ASGL) across five distinct datasets. The performance metric is "MRR" (Mean Reciprocal Rank). Each chart corresponds to a different dataset, labeled (a) through (e). A shared legend is positioned at the top center of the entire figure.
### Components/Axes
* **Legend (Top Center):** A horizontal legend defines the six methods with distinct colors and patterns:
* **SGCN:** Blue with diagonal stripes (\\).
* **SDGNN:** Light blue with a dotted pattern.
* **SiGAT:** Pink with horizontal stripes (-).
* **LSNE:** Yellow with diagonal stripes (/).
* **GAP:** Purple with vertical stripes (|).
* **ASGL:** Red with a crosshatch pattern (+).
* **Chart Titles (Bottom of each subplot):**
* (a) Bitcoin_Alpha
* (b) Bitcoin_OCT
* (c) WikiRF
* (d) Slashdot
* (e) Epinions
* **X-Axis (Common to all charts):** Labeled with three categories: "Top 1", "Top 2", and "Top 3".
* **Y-Axis (Common label, varying scales):** Labeled "MRR". The scale ranges differ per chart to accommodate the data range.
* (a) Bitcoin_Alpha: ~0.40 to 0.70
* (b) Bitcoin_OCT: ~0.40 to 0.80
* (c) WikiRF: ~0.450 to 0.600
* (d) Slashdot: ~0.450 to 0.600
* (e) Epinions: ~0.35 to 0.70
### Detailed Analysis
**Chart (a) Bitcoin_Alpha:**
* **Trend:** For all methods, MRR increases from "Top 1" to "Top 3". ASGL shows the most significant increase.
* **Approximate Values (MRR):**
* **Top 1:** SGCN (~0.47), SDGNN (~0.48), SiGAT (~0.49), LSNE (~0.50), GAP (~0.49), ASGL (~0.50).
* **Top 2:** SGCN (~0.48), SDGNN (~0.49), SiGAT (~0.50), LSNE (~0.51), GAP (~0.50), ASGL (~0.53).
* **Top 3:** SGCN (~0.49), SDGNN (~0.50), SiGAT (~0.51), LSNE (~0.52), GAP (~0.65), ASGL (~0.66).
**Chart (b) Bitcoin_OCT:**
* **Trend:** MRR increases from "Top 1" to "Top 3" for all methods. ASGL and GAP show a very sharp increase at "Top 3".
* **Approximate Values (MRR):**
* **Top 1:** SGCN (~0.44), SDGNN (~0.46), SiGAT (~0.48), LSNE (~0.49), GAP (~0.48), ASGL (~0.49).
* **Top 2:** SGCN (~0.45), SDGNN (~0.47), SiGAT (~0.49), LSNE (~0.50), GAP (~0.58), ASGL (~0.68).
* **Top 3:** SGCN (~0.46), SDGNN (~0.48), SiGAT (~0.50), LSNE (~0.51), GAP (~0.75), ASGL (~0.78).
**Chart (c) WikiRF:**
* **Trend:** MRR generally increases from "Top 1" to "Top 3". ASGL consistently outperforms others, with a notable jump at "Top 3".
* **Approximate Values (MRR):**
* **Top 1:** SGCN (~0.490), SDGNN (~0.500), SiGAT (~0.505), LSNE (~0.510), GAP (~0.505), ASGL (~0.530).
* **Top 2:** SGCN (~0.495), SDGNN (~0.510), SiGAT (~0.515), LSNE (~0.520), GAP (~0.550), ASGL (~0.560).
* **Top 3:** SGCN (~0.500), SDGNN (~0.520), SiGAT (~0.525), LSNE (~0.530), GAP (~0.580), ASGL (~0.600).
**Chart (d) Slashdot:**
* **Trend:** MRR increases from "Top 1" to "Top 3". ASGL and GAP show the strongest performance, especially at "Top 3".
* **Approximate Values (MRR):**
* **Top 1:** SGCN (~0.470), SDGNN (~0.480), SiGAT (~0.485), LSNE (~0.490), GAP (~0.485), ASGL (~0.500).
* **Top 2:** SGCN (~0.475), SDGNN (~0.490), SiGAT (~0.495), LSNE (~0.500), GAP (~0.540), ASGL (~0.550).
* **Top 3:** SGCN (~0.480), SDGNN (~0.500), SiGAT (~0.505), LSNE (~0.510), GAP (~0.570), ASGL (~0.590).
**Chart (e) Epinions:**
* **Trend:** MRR increases from "Top 1" to "Top 3". ASGL and GAP are the top performers, with a very large increase at "Top 3".
* **Approximate Values (MRR):**
* **Top 1:** SGCN (~0.38), SDGNN (~0.39), SiGAT (~0.40), LSNE (~0.41), GAP (~0.40), ASGL (~0.50).
* **Top 2:** SGCN (~0.39), SDGNN (~0.40), SiGAT (~0.41), LSNE (~0.42), GAP (~0.58), ASGL (~0.67).
* **Top 3:** SGCN (~0.40), SDGNN (~0.41), SiGAT (~0.42), LSNE (~0.43), GAP (~0.67), ASGL (~0.68).
### Key Observations
1. **Consistent Leader:** The ASGL method (red crosshatch) achieves the highest or ties for the highest MRR in every "Top" category across all five datasets.
2. **Strong Runner-up:** The GAP method (purple vertical stripes) is consistently the second-best performer, often closely following ASGL.
3. **Performance Hierarchy:** A general hierarchy is visible: ASGL ≥ GAP > LSNE ≈ SiGAT ≈ SDGNN > SGCN. SGCN (blue diagonal stripes) is typically the lowest-performing method.
4. **"Top 3" Effect:** The performance gap between methods, particularly between ASGL/GAP and the others, widens significantly in the "Top 3" category for all datasets. This suggests ASGL and GAP are especially effective at ranking correct items within the top three results.
5. **Dataset Variability:** The absolute MRR values vary by dataset (e.g., Bitcoin_OCT shows higher overall MRR than Epinions), but the relative performance pattern of the methods remains remarkably consistent.
### Interpretation
This set of charts provides a comparative benchmark for six graph-based or network embedding methods on link prediction or ranking tasks (as suggested by the "MRR" metric and "Top-k" evaluation). The data strongly suggests that the **ASGL** method is the most effective across diverse network types (cryptocurrency transaction networks like Bitcoin, social networks like Slashdot and Epinions, and an encyclopedia network like WikiRF). Its consistent superiority, especially in the critical "Top 3" ranking, indicates a robust algorithmic advantage.
The parallel performance of **GAP** as a close second implies it may share some effective underlying principles with ASGL. The clustering of the other four methods (SGCN, SDGNN, SiGAT, LSNE) at a lower performance tier suggests they may represent a different, less effective class of approaches for these specific tasks. The widening gap at "Top 3" is particularly noteworthy for applications where high-precision top recommendations are crucial, as it demonstrates that the choice of method (ASGL or GAP) has a much larger impact on user experience or system accuracy in that scenario. The consistency across five distinct datasets argues against these results being dataset-specific anomalies and points toward a generalizable finding about method efficacy.
</details>
Figure 6. Symmetric separation index (SSI) vs. Privacy cost ( $ε$ ) of private signed graph learning methods in node clustering.
### 5.3. Impact of Privacy Budget on Edge Sign Prediction
To evaluate the effectiveness of different private graph learning methods on edge sign prediction, we compare their AUC scores under privacy budgets $ε$ ranging from 1 to 6, as shown in Fig. 5 and Table 6 (see App. J.3). The proposed ASGL consistently outperforms all baselines across all privacy levels and datasets, owing to its ability to generate node embeddings that preserve connectivity distributions while satisfying DP guarantees. Although SDGNN achieves sub-optimal performance, it exhibits a noticeable gap from ASGL under limited privacy budgets ( $ε<4$ ). SiGAT, SGCN, and LSNE employ the moments accountant (MA) to mitigate excessive privacy budget consumption, yet still suffer from poor convergence and degraded utility under limited privacy budgets. GAP adopts aggregation perturbation to ensure node-level DP, but its performance is limited due to noisy neighborhood information, hindering its ability to capture structural information for edge prediction tasks.
### 5.4. Impact of Privacy Budget on Node Cluster
To further examine the capability of ASGL in preserving signed node proximity, we conduct a fair comparison across multiple private graph learning methods using the SSI metric. As shown in Fig. 6 and Table 7 (see App. J.4), ASGL consistently outperforms all baselines across different datasets and privacy budgets, demonstrating that ASGL is capable of generating node embeddings that effectively preserve signed node proximity. Notably, GAP achieves the second-best clustering performance on most datasets (excluding Slashdot), benefiting from its ability to leverage node features for clustering nodes. Nevertheless, to guarantee node-level DP, GAP needs to repeatedly query sensitive graph information in every training iteration, resulting in significantly higher privacy costs.
### 5.5. Resilience Against Link Stealing Attack
To assess the effectiveness of ASGL in preserving the privacy of edge information, we perform link stealing attacks (LSA) across all datasets and compare the resilience of all methods to such attacks in edge sign prediction tasks. The LSA setup is detailed in App. J.5. Attack performance is measured by the AUC score, averaged over five independent runs. Table 4 summarizes the effectiveness of LSA on various trained target models and datasets. It can be observed that as the privacy budget $ε$ increases, the average AUC of LSA consistently improves, indicating the reduced privacy protection of target models and an increased success rate of the attack. Overall, the average AUC of the attack is close to 0.50 in most cases, indicating the unsuccessful edge inference and the robustness of DP against such an attack. When $ε=3$ , ASGL demonstrates stronger resistance to LSA across most datasets, with AUC values consistently below 0.57. This suggests that ASGL offers defense performance comparable to other differentially private graph learning methods.
Table 4. The average AUC of LSA on different comparisons and datasets. (BOLD: Best resilience against LSA)
| 1 | Bitcoin-Alpha | 0.5072 | 0.7091 | 0.5079 | 0.5145 | 0.5404 | 0.5053 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Bitcoin-OTC | 0.5081 | 0.7118 | 0.5119 | 0.5409 | 0.5660 | 0.5466 | |
| Slashdot | 0.5538 | 0.8232 | 0.5551 | 0.5609 | 0.5460 | 0.5325 | |
| WikiRfA | 0.5148 | 0.5424 | 0.5427 | 0.5293 | 0.5470 | 0.5302 | |
| Epinions | 0.7877 | 0.6329 | 0.5114 | 0.5129 | 0.5188 | 0.5092 | |
| 3 | Bitcoin-Alpha | 0.5547 | 0.7514 | 0.5533 | 0.5542 | 0.5598 | 0.5430 |
| Bitcoin-OTC | 0.5655 | 0.7273 | 0.5684 | 0.5734 | 0.5765 | 0.5612 | |
| Slashdot | 0.5742 | 0.8394 | 0.6267 | 0.5730 | 0.6464 | 0.5634 | |
| WikiRfA | 0.5276 | 0.5466 | 0.5542 | 0.5696 | 0.5772 | 0.5624 | |
| Epinions | 0.7981 | 0.6456 | 0.5588 | 0.5629 | 0.5665 | 0.5542 | |
## 6. Related Work
Signed graph learning. In recent years, deep learning approaches have been increasingly adopted for signed graph learning. For example, SiNE (47) extracts signed structural information based on balance theory and designs an objective function to learn signed node proximity. Furthermore, the GNN model (36) and its variants (38; 39) are used to learn signed relationships between nodes in multi-hop neighborhoods. However, these GNNs-based methods depend on the message-passing mechanism, which is sensitive to noisy interactions between nodes (49). To address this issue, Lee et al. (14) extends the adversarial framework to signed graphs by generating both positive and negative node embeddings. Still, these signed graph learning models are vulnerable to user-linkage attacks.
Private graph learning. Recent works have increasingly focused on developing DP methods to address privacy leakage in GNNs. For instance, Daigavane et al. (33) propose a DP-GNN method based on gradient perturbation. However, this method fails to balance utility and privacy due to excessive noise. Furthermore, GAP (40) and DPRA (50) are proposed to ensure the privacy of sensitive node embeddings by perturbing node aggregations. Despite their success in node classification, the private node information is repeatedly queried in the training process of GAP, which consumes more privacy budgets to implement DPSGD. DPRA is not well-suited for signed graph embedding learning, as its edge perturbation strategy introduces cascading errors under balance theory.
## 7. Conclusion
In this paper, we propose ASGL that achieves strong model utility while providing node-level DP guarantees. To address the cascading error and gradient polarity reversals from edge sign flips, ASGL separately processes positive and negative subgraphs within a shared embedding space using a DPSGD-based adversarial mechanism to learn high-quality node embeddings. To further reduce gradient sensitivity, we introduce a constrained BFS-tree strategy that limits node receptive fields and enables gradient decoupling. This effectively reduces the required noise scale and enhances model performance. Extensive experiments demonstrate that ASGL achieves a favorable privacy-utility trade-off. Our future work is to extend the ASGL framework by considering edge directions and weights.
Acknowledgements. This work was supported by the National Natural Science Foundation of China (Grant No: 62372122 and 92270123), and the Research Grants Council (Grant No: 15208923, 25207224, and 15207725), Hong Kong SAR, China.
## Appendix A Adversarial Learning on Graph
The adversarial learning model for graph embedding (03) is illustrated as follows. Let $N(v_r)$ be the node set directly connected to $v_r$ . We denote the underlying true connectivity distribution of node $v_r$ as the conditional probability $p(v|v_r)$ , which captures the preference of $v_r$ to connect with other nodes $v∈ V$ . In other words, the neighbor set $N(v_r)$ can be interpreted as a set of observed nodes drawn from $p(v|v_r)$ . The adversarial learning for the graph $G$ is conducted by the following two modules:
Generator $G$ : Through optimizing the generator parameters $θ_G$ , this module aims to approximate the underlying true connectivity distribution and generate (or select) the most likely nodes $v∈ V$ that are relevant to $v_r$ . Specifically, the fake The term “Fake” indicates that although a node $v$ selected by the generator is relevant to $v_r$ , there is no actual edge between them. (i.e., estimated) connectivity distribution of node $v_r$ is calculated as:
$$
p^\prime(v|v_r)=G≤ft(v|v_r;θ_G\right)=\frac{\exp≤ft(g_v^⊤g_v_{r}\right)}{∑_v≠ v_{r}\exp≤ft(g_v^⊤g_v_{r}\right)}, \tag{18}
$$
where $g_v,g_v_{r}∈ℝ^k$ are the $k$ -dimensional vectors of nodes $v$ and $v_r$ , respectively, and $θ_G$ is the union of all $g_v$ ’s. To update $θ_G$ in each iteration, a set of node pairs $(v,v_r)$ , not necessarily directly connected, is sampled according to $p^\prime(v|v_r)$ . The key purpose of generator $G$ is to deceive the discriminator $D$ , and thus its loss function $L_G$ is determined as follows:
$$
\displaystyle L_G=\min∑_r=1^|V|≤ft.E_v∼ G≤ft(·\mid v_{r;θ_G\right)}≤ft[\log≤ft(1-D≤ft(v_r,v\midθ_D\right)\right)\right]\right., \tag{19}
$$
where the discriminant function $D(·)$ estimates the probability that a given node pairs $(v,v_r)$ are considered real, i.e., directly connected.
Discriminator $D$ : This module tries to distinguish between real node pairs and fake node pairs synthesized by the generator $G$ . Accordingly, the discriminator estimates the probability that an edge exists between $v_r$ and $v$ , denoted as:
$$
D(v_r,v|θ_D)=σ(d_v^⊤d_v_{r})=\frac{1}{1+\exp({-d_v^⊤d_v_{r})}}, \tag{20}
$$
where $d_v,d_v_{r}∈ℝ^k$ are the $k$ -dimensional vectors corresponding to the $v$ -th and $v_r$ -th rows of discriminator parameters $θ_D$ , respectively. $σ(·)$ represents the sigmoid function of the inner product of these two vectors. Given the sets of real and fake node pairs, the loss function of $D$ can be derived as:
$$
\displaystyle L_D= \displaystyle\max∑_r=1^|V|≤ft(E_v∼ p≤ft(·\mid v_{r\right)}≤ft[\log D≤ft(v,v_r\midθ_D\right)\right]\right. \displaystyle≤ft.+E_v∼ G≤ft(·\mid v_{r;θ_G\right)}≤ft[\log≤ft(1-D≤ft(v_r,v\midθ_D\right)\right)\right]\right). \tag{21}
$$
In summary, the generator $G$ and discriminator $D$ operate as two adversarial components: the generator $G$ aims to fit the true connectivity distribution $p(v|v_r)$ , generating candidate nodes $v$ that resemble the real neighbors of $v_r$ to deceive the discriminator $D$ . In contrast, the discriminator $D$ seeks to distinguish whether a given node is a true neighbor of $v_r$ or one generated by $G$ . Formally, $D$ and $G$ are engaged in a two-player minimax game with the following loss function:
$$
\displaystyle\min_θ_{G} \displaystyle\max_θ_{D}L(G,D)=∑_r=1^|V|≤ft(E_v∼ p≤ft(·\mid v_{r\right)}≤ft[\log D≤ft(v,v_r\midθ_D\right)\right]\right. \displaystyle≤ft.+E_v∼ G≤ft(·\mid v_{r;θ_G\right)}≤ft[\log≤ft(1-D≤ft(v_r,v\midθ_D\right)\right)\right]\right). \tag{22}
$$
Based on Eq. (22), the parameters $θ_D$ and $θ_G$ are updated by alternately maximizing and minimizing the loss function $L(G,D)$ . Competition between $G$ and $D$ results in mutual improvement until $G$ becomes indistinguishable from the true connectivity distribution.
## Appendix B Notation Introduction
The frequently used notations are summarized in Table 5.
Table 5. Notation Summary
| $G,G^+,G^-$ $V,E^+,E^-$ $N(v_r)$ | Signed graph, positive subgraph, negative subgraph Node set, negative and positive edge sets Neighbor node set of node $v_r$ |
| --- | --- |
| $θ_D$ | Shared parameters of discriminators $D^+$ and $D^-$ |
| $θ_G$ | Shared parameters of generators $G^+$ and $G^-$ |
| $d_v_{r}$ | Node embedding for node $v_r$ of Discriminators |
| $g_v_{r}$ | Node embedding for node $v_r$ of Generators |
| $N,L$ | Maximum number and length of generated path |
| $ε,δ$ | Privacy parameters |
| $N(0,σ^2)$ | Gaussian distribution with standard deviation $σ^2$ |
| $P_rt$ | A path from rooted node $v_r$ to target node $v_t$ |
| $T^+_v_{r},T^-_v_{r}$ | Positive and negative BFS-trees rooted from $v_r$ |
| $p_true^+(·|v_r)$ | Positive connectivity distributions of $(v_r,v)∈ E^+$ |
| $p_true^-(·|v_r)$ | Negative connectivity distributions of $(v_r,v)∈ E^-$ |
| $p^+_T^+_v_{r}(v|v_r)$ | Positive relevance probability between $v_r$ and $v$ |
| $p^-_T^-_v_{r}(v|v_r)$ | Negative relevance probability between $v_r$ and $v$ |
## Appendix C Rényi Differential Privacy
Since standard DP can be overly strict for deep learning, we follow prior work (30; 31) and adopt an alternative definition—Rényi Differential Privacy (RDP) (07). RDP offers tighter and more efficient composition bounds, enabling more accurate estimation of cumulative privacy cost over multiple queries on graphs.
**Definition 0 (Rényi Differential Privacy(07))**
*The Rényi divergence quantifies the similarity between output distributions of a mechanism and is defined as:
$$
D_α(P\|Q)=\frac{1}{α-1}\log≤ft(∑_xP(x)^αQ(x)^1-α\right), \tag{23}
$$
where $P(x)$ and $Q(x)$ are probability distributions over the output space. $α>1$ denotes the order of the divergence, and its choice allows for different levels of sensitivity to the output distribution. Accordingly, an algorithm $M$ satisfies $(α,ε)$ -RDP if, for any two adjacent graphs $G$ and $G^\prime$ , the following condition holds $D_α≤ft(M(G)\|M≤ft(G^\prime\right)\right)≤ε$ .*
Since RDP is an extension of DP, it can be converted into ( $ε$ , $δ$ )-DP based on Proposition 3 in (07), as outlined below.
**Lemma 0 (Conversion from RDP to DP(07))**
*If a mechanism $M$ satisfies $(α,ε)$ -RDP, it also satisfies $(ε+\log(1/δ)/(α-1),δ)$ -DP for any $δ∈(0,1)$ .*
## Appendix D Gaussian Mechanism
Let $f$ be a function that maps a graph $G$ to $k$ -dimensional node vectors $Z∈ℝ^|V|× k$ . To ensure the RDP guarantees of $f$ , it is common to inject Gaussian noise into its output (07). The noise scale depends on the sensitivity of $f$ , defined as $Δ_f=\max_G,\mathcal{G^\prime}≤ft\|f(G)-f≤ft(G^\prime\right)\right\|_2$ . Specifically, the privatized mechanism is defined as $M(G)=f(G)+N(0,σ^2I)$ , where $N(0,σ^2I)$ is the Gaussian distribution with zero mean and standard deviation $σ^2$ . This results in an $(α,ε)$ -RDP mechanism $M$ for all $α>1$ with $ε=αΔ_f^2/2σ^2$ .
Input: Graph $G=\{G^+,G^-\}$ ; The training set of nodes $V_tr$ ; The maximum path length $L$ ; The maximum path number $N$ .
Output: The training set of subgraphs $S_tr$ .
1 for $v_r∈ V_tr$ do
2 Construct BFS-trees $T^+_v_{r}$ (or $T^-_v_{r}$ ) rooted from the node $v_r$ on $G^+$ (or $G^-$ );
3 for $n=0;n<N$ do
4 Based on the positive and negative relevance probability in Eqs. (6) and (7), conduct the random walk at $T^+_v_{r}$ (or $T^-_v_{r}$ ) to form a path $P_rt^(n)+$ (or $P_rt^(n)-$ ) of length $L$ ;
5 Add all nodes $v$ (excluding those in $N(v_r)$ ) along the path $P_rt^(n)+$ (or $P_rt^(n)-$ ) as a fake edge $(v_r,v)$ to the corresponding subgraph set $S_tr^+$ (or $S_tr^-$ );
6 Drop $P_rt^(n)+$ (or $P_rt^(n)-$ ) from $T^+_v_{r}$ (or $T^-_v_{r}$ ).
7 end for
8
9 end for
Return $S_tr=\{S_tr^+,S_tr^-\}$ ;
Algorithm 1 SAMPLE-SUBGRAPHS by Constrained BFS-trees
## Appendix E BFS-tree Strategy
Fig. 7 provides an illustrative example of the BFS-tree strategy: Let $v_r_0$ be the rooted node. We first compute the transition probabilities between $v_r_0$ and its neighbors $N({v_r_0})$ . The next node $v_r_1$ is then sampled as the first step of the walk, in proportion to these transition probabilities. Similarly, the next node $v_r_2$ is selected based on the transition probabilities between $v_r_1$ and its neighbors $N({v_r_1})$ . The random walk continues until it reaches the terminal node $v_r_{n}$ , and unconnected node pairs $(v_r_0,v_r_{k})^+$ for $k=2,3,…,n$ are regarded as fake positive edges.
<details>
<summary>x7.png Details</summary>

### Visual Description
## Diagram: Probabilistic Graph Traversal and Node Selection Process
### Overview
The image is a technical diagram illustrating a three-step process for traversing a graph and selecting nodes based on conditional probabilities. It visually explains how an original graph is transformed into a Breadth-First Search (BFS) tree, followed by a probabilistic selection of child nodes. The diagram uses nodes (circles), directed edges (arrows), and numerical probability annotations to convey the algorithm's steps.
### Components/Axes
The diagram is divided into three distinct sections from left to right, connected by arrows indicating the process flow.
1. **Left Section: Original Graph**
* **Label:** "Original Graph \( \mathcal{G} \)" (located at the bottom).
* **Components:** An undirected graph with a root node labeled \( v_0 \) (filled with a light pink color) at the top. Below it are two layers of white, unfilled nodes connected by black lines (edges). The structure is not strictly hierarchical, with some cross-connections between nodes in the same layer.
2. **Middle Section: BFS-tree Construction**
* **Label:** "BFS-tree" (in red text, above the arrow leading from the first section).
* **Components:** A directed tree structure derived from the original graph. The root \( v_0 \) (pink) is at the top. Three child nodes (white) are directly connected to it via red, dashed arrows. Each arrow has a probability value written in red next to it:
* Arrow to the leftmost child: `0.6`
* Arrow to the middle child: `0.1`
* Arrow to the rightmost child: `0.3`
* **Footer Text:** "Select \( v_{r_1} \) based on \( p_{T_{v_r}}(v_{r_1} | v_0) = 0.6 \)" (located at the bottom). This indicates the leftmost child node, \( v_{r_1} \), is selected with a probability of 0.6 given the root \( v_0 \).
3. **Right Section: Second-Level Node Selection**
* **Components:** The tree expands from the selected node \( v_{r_1} \). From \( v_{r_1} \), two red, dashed arrows point to its child nodes (white). The probabilities are:
* Arrow to the left child: `0.5`
* Arrow to the right child: `0.3`
* A separate red, dashed arrow points from the root \( v_0 \) to a different node (not a direct child of \( v_{r_1} \)) with a probability of `0.2`.
* **Footer Text:** "Select \( v_{r_2} \) based on \( p_{T_{v_r}}(v_{r_2} | v_{r_1}) = 0.5 \)" (located at the bottom). This indicates the left child of \( v_{r_1} \), labeled \( v_{r_2} \), is selected with a probability of 0.5 given its parent \( v_{r_1} \).
### Detailed Analysis
* **Process Flow:** The diagram flows left to right: Original Graph → BFS-tree with first-level probabilistic selection → Second-level probabilistic selection.
* **Node Identification:** Key nodes are explicitly labeled: \( v_0 \) (root), \( v_{r_1} \) (first selected child), and \( v_{r_2} \) (second selected child).
* **Probability Values:**
* From \( v_0 \) to its children: 0.6, 0.1, 0.3. These sum to 1.0, representing a complete probability distribution for the first selection step.
* From \( v_{r_1} \) to its children: 0.5 and 0.3. These do not sum to 1.0, suggesting there may be other children not shown or the distribution is incomplete in this illustration.
* An additional probability of 0.2 is shown on an edge from \( v_0 \) to a node in the second layer that is not \( v_{r_1} \).
* **Visual Coding:**
* **Node Color:** The root node \( v_0 \) is consistently pink. All other nodes are white with black outlines.
* **Edge Style:** Original graph edges are solid black lines. The BFS-tree and selection edges are red and dashed, highlighting the active traversal path and probabilistic relationships.
* **Text Color:** Probability values and the "BFS-tree" label are in red, matching the dashed edges they describe.
### Key Observations
1. **Hierarchical Transformation:** The process converts a general graph with cross-connections into a strict parent-child tree hierarchy (BFS-tree).
2. **Probabilistic Selection:** Node selection is not deterministic. The algorithm uses conditional probabilities (e.g., \( p(v_{r_1} | v_0) \)) to choose the next node to explore.
3. **Path Dependency:** The selection of \( v_{r_2} \) is conditional on the prior selection of \( v_{r_1} \), demonstrating a path-dependent stochastic process.
4. **Incomplete Probability Sum:** The probabilities from \( v_{r_1} \) (0.5 + 0.3 = 0.8) do not sum to 1. This could imply:
* The diagram is illustrative and not exhaustive.
* There is a third child of \( v_{r_1} \) with a probability of 0.2 that is not drawn.
* The probability mass is distributed over other possible actions not depicted.
### Interpretation
This diagram likely explains a component of a **probabilistic graph algorithm**, such as those used in network sampling, reinforcement learning on graphs, or stochastic search procedures.
* **What it demonstrates:** It shows how to perform a biased, random walk on a graph. Instead of exploring all neighbors uniformly (as in standard BFS), the algorithm selects the next node to visit based on a learned or predefined probability distribution. This is efficient for exploring large graphs where exhaustive search is impossible.
* **Relationship between elements:** The "Original Graph" is the input. The "BFS-tree" represents the structured exploration path. The probability annotations are the core decision mechanism, turning a deterministic traversal into a stochastic one. The footer text explicitly states the mathematical rule for each selection step.
* **Underlying Concept:** The notation \( p_{T_{v_r}}(v_{child} | v_{parent}) \) suggests the probabilities are defined over a tree \( T \) rooted at some vertex \( v_r \). This is common in algorithms that build a spanning tree while making probabilistic choices, like in some variants of Monte Carlo Tree Search or random walk with restart. The process balances exploration (visiting different branches) and exploitation (following higher-probability paths).
</details>
Figure 7. Random-walk-based edge generation for generator $G^+$ or $G^-$ . Red digits denote the transition probabilities (Eqs. (6) and (7)), and red arrows indicate the walk directions.
## Appendix F Details of Algorithm
### F.1. The Parameter Update of Generators
Given fake positive/negative edges $(v_r,v_t)$ from ${G}^+/{G}^-$ , the gradient of joint loss function (Eq. (10)) with respect to $θ_G$ is derived via the policy gradient (03):
$$
∇ L_G=≤ft\{\begin{array}[]{l}∑_r=1^|V^{+|}[∇_θ_{G}\log G^+≤ft(v_t|v_r;θ_G\right)\log≤ft(1-D^+≤ft(v_t,v_r\right)\right)],\\
if ≤ft(v_r,v_t\right) is a fake positive edge;\\
∑_r=1^|V^{-|}∇_θ_{G}\log G^-≤ft(v_t|v_r;θ_G\right)\log≤ft(1-D^-≤ft(v_t,v_r\right)\right),\\
if ≤ft(v_r,v_t\right) is a fake negative edge.\end{array}\right. \tag{24}
$$
### F.2. SAMPLE-SUBGRAPHS by Constrained BFS-trees
As shown in Algorithm 1, during the random walk on the BFS tree $T^+_v_{r}$ or $T^-_v_{r}$ rooted at $v_r∈ V_tr$ , we generate multiple unique paths while constraining their number and length by parameters $N$ and $L$ , respectively. This process yields a training subgraph set $S_tr$ composed of constrained paths.
Input: Graph $G$ ; Training set of nodes $V_tr$ ; Maximum path length $L$ ; Maximum path number $N$ ; Batch-size $B_d$ and $B_g$ of sampled edges in discrininators and generators; Number of epochs $n^epoch$ ; Number of iterations for generators and discriminators per epoch $n^iter$ ; Privacy parameters $δ$ , $ε$ , $σ$ .
Output: Privacy-preserving node embedding $g_v∈θ_G$ for downstream tasks.
1 According to edge signs, divide $G$ into $G^+$ and $G^-$ ;
2 Generate the training subgraph set $S_tr=\{S_tr^+,S_tr^-\}$ based on $SAMPLE-SUBGRAPHS(G,V_tr,N,L)$ in Algorithm 1;
3 for $v_r∈ V_tr$ do
4 Sample all real positive edges $(v_r,v_t)^+$ from $G^+$ ;
5 Sample all fake positive edges $(v_r,v_t^\prime)^+$ from $S_tr^+$ ;
6 Sample all real negative edges $(v_r,v_t)^-$ from $G^-$ ;
7 Sample all fake negative edges $(v_r,v_t^\prime)^-$ from $S_tr^-$ ;
8 $E^+_D.add((v_r,v_t)^+,(v_r,v_t^\prime)^+)$ , $E^+_G.add((v_r,v_t^\prime)^+)$ ,
9 $E^-_D.add((v_r,v_t)^-,(v_r,v_t^\prime)^-)$ , $E^-_G.add((v_r,v_t^\prime)^-)$ ;
10
11 end for
12 for $epoch=0;epoch<n^epoch$ do
13 Train the discriminator $D^+$ :
14 for $iter=0;iter<n^iter$ do
15 Sample $B_d$ real and fake positive edges from $E^+_D$ ;
16 Update $θ_D$ via Eqs. (8) and (11), and achieve gradient perturbation via Eq. (13);
17 Calculate privacy spent $\hat{δ}$ given the target $ε$ ;
18 Stop optimization if $\hat{δ}≥δ$ .
19 end for
20 Train the generator $G^+$ :
21 for $iter=0;iter<n^iter$ do
22 Subsample $B_g$ fake positive edges from $E^+_G$ ;
23 Update $θ_G$ via Eqs. (4) and (24).
24 end for
25 Train the discriminator $D^-$ :
26 for $iter=0;iter<n^iter$ do
27 Subsample $B_d$ real and fake negative edges from $E^-_D$ ;
28 Update $θ_D$ via Eqs. (9) and (11), and achieve gradient perturbation via Eq. (13);
29 Calculate privacy spent $\hat{δ}$ given the target $ε$ ;
30 Stop optimization if $\hat{δ}≥δ$ .
31 end for
32 Train the generator $G^-$ :
33 for $iter=0;iter<n^iter$ do
34 Subsample $B_g$ fake negative edges from $E^-_G$ ;
35 Update $θ_G$ via Eqs. (5) and (24).
36 end for
37
38 end for
39 Return privacy-preserving node embedding $g_v∈θ_G$ ;
Algorithm 2 ASGL Algorithm
### F.3. The training of ASGL
The training process of ASGL is outlined in Algorithm 2 and consists of the following main steps:
(1) Signed graph decomposition and subgraph sampling: Given an input signed graph $G$ , we first divide it into a positive subgraph $G^+$ and a negative subgraph $G^-$ based on edge signs. Then, for each node $v_r∈ V_tr$ , constrained BFS trees are constructed from $G^+$ and $G^-$ , respectively, to generate a set of training subgraphs $S_tr=\{S_tr^+,S_tr^-\}$ by limiting the maximum number of paths $N$ and the maximum path length $L$ . These subgraphs are used to sample fake edges for adversarial training.
(2) Edge sampling for adversarial learning: For each node $v_r$ , we sample real edges from $G^+$ and $G^-$ , and fake edges from $S_tr^+$ and $S_tr^-$ . These edges are organized into four sets:
- $E_D^+$ : real and fake positive edges for training $D^+$ .
- $E_G^+$ : fake positive edges for training $G^+$ .
- $E_D^-$ : real and fake negative edges for training $D^-$ .
- $E_G^-$ : fake negative edges for training $G^-$ .
(3) Adversarial training with DPSGD: The training is performed over $n^epoch$ epochs. In each epoch:
- Discriminator training: For each discriminator $D^+$ and $D^-$ , we perform $n^iter$ iterations. In each iteration, a batch of $B_d$ real and fake edges is sampled. The discriminator parameters $θ_D$ are updated using gradient descent with noise addition according to the DPSGD mechanism (Eq. (13)), ensuring node-level DP. The privacy budget $\hat{δ}$ is tracked, and training stops early if $\hat{δ}>δ$ .
- Generator training: Each generator $G^+$ and $G^-$ is trained for $n^iter$ iterations. In each iteration, a batch of $B_g$ fake edges is sampled, and the generator parameters $θ_G$ are updated by maximizing the generator objective (Eq. (24)).
(4) Embedding output for downstream tasks: After all epochs, the generator parameters $θ_G$ encode the privacy-preserving node embeddings $g_v∈θ_G$ , which are used for downstream tasks such as edge sign prediction and node clustering.
## Appendix G Details of Lemma
The following lemmas are used for proving Theorem 1:
**Lemma 0 (Receptive field of a node)**
*As shown in Fig. 4 (b), we define the receptive field of a node as the region (i.e., the set of nodes) over which it can exert influence. Accordingly, for a subgraph constructed from paths sampled on constrained BFS-trees (Fig. 4 (b)), the maximum receptive field size of $v_r$ is given by $R_N,L=∑_l=0^LN^l=\frac{N^L+1-1}{N-1}≤ B$ .*
**Lemma 0**
*Let $S_tr$ denote the training set of subgraphs constructed from constrained BFS-tree paths, and $S(v)⊂ S_tr$ denote the subgraph subset that contains the node $v$ . Since $R_N,L$ represents the upper bound on the number of occurrences of any node in $S_tr$ , it follows that $|S(v)|≤ R_N,L$ . The proof of Lemma 2 is illustrated in App. H.*
## Appendix H Proof of Lemma 2
Proof. We proceed by induction (13) on the path length $L$ of the BFS-tree.
Base case: When $L=0$ , each sampled subgraph $S(v)$ contains exactly the training node $v∈ V_tr$ itself. Thus, every node appears in one subgraph, trivially satisfying the bound $|S(v)|=R_N,0=1$ .
Inductive hypothesis: Assume that for some fixed $L≥ 0$ , any $v∈ V_tr$ appears in at most $R_N,L$ subgraphs constructed from constrained BFS-tree paths. Let $S^L(v)$ denote a subgraph set with $L$ path length. Thus, the hypothesis is $|S^L(v)|≤ R_N,L$ for any $v$ .
Inductive step: We further show that the above hypothesis also holds for $L+1$ path length: Let $T_u^\prime$ represent the $L$ -length BFS-tree rooted at $u^\prime$ . If $T_u^\prime∈ S^L+1(v)$ , there must exit node $u$ such that $u∈ T_u^\prime$ and $T_u∈ S^L(v)$ . According to the setting of Algorithm 1, the number of such nodes $u$ is at most $N$ . By the hypothesis, there are at most $R_N,L-1$ such $u^\prime≠ v$ such that $T_u^\prime∈ S^L+1(v)$ . Based on these upper bounds, we can derive the upper bound matching the inductive hypothesis for $L+1$ :
$$
≤ft|S^L+1(v)\right|≤ N·(R_N,L-1)+1=\frac{N^L+2-1}{N-1}=R_N,L+1. \tag{25}
$$
By induction, the Lemma 2 holds for all $L≥ 0$ .
## Appendix I Proof of Theorem 2
The following lemmas are used for proving Theorem 2:
**Lemma 0 (Adaptation of Lemma 5 from(34))**
*Let $N(μ,σ^2)$ represent the Gaussian distribution with mean $μ$ and standard deviation $σ^2$ , it holds that:
$$
D_α≤ft(N≤ft(μ,σ^2\right)\|N≤ft(0,σ^2\right)\right)=\frac{αμ^2}{2σ^2} \tag{26}
$$*
**Lemma 0 (Adaptation of Lemma 25 from(33))**
*Assume $μ_0,...,μ_n$ and $η_0,...,η_n$ are probability distributions over some domain $Z$ such that their Rényi divergences satisfy: $D_α(μ_0||η_0)≤ε_0,...,D_α(μ_n||η_n)≤ε_n$ for some given $ε_0,...,ε_n$ . Let $ρ$ be a probability distribution over $\{0,...,n\}$ . Denoted by $μ_ρ$ ( $η_ρ$ , respectively) the probability distribution on $Z$ obtained by sampling $i$ from $ρ$ and then randomly sampling from $μ_i$ and $η_i$ , we have:
$$
D_α≤ft(μ_ρ\|η_ρ\right)≤\lnE_i∼ρ≤ft[e^ε_i(α-1)\right]=\frac{1}{α-1}\ln∑_i=0^nρ_ie^ε_i(α-1) \tag{27}
$$*
Proof of Theorem 2. Consider any minibatch $B_t$ randomly sampled from the training subgraph set $S_tr$ of Algorithm 2 at iteration $t$ . For a subset $S(v^*)⊂ S_tr$ containing node $v^*$ , its size is bounded by $R_N,L$ (Lemma 2). Define the random variable $β$ as $|S(v^*)∩B_t|$ , and its distribution follows the hypergeometric distribution $Hypergeometric(|S_tr|,R_N,L,|B_t|)$ (32):
$$
β_i=P[β=i]\ext@arrow 0099\arrowfill@\Relbar\Relbar\Relbar{|S_tr|=N_tr}{|B_t|=B_d}\frac{\binom{R_N,L}{i}\binom{N_tr-R_N,L}{B_d-i}}{\binom{N_tr}{B_d}}. \tag{28}
$$
Next, consider the training of the discriminators (Lines 12–18 and 24–30 in Algorithm 2). Let $G$ and $G^\prime$ be two adjacent graphs differing only in the presence of node $v^*$ and its associated signed edges. Based on the gradient perturbation applied in Lines 15 and 27 of Algorithm 2, we have:
$$
\displaystyle\tilde{{g}}_t \displaystyle={g}_t+N≤ft(0,σ^2Δ_{g}^2I\right)=∑_v∈B_tClip_C(\frac{∂ L_D}{∂d_v})+N≤ft(0,σ^2Δ_{g}^2I\right) \displaystyle\tilde{{g}}^\prime_t \displaystyle={g}^\prime_t+N≤ft(0,σ^2Δ_{g}^2I\right)=∑_v^\prime∈B_tr^\primeClip_C(\frac{∂ L_D}{∂d_v^\prime})+N≤ft(0,σ^2Δ_{g}^2I\right), \tag{29}
$$
where $Δ_{g}=R_N,LC=\frac{N^L+1-1}{N-1}C$ (Theorem 1). $\tilde{{g}}_t$ and $\tilde{{g}}^\prime_t$ denote the noisy gradients of $G$ and $G^\prime$ , respectively. When $β=i$ , their Rényi divergences can be upper bounded as:
$$
\displaystyleD_α \displaystyle≤ft(\tilde{{g}}_t,i\|\tilde{{g}}_t,i^\prime\right) \displaystyle=D_α≤ft({g}_t,i+N≤ft(0,σ^2Δ_{g}^2I\right)\|{g}_t,i^\prime+N≤ft(0,σ^2Δ_{g}^2I\right)\right) \displaystyle=D_α≤ft(N≤ft({{g}}_t,i,σ^2Δ_{g}^2I\right)\|N≤ft({{g}^\prime}{}_t,i,σ^2Δ_{g}^2I\right)\right) \displaystyle\stackrel{{\scriptstyle(a)}}{{=}}D_α≤ft(N≤ft(≤ft({{g}}_t,i-{{g}}_t,i^\prime\right),σ^2Δ_{g}^2I\right)\|N≤ft(0,σ^2Δ_{g}^2I\right)\right) \displaystyle\stackrel{{\scriptstyle(b)}}{{≤}}\sup_\|{Δ_{i}\|_2≤ iC}D≤ft(N≤ft(Δ_i,σ^2Δ_{g}^2I\right)\|N≤ft(0,σ^2Δ_{g}^2I\right)\right) \displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\sup_\|Δ_{i\|_2≤ iC}\frac{α\|Δ_i\|_2^2}{2Δ_{g}^2σ^2}=\frac{α i^2}{2R_N,L^2σ^2}, \tag{30}
$$
where $Δ_i={{g}}_t,i-{{g}}_t,i^\prime$ . (a) leverages the property that Rényi divergence remains unchanged under invertible transformations (34), while (b) and (c) are derived from Theorem 1 and Lemma 1, respectively. Based on Lemma 2 , we derive that:
$$
\displaystyleD_α≤ft(\tilde{{g}}_t\|\tilde{{g}}_t^\prime\right)≤\lnE_i∼β≤ft[\exp≤ft({\frac{α i^2(α-1)}{2R_N,L^2σ^2}}\right)\right] \displaystyle=\frac{1}{α-1}\ln≤ft(∑_i=0^R_N,Lβ_i\exp≤ft({\frac{α i^2(α-1)}{2R_N,L^2σ^2}}\right)\right)=γ. \tag{31}
$$
Here, $β_i$ is illustrated in Eq. (28). Based on the composition property of DP, after $T=n^epoch· n^iter$ interations, the discriminators satisfy node-level $(α,{2T}γ)$ -RDP. Moreover, owing to the post-processing property of DP, the generators $G^+$ and $G^-$ inherit the same privacy guarantee as the discriminators. Therefore, Algorithm 2 obeys node-level $(α,{2T}γ)$ -RDP, and the proof of Theorem 2 is completed.
## Appendix J Additional Details of Experiments
### J.1. Dataset Introduction
The detailed introduction of all datasets is as follows.
- Bitcoin-Alpha and Bitcoin-OTC are trust networks among Bitcoin users, aimed at preventing transactions with fraudulent or high-risk users. In these networks, user relationships are represented by positive (trust) and negative (distrust) edges.
- Slashdot is a social network derived from user interactions on a technology news site, where relationships are annotated as positive (friend) or negative (enemy) edges.
- WikiRfA is a voting network for electing managers in Wikipedia, where edges denote positive (supporting vote) or negative (opposing vote) relationships between users.
- Epinions is a product review site where users can establish both trust and distrust relationships with others.
### J.2. The Distribution of Node Degrees and Path Lengths
The findings for the distribution of node degrees and path lengths in the Bitcoin-Alpha and Slashdot datasets are shown in Figs. 8 and 9.
<details>
<summary>x8.png Details</summary>

### Visual Description
## Bar Charts with Overlaid Line Graphs: Node Degree Distributions
### Overview
The image displays two side-by-side bar charts, each overlaid with a line graph. They illustrate the distribution of node degrees (number of connections) within two different networks: "Bitcoin-Alpha" and "Slashdot". Both charts show a classic "long-tail" or power-law distribution, where a large proportion of nodes have a low degree, and a very small proportion have a high degree.
### Components/Axes
* **Chart Layout:** Two charts arranged horizontally.
* **Chart (a) Title:** "(a) Bitcoin-Alpha" (centered below the chart).
* **Chart (b) Title:** "(b) Slashdot" (centered below the chart).
* **X-Axis (Both Charts):**
* **Label:** "Degree"
* **Scale:** Linear, with integer markers from 1 to 7.
* **Y-Axis (Both Charts):**
* **Label:** "Proportion of Node (%)"
* **Scale:** Linear, percentage.
* **Chart (a) Range:** 0 to 40, with major ticks at 0, 10, 20, 30, 40.
* **Chart (b) Range:** 0 to 60, with major ticks at 0, 20, 40, 60.
* **Data Series:**
* **Bars:** Represent the proportion of nodes at each specific degree. They are filled with a light blue color and a dotted pattern.
* **Line:** A solid, darker blue line that connects the top-center of each bar, visualizing the overall distribution trend.
* **Legend:** Not explicitly present. The bar and line represent the same data series (proportion per degree).
### Detailed Analysis
**Chart (a) Bitcoin-Alpha:**
* **Trend:** The line slopes steeply downward from left to right, indicating a rapid decrease in node proportion as degree increases.
* **Data Points (Approximate):**
* Degree 1: ~35%
* Degree 2: ~18%
* Degree 3: ~10%
* Degree 4: ~6%
* Degree 5: ~4%
* Degree 6: ~3%
* Degree 7: ~2%
**Chart (b) Slashdot:**
* **Trend:** The line shows an even steeper initial decline than chart (a), dropping sharply from degree 1 to degree 2, then continuing to decline.
* **Data Points (Approximate):**
* Degree 1: ~60%
* Degree 2: ~15%
* Degree 3: ~8%
* Degree 4: ~4%
* Degree 5: ~2%
* Degree 6: ~1%
* Degree 7: <1%
### Key Observations
1. **Dominance of Low-Degree Nodes:** In both networks, the vast majority of nodes have a very low degree (1 or 2). For Slashdot, over half of all nodes (≈60%) have a degree of exactly 1.
2. **Steep Drop-off:** The proportion of nodes falls off dramatically after the first degree. The drop from degree 1 to degree 2 is particularly severe in the Slashdot network (≈60% to ≈15%).
3. **Long Tail:** Both distributions have a "long tail" extending to higher degrees (up to 7 shown), but the proportion of nodes in this tail is very small (single-digit percentages).
4. **Relative Scale:** The Slashdot network exhibits a more extreme concentration of nodes at the lowest degree compared to Bitcoin-Alpha.
### Interpretation
These charts are visualizing the **degree distribution** of two social or trust networks. A power-law distribution like this is a hallmark of **scale-free networks**.
* **What it suggests:** The data demonstrates that both the Bitcoin-Alpha (likely a trust network from a Bitcoin trading platform) and Slashdot (a technology news site with a friend/foe linking system) networks are highly centralized in terms of connectivity. A tiny minority of "hub" nodes have many connections, while the overwhelming majority of nodes are peripheral with only one or two links.
* **How elements relate:** The bar chart provides the precise proportion for each discrete degree, while the overlaid line graph emphasizes the continuous, decaying nature of the distribution. The steeper curve for Slashdot indicates its network structure is even more centralized around a few hubs than Bitcoin-Alpha.
* **Notable Anomalies/Patterns:** There are no anomalies; the charts show a textbook example of a skewed distribution common in real-world networks. The key pattern is the **extreme inequality in connectivity**. This structure has implications for network resilience (robust to random failure but vulnerable to targeted attacks on hubs) and information spread (hubs can disseminate information rapidly). The investigation here is **deductive**: starting from the observed visual pattern (steep curve) and applying network theory to infer the underlying structural properties of the systems being measured.
</details>
Figure 8. Distribution of node degrees.
<details>
<summary>x9.png Details</summary>

### Visual Description
## Histograms: Path Length Distributions in Two Networks
### Overview
The image displays two side-by-side histograms, each showing the distribution of path lengths between node pairs in a network. The charts compare the "Bitcoin-Alpha" and "Slashdot" datasets. Both charts share the same visual style: blue, diagonally hatched bars representing the empirical data, and a smooth red curve representing a fitted theoretical distribution (likely a Poisson or similar unimodal distribution).
### Components/Axes
**Common Elements (Both Charts):**
* **X-Axis:** Labeled "Path Length". It is a discrete axis with integer markers from 1 to 7.
* **Y-Axis:** Labeled "Proportion of Node Pairs (%)". It represents a percentage.
* **Data Series:**
* **Bars:** Blue with diagonal hatching (///). Represent the observed proportion of node pairs for each integer path length.
* **Curve:** Solid red line. Represents a fitted probability distribution over the continuous range of path lengths.
* **Grid:** A light gray grid is present in the background.
**Chart-Specific Elements:**
* **Chart (a) - Left:**
* **Title/Caption:** "(a) Bitcoin-Alpha" (located below the chart).
* **Y-Axis Scale:** Ranges from 0 to 50, with major ticks at 0, 10, 20, 30, 40, 50.
* **Chart (b) - Right:**
* **Title/Caption:** "(b) Slashdot" (located below the chart).
* **Y-Axis Scale:** Ranges from 0 to 60, with major ticks at 0, 20, 40, 60.
### Detailed Analysis
**Chart (a) Bitcoin-Alpha:**
* **Trend:** The distribution is roughly symmetric and unimodal, peaking around a path length of 3-4.
* **Data Points (Approximate from bars):**
* Path Length 1: ~0.5%
* Path Length 2: ~8%
* Path Length 3: ~40%
* Path Length 4: ~41%
* Path Length 5: ~10%
* Path Length 6: ~1%
* Path Length 7: ~0%
* **Fitted Curve:** The red curve peaks at approximately 45% between path lengths 3 and 4, closely following the shape of the bar chart.
**Chart (b) Slashdot:**
* **Trend:** The distribution is more sharply peaked and slightly right-skewed compared to Bitcoin-Alpha.
* **Data Points (Approximate from bars):**
* Path Length 1: ~0%
* Path Length 2: ~2%
* Path Length 3: ~24%
* Path Length 4: ~64%
* Path Length 5: ~8%
* Path Length 6: ~0.5%
* Path Length 7: ~0%
* **Fitted Curve:** The red curve peaks at approximately 65% at path length 4, showing a very sharp ascent and descent.
### Key Observations
1. **Dominant Path Length:** In both networks, the most common path length between node pairs is 4 hops.
2. **Distribution Shape:** The Slashdot network has a much more concentrated distribution. Over 60% of all node pairs are exactly 4 hops apart, compared to about 41% in Bitcoin-Alpha. Bitcoin-Alpha has a broader spread, with a significant proportion (40%) at 3 hops as well.
3. **Network Diameter:** Both networks appear to have a small effective diameter, with very few pairs requiring more than 5 or 6 hops. The proportion drops to near zero by path length 7.
4. **Model Fit:** The red theoretical curves provide a good visual fit to the empirical bar data in both cases, suggesting the path length distributions can be well-approximated by a standard statistical model.
### Interpretation
These histograms characterize the "small-world" nature of the Bitcoin-Alpha and Slashdot social/trust networks. The data suggests:
* **High Connectivity & Efficiency:** The concentration of path lengths around 3-4 indicates these are highly connected networks where information, trust, or influence can propagate quickly between any two members. This is a hallmark of efficient social or transactional networks.
* **Structural Difference:** The sharper peak in the Slashdot data implies a more uniform or regular network structure. The broader distribution in Bitcoin-Alpha could indicate a more heterogeneous structure, possibly with distinct communities or clusters that create a slightly wider variety of connection distances.
* **Practical Implication:** For applications like information diffusion or reputation scoring, the short path lengths mean that effects can be expected to spread rapidly across the entire network. The difference in distribution shape might affect the speed and uniformity of such processes between the two platforms. The excellent fit of the theoretical curves suggests the underlying network growth or connection mechanisms may follow predictable statistical patterns.
</details>
Figure 9. Distribution of path lengths.
### J.3. The detailed results of Edge Sign Prediction
The average AUC results under different values of $ε$ and datasets for edge prediction tasks are detailed in Table 6.
Table 6. Summary of average AUC with different $ε$ and datasets for edge sign prediction tasks. (BOLD: Best)
| Bitcoin-OTC | SDGNN | 0.7655 | 0.7872 | 0.7913 | 0.8105 | 0.8571 |
| --- | --- | --- | --- | --- | --- | --- |
| SiGAT | 0.7011 | 0.7282 | 0.7869 | 0.8379 | 0.8706 | |
| SGCN | 0.5565 | 0.5740 | 0.6634 | 0.7516 | 0.7801 | |
| GAP | 0.5763 | 0.5782 | 0.6486 | 0.6741 | 0.7411 | |
| LSNE | 0.5030 | 0.5405 | 0.7041 | 0.8239 | 0.8776 | |
| ASGL | 0.8004 | 0.8462 | 0.8488 | 0.8505 | 0.8801 | |
| Bitcoin-Alpha | SDGNN | 0.6761 | 0.6883 | 0.7098 | 0.7308 | 0.8476 |
| SiGAT | 0.7033 | 0.7215 | 0.7303 | 0.7488 | 0.8207 | |
| SGCN | 0.5157 | 0.5450 | 0.6433 | 0.6930 | 0.7702 | |
| GAP | 0.5664 | 0.6025 | 0.6367 | 0.7091 | 0.7320 | |
| LSNE | 0.5112 | 0.5361 | 0.5959 | 0.6524 | 0.8069 | |
| ASGL | 0.7505 | 0.8075 | 0.8589 | 0.8591 | 0.8592 | |
| WikiRfA | SDGNN | 0.6558 | 0.7066 | 0.7142 | 0.7267 | 0.7930 |
| SiGAT | 0.6313 | 0.6525 | 0.7023 | 0.7777 | 0.8099 | |
| SGCN | 0.5107 | 0.6456 | 0.6515 | 0.7008 | 0.7110 | |
| GAP | 0.5356 | 0.5506 | 0.5612 | 0.5717 | 0.5937 | |
| LSNE | 0.5086 | 0.5253 | 0.6119 | 0.6553 | 0.7832 | |
| ASGL | 0.6680 | 0.7706 | 0.7963 | 0.7986 | 0.8100 | |
| Slashdot | SDGNN | 0.7547 | 0.8325 | 0.8697 | 0.8788 | 0.8862 |
| SiGAT | 0.7061 | 0.7886 | 0.8392 | 0.8424 | 0.8527 | |
| SGCN | 0.5662 | 0.6151 | 0.6662 | 0.7181 | 0.8093 | |
| GAP | 0.6121 | 0.6389 | 0.6879 | 0.7126 | 0.7471 | |
| LSNE | 0.5717 | 0.6144 | 0.7541 | 0.7753 | 0.7816 | |
| ASGL | 0.7861 | 0.8539 | 0.8887 | 0.8890 | 0.8910 | |
| Epinions | SDGNN | 0.6788 | 0.7180 | 0.7201 | 0.7455 | 0.8428 |
| SiGAT | 0.6772 | 0.7046 | 0.7063 | 0.7702 | 0.8253 | |
| SGCN | 0.6152 | 0.6487 | 0.6974 | 0.7502 | 0.8318 | |
| GAP | 0.5899 | 0.6034 | 0.6288 | 0.6310 | 0.6618 | |
| LSNE | 0.5033 | 0.6055 | 0.7590 | 0.8434 | 0.8585 | |
| ASGL | 0.6869 | 0.8134 | 0.8513 | 0.8658 | 0.8666 | |
### J.4. The detailed results of node clustering
The average SSI results under different values of $ε$ and datasets for node clustering tasks are detailed in Table 7.
Table 7. Summary of average SSI with different $ε$ and datasets for node clustering tasks. (BOLD: Best)
| 1 | Bitcoin-Alpha | 0.4819 | 0.4378 | 0.4877 | 0.4977 | 0.4988 | 0.5091 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Bitcoin-OTC | 0.4505 | 0.4677 | 0.5025 | 0.4970 | 0.5008 | 0.5160 | |
| Slashdot | 0.4715 | 0.5011 | 0.5025 | 0.5052 | 0.5005 | 0.5107 | |
| WikiRfA | 0.4788 | 0.4988 | 0.4968 | 0.4890 | 0.5003 | 0.5126 | |
| Epinions | 0.5001 | 0.4965 | 0.5022 | 0.5013 | 0.6095 | 0.6106 | |
| 2 | Bitcoin-Alpha | 0.4910 | 0.4733 | 0.4969 | 0.4985 | 0.5032 | 0.5402 |
| Bitcoin-OTC | 0.4733 | 0.4968 | 0.5075 | 0.4986 | 0.5729 | 0.6810 | |
| Slashdot | 0.4888 | 0.4864 | 0.4871 | 0.5134 | 0.5132 | 0.5494 | |
| WikiRfA | 0.4934 | 0.5054 | 0.5117 | 0.4996 | 0.5032 | 0.5577 | |
| Epinions | 0.5068 | 0.5116 | 0.5086 | 0.5463 | 0.6263 | 0.6732 | |
| 4 | Bitcoin-Alpha | 0.5019 | 0.4948 | 0.5112 | 0.5049 | 0.6204 | 0.6707 |
| Bitcoin-OTC | 0.5005 | 0.5325 | 0.5612 | 0.5465 | 0.6953 | 0.7713 | |
| Slashdot | 0.5003 | 0.5685 | 0.5545 | 0.5671 | 0.5444 | 0.5994 | |
| WikiRfA | 0.5005 | 0.5142 | 0.5538 | 0.5476 | 0.5644 | 0.5977 | |
| Epinions | 0.5148 | 0.5389 | 0.5386 | 0.6255 | 0.6747 | 0.6787 | |
<details>
<summary>x10.png Details</summary>

### Visual Description
## Grouped Bar Chart: AUC Comparison of ASGL Variants Across Datasets
### Overview
The image displays a grouped bar chart comparing the Area Under the Curve (AUC) performance of three model variants—ASGL-, ASGL+, and ASGL—across four different network datasets. The chart uses hatching patterns instead of solid colors to distinguish the data series.
### Components/Axes
* **Chart Type:** Grouped Bar Chart.
* **Y-Axis:**
- **Label:** "AUC"
- **Scale:** Linear, ranging from 0.65 to 0.90, with major gridlines at intervals of 0.05 (0.65, 0.70, 0.75, 0.80, 0.85, 0.90).
* **X-Axis:**
- **Categories (Datasets):** Four distinct categories are labeled: "Bitcoin-Alpha", "Bitcoin-OCT", "Slashdot", and "WikiRfA".
* **Legend:**
- **Position:** Top-left corner of the chart area.
- **Series:**
1. **ASGL-:** Represented by bars with a diagonal stripe pattern (lines sloping down from left to right).
2. **ASGL+:** Represented by bars with a dotted pattern.
3. **ASGL:** Represented by bars with a cross-hatch pattern (diagonal lines in both directions).
### Detailed Analysis
The chart presents AUC values for each variant on each dataset. Values are approximate based on visual alignment with the Y-axis gridlines.
**1. Bitcoin-Alpha:**
* **ASGL- (Diagonal Stripes):** ~0.83
* **ASGL+ (Dots):** ~0.84
* **ASGL (Cross-hatch):** ~0.86
* **Trend:** Performance increases from ASGL- to ASGL+ to ASGL.
**2. Bitcoin-OCT:**
* **ASGL- (Diagonal Stripes):** ~0.805
* **ASGL+ (Dots):** ~0.825
* **ASGL (Cross-hatch):** ~0.845
* **Trend:** Performance increases from ASGL- to ASGL+ to ASGL.
**3. Slashdot:**
* **ASGL- (Diagonal Stripes):** ~0.86
* **ASGL+ (Dots):** ~0.81
* **ASGL (Cross-hatch):** ~0.89
* **Trend:** Non-monotonic. ASGL- performs better than ASGL+, but ASGL achieves the highest score.
**4. WikiRfA:**
* **ASGL- (Diagonal Stripes):** ~0.785
* **ASGL+ (Dots):** ~0.70
* **ASGL (Cross-hatch):** ~0.805
* **Trend:** Non-monotonic. ASGL+ shows a significant drop in performance compared to ASGL-, while ASGL recovers to achieve the highest score for this dataset.
### Key Observations
1. **Consistent Leader:** The **ASGL** variant (cross-hatch pattern) achieves the highest AUC score on all four datasets.
2. **Dataset Difficulty:** The **WikiRfA** dataset appears to be the most challenging, as all three variants show their lowest performance here. The **ASGL+** variant's performance on WikiRfA (~0.70) is a notable outlier, being the lowest single data point in the chart.
3. **Anomalous Trend on Slashdot:** The **Slashdot** dataset is the only one where the **ASGL-** variant outperforms the **ASGL+** variant. This breaks the pattern seen in the Bitcoin datasets where ASGL+ consistently scored higher than ASGL-.
4. **Visual Encoding:** The chart relies entirely on hatching patterns for differentiation, which is effective for black-and-white printing but requires careful visual matching between the legend and bars.
### Interpretation
This chart demonstrates the comparative effectiveness of the ASGL method and its ablated versions (ASGL-, ASGL+) for a link prediction or network analysis task (inferred from the use of AUC and common network dataset names).
* **Core Finding:** The full **ASGL** model is robust and superior across diverse network types (cryptocurrency trust networks like Bitcoin-Alpha/OCT, social news like Slashdot, and Wikipedia adminship votes like WikiRfA). This suggests its algorithmic components work synergistically.
* **Ablation Insight:** Removing components (to create ASGL- and ASGL+) generally harms performance, but the impact is dataset-dependent. The severe drop of **ASGL+** on **WikiRfA** indicates that the component removed in ASGL- but present in ASGL+ is critical for that specific type of voting/controversy network. Conversely, on **Slashdot**, that same component may be slightly detrimental or noisy.
* **Practical Implication:** The results argue for using the complete ASGL model for general-purpose application. However, the performance variance of the ablated versions provides diagnostic insight into which parts of the model are most valuable for different network structures, guiding future refinement. The chart effectively communicates that model architecture choices have non-uniform effects across different data domains.
</details>
Figure 10. Comparison between ASGL, $ASGL^+$ , and $ASGL^-$ .
### J.5. The Setup of Link Stealing Attack
Motivated by (42), we assume that the adversary has black-box access to the node embeddings produced by the target signed graph learning model, but not to its internal parameters or gradients. The adversary also possesses an auxiliary graph dataset comprising node pairs that partially overlap in distribution with the target graph. Some of these node pairs belong to the training graph (members), while others are from the test graph (non-members). For each node pair, a feature vector is constructed by concatenating their embeddings. Finally, these feature vectors, along with their corresponding member or non-member labels, are then used to train a logistic regression classifier to infer whether an edge exists between any two nodes of the target graph. To simulate this link stealing attack, each dataset is partitioned into target training, auxiliary training, target test, and auxiliary test sets with a 5:2:2:1 ratio.
### J.6. Effectiveness of Adversarial Learning with Edge Signs.
To verify the effectiveness of adversarial learning with signed edges, we also compare our ASGL with its variants, denoted as $ASGL^+$ and $ASGL^-$ . Specifically, $ASGL^+$ and $ASGL^-$ only operate on the positive graph $G^+$ and the negative graph $G^-$ , respectively. Fig. 10 presents the average AUC scores of ASGL, $ASGL^+$ , and $ASGL^-$ across all datasets. It can be observed that ASGL significantly outperforms both $ASGL^+$ and $ASGL^-$ in all cases. These results demonstrate that our privacy-preserving adversarial learning framework with edge signs is more effective in representing signed graphs compared to its variants that neglect edge sign information.