2406.11888

Model: healer-alpha-free

# Neural logic programs and neural nets **Authors**: Christian Antić > Vienna University of Technology Vienna, Austria ## Abstract Neural-symbolic integration aims to combine the connectionist subsymbolic with the logical symbolic approach to artificial intelligence. In this paper, we first define the answer set semantics of (boolean) neural nets and then introduce from first principles a class of neural logic programs and show that nets and programs are equivalent. ## 1. Introduction Artificial neural nets are inspired by the human brain (?, ?) with numerous applications in artificial intelligence research such as pattern recognition (cf. ?), deep inductive learning through backpropagation (cf. ?), and game playing with AlphaGo beating the human champion in the game of Go (?), just to name a few. Neural nets are at the core of what is known as the connectionist https://plato.stanford.edu/entries/connectionism/ or subsymbolic approach to AI. The mathematical subjects behind neural nets are analysis, probability theory, and statistics. Logic programming, on the other hand, represents the symbolic and logical approach to (Good Old-Fashioned) AI (?, ?). https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence It has its roots in mathematical logic and automated reasoning with discrete mathematics, particularly logic, algebra, and combinatorics as its main mathematical tools. Both worlds, the subsymbolic and the symbolic, have their strengths and weaknesses. Logical formalisms can be interpreted by humans and have a clear formal semantics which is missing for neural nets. Connectionist systems, on the other hand, have a remarkable noise-tolerance and learning capability which is missing for logical formalisms (a notable exception is inductive logic programming (?)). Neural-symbolic integration therefore aims at integrating both the symbolic and subsymbolic worlds (?) (cf. ?, ?, ?, ?). Compared to the field’s short existence, its successes are remarkable and can be found in various fields such as bioinformatics, control engineering, software verification and adaptation, visual intelligence, ontology learning, and computer games (?, ?, ?). Moreover, it is related to statistical relational learning and probabilistic logic learning (?, ?, ?, ?). The main contributions of this paper can be summarized as follows: 1. First, in § 2 we define an answer set semantics (?) (cf. ?, ?, ?, ?) for (boolean) neural nets by defining an immediate consequence and a Fitting operator (?, ?) of a net and by applying Approximation Fixed Point Theory (AFT) (?, ?, ?), which is a non-monotonic generalization of the well-known Knaster-Tarski Theory (?) of monotonic lattice operators with numerous applications in answer set programming (e.g. ?, ?, ?). To the best of our knowledge, this paper is the first to recognize that such a semantics can be given to (boolean) neural nets (under the assumption that we can remember a neuron’s active state). 1. Second, in § 3 we introduce from first principles neural logic programs as programs assembled from neurons. We define the least model semantics of positive programs, and the answer set semantics of arbitrary programs again by applying AFT; moreover, we define the FLP-answer set semantics in the ordinary sense in terms of the Faber-Leone-Pfeifer reduct (?, ?). 1. Finally, in § 4 we prove that neural nets and neural logic programs are equivalent. In a sense, our approach is dual to the well-known core method (?, ?) where one starts with a propositional logic program and then constructs a neural net that simulates that program. The core method is the basis for the simulation-based work on neural-symbolic integration as presented in ? (?). In a broader sense, this paper is a further step towards neural-symbolic integration. ## 2. Neural nets In this section, we first recall (boolean) neural nets as presented for example in ? (?, §3) (with the important difference that we assume here that neurons remain activated) where we simplify the functionality of a neuron to a boolean output computed from the weighted sum of its inputs and a given threshold. We then define the least model semantics of positive nets (§ 2.2) and the answer set semantics of arbitrary nets (§ 2.3), which appears to be original. In what follows, we denote the reals by $\mathbb{R}$ , the positive reals by $\mathbb{R}^{+}$ , $\mathbb{R}^{-\infty}:=\mathbb{R}\cup\{-\infty\}$ , and the booleans by $\mathbb{B}=\{0,1\}$ . We denote the cardinality of a set $X$ by $|X|$ . We define the empty sum $\sum_{i\in\emptyset}a_{i}:=-\infty$ . This convention will be used in (2). A prefixed point of a function $f:L\to L$ on a lattice $L=(L,\leq)$ is any element $a\in L$ satisfying $f(a)\leq a$ , and we call $a$ a fixed point of $f$ iff $f(a)=a$ . ### 2.1. Neurons and nets Let $A$ be a set of neurons where each neuron $a\in A$ is determined by its threshold $\theta(a)\in\mathbb{R}^{-\infty}$ . A (neural) net over $A$ is a finite weighted directed graph $N$ with neurons from $A$ as vertices and edges $a\xrightarrow{w_{ab}}b$ , with $w_{ab}\in\mathbb{R}$ . In case $w_{ab}\neq 0$ we say that $a$ and $b$ are connected and in case $w_{ab}=0$ we say that $a$ and $b$ are disconnected. A net is positive iff it contains only positive weights. We write $a\in N$ in case the neuron $a\in A$ appears in $N$ . We define the body of a neuron $a\in N$ by | | $\displaystyle b_{N}(a):=\{b\in N\mid w_{ba}\neq 0\}.$ | | | --- | --- | --- | A fact is a neuron with empty body and no weights, and given a fact $a$ we always assume $\theta(a)=-\infty$ , and we assume $\theta(a)\neq-\infty$ in case $a$ is not a fact. This will allow us to initiate the computation process of a net (see (2)). We denote the facts in $N$ by $facts(N)$ . The facts will represent the input signals. An ordinary net is a net $N$ so that $\theta(a)=|b_{N}(a)|$ , for all $a\in N$ , and $w_{ba}=1$ for all $b\in b_{N}(a)$ . Intuitively, in an ordinary net $N$ a neuron $a$ “fires” iff each of its body neurons in $b_{N}(a)$ “fires.” ### 2.2. Least model semantics of positive nets An interpretation is any subset of $A$ and we denote the set of all interpretations over $A$ by $\mathbb{I}_{A}$ . We can interpret each interpretation $I$ as a function $I:A\to\mathbb{B}$ so that $I(a)=1$ iff $a\in I$ . In the literature (e.g. ?, §3), the functionality of a neural net is given with respect to a time point $t$ and the activation of a neuron $a$ at $t$ usually means that $a$ is inactive at $t+1$ unless there is a recurrenct connection from $a$ to itself. In this paper, we take a different approach as we assume that once a neuron $a$ is activated it remains active or, in another interpretation, it is remembered that $a$ was active. This allows the net to reach stable configurations which we identify with stable (or answer set) models. This will allow us in § 4 to show that nets and programs are equivalent. **Definition 1** *Define the (immediate consequence) operator of $N$ , for every interpretation $I$ , by $$ \displaystyle T_{N}(I):=\left\{a\in N\;\middle|\;\sum_{b\in b_{N}(a)}w_{ba}I(b )\geq\theta(a)\right\}. \tag{1} $$ The operator $T_{N}$ of a positive net $N$ ( $w_{ba}\geq 0$ ) is monotone in the sense that | | $\displaystyle I\subseteq J\quad\text{implies}\quad T_{N}(I)\subseteq T_{N}(J).$ | | | --- | --- | --- | We call an interpretation $I$ a model of $N$ iff $I$ is a prefixed point of $T_{N}$ , and we call $I$ a supported model of $N$ iff $I$ is a fixed point of $T_{N}$ .* Since the set of all interpretations over $A$ is a complete lattice with respect to union and intersection, it follows by the well-known Knaster-Tarksi Theory (?) that for a positive net $N$ , the operator $T_{N}$ has a least fixed point which can be obtained via a bottom-up iteration of the form | | $\displaystyle T_{N}^{0}$ | $\displaystyle:=\emptyset$ | | | --- | --- | --- | --- | We call $T_{N}^{\infty}$ the least model of $N$ . Notice that this bottom-up computation can only be initiated since we have assumed $\theta(a)=-\infty$ iff $a$ is a fact in $N$ , which implies $$ \displaystyle T_{N}(\emptyset)=\left\{a\in N\;\middle|\;\sum_{b\in\emptyset}b= -\infty\geq\theta(a)\right\}=facts(N). \tag{2} $$ This means that a positive net with no facts (i.e. no input signals) has always the empty least model. The definition of an immediate consequence operator of a neural net and the associated least model of a positive net appears to be new. ### 2.3. Answer set semantics of arbitrary nets The immediate consequence operator of an arbitrary neural net possible containing negative weights may be non-monotonic which means that its least fixed point may not exist. Approximation Fixed Point Theory (AFT) (?, ?, ?, ?) has been designed exactly for dealing with non-monotonic operators and it can be seen as a generalization of the Knaster-Tarski Theory from monotonic to non-monotonic lattice operators. In this section, we use AFT to define the answer set semantics of neural nets, which appears to be original, by following the standard procedure for defining answer sets in terms of the 3-valued Fitting operator (Definition 5). **Definition 2** *A pair of interpretations $(I,J)$ is a 3-interpretation iff $I\subseteq J$ . This can be interpreted as follows: - $a\in I$ means that $a$ is true, - $a\in J-I$ means that $a$ is undefined, - $a\not\in J$ means that $a$ is false.* **Definition 3** *Define the precision ordering between 3-interpretations by | | $\displaystyle(I,J)\subseteq_{p}(I^{\prime},J^{\prime})\quad\text{iff}\quad I \subseteq I^{\prime}\subseteq J^{\prime}\subseteq J.$ | | | --- | --- | --- |* **Definition 4** *Define the Fitting operator (cf. ?) of a net $N$ by | | $\displaystyle\Phi_{N}(I,J):=\left\{a\in N\;\middle|\;\sum_{b\in b_{N}(a)}w_{ba }K(b)\geq\theta(a),\text{ for all $I\subseteq K\subseteq J$}\right\}.$ | | | --- | --- | --- |* Notice that we have | | $\displaystyle\Phi_{N}(I,I)=T_{N}(I).$ | | | --- | --- | --- | The Fitting operator is monotone with respect to the precision ordering, that is, | | $\displaystyle(I,J)\subseteq_{p}(I^{\prime},J^{\prime})\quad\text{implies}\quad \Phi_{N}(I,J)\subseteq\Phi_{N}(I^{\prime},J^{\prime}).$ | | | --- | --- | --- | This implies that the operator $\Phi_{N}(\bullet,I)$ is monotone on the complete lattice of interpretations $\emptyset\subseteq J\subseteq I$ and thus has a least fixed point denoted by $\mathrm{lfp}(\Phi_{P}(\bullet,I))$ . We therefore can define the operator | | $\displaystyle\Phi_{N}^{\dagger}(I):=\mathrm{lfp}(\Phi_{N}(\bullet,I)).$ | | | --- | --- | --- | **Definition 5** *We call $I$ an answer set of $N$ iff $I=\Phi_{N}^{\dagger}(I)$ .* The definition of the Fitting operator of a neural net and the associated answer set semantics appears to be new. ### 2.4. Acyclic nets A net is acyclic iff it contains no cycle of non-zero weighted edges. Notice that the neurons in an acyclic net $N$ can be partitioned into layers where each neuron only has incoming edges from neurons of lower level. An $n$ -layer feed-forward net (or $n$ -net) is an acyclic net $N=N_{1}\cup\ldots\cup N_{n}$ (disjoint union) such that $N_{i}$ contains the neurons of level $i$ , $1\leq i\leq n$ ; we call $N_{1}$ the input layer and $N_{n}$ the output layer. Recall that we assume $\theta(a)=0$ for all input neurons $a\in N_{1}$ since $b_{N}(a)$ is empty means that each $a\in N_{1}$ is a fact (see § 2.1). Every $n$ -net $N=N_{1}\cup\ldots\cup N_{n}$ computes a (boolean) function $f_{N}:\mathbb{I}_{N_{1}}\to\mathbb{I}_{N_{n}}$ (notice that $I$ may contain only neurons from the input layer $N_{0}$ and $f_{N}(I)$ contains only neurons from the output layer $N_{n}$ ) by | | $\displaystyle f_{N}=T_{N_{n}}\circ\ldots\circ T_{N_{1}}$ | | | --- | --- | --- | so that for each interpretation $I\in\mathbb{I}_{N_{1}}$ , | | $\displaystyle f_{N}(I)=T_{N_{n}}(\ldots T_{N_{2}}(T_{N_{1}}(I)).$ | | | --- | --- | --- | ## 3. Neural logic programs In this section, we introduce neural logic programs as programs assembled from neurons. ### 3.1. Syntax Let $A$ be a finite set of neurons. A (neural logic) program over $A$ is a finite set of (neural) rules of the form $$ \displaystyle a_{0}\xleftarrow{\mathbf{w}}a_{1},\ldots,a_{k},\quad k\geq 0, \tag{3} $$ where $a_{0},\ldots,a_{k}\in A$ are neurons and $\mathbf{w}=(w_{a_{1}a_{0}},\ldots,w_{a_{k}a_{0}})\in\mathbb{R}^{k}$ , so that $w_{a_{i}a_{0}}\neq 0$ for all $1\leq i\leq k$ , are weights. A rule of the form (3) is positive iff all weights $w_{1},\ldots,w_{k}\geq 0$ are positive and a program is positive iff it consists only of positive rules. It will be convenient to define, for a rule $r$ of the form (3), the head of $r$ by $h(r):=a_{0}$ and the body of $r$ by $b(r):=\{a_{1},\ldots,a_{k}\}$ . A program is minimalist iff it contains at most one rule for each rule head. We define the dependency graph of $P$ by $dep(P):=(A_{P},E_{P})$ , where $A_{P}$ are the neurons occurring in $P$ , and there is an edge $a\xrightarrow{w_{ba}}b$ in $E_{P}$ iff there is a rule | | $\displaystyle a\xleftarrow{(\ldots,w_{ba},\ldots)}b_{1},\ldots,b_{i-1},b,b_{i+ 1},b_{k}\in P,\quad k\geq 1.$ | | | --- | --- | --- | Notice that the dependency graph of a program is a net! A program is acyclic iff its dependency graph is acyclic. Similar to acyclic nets, the neurons $A_{P}$ occurring in an acyclic program $P$ can be partitioned into layers $A_{P}=A^{1}_{P}\cup\ldots\cup A^{n}_{P}$ (disjoint union) such that for each rule $r\in P$ , if $h(r)\in A^{i}_{P}$ then $b\in A^{i-k}_{P}$ , $1\leq k\leq i-1$ , for every $b\in b(r)$ . An $n$ -program is an acyclic program which has a partitioning into $n$ layers. An ordinary rule is a rule of the form (3) with $\mathbf{w}=(1,\ldots,1)\in\mathbb{R}^{k}$ and $\theta(a_{0})=k$ written simply as $$ \displaystyle a_{0}\leftarrow a_{1},\ldots,a_{k}. \tag{4} $$ An ordinary program consists only of ordinary rules. ### 3.2. Answer set semantics We now define the semantics of a neural logic program. As for neural nets, an interpretation of a program is any subset of $A$ . **Definition 6** *The semantics of ordinary programs is defined as for ordinary propositional Horn logic programs inductively as follows: - for a neuron $a\in A$ , $I\models a$ iff $a\in I$ ; - for a set of neurons $B\subseteq A$ , $I\models B$ iff $B\subseteq I$ ; - for an ordinary rule $r$ of the form (4), $I\models r$ iff $I\models b(r)$ implies $I\models h(r)$ ; - for an ordinary program $P$ , $I\models P$ iff $I\models r$ for each $r\in P$ .* **Definition 7** *We define the semantics of neural logic programs inductively as follows: - For a neuron $a\in A$ , $I\models a$ iff $a\in I$ . - For a rule $r$ of the form (3), we define | | $\displaystyle I\models r\quad\text{iff}\quad\sum_{b\in b(r)}w_{ba}I(b)\geq \theta(h(r))\text{ implies }I\models h(r).$ | | | --- | --- | --- | - For a neural logic program $P$ , $I\models P$ iff $I\models r$ for every $r\in P$ , in which case we call $I$ a model of $P$ .* **Definition 8** *Define the (immediate consequence) operator (?) of $P$ , for every interpretation $I$ , by | | $\displaystyle T_{P}(I):=\left\{h(r)\;\middle|\;r\in P,\sum_{b\in b(r)}w_{ba}I( b)\geq\theta(h(r))\right\}.$ | | | --- | --- | --- |* Notice the similarity to the immediate consequence operator of a net in (1) which will be essential in § 4. If $P$ is ordinary, then we get the ordinary immediate consequence operator of ? (?). **Example 9** *Consider the program | | $\displaystyle P:=\left\{\begin{array}[]{l}a\\ b\xleftarrow{w}a\end{array}\right\}.$ | | | --- | --- | --- | We have | | $\displaystyle a\in T_{P}(\{a\})\quad\text{iff}\quad w\{a\}(a)\geq\theta(b) \quad\text{iff}\quad w\geq\theta(b).$ | | | --- | --- | --- | So for example if $w=0$ and $\theta(b)>0$ , the least model of $P$ is $\{a\}$ which differs from the least model $\{a,b\}$ of the ordinary program which we obtain from $P$ by putting $w=1$ and $\theta(b)=1$ .* **Fact 10** *An interpretation $I$ is a model of $P$ iff $I$ is a prefixed point of $T_{P}$ .* An interpretation $I$ is a supported model of $P$ iff $I$ is a fixed point of $T_{P}$ . A program is supported model equivalent to a net iff they have the same supported models. As for nets, the operator of a positive program $P$ is monotone and thus has a least fixed point, which by Fact 10 is a model of $P$ , called the least model of $P$ . A program $P$ is subsumption equivalent (?) to a net $N$ iff $T_{P}=T_{N}$ . Moreover, a positive program is least model equivalent to a net iff their least models coincide. As for nets, the operator of a non-positive program may be non-monotonic which means that that its least fixed point may not exists. We can define an answer set semantics of arbitrary programs in the same way as for arbitrary nets by defining the operators $\Phi_{P}$ and $\Phi_{P}^{\dagger}$ and by saying that $I$ is an answer set of $P$ iff $I=\Phi_{P}^{\dagger}(I)$ . Notice that this construction is the standard procedure for defining an answer set semantics of a program in terms of the Fitting operator (cf. ?, ?). A program is equivalent to a net iff their answer sets coincide. In contrast to nets, we can define the answer set semantics of a neural logic program in a direct way using the (Faber-Leone-Pfeifer) reduct (?, ?) defined, for every program $P$ and interpretation $I$ , by | | $\displaystyle P^{I}:=\{r\in P\mid I\models b(r)\}.$ | | | --- | --- | --- | We now can say that $I$ is an FLP-answer set of $P$ iff $I$ is a $\subseteq$ -minimal model of $P^{I}$ . Notice that we cannot define the reduct of a neural net in a reasonable way as we have no notion of “rule” for nets, which means that there is no notion of FLP-answer set for nets. ## 4. Equivalences We are now ready to prove the main results of the paper. **Theorem 11** *Every neural net is subsumption equivalent to a minimalist neural program.* * Proof* Given a net $N$ , define the minimalist program | | $\displaystyle P_{N}:=\left\{a\xleftarrow{\mathbf{w}}b_{N}(a)\;\middle|\;a\in N ,\mathbf{w}=(w_{ba}\mid b\in b_{N}(a))\right\}.$ | | | --- | --- | --- | Notice that | | $\displaystyle N=dep(P_{N}).$ | | | --- | --- | --- | For any interpretation $I$ , we have | | $\displaystyle T_{N}(I)$ | $\displaystyle=\left\{a\in N\;\middle|\;\sum_{b\in b_{N}(a)}w_{ba}I(b)\geq \theta(a)\right\}$ | | | --- | --- | --- | --- | ∎ **Remark 12** *Notice that the neural program $P_{N}$ constructed in the proof of Theorem 11 is minimalist in the sense that it contains at most one rule for each head. Moreover, if $N$ is acyclic then $P_{N}$ is acyclic as well.* **Corollary 13** *Every neural net is supported model equivalent to a minimalist neural program.* **Corollary 14** *Every positive neural net is least model equivalent to a positive minimalist neural program.* **Theorem 15** *Every ordinary neural net is subsumption equivalent to an ordinary minimalist neural program.* * Proof* The minimalist program $P_{N}$ as defined in the proof of Theorem 11 is equivalent to the ordinary program | | $\displaystyle\widehat{P}_{N}:=\{a\leftarrow b_{N}(a)\mid a\in N\}$ | | | --- | --- | --- | since | | $\displaystyle T_{P_{N}}(I)$ | $\displaystyle=\left\{h(r)\;\middle|\;r\in P_{N},\sum_{b\in b_{N}(a)}I(b)=|b_{N }(a)|\right\}$ | | | --- | --- | --- | --- | where the first identity follows from the assumption that $N$ is ordinary. ∎ **Theorem 16** *Every neural net is equivalent to a minimalist neural program.* * Proof* Given a net $N$ , for the minimalist program $P_{N}$ as constructed in the proof of Theorem 11, we have | | $\displaystyle\Phi_{N}(I,J)$ | $\displaystyle=\left\{a\in N\;\middle|\;\sum_{b\in b_{N}(a)}w_{ba}K(b)\geq \theta(a),\text{ for all $I\subseteq K\subseteq J$}\right\}$ | | | --- | --- | --- | --- | ∎ **Corollary 17** *Every $n$ -net is equivalent to an $n$ -program.* * Proof* Given an $n$ -net $N$ , the program $P_{N}$ in the proof of Theorem 11 is an $n$ -program equivalent to $N$ . ∎ ## 5. Future work In this paper, neurons are boolean in the sense that their outputs are boolean values from $\{0,1\}$ , which is essential for the translation into neural logic programs. If we allow neurons to have values in the reals (or any other semiring), then we need to translate them into weighted neural logic programs where the semantics is given within the reals (or the semiring) as well (cf. ?, ?, ?). This is important since learning strategies such as backpropagation work only in the non-boolean setting. This brings us to the next line of research which is to interpret well-known learning strategies such as backpropagation in the setting of neural logic programs and to analyze the role of the immediate consequence operator in learning. HEX programs (?) incorporating external atoms into answer set programs can be used to implement neural logic programs where each neuron is implemented by an external atom, given that HEX programs are generalized to allow external atoms to occur in rule heads (which still appears to be missing). Arguably the most fascinating line of future research is to lift the concepts and results of this paper from propositional to first-order neural logic programs. This requires to resolve the fundamental question of what a “first-order neuron” is, which is related to the problem of variable binding (cf. ?, ?, ?, ?) at the core of neural-symbolic integration (cf. ?). This line of work will be related to the recently introduced neural stochastic logic programs (?, ?) which extend probabilistic logic programs with neural predicates. Finally, it appears fundamental to introduce and study the sequential composition (?, ?) of neural logic programs for program composition and decomposition, since it provides an algebra of neural logic programs and thus an algebra of neural nets which can be used to define neural logic program proportions (?) of the form “ $P$ is to $Q$ what $R$ is to $S$ ” and thus analogical proportions between neural nets. Even more interesting are “mixed” proportions of the form $P:R::M:N$ where $P,R$ are ordinary programs and $M,N$ are neural programs representing neural nets, which ultimately gives us an algebraic connection between pure logic programs and neural nets in the form of proportional functors $F$ satisfying $P:R::F(P):F(R)$ (?). inline] ## 6. Conclusion In this paper, we defined the least model semantics of positive and the answer set semantics of arbitrary (boolean) neural nets. After that we defined from first principles the class of neural logic programs and showed that neural nets and neural programs are equivalent. In a broader sense, this paper is a further step towards neural-symbolic integration. ## References - Antić Antić, C. (2020). Fixed point semantics for stream reasoning. Artificial Intelligence, 288, 103370. https://doi.org/10.1016/j.artint.2020.103370. - Antić Antić, C. (2023a). Logic program proportions. Annals of Mathematics and Artificial Intelligence. https://doi.org/10.1007/s10472-023-09904-8. - Antić Antić, C. (2023b). Proportoids. https://arxiv.org/pdf/2210.01751.pdf. - Antić Antić, C. (2023c). Sequential composition of answer set programs. https://arxiv.org/pdf/2104.12156.pdf. - Antić Antić, C. (2024). Sequential composition of propositional logic programs. Annals of Mathematics and Artificial Intelligence, 92 (2), 505–533. https://doi.org/10.1007/s10472-024-09925-x. - Antić, Eiter, and Fink Antić, C., Eiter, T., and Fink, M. (2013). HEX semantics via approximation fixpoint theory. In Cabalar, P., and Son, T. C. (Eds.), LPNMR 2013, pp. 102–115. https://doi.org/10.1007/978-3-642-40564-8_11. - Apt Apt, K. R. (1990). Logic programming. In van Leeuwen, J. (Ed.), Handbook of Theoretical Computer Science, Vol. B, pp. 493–574. Elsevier, Amsterdam. - Bader, Hitzler, and Hölldober Bader, S., Hitzler, P., and Hölldober, S. (2008). Connectionist model generation: A first-order approach. Neurocomputing, 71, 2420–2432. - Baral Baral, C. (2003). Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, Cambridge. - Bishop Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. - Borges, d’Avila Garcez, and Lamb Borges, R. V., d’Avila Garcez, A., and Lamb, L. C. (2011). Learning and representing temporal knowledge in recurrent networks. IEEE Transactions on Neural Networks, 22 (12), 2409–2421. - Brewka, Eiter, and Fink Brewka, G., Eiter, T., and Fink, M. (2011). Nonmonotonic multi-context systems: a flexible approach for integrating heterogenous knowledge sources. In Balduccini, M., and Son, T. C. (Eds.), Logic Programming, Knowledge Representation, and Nonmonotonic Reasoning, pp. 233–258. Springer-Verlag, Berlin/Heidelberg. - Browne and Sun Browne, A., and Sun, R. (1999). Connectionist variable binding. Expert Systems, 16 (3), 189–207. - Cohen, Simmons, and Smith Cohen, S. B., Simmons, R. J., and Smith, N. A. (2011). Products of weighted logic programs. Theory and Practice of Logic Programming, 11 (2-3), 263–296. - d’Avila Garcez, Besold, de Raedt, Földiak, Hitzler, Icard, Kühnberger, Lamb, Miikkulainen, and Silver d’Avila Garcez, A., Besold, T. R., de Raedt, L., Földiak, P., Hitzler, P., Icard, T., Kühnberger, K.-U., Lamb, L. C., Miikkulainen, R., and Silver, D. L. (2015). Neural-symbolic learning and reasoning: contributions and challenges. In AAAI Spring Symposium - Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches. - d’Avila Garcez, Broda, and Gabbay d’Avila Garcez, A. S., Broda, K. B., and Gabbay, D. M. (2002). Neural-Symbolic Learning Systems. Foundations and Applications. Springer-Verlag, Berlin/Heidelberg. - d’Avila Garcez, Lamb, and Gabbay d’Avila Garcez, A. S., Lamb, L. C., and Gabbay, D. M. (2009). Neural-Symbolic Cognitive Reasoning. Springer-Verlag, Berlin/Heidelberg. - de Penning, d’Avila Garcez, Lamb, and Meyer de Penning, L., d’Avila Garcez, A., Lamb, L. C., and Meyer, J. J. (2011). A neural-symbolic cognitive agent for online learning and reasoning. In IJCAI 2011. - Denecker, Bruynooghe, and Vennekens Denecker, M., Bruynooghe, M., and Vennekens, J. (2012). Approximation fixpoint theory and the semantics of logic and answer set programs. In Erdem, E., Lee, J., Lierler, Y., and Pearce, D. (Eds.), Correct Reasoning, Vol. 7265 of LNCS, pp. 178–194, Heidelberg. Springer-Verlag. - Denecker, Marek, and Truszczyński Denecker, M., Marek, V., and Truszczyński, M. (2004). Ultimate approximation and its application in nonmonotonic knowledge representation systems. Information and Computation, 192 (1), 84–121. - Denecker, Marek, and Truszczyński Denecker, M., Marek, V., and Truszczyński, M. (2000). Approximations, stable operators, well-founded fixpoints and applications in nonmonotonic reasoning. In Minker, J. (Ed.), Logic-Based Artificial Intelligence, Vol. 597 of The Springer International Series in Engineering and Computer Science, pp. 127–144, Norwell, Massachusetts. Kluwer Academic Publishers. - Droste and Gastin Droste, M., and Gastin, P. (2007). Weighted automata and weighted logics. Theoretical Computer Science, 380 (1-2), 69–86. - Eiter, Ianni, and Krennwallner Eiter, T., Ianni, G., and Krennwallner, T. (2009). Answer set programming: a primer. In Reasoning Web. Semantic Technologies for Information Systems, volume 5689 of Lecture Notes in Computer Science, pp. 40–110. Springer, Heidelberg. - Eiter, Ianni, Schindlauer, and Tompits Eiter, T., Ianni, G., Schindlauer, R., and Tompits, H. (2005). A uniform integration of higher-order reasoning and external evaluations in answer-set programming. In Kaelbling, L. P., and Saffiotti, A. (Eds.), IJCAI 2005, pp. 90–96. - Faber, Leone, and Pfeifer Faber, W., Leone, N., and Pfeifer, G. (2004). Recursive aggregates in disjunctive logic programs: semantics and complexity. In Alferes, J., and Leite, J. (Eds.), JELIA 2004, LNCS 3229, pp. 200–212. Springer, Berlin. - Faber, Pfeifer, and Leone Faber, W., Pfeifer, G., and Leone, N. (2011). Semantics and complexity of recursive aggregates in answer set programming. Artificial Intelligence, 175 (1), 278–298. - Feldmann Feldmann, J. A. (2013). The neural binding problem(s). Cognitive Neurodynamics, 7 (1), 1–11. - Fitting Fitting, M. (2002). Fixpoint semantics for logic programming — a survey. Theoretical Computer Science, 278 (1-2), 25–51. - Gelfond and Lifschitz Gelfond, M., and Lifschitz, V. (1991). Classical negation in logic programs and disjunctive databases. New Generation Computing, 9 (3-4), 365–385. - Getoor and Taskar Getoor, L., and Taskar, B. (Eds.). (2007). Introduction to Statistical Relational Learning. MIT Press. - Goodfellow, Bengio, and Courville Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning: Adaptive Computation and Machine Learning. MIT Press, Cambridge USA. - Hitzler, Bader, and d’Avila Garcez Hitzler, P., Bader, S., and d’Avila Garcez, A. (2005). Ontology learning as a use case for neural-symbolic integration. In NeSy 2005, IJCAI 2005. - Hölldobler and Kalinke Hölldobler, S., and Kalinke, Y. (1994). Towards a new massively parallel computational model for logic programming. In ECAI 1994, pp. 68–77. - Hölldobler, Kalinke, and Störr Hölldobler, S., Kalinke, Y., and Störr, H.-P. (1999). Approximating the semantics of logic programs by recurrent neural networks. Applied Intelligence, 11, 45–58. - Lifschitz Lifschitz, V. (2019). Answer Set Programming. Springer Nature Switzerland AG, Cham, Switzerland. - Lloyd Lloyd, J. W. (1987). Foundations of Logic Programming (2 edition). Springer-Verlag, Berlin, Heidelberg. - Maher Maher, M. J. (1988). Equivalences of logic programs. In Minker, J. (Ed.), Foundations of Deductive Databases and Logic Programming, chap. 16, pp. 627–658. Morgan Kaufmann Publishers. - Manhaeve, Dumančić, Kimmig, Demeester, and Raedt Manhaeve, R., Dumančić, S., Kimmig, A., Demeester, T., and Raedt, L. D. (2021). Neural probabilistic logic programming in DeepProbLog. Artificial Intelligence, 298. - McCulloch and Pitts McCulloch, W. S., and Pitts, W. (1943). A logical calculus of the ideas immanent in the nervous activity. Bulletin of Mathematical Biophysics, 5 (4), 115–133. - Muggleton Muggleton, S. (1991). Inductive logic programming. New Generation Computing, 8 (4), 295–318. - Pelov Pelov, N. (2004). Semantics of Logic Programs with Aggregates. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven. - Raedt Raedt, L. D. (2008). Logical and Relational Learning. Cognitive Technologies. Springer-Verlag, Berlin/Heidelberg. - Raedt, Dumančić, Manhaeve, and Marra Raedt, L. D., Dumančić, S., Manhaeve, R., and Marra, G. (2020). From statistical relational to neural-symbolic artificial intelligence. In IJCAI 2020, pp. 4943–4950. - Raedt, Frasconi, Kersting, and Muggleton Raedt, L. D., Frasconi, P., Kersting, K., and Muggleton, S. (Eds.). (2008). Probabilistic Inductive Logic Programming: Theory and Applications. Springer-Verlag. - Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez, Hubert, Baker, Lai, Bolton, Chen, Lillicrap, Hui, Sifre, van den Driessche, Graepel, and Hassabis Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550, 354–359. - Smolensky Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46 (1-2), 159–216. - Stüber and Vogler Stüber, T., and Vogler, H. (2008). Weighted monadic datalog. Theoretical Computer Science, 403 (2-3), 221–238. - Sun Sun, R. (1994). Integrating Rules and Connectionism for Robust Commonsense Reasoning. John Wiley & Sons. - Tarski Tarski, A. (1955). A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics, 5 (2), 285–309. - Turing Turing, A. (1948). Intelligent machinery. Tech. rep.. - Valiant Valiant, L. G. (2008). Knowledge infusion: a pursuit of robustness in artificial intelligence. In FSTTCS, pp. 415–422. - van Emden and Kowalski van Emden, M. H., and Kowalski, R. (1976). The semantics of predicate logic as a programming language. Journal of the ACM, 23 (4), 733–742. - Winters, Marra, Manhaeve, and Raedt Winters, T., Marra, G., Manhaeve, R., and Raedt, L. D. (2022). DeepStochLog: Neural stochastic logic programming. In AAAI 2022, pp. 10090–10100.

Rendering Paper...