# Mathematical Runtime Analysis for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)††thanks: Extended version of a paper that appeared at AAAI 2022 [ZLD22], and that was conducted when the first author was with Southern University of Science and Technology. This version contains all proofs, many of them revised and improved. In particular, the runtime result for the NSGA-II with tournament selection on LeadingOnesTrailingZeroes now holds for N≥4(n+1)𝑁4𝑛1N\geq 4(n+1)italic_N ≥ 4 ( italic_n + 1 ) instead of N≥5(n+1)𝑁5𝑛1N\geq 5(n+1)italic_N ≥ 5 ( italic_n + 1 ). In addition, in all upper bounds we now also regard binary tournament selection as in Deb’s implementation of the NSGA-II (building the N𝑁Nitalic_N tournaments from two permutations of the population). We added tail bounds for the runtime guarantees. The experimental section was extended as well.
**Authors**:
- Weijie Zheng (International Research Institute for Artificial Intelligence)
- Shenzhen, China
- Benjamin Doerr
- Laboratoire d’Informatique (LIX)
- École Polytechnique, CNRS (Institute Polytechnique de Paris)
- Palaiseau, France
> Corresponding author.
Abstract
The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeroes benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.
1 Introduction
Many real-world problems need to optimize multiple conflicting objectives simultaneously, see [ZQL ${}^{+}$ 11] for a discussion of the different areas in which such problems arise. Instead of computing a single good solution, a common approach to such multi-objective optimization problems is to compute a set of interesting solutions so that a decision maker can select the most desirable one from these. Multi-objective evolutionary algorithms (MOEAs) are a natural choice for such problems due to their population-based nature. Such multi-objective evolutionary algorithms (MOEAs) have been successfully used in many real-world applications [ZQL ${}^{+}$ 11].
Unfortunately, the theoretical understanding of MOEAs falls far behind their success in practice, and this discrepancy is even larger than in single-objective evolutionary computation, where the last twenty years have seen some noteworthy progress on the theory side [NW10, AD11, Jan13, DN20]. After some early theoretical works on convergence properties, e.g., [Rud98], the first mathematical runtime analysis of an MOEA was conducted by Laumanns et al. [LTZ ${}^{+}$ 02, LTZ04]. They analyzed the runtime of the simple evolutionary multi-objective optimizer (SEMO), a multi-objective counterpart of the randomized local search heuristic, on the CountingOnesCountingZeroes and LeadingOnesTrailingZeroes benchmarks, which are bi-objective analogues of the classic (single-objective) OneMax and LeadingOnes benchmark. Around the same time, Giel [Gie03] analyzed the global SEMO (GSEMO), the multi-objective counterpart of the $(1+1)$ EA, on the LeadingOnesTrailingZeroes function.
Subsequent theoretical works majorly focused on variants of these algorithms and analyzed their runtime on the CountingOnesCountingZeroes and LeadingOnesTrailingZeroes benchmarks, on variants of them, on new benchmarks, and on combinatorial optimization problems [QYZ13, BQT18, RNNF19, QBF20, BFQY20, DZ21]. We note that the (G)SEMO algorithm keeps all non-dominated solutions in the population and discards all others, which can lead to impractically large population sizes. There are three theory works [BFN08, NSN15, DGN16] on the runtime of a simple hypervolume-based MOEA called ( $\mu+1$ ) simple indicator-based evolutionary algorithm (( $\mu+1$ )-SIBEA), regarding both classic benchmarks and problems designed to highlight particular strengths and weaknesses of this algorithm. As the SEMO and GSEMO, the $(\mu+1)$ SIBEA also creates a single offspring per generation; different from the former, it works with a fixed population size $\mu$ .
Recently, also decomposition-based multi-objective evolutionary algorithms were analyzed [LZZZ16, HZCH19, HZ20]. These algorithms decompose the multi-objective problem into several related single-objective problems and then solve the single-objective problems in a co-evolutionary manner. This direction is fundamentally different from the above works and our research. Since not primarily focused on multi-objective optimization, we also do not discuss further the successful line of works that solve constrained single-objective problems by turning the constraint violation into a second objective, see, e.g., [NW06, FHH ${}^{+}$ 10, NRS11, FN15, QYZ15, QSYT17, QYT ${}^{+}$ 19, Cra19, DDN ${}^{+}$ 20, Cra21].
Unfortunately, most of the algorithms discussed in these theoretical works are far from the MOEAs used in practice. As pointed out in the survey [ZQL ${}^{+}$ 11], the majority of the MOEAs used in research and applications builds on the framework of the non-dominated sorting genetic algorithm II (NSGA-II) [DPAM02]. This algorithm works with a population of fixed size $N$ . It uses a complete order defined by the non-dominated sorting and the crowding distance to compare individuals. In each generation, $N$ offspring are generated from the parent population and the $N$ best individuals (according to the complete order) are selected as the new parent population. This approach is thus substantially different from the (G)SEMO algorithm and hypervolume-based approaches (and naturally completely different from decomposition-based methods), see the features of these algorithms described in the above two paragraphs.
Both the predominance in practice and the fundamentally different working principles ask for a rigorous understanding of the NSGA-II. However, to the best of our knowledge so far no mathematical runtime analysis for the NSGA-II has appeared. By mathematical runtime analysis, we mean the question of how many function evaluations a black-box algorithm takes to achieve a certain goal. The computational complexity of the operators used by the NSGA-II, in particular, how to most efficiently implement the non-dominated sorting routine, is a different question (and one that is well-understood [DPAM02]). We note that the runtime analysis in [COGNS20] considers a (G)SEMO algorithm that uses the crowding distance as one of several diversity measures used in the selection of the single parent creating an offspring, but due to the differences of the basic algorithms, none of the arguments used there appear helpful in the analysis of the NSGA-II.
Our Contributions. This paper conducts the first mathematical runtime analysis of the NSGA-II. We regard the NSGA-II with four different parent selection strategies (choosing each individual as a parent once, choosing parents independently and uniformly at random, and two ways of choosing the parents via binary tournaments) and with two classic mutation operators (one-bit mutation and standard bit-wise mutation), but in this first work without crossover (we remark that crossover is very little understood from the runtime perspective in MOEAs, the only works prior to ours we are aware of are [NT10, QYZ13, HZCH19]). As previous theoretical works, we analyze how long the NSGA-II takes to cover the full Pareto front, that is, we estimate the number of iterations until the parent population contains an individual for each objective value of the Pareto front.
When trying to determine the runtime of the NSGA-II, we first note that the selection mechanism of the NSGA-II may remove all individuals with some fixed objective value on the front. In other words, the fact that a certain objective value on the Pareto front was found in some iteration does not mean that this is not lost in some later iteration. This is one of the substantial differences to the (G)SEMO algorithm. We prove that if the population size $N$ is at least four times larger than the size of the Pareto front, then both for the OneMinMax and the LeadingOnesTrailingZeroes benchmarks, such a loss of Pareto front points cannot occur. With this insight, we then show that each of these eight variants of the NSGA-II computes the full Pareto front of the OneMinMax benchmark in an expected number of $O(n\log n)$ iterations (Theorems 2 and 6) and the front of the LeadingOnesTrailingZeroes benchmark in $O(n^{2})$ iterations (Theorems 8 and 9). When $N=\Theta(n)$ , the corresponding runtime guarantees in terms of fitness evaluations, $O(Nn\log n)=O(n^{2}\log n)$ and $O(Nn^{2})=O(n^{3})$ , have the same asymptotic order as those proven previously for the SEMO, GSEMO, and $(\mu+1)$ SIBEA (when $\mu≥ n+1$ and when $\mu=\Theta(n)$ for the $(\mu+1)$ SIBEA). We note that the benchmarks OneMinMax and LeadingOnesTrailingZeroes are the two most intensively studied benchmarks in the runtime analysis of MOEAs. In this first runtime analysis work on the NSGA-II, we therefore concentrated on these two benchmarks to allow a good comparison with the known performance of other MOEAs.
Using a population size larger than the size of the Pareto front is necessary. We prove that if the population size is equal to the size of the Pareto front, then the NSGA-II (applying one-bit mutation once to each parent) regularly loses points on the Pareto front of OneMinMax. This effect is strong enough so that with high probability for an exponential time each generation of the NSGA-II does not cover a constant fraction of the Pareto front of OneMinMax.
Our short experimental analysis confirms these findings and gives some quantitative estimates for which mathematical analyses are not precise enough. For example, we observe that also with population sizes smaller than what is required for our theoretical analysis (four times the size of the Pareto front), the NSGA-II efficiently covered the Pareto front of the OneMinMax and LeadingOnesTrailingZeroes benchmarks. With suitable population sizes, the NSGA-II beats the GSEMO algorithm on these benchmarks. Complementing our negative result, we observe that the fraction of the Pareto front not covered when using a population size equal to the front size is around 20% for OneMinMax and 40% for LeadingOnesTrailingZeroes. Also without covering the full Pareto front, MOEAs can serve their purpose of proposing to a decision maker a set of interesting solutions. With this perspective, we also regard experimentally the sets of solutions evolved by the NSGA-II when the population size is only equal to the size of the Pareto front. For both benchmarks, we observe that after a moderate runtime, the population contains the two extremal solutions and covers in a very evenly manner the rest of the Pareto front.
Overall, this work shows that the NSGA-II despite its higher complexity (parallel generation of offspring, selection based on non-dominated sorting and crowding distance) admits mathematical runtime analyses in a similar fashion as done before for simpler MOEAs, which hopefully will lead to a deeper understanding of the working principles of this important algorithm.
Subsequent works: We note that the conference version [ZLD22] of this work has already inspired a substantial amount of subsequent research. We briefly describe these results now. In [ZD22a], the performance of the NSGA-II with small population size was analyzed. The main result is that the problem that Pareto front points can be lost can be significantly reduced with a small modification of the selection procedure that was previously analyzed experimentally [KD06], namely to remove individuals in the selection of the next population sequentially, recomputing the crowding distance after each removal. For this setting, an $O(n/N)$ approximation guarantee was proven. In [BQ22], the first runtime analysis of the NSGA-II with crossover was conducted, however, no speed-ups from crossover could be shown. Also, significant speed-ups were shown when using larger tournaments than binary tournaments. In [DQ23a], the performance of the NSGA-II on the multimodal benchmark OneJumpZeroJump benchmark [DZ21] was analyzed. This work shows that the NSGA-II also on this multimodal benchmark has a performance asymptotically at least as good as the GSEMO algorithm (when the population size is at least four times the size of the Pareto front). A matching lower bound for this and our result on OneMinMax was proven in [DQ23b]. This work in particular shows that the NSGA-II in these settings does not profit from population sizes larger than the minimum required population size. Two recent works showed significant performance gains from crossover, one on the OneJumpZeroJump benchmark [DQ23c] and one on an artificial problem [DOSS23b]. The first runtime analysis of the NSGA-II on a combinatorial problem, namely the bi-objective minimum spanning tree problem previously regarded in [Neu07, NW22], was conducted in [CDH ${}^{+}$ 23]. The first runtime analysis of the NSGA-II for noisy optimization appeared in [DOSS23a]. The first runtime analysis of the SMS-EMOA [BNE07] (a variant of the NSGA-II building on the hyper-volume) was conducted in [BZLQ23]. All these works regard bi-objective problems. For the OneMinMax problem in three or more objectives, it was shown in [ZD22b] that the NSGA-II cannot find the full Pareto front in polynomial time and even has difficulties in approximating it. It was shown in [WD23] that the NSGA-III does not experience these problems, at least in three objectives. With this recent development, we are confident to claim that our first mathematical runtime analysis for the NSGA-II has started a fruitful direction of research.
The remainder of the paper is organized as follows. The NSGA-II framework is briefly introduced in Section 2. Sections 3 and 4 separately show our runtime results of the NSGA-II with large enough population size on the OneMinMax and LeadingOnesTrailingZeroes functions. Section 5 proves the exponential runtime of the NSGA with population size equal to the size of the Pareto front. Our experimental results are discussed in Section 6. Section 7 concludes this work.
2 Preliminaries
In this section, we give a brief introduction to multi-objective optimization and to the NSGA-II framework. For the simplicity of presentation, we shall concentrate on two objectives, both of which have to be maximized. A bi-objective objective function on some search space $\Omega$ is a pair $f=(f_{1},f_{2})$ where $f_{i}:\Omega→\mathbb{R}$ . We write $f(x)=(f_{1}(x),f_{2}(x))$ for all $x∈\Omega$ . We shall always assume that we have a bit-string representation, that is, that $S=\{0,1\}^{n}$ for some $n∈\mathbb{N}$ . The challenge in multi-objective optimization is that usually there is no solution that maximizes both $f_{1}$ and $f_{2}$ and thus is at least as good as all other solutions.
More precisely, in bi-objective maximization, we say $x$ weakly dominates $y$ , denoted by $x\succeq y$ , if and only if $f_{1}(x)≥ f_{1}(y)$ and $f_{2}(x)≥ f_{2}(y)$ . We say $x$ strictly dominates $y$ , denoted by $x\succ y$ , if and only if $f_{1}(x)≥ f_{1}(y)$ and $f_{2}(x)≥ f_{2}(y)$ and at least one of the inequalities is strict. We say that a solution is Pareto-optimal if it is not strictly dominated by any other solution. The set of objective values of all Pareto optima is called the Pareto front of $f$ . With this language, the aim in multi-objective optimization is to compute a small set $P$ of Pareto optima such that $f(P)=\{f(x)\mid x∈ P\}$ is the Pareto front or is at least a diverse subset of it. Consider an algorithm $A$ optimizing a multi-objective problem $f$ with Pareto front $M$ . Let $P_{t}$ be the population at iteration $t$ and $G_{t}$ be the number of function evaluations till iteration $t$ , then the time complexity or running time in this paper is the random variable $T_{A}(f)=∈f\{G_{t}\mid f(P_{t})⊃eq M\}$ . Usually, we discuss the expected runtime or the runtime with some probability.
The NSGA-II
When working with a fixed population size, an MOEA must select the next parent population from the combined parent and offspring population by discarding some of these individuals. For this, a complete order on the combined parent and offspring population could be used so that the next parent population is taken in a greedy manner according to this order. Since dominance is only a partial order, the NSGA-II [DPAM02] extends the dominance relation to the following complete order.
In a given population $P⊂eq\{0,1\}^{n}$ , each individual $x$ has both a rank and a crowding distance. The ranks are defined recursively based on the dominance relation. All individuals that are not strictly dominated by another one have rank one. Given that the ranks $1,...,k$ are already defined, the individuals of rank $k+1$ are those among the remaining individuals that are not strictly dominated except by individuals of rank $k$ or smaller. This defines a partition of $P$ into sets $F_{1},F_{2},...$ such that $F_{i}$ contains all individuals with rank $i$ . As shown in [DPAM02], this partition can be computed more efficiently than what the above recursive description suggests, namely in quadratic time, see Algorithm 1 for details. It is clear that individuals with lower rank are more interesting, so when comparing two individuals of different ranks, the one with lower rank is preferred.
To compare individuals in the same rank class $F_{i}$ , the crowding distance of these individuals (in $F_{i}$ ) is computed, and the individual with larger distance is preferred. Ties are broken randomly.
1: Input: $S=\{S_{1},...,S_{|S|}\}$ : the set of individuals
2: Output: $F_{1},F_{2},...$
3: for $i=1,...,|S|$ do
4: $\operatorname{ND}(S_{i})=0$ % number of individuals strictly dominating $S_{i}$
5: $\operatorname{SD}(S_{i})=\emptyset$ % set of individuals strictly dominated by $S_{i}$
6: end for
7: for $i=1,...,|S|$ do % compute $\operatorname{ND}$ and $\operatorname{SD}$
8: for $j=1,...,|S|$ do
9: if $S_{i}\prec S_{j}$ then
10: $\operatorname{ND}(S_{i})=\operatorname{ND}(S_{i})+1$
11: $\operatorname{SD}(S_{j})=\operatorname{SD}(S_{j})\cup\{S_{i}\}$
12: end if
13: end for
14: end for
15: $F_{1}=\{S_{i}\mid\operatorname{ND}(S_{i})=0,i=1,2,...,|S|\}$
16: $k=1$
17: while $F_{k}≠\emptyset$ do
18: $F_{k+1}=\emptyset$
19: for any $s∈ F_{k}$ do % discount $F_{k}$ from $\operatorname{ND}$ and $\operatorname{SD}$
20: for any $s^{\prime}∈\operatorname{SD}(s)$ do
21: $\operatorname{ND}(s^{\prime})=\operatorname{ND}(s^{\prime})-1$
22: if $\operatorname{ND}(s^{\prime})=0$ then
23: $F_{k+1}=F_{k+1}\cup\{s^{\prime}\}$
24: end if
25: end for
26: end for
27: $k=k+1$
28: end while
Algorithm 1 fast-non-dominated-sort(S)
Algorithm 2 shows how the crowding distance in a given set $S$ is computed. The crowding distance of some $x∈ S$ is the sum of the crowding distances $x$ has with respect to each objective function $f_{i}$ . For a given $f_{i}$ , the individuals in $S$ are sorted in order of ascending $f_{i}$ value (for equal values, a tie-breaking mechanism is needed, but we shall not make any assumption on this, that is, our mathematical results are valid regardless of how these ties are broken). The first individual and the last individual in the sorted list have an infinite crowding distance. For other individuals in the sorted list, their crowding distance with respect to $f_{i}$ is the difference of the objective values of its left and right neighbor in the sorted list, normalized by the difference between the first and the last.
1 Input: $S=\{S_{1},...,S_{|S|}\}$ : the set of individuals
Output: $\operatorname{cDis}(S)=(\operatorname{cDis}(S_{1}),...,\operatorname{cDis}(S%
_{|S|}))$ , the vector of crowding distances of the individuals in $S$
1: $\operatorname{cDis}(S)=(0,...,0)$
2: for each objective function $f_{i}$ do
3: Sort $S$ in order of ascending $f_{i}$ value: $S_{i.1},...,S_{i.{|S|}}$
4: $\operatorname{cDis}(S_{i.1})=+∞,\operatorname{cDis}(S_{i.{|S|}})=+∞$
5: for $j=2,...,|S|-1$ do
6: $\operatorname{cDis}(S_{i.j})=\operatorname{cDis}(S_{i.j})+\frac{f_{i}(S_{i.{j+%
1}})-f_{i}(S_{i.{j-1}})}{f_{i}(S_{i.{|S|}})-f_{i}(S_{i.1})}$
7: end for
8: end for
Algorithm 2 crowding-distance( $S$ )
The whole NSGA-II framework is shown in Algorithm 3. After the random initialization of the population of size $N$ , the main loop starts with the generation of $N$ offspring (the precise way how this is done is not part of the NSGA-II framework and is mostly left as a design choice to the algorithm user in [DPAM02], although it is suggested to select parents via binary tournaments based the total order described above). Then the total order based on rank and crowding distance is used to remove the worst $N$ individuals in the union of the parent and offspring population. The remaining individuals form the parent population of the next iteration.
1: Uniformly at random generate the initial population $P_{0}=\{x_{1},x_{2},...,x_{N}\}$ with $x_{i}∈\{0,1\}^{n},i=1,2,...,N.$
2: for $t=0,1,2,...$ do
3: Generate the offspring population $Q_{t}$ with size $N$
4: Use Algorithm 1 to divide $R_{t}=P_{t}\cup Q_{t}$ into $F_{1},F_{2},...$
5: Find $i^{*}≥ 1$ such that $\sum_{i=1}^{i^{*}-1}|F_{i}|<N$ and $\sum_{i=1}^{i^{*}}|F_{i}|≥ N$
6: Use Algorithm 2 to separately calculate the crowding distance of each individual in $F_{1},...,F_{i^{*}}$
7: Let $\tilde{F}_{i^{*}}$ be the $N-\sum_{i=0}^{i^{*}-1}|F_{i}|$ individuals in $F_{i^{*}}$ with largest crowding distance, chosen at random in case of a tie
8: $P_{t+1}=\mathopen{}\mathclose{{}\left(\bigcup_{i=1}^{i^{*}-1}F_{i}}\right)\cup%
\tilde{F}_{i^{*}}$
9: end for
Algorithm 3 NSGA-II
3 Runtime of the NSGA-II on OneMinMax
In this section, we analyze the runtime of the NSGA-II on the OneMinMax benchmark proposed first by Giel and Lehre [GL10] as a bi-objective analogue of the classic OneMax benchmark. It is the function $f:\{0,1\}^{n}→\mathbb{N}×\mathbb{N}$ defined by
$$
f(x)=\big{(}f_{1}(x),f_{2}(x)\big{)}=\big{(}n-\sum_{i=1}^{n}x_{i},\sum_{i=1}^{%
n}x_{i}\big{)}
$$
for all $x=(x_{1},...,x_{n})∈\{0,1\}^{n}$ . The aim is to maximize both objectives in $f$ . We immediately note that for this benchmark problem, any solution lies on the Pareto front. It is hence a good example to study how an MOEA explores the Pareto front when already some Pareto optima were found.
Giel and Lehre [GL10] showed that the simple SEMO algorithm finds the full Pareto front of OneMinMax in $O(n^{2}\log n)$ iterations and fitness evaluations. Their proof can easily be extended to the GSEMO algorithm. For the SEMO, a (matching) lower bound of $\Omega(n^{2}\log n)$ was shown in [COGNS20]. An upper bound of $O(\mu n\log n)$ was shown for the hypervolume-based $(\mu+1)$ SIBEA with $\mu≥ n+1$ [NSN15]. When the SEMO or GSEMO is enriched with a diversity mechanism (strong enough so that solutions that can create a new point on the Pareto front are chosen with constant probability), then the runtime of these algorithms reduces to $O(n\log n)$ [COGNS20].
In contrast to the SEMO and GSEMO as well as the $(\mu+1)$ SIBEA with population size $\mu≥ n+1$ , the NSGA-II can lose all solutions covering a point of the Pareto front. In the following lemma, central to our runtime analyses on OneMinMax, we show that this cannot happen when the population size is large enough, namely at least four times the size of the Pareto front. Besides, we shall use $[i..j],i≤ j$ , to denote the set $\{i,i+1,...,j\}$ in this paper.
**Lemma 1**
*Consider one iteration of the NSGA-II with population size $N≥ 4(n+1)$ optimizing the OneMinMax function. Assume that in some iteration $t$ the combined parent and offspring population $R_{t}=P_{t}\cup Q_{t}$ contains a solution $x$ with objective value $(k,n-k)$ for some $k∈[0..n]$ . Then also the next parent population $P_{t+1}$ contains an individual $y$ with $f(y)=(k,n-k)$ .*
* Proof*
It is not difficult to see that for any $x,y∈\{0,1\}^{n}$ , we have $x\nprec y$ and $y\nprec x$ . Hence, all individuals in $R_{t}$ have rank one in the non-dominated sorting of $R_{t}$ , that is, after Step 4 in Algorithm 3. Thus, in the notation of the algorithm, $F_{1}=R_{t}$ and $i^{*}=1$ . We calculate the crowding distance of each individual of $R_{t}$ . Let $k∈[0..n].$ Assume that there is at least one individual $x∈ R_{t}$ such that $f(x)=(k,n-k)$ . We recall from Algorithm 2 that $S_{1.1},...,S_{1.{2N}}$ and $S_{2.1},...,S_{2.{2N}}$ are the sorted populations based on $f_{1}$ and $f_{2}$ respectively. Since the individuals with the same objective value will continuously appear in the sorted list w.r.t. this objective value, we know that there exist $a≤ b$ and $a^{\prime}≤ b^{\prime}$ such that $[a..b]=\{i\mid f_{1}(S_{1.i})=k\}$ and $[a^{\prime}..b^{\prime}]=\{i\mid f_{2}(S_{2.i})=n-k\}$ . From the crowding distance calculation in Algorithm 2, we know that $\operatorname{cDis}(S_{1.a})≥\frac{f_{1}\mathopen{}\mathclose{{}\left(S_{1.%
{a+1}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a-1}}}\right)}{f_{1}%
\mathopen{}\mathclose{{}\left(S_{1.{|S|}}}\right)-f_{1}\mathopen{}\mathclose{{%
}\left(S_{1.1}}\right)}≥\frac{f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a}}}%
\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a-1}}}\right)}{f_{1}\mathopen%
{}\mathclose{{}\left(S_{1.{|S|}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_%
{1.1}}\right)}>0$ since $f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a}}}\right)-f_{1}\mathopen{}%
\mathclose{{}\left(S_{a-1}}\right)>0$ by the definition of $a$ . Similarly, we have $\operatorname{cDis}(S_{1.b})>0$ , $\operatorname{cDis}(S_{2.a^{\prime}})>0$ and $\operatorname{cDis}(S_{2.b^{\prime}})>0$ . For all $j∈[a+1..b-1]$ with $S_{1.j}∉\{S_{2.a^{\prime}},S_{2.b^{\prime}}\}$ , we know $f_{1}(S_{1.{j-1}})=f_{1}(S_{1.{j+1}})=k$ and $f_{2}(S_{1.{j-1}})=f_{2}(S_{1.{j+1}})=n-k$ from the definitions of $a,b,a^{\prime}$ and $b^{\prime}$ . Hence, we have $\operatorname{cDis}(S_{1.j})=\frac{f_{1}(S_{1.{j+1}})-f_{1}(S_{1.{j-1})}}{f_{1%
}\mathopen{}\mathclose{{}\left(S_{1.{|S|}}}\right)-f_{1}\mathopen{}\mathclose{%
{}\left(S_{1.1}}\right)}+\frac{f_{2}(S_{2.{j^{\prime}+1}})-f_{2}(S_{2.{j^{%
\prime}-1}})}{f_{2}\mathopen{}\mathclose{{}\left(S_{2.{|S|}}}\right)-f_{2}%
\mathopen{}\mathclose{{}\left(S_{2.1}}\right)}=0$ . This shows that the individuals with objective value $(k,n-k)$ and positive crowding distance are exactly $S_{1.a}$ , $S_{1.b}$ , $S_{2.a^{\prime}}$ and $S_{2.b^{\prime}}$ . Hence, for each $(k,n-k)$ , there are at most four solutions $x$ with $f(x)=(k,n-k)$ and $\operatorname{cDis}(x)>0$ . Noting that the Pareto front size for OneMinMax is $n+1$ , the number of individuals with positive crowding distance is at most $4(n+1)≤ N$ . Since Step 7 in Algorithm 3 keeps $N$ individuals with largest crowding distance, we know that all individuals with positive crowding distance will be kept. Thus, $y=S_{1.a}∈ P_{t+1}$ , proving our claim. ∎
Since Lemma 1 ensures that objective values on the Pareto front will not be lost in the future, we can estimate the runtime of the NSGA-II via the sum of the waiting times for finding a new Pareto solution. Apart from the fact that the NSGA-II generates $N$ solutions per iteration (which requires some non-trivial arguments in the case of binary tournament selection), this analysis resembles the known analysis of the simpler SEMO algorithm [GL10]. For $N=O(n)$ , we also obtain the same runtime estimate (in terms of fitness evaluations).
We start with the easier case that parents are chosen uniformly at random or that each parent creates one offspring.
**Theorem 2**
*Consider optimizing the OneMinMax function via the NSGA-II with one of the following four ways to generate the offspring population in Step 3 in Algorithm 3, namely applying one-bit mutation or standard bit-wise mutation once to each parent or $N$ times choosing a parent uniformly at random and applying one-bit mutation or standard bit-wise mutation to it. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $\frac{2e^{2}}{e-1}n(\ln n+1)$ iterations and at most $\frac{2e^{2}}{e-1}Nn(\ln n+1)$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front, then $\Pr[T≥\tfrac{2e^{2}(1+\delta)}{e-1}n\ln n]≤ 2n^{-\delta}$ holds for any $\delta≥ 0$ .*
* Proof*
Let $x∈ P_{t}$ and $f(x)=(k,n-k)$ for some $k∈[0..n]$ . Let $p$ denote the probability that $x$ is chosen as parent to be mutated. Conditional on that, let $p_{k}^{+}$ denote the probability of generating from $x$ an offspring $y_{+}$ with $f(y_{+})=(k+1,n-k-1)$ (when $k<n$ ) and $p_{k}^{-}$ denote the probability of generating from $x$ an offspring $y_{-}$ with $f(y_{-})=(k-1,n-k+1)$ (when $k>0$ ). Consequently, the probability that $R_{t}$ contains an individual $y_{+}$ with objective value $(k+1,n-k-1)$ is at least $pp_{k}^{+}$ , and the probability that $R_{t}$ contains an individual $y_{-}$ with objective value $(k-1,n-k+1)$ is at least $pp_{k}^{-}$ . Since Lemma 1 implies that any existing OneMinMax objective value will be kept in the iterations afterwards, we know that the expected number of iterations to obtain $y_{+}$ (resp. $y_{-}$ ) once $x$ is in the population is at most $\frac{1}{pp_{k}^{+}}$ (resp. $\frac{1}{pp_{k}^{-}}$ ). Assume that the initial population of the Algorithm 3 contains an $x$ with $f(x)=(k_{0},n-k_{0})$ . Then the expected number of iterations to obtain individuals containing objective values $(k_{0},n-k_{0}),(k_{0}+1,n-k_{0}-1),...,(n,0)$ is at most $\sum_{i=k_{0}}^{n-1}\frac{1}{pp_{i}^{+}}$ . Similarly, the expected number of iterations to obtain individuals containing objective values $(k_{0}-1,n-k_{0}+1),(k_{0}-2,n-k_{0}-2),...,(0,n)$ is at most $\sum_{i=1}^{k_{0}}\frac{1}{pp_{i}^{-}}$ . Consequently, the expected number of iterations to cover the whole Pareto front is at most $\sum_{i=k_{0}}^{n-1}\frac{1}{pp_{i}^{+}}+\sum_{i=1}^{k_{0}}\frac{1}{pp_{i}^{-}}$ . Now we calculate $p$ for the different ways of selecting parents and $p_{k}^{+}$ and $p_{k}^{-}$ for the different mutation operations. If we apply mutation once to each parent in $P_{t}$ , we apparently have $p=1$ . If we choose the parents independently at random from $P_{t}$ , then $p=1-(1-\frac{1}{N})^{N}≥ 1-\frac{1}{e}$ . For one-bit mutation, we have $p_{k}^{+}=\frac{n-k}{n}$ and $p_{k}^{-}=\frac{k}{n}$ . For standard bit-wise mutation, we have $p_{k}^{+}≥\frac{n-k}{n}(1-\frac{1}{n})^{n-1}≥\frac{n-k}{en}$ and $p_{k}^{-}≥\frac{k}{n}(1-\frac{1}{n})^{n-1}≥\frac{k}{en}$ . From these estimates and the fact that the Harmonic number $H_{n}=\sum_{i=1}^{n}\frac{1}{i}$ satisfies $H_{n}<\ln n+1$ , it is not difficult to see that all cases lead to an expected runtime of at most
| | $\displaystyle\sum_{i=0}^{n-1}$ | $\displaystyle{}\frac{1}{pp_{i}^{+}}+\sum_{i=1}^{n}\frac{1}{pp_{i}^{-}}≤\sum%
_{i=0}^{n-1}\frac{1}{(1-\frac{1}{e})\frac{n-i}{en}}+\sum_{i=1}^{n}\frac{1}{(1-%
\frac{1}{e})\frac{i}{en}}$ | |
| --- | --- | --- | --- |
iterations, hence at most $\frac{2e^{2}}{e-1}Nn(\ln n+1)$ fitness evaluations. Now we will prove the concentration result. Let $X^{+}_{k}$ and $X^{-}_{k}$ be independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{n-k}{en}$ and $(1-\frac{1}{e})\frac{k}{en}$ , respectively. Let $T$ be the number of iterations to cover the full Pareto front, and let $Z^{+}=\sum_{k=0}^{n-1}X^{+}_{k}$ and $Z^{-}=\sum_{k=1}^{n}X^{-}_{k}$ . Then from the above discussion, we know that $Z:=Z^{+}+Z^{-}$ stochastically dominates $T$ (see [Doe19] for a detailed discussion of how to use stochastic domination arguments in the analysis of evolutionary algorithms). Let the success probabilities of $X^{+}_{n-1},X^{+}_{n-2},...,X^{+}_{0}$ be $q_{1}^{+},...,q_{n}^{+}$ , and let $q_{1}^{-},...,q_{n}^{-}$ denote the success probabilities of $X^{-}_{1},X^{-}_{2},...,X^{-}_{n}$ . Then we have $q_{i}^{+}≥(1-\frac{1}{e})\frac{1}{e}\frac{i}{n}$ and $q_{i}^{-}≥(1-\frac{1}{e})\frac{1}{e}\frac{i}{n}$ for all $i∈[1..n]$ . From a Chernoff bound for a sum of such geometric random variables ([DD18, Lemma 4], also found in [Doe20, Theorem 1.10.35]), we have that for any $\delta≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}\left[Z^{+}\geq(1+\delta)\frac{e^{2}}{e-1}n\ln n}%
\right]\leq n^{-\delta}
$$
and
$$
\Pr\mathopen{}\mathclose{{}\left[Z^{-}\geq(1+\delta)\frac{e^{2}}{e-1}n\ln n}%
\right]\leq n^{-\delta}.
$$
Hence, we have
$$
\Pr\mathopen{}\mathclose{{}\left[Z\geq(1+\delta)\frac{2e^{2}}{e-1}n\ln n}%
\right]\leq 2n^{-\delta}.
$$
Since $Z$ stochastically dominates $T$ , we obtain $\Pr[T≥\tfrac{2e^{2}(1+\delta)}{e-1}n\ln n]≤ 2n^{-\delta}.$ ∎
We now analyze the performance of the NSGA-II on OneMinMax when selecting the parents via binary tournaments, which is the selection method suggested in the original NSGA-II paper [DPAM02]. We regard two variants of this selection method. The most natural one, discussed for example in [GD90], is to conduct $N$ independent tournaments. Here the offspring population $Q_{t}$ is generated by $N$ times independently performing the following sequence of actions: (i) Select two different individuals $x^{\prime},x^{\prime\prime}$ uniformly at random from $P_{t}$ . (ii) Select $x$ as the better of these two, that is, the one with smaller rank in $P_{t}$ or, in case of equality, the one with larger crowding distance in $P_{t}$ (breaking ties randomly). (iii) Generate an offspring by mutating $x$ . We note that in some definitions of tournament selection the better individual in a tournament is chosen as winner only with some probability $p>0.5$ , but we do not regard this case any further. We note though that all our mathematical results would remain true in this setting. We also note that sometimes the participants of a tournament are selected “with replacement”. Again, this would not change our results, but we do not discuss this case any further.
A closer look in Deb’s implementation of the NSGA-II (see Revision 1.1.6 available at [Deb]), and we are thankful for Maxim Buzdalov (Aberystwyth University) for pointing this out to us, shows that here a different way of selecting the parents is used. In this two-permutation tournament selection scheme, two random permutations $\pi_{1}$ and $\pi_{2}$ of $P_{t}$ are generated and then a binary tournament is conducted between $\pi_{j}(2i-1)$ and $\pi_{j}(2i)$ for all $i∈[1..\frac{N}{2}]$ and $j∈\{1,2\}$ (we assume here that $N$ is even). Of course, this is nothing else than saying that twice a random matching on $P_{t}$ is generated and the end vertices of each matching edge conduct a tournament. Different from independent tournaments, this selection operator cannot be implemented in parallel. On the positive side, it ensures that each individual takes part in exactly two tournaments, so it treats the individuals in a fairer manner. Also, if there is a unique best individual, then this will surely be selected. As above, in our setting where we do not use crossover, each tournament winner is mutated and these $N$ individuals form the offspring population $Q_{t}$ .
In the case of binary tournament selection, the analysis is slightly more involved since we need to argue that a desired parent is chosen for mutation with constant probability in one iteration. This is easy to see for a parent at the boundary of the front as its crowding distance is infinite, but less obvious for parents not at the boundary. We note that we need to be able to select such parents since we cannot ensure that the population intersects the Pareto front in a contiguous interval (as can be seen, e.g., from the random initial population). We solve this difficulty in the following three lemmas.
We use the following notation. Consider some iteration $t$ . For $i=1,2$ , let
| | $\displaystyle v_{i}^{\min}$ | $\displaystyle=\min\{f_{i}(x)\mid x∈ R_{t}\},$ | |
| --- | --- | --- | --- |
denote the extremal objective values in the combined parent and offspring population. Let $V=f(R_{t})=\{(f_{1}(x),f_{2}(x))\mid x∈ R_{t}\}$ denote the set of objective values of the solutions in the combined parent and offspring population $R_{t}$ . We define the set of values such that also the right (left) neighbor on the Pareto front is covered by
| | $\displaystyle V_{\operatorname{in}}^{+}={}$ | $\displaystyle\{(v_{1},v_{2})∈ V\mid∃ y∈ R_{t}:(f_{1}(y),f_{2}(y))=(v%
_{1}+1,v_{2}-1)\},$ | |
| --- | --- | --- | --- |
**Lemma 3**
*For any $(v_{1},v_{2})∈ V\setminus(V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}%
}^{-})$ , there is at least one individual $x∈ R_{t}$ with $f(x)=(v_{1},v_{2})$ and $\operatorname{cDis}(x)≥\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ .*
* Proof*
Let $(v_{1},v_{2})∈ V\setminus(V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}%
}^{-})$ , let $S_{1.1},...,S_{1.{2N}}$ be the sorting of $R_{t}$ according to $f_{1}$ , and let $[a..b]=\{i∈[1..2N]\mid f_{1}(S_{1.i})=v_{1}\}$ . If $v_{1}∈\{v_{1}^{\max},v_{1}^{\min}\}$ , then by definition of the crowding distance, one individual in $f^{-1}((v_{1},v_{2}))$ has an infinite crowding distance. Otherwise, if $(v_{1},v_{2})∈ V\setminus V_{\operatorname{in}}^{-}$ , then we have $f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a+1}}}\right)-f_{1}\mathopen{}%
\mathclose{{}\left(S_{1.{a-1}}}\right)≥ f_{1}\mathopen{}\mathclose{{}\left(%
S_{1.{a}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a-1}}}\right)≥ 2$ and thus $\operatorname{cDis}(S_{1.a})≥\frac{f_{1}\mathopen{}\mathclose{{}\left(S_{1.%
{a+1}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{a-1}}}\right)}{v_{1}^{%
\max}-v_{1}^{\min}}≥\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ . Similarly, if $(v_{1},v_{2})∈ V\setminus V_{\operatorname{in}}^{+}$ , then $f_{1}\mathopen{}\mathclose{{}\left(S_{1.{b+1}}}\right)-f_{1}\mathopen{}%
\mathclose{{}\left(S_{1.{b-1}}}\right)≥ f_{1}\mathopen{}\mathclose{{}\left(%
S_{1.{b+1}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{b}}}\right)≥ 2$ and $\operatorname{cDis}(S_{1.b})≥\frac{f_{1}\mathopen{}\mathclose{{}\left(S_{1.%
{b+1}}}\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{b-1}}}\right)}{v_{1}^{%
\max}-v_{1}^{\min}}≥\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ . ∎
**Lemma 4**
*For any $(v_{1},v_{2})∈ V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}$ , there are at most two individuals in $R_{t}$ with objective value $(v_{1},v_{2})$ and crowding distance at least $\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ .*
* Proof*
Let $(v_{1},v_{2})∈ V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}$ , $[a..b]=\{i∈[1..2N]\mid f_{1}(S_{1.i})=v_{1}\}$ , and $[a^{\prime}..b^{\prime}]=\{j∈[1..2N]\mid f_{2}(S_{2.j})=v_{2}\}$ . Let $C=\{S_{1.a},S_{1.b}\}\cup\{S_{2.a^{\prime}},S_{2.b^{\prime}}\}$ . If $(R_{t}\cap f^{-1}((v_{1},v_{2})))\setminus C$ is not empty, then for any $x∈(R_{t}\cap f^{-1}((v_{1},v_{2})))\setminus C$ , there exist $i∈[a+1..b-1]$ and $j∈[a^{\prime}+1..b^{\prime}-1]$ such that $x=S_{1.i}=S_{2.j}$ . Hence $\operatorname{cDis}(x)=\frac{f_{1}\mathopen{}\mathclose{{}\left(S_{1.{i+1}}}%
\right)-f_{1}\mathopen{}\mathclose{{}\left(S_{1.{i-1}}}\right)}{v_{1}^{\max}-v%
_{1}^{\min}}+\frac{f_{2}\mathopen{}\mathclose{{}\left(S_{2.{j+1}}}\right)-f_{2%
}\mathopen{}\mathclose{{}\left(S_{2.{j-1}}}\right)}{v_{2}^{\max}-v_{2}^{\min}}=0$ . We thus know that any individual with crowding distance at least $\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ lies in $C$ . For any $x∈ C\setminus(\{S_{1.a},S_{1.b}\}\cap\{S_{2.a^{\prime}},S_{2.b^{\prime}}\})$ , we have $\operatorname{cDis}(x)=\frac{1}{v_{1}^{\max}-v_{1}^{\min}}$ or $\operatorname{cDis}(x)=\frac{1}{v_{2}^{\max}-v_{2}^{\min}}$ . We note that $v_{1}^{\max}-v_{1}^{\min}=v_{2}^{\max}-v_{2}^{\min}$ since $v_{1}^{\max}=n-v_{2}^{\min}$ and $v_{1}^{\min}=n-v_{2}^{\max}$ . Hence $\operatorname{cDis}(x)<\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ . Let now $x∈\{S_{1.a},S_{1.b}\}\cap\{S_{2.a^{\prime}},S_{2.b^{\prime}}\}$ . If $|C|=1$ , then $\operatorname{cDis}(x)=\frac{2}{v_{1}^{\max}-v_{1}^{\min}}+\frac{2}{v_{2}^{%
\max}-v_{2}^{\min}}=\frac{4}{v_{1}^{\max}-v_{1}^{\min}}$ . Otherwise, $\operatorname{cDis}(x)=\frac{1}{v_{1}^{\max}-v_{1}^{\min}}+\frac{1}{v_{2}^{%
\max}-v_{2}^{\min}}=\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ . Therefore, the number of individuals in $R_{t}\cap f^{-1}((v_{1},v_{2}))$ with crowding distance at least $\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ is $|\{S_{1.a},S_{1.b}\}\cap\{S_{2.a^{\prime}},S_{2.b^{\prime}}\}|$ , which is at most $2$ . ∎
**Lemma 5**
*Let $N≥ 4(n+1)$ . Let $P$ be a parent population in a run of the NSGA-II using independent or two-permutation binary tournament selection optimizing OneMinMax. Let $v=(v_{1},n-v_{1})∉ f(P)$ be a point on the Pareto front that is not covered by $P$ , but a neighbor of $v$ on the front is covered by $P$ , that is, there is a $y∈ P$ such that $\|f(y)-v\|_{∞}=1$ . In the case of independent tournaments, each of the $N$ tournaments with probability at least $\frac{1}{N}(\frac{1}{6}-\frac{3.5}{N-1})$ selects an individual $x$ with $f(x)=f(y)$ . In the case of two-permutation selection, there are two stochastically independent tournaments each of which with probability at least $\frac{1}{6}-\frac{2.5}{N-1}$ selects an individual $x$ with $f(x)=f(y)$ .*
* Proof*
By Lemma 3, there is an individual $x^{\prime}∈ P$ with $f(x^{\prime})=f(y)$ and crowding distance at least $\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ . We estimate the probability that $x^{\prime}$ is the winner of a tournament. We start with the case of independent tournaments and we regard a particular one of these. With probability $\frac{1}{N}$ , the individual $x^{\prime}$ is chosen as the first participant of the tournament. We condition on this and regard the second individual $x^{\prime\prime}$ of the tournament, which is chosen uniformly at random from the remaining $N-1$ individuals. We shall argue that with good probability, it has a smaller crowding distance, and thus loses the tournament. To this aim, we estimate the number of element $z∈ P$ that have a crowding distance of $\frac{2}{v_{1}^{\max}-v_{1}^{\min}}$ or more (“large crowding distance”). We treat the individuals differently according to their objective value $w=f(z)$ . If $w∈ V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}$ , then by Lemma 4 at most two individuals with this objective value have a large crowding distance. For other objective values $w$ , we use the general estimate from the proof of Lemma 1 that at most four individuals have this objective value and positive crowding distance. This gives an upper bound of $2|V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}|+4(|f(P)|-|V_{%
\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}|)=2|f(P)|+2|f(P)\setminus%
(V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-})|$ individuals with large crowding distance. We note that out of each consecutive three elements $(w_{1},n-w_{1}),(w_{1}+1,n-w_{1}-1),(w_{1}+2,n-w_{1}-2)$ of the Pareto front, at most two can be in $f(P)\setminus(V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-})$ – if all three were in $f(P)$ , then the middle one would necessarily be in $V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-}$ . Consequently, $|f(P)\setminus(V_{\operatorname{in}}^{+}\cap V_{\operatorname{in}}^{-})|≤ 2%
\lceil\frac{n+1}{3}\rceil$ . With this estimate, our upper bound on the number of individuals with large crowding distance becomes at most $2(n+1)+4\lceil\frac{n+1}{3}\rceil≤\frac{10}{3}(n+1)+\frac{8}{3}$ , and then excluding the first-chosen individual $x^{\prime}$ , we know that the upper bound estimate for the probability that $x^{\prime\prime}$ has large crowding distance becomes $\frac{1}{N-1}(\frac{10}{3}(n+1)+\frac{8}{3}-1)=\frac{1}{N-1}(\frac{10}{3}(n+%
\frac{3}{4})+\frac{10}{3}\frac{1}{4}+\frac{5}{3})≤\frac{5}{6}+\frac{1}{N-1}%
(\frac{10}{3}\frac{1}{4}+\frac{5}{3})$ . Consequently, the probability that $x^{\prime}$ is selected as first participant of the tournament and it wins the tournament is at least
$$
\frac{1}{N}\mathopen{}\mathclose{{}\left(\frac{1}{6}-\frac{2.5}{N-1}}\right).
$$
For the case of two-permutation tournament selection, we note that there are two independent tournaments (stemming from different permutations) in which $x^{\prime}$ participates. In both, the partner $x^{\prime\prime}$ of $x^{\prime}$ is distributed uniformly in $P_{t}\setminus\{x^{\prime}\}$ . Hence the above arguments can be applied and we see that with probability at least $\frac{1}{6}-\frac{2.5}{N-1}$ , the second participant loses against $x^{\prime}$ . ∎
With Lemma 5, we can now easily argue that in a given iteration $t$ , we have a constant probability of choosing at least once a parent that is a neighbor of an empty spot on the Pareto front. This allows to re-use the main arguments of the simpler analyses for the cases that the parents were choosing randomly or that each parent creates one offspring. We note that in the following result, as in any other result in this work, we did not try to optimize the leading constant.
**Theorem 6**
*Let $n≥ 4$ . Consider optimizing the OneMinMax function via the NSGA-II which creates the offspring population by selecting parents via independent binary tournaments or via the two-permutation approach and applying one-bit or standard bit-wise mutation to these. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $\tfrac{200e}{3}n(\ln n+1)$ iterations and at most $\tfrac{200e}{3}Nn(\ln n+1)$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front. Then we further have that $\Pr[T≥\tfrac{200e}{3}(1+\delta)n\ln n]≤ 2n^{-\delta}$ holds for any $\delta≥ 0$ .*
* Proof*
Thanks to Lemma 5, we can essentially follow the arguments of the proof of Theorem 2. Let $y∈ P_{t}$ be such that $f(y)$ is a neighbor of a point on the Pareto front that is not in $f(P_{t})$ . For independent tournaments, by Lemma 5 a single tournament will select a parent $x$ with $f(x)=f(y)$ , that is, also a neighbor of this uncovered point, with probability at least $\frac{1}{N}(\frac{1}{6}-\frac{2.5}{N-1})$ . Hence the probability that at least one such parent is selected in this iteration is
| | $\displaystyle p=1-\mathopen{}\mathclose{{}\left(1-\tfrac{1}{N}(\tfrac{1}{6}-%
\tfrac{2.5}{N-1})}\right)^{N}≥ 1-\exp\mathopen{}\mathclose{{}\left(-\tfrac{%
1}{6}+\tfrac{2.5}{N-1}}\right)>0.03,$ | |
| --- | --- | --- |
where the last inequality uses $N≥ 20$ from $n≥ 4$ and $N≥ 4(n+1)$ . For two-permutation tournament selection, again by Lemma 5, with probability at least $p=1-(1-(\frac{1}{6}-\frac{2.5}{N-1}))^{2}>0.03$ (since $N≥ 20$ ) a parent $x$ with $f(x)=f(y)$ is selected. With these values of $p$ , the proof of Theorem 2 extends to the two cases of tournament selection, and we know that the expected iterations to cover the full Pareto front is at most
| | $\displaystyle\sum_{i=0}^{n-1}$ | $\displaystyle{}\frac{1}{pp_{i}^{+}}+\sum_{i=1}^{n}\frac{1}{pp_{i}^{-}}≤\sum%
_{i=0}^{n-1}\frac{1}{0.03\frac{n-i}{en}}+\sum_{i=1}^{n}\frac{1}{0.03\frac{i}{%
en}}$ | |
| --- | --- | --- | --- |
We now discuss the concentration result. With the same arguments as in the proof of Theorem 2, but using the success probabilities $0.03\frac{n-k}{en}$ and $0.03\frac{k}{en}$ for $X^{+}_{k}$ and $X^{-}_{k}$ respectively and estimating $q_{i}≥\frac{0.03}{e}\frac{i}{n}$ , we obtain that for any $\delta≥ 0$ , we have $\Pr[T≥\tfrac{200e}{3}(1+\delta)n\ln n]≤ 2n^{-\delta}.$ ∎
4 Runtime of the NSGA-II on LeadingOnesTrailingZeroes
We proceed with analyzing the runtime of the NSGA-II on the benchmark LeadingOnesTrailingZeroes proposed by Laumanns, Thiele, and Zitzler [LTZ04]. This is the function $f:\{0,1\}^{n}→\mathbb{N}×\mathbb{N}$ defined by
$$
f(x)=\big{(}f_{1}(x),f_{2}(x)\big{)}=\big{(}\sum_{i=1}^{n}\prod_{j=1}^{i}x_{j}%
,\sum_{i=1}^{n}\prod_{j=i}^{n}(1-x_{j})\big{)}
$$
for all $x∈\{0,1\}^{n}$ . Here the first objective is the so-called LeadingOnes function, counting the number of (contiguous) leading ones of the bit string, and the second objective counts in an analogous fashion the number of trailing zeros. Again, the aim is to maximize both objectives. Different from OneMinMax, here many solutions exist that are not Pareto optimal. The known runtimes for this benchmark are $\Theta(n^{3})$ for the SEMO [LTZ04], $O(n^{3})$ for the GSEMO [Gie03], and $O(\mu n^{2})$ for the $(\mu+1)$ SIBEA with population size $\mu≥ n+1$ [BFN08].
Similar to OneMinMax, we can show that when the population size is large enough, an objective value on the Pareto front stays in the population from the point on when it is discovered.
**Lemma 7**
*Consider one iteration of the NSGA-II with population size $N≥ 4(n+1)$ optimizing the LeadingOnesTrailingZeroes function. Assume that in some iteration $t$ the combined parent and offspring population $R_{t}=P_{t}\cup Q_{t}$ contains a solution $x$ with rank one. Then also the next parent population $P_{t+1}$ contains an individual $y$ with $f(y)=f(x)$ . In particular, once the parent population contains an individual with objective value $(k,n-k)$ , it will do so for all future generations.*
* Proof*
Let $F_{1}$ be the set of solutions of rank one, that is, the set of solutions in $R_{t}$ that are not dominated by any other individual in $R_{t}$ . By definition of dominance, for each $v_{1}∈\{f_{1}(x)\mid x∈ F_{1}\}$ , there exists a unique $v_{2}$ such that $(v_{1},v_{2})∈\{f(x)\mid x∈ F_{1}\}$ . Therefore, $|f(F_{1})|$ is at most $n+1$ . We now reuse the argument from the proof of Lemma 1 for OneMinMax that for each objective value, there are at most $4$ individuals with this objective value and positive crowding distance. Thus the number of individuals in $F_{1}$ with positive crowding distance is at most $4(n+1)≤ N$ . Since the NSGA-II keeps $N$ individuals with smallest rank and largest crowding distance in case of a tie, we know that the individuals with rank one and positive crowding distance will all be kept. This shows the first claim. For the second claim, let $x∈ P_{t}$ with $f(x)=(k,n-k)$ for some $k$ . Since $x$ lies on the Pareto front of LeadingOnesTrailingZeroes, the rank of $x$ in $R_{t}$ is necessarily one. Hence by the first claim, a $y$ with $f(y)=f(x)$ will be contained in $P_{t+1}$ . A simple induction extends this finding to all future generations. ∎
Since not all individuals are on the Pareto front, the runtime analysis for LeadingOnesTrailingZeroes function is slightly more complex than for OneMinMax. We analyze the process in two stages: the first stage lasts until we have found both extremal solutions of the Pareto front. In this phase, we argue that the first (resp. second) objective value increases by one every (expected) $O(n)$ iterations. Consequently, after an expected number of $O(n^{2})$ iterations, we have an individual $x$ in the population with $f_{1}(x)=n$ (resp. $f_{2}(x)=n$ ), which are the desired extremal individuals. The second stage, where we complete the Pareto front from existing Pareto solutions, can be analyzed in a similar manner as for OneMinMax in Theorem 2, noting of course the different probabilities to generate a new solution on the Pareto front. We start with the two easier parent selections and discuss tournament selection separately in Theorem 9.
**Theorem 8**
*Consider optimizing the LeadingOnesTrailingZeroes function via the NSGA-II with one of the following four ways to generate the offspring population in Step 3 in Algorithm 3, namely applying one-bit mutation or standard bit-wise mutation once to each parent or $N$ times choosing a parent uniformly at random and applying one-bit mutation or standard bit-wise mutation to it. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is $\frac{2e^{2}}{e-1}n^{2}$ iterations and $\frac{2e^{2}}{e-1}Nn^{2}$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front. Then
| | $\displaystyle\Pr\mathopen{}\mathclose{{}\left[T≥\frac{2e^{2}(1+\delta)}{e-1%
}n^{2}}\right]≤\exp\mathopen{}\mathclose{{}\left(-\frac{\delta^{2}}{2(1+%
\delta)}(2n-1)}\right)$ | |
| --- | --- | --- |
holds for any $\delta≥ 0$ .*
* Proof*
Consider one iteration $t$ of the first stage, that is, we have $P_{t}^{p}=\{x∈ P_{t}\mid f_{1}(x)+f_{2}(x)=n\}=\emptyset$ . Let $v_{t}=\max\{f_{1}(x)\mid x∈ P_{t}\}$ and let $P_{t}^{*}=\{x∈ P_{t}\mid f_{1}(x)=v_{t}\}$ . Note that by Lemma 7, $v_{t}$ is non-decreasing over time. Let $x∈ P_{t}^{*}$ . Let $p$ denote the probability that $x$ is chosen as a parent to be mutated (note that this probability is independent of $x$ for the two selection schemes regarded here). Conditional on that, let $p^{*}$ be a lower bound (independent of $x$ ) on the probability that $x$ generates a solution with a larger $f_{1}$ -value. Then the expected number of iterations to obtain a solution with better $f_{1}$ -value is at most $\frac{1}{pp^{*}}$ . Consequently, the expected number of iterations to obtain a $f_{1}$ -value of $n$ , thus a solution on the Pareto front, is at most $(n-k_{0})\frac{1}{pp^{*}}≤ n\frac{1}{pp^{*}}$ , where $k_{0}$ is the maximum LeadingOnes value in the initial population. For the second stage, let $x∈ P_{t}^{p}$ be such that a neighbor of $f(x)$ on the front is not yet covered by $P_{t}$ . Let $p^{\prime}$ denote the probability that $x$ is chosen as a parent to be mutated. Conditional on that, let $p^{**}$ denote a lower bound (independent of $x$ ) for the probability to generate a particular neighbor of $x$ on the front. Consequently, the probability that $R_{t}$ covers an extra element of the Pareto front is at least $p^{\prime}p^{**}$ . Since Lemma 7 implies that any existing LeadingOnesTrailingZeroes value on the front will be kept in the following iterations, we know that the expected number of iterations for this progress is at most $\frac{1}{p^{\prime}p^{**}}$ . Since $n$ such progresses are sufficient to cover the full Pareto front, the expected number of iterations to cover the whole Pareto front is at most $n\frac{1}{p^{\prime}p^{**}}$ . Therefore, the expected total runtime is at most $n\frac{1}{pp^{*}}+n\frac{1}{p^{\prime}p^{**}}$ iterations. We recall from Theorem 2 that we have $p=p^{\prime}=1$ when selecting each parent once and we have $p=p^{\prime}=1-(1-\frac{1}{N})^{N}≥ 1-\frac{1}{e}$ when choosing parents randomly. To estimate $p^{*}$ and $p^{**}$ , we note that the desired progress can always be obtained by flipping one particular bit. Hence for one-bit mutation, we have $p^{*}=p^{**}=\frac{1}{n}$ . For standard bit-wise mutation, $\frac{1}{n}(1-\frac{1}{n})^{n-1}≥\frac{1}{en}$ is a valid lower bound for $p^{*}$ and $p^{**}$ . With these estimates, we obtain in all cases an expected runtime of at most
| | $\displaystyle n\frac{1}{pp^{*}}+n\frac{1}{p^{\prime}p^{**}}≤ n\frac{1}{(1-%
\frac{1}{e})\frac{1}{en}}+n\frac{1}{(1-\frac{1}{e})\frac{1}{en}}=\frac{2e^{2}n%
^{2}}{e-1}$ | |
| --- | --- | --- |
iterations, hence $\frac{2e^{2}}{e-1}Nn^{2}$ fitness evaluations. Now we will prove the concentration result. The time to cover the full Pareto front is divided into two stages as discussed before. It is not difficult to see that the first stage is to reach a Pareto optimum for the first time, and the corresponding runtime is dominated by the sum of $n$ independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{1}{en}$ . The second stage is to cover the full Pareto front, and the corresponding runtime is dominated by the sum of another $n$ such independent geometric random variables. Formally, let $X_{1},...,X_{2n}$ be independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{1}{en}$ , and let $T$ be the number of iterations to cover the full Pareto front. Then $Z:=\sum_{i=1}^{2n}X_{i}$ stochastically dominates $T$ , and $E[Z]=\frac{2e^{2}n^{2}}{e-1}$ . From a Chernoff bound for sums of independent identically distributed geometric random variables [Doe20, (1.10.46) in Theorem 1.10.32], we have that for any $\delta≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}\left[Z\geq(1+\delta)\frac{2e^{2}n^{2}}{e-1}}\right%
]\leq\exp\mathopen{}\mathclose{{}\left(-\frac{\delta^{2}}{2}\frac{2n-1}{1+%
\delta}}\right).
$$
Since $Z$ dominates $T$ , we have proven this theorem. ∎
We now study the runtime of the NSGA-II using binary tournament selection. Compared to OneMinMax, we face the additional difficulty that now rank one solutions can exist which are not on the Pareto front. Due to their low rank, they could perform well in the selection, but being possibly far from the front, they are not interesting as parents. We need a sightly different general proof outline to nevertheless argue that sufficiently often a parent on the Pareto front generates a new neighbor on the front. Also, since not all individuals are on the Pareto front, we do not have anymore the property that the difference between the maximum and minimum value is the same for both objectives. We overcome this by first showing the NSGA-II finds the two extremal points of the Pareto front in reasonable time (then the maximum values are both $n$ and the minimum values are both $0 0$ ).
**Theorem 9**
*Consider optimizing the LeadingOnesTrailingZeroes function via the NSGA-II. Assume that the parents for variation are chosen either via $N$ independent random tournaments between different individuals or via the two-permutation implementation of binary tournaments. Assume that these parents are mutated via one-bit or standard bit-wise mutation. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $15en^{2}$ iterations and at most $15eNn^{2}$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front, then
| | $\displaystyle\Pr\mathopen{}\mathclose{{}\left[T≥\frac{(1+\delta)100e}{3}n^{%
2}}\right]≤\exp\mathopen{}\mathclose{{}\left(-\frac{\delta^{2}}{2(1+\delta)%
}(3n-1)}\right)$ | |
| --- | --- | --- |
holds for any $\delta≥ 0$ .*
* Proof*
We first argue that, regardless of the initial state of the NSGA-II, it takes $O(n^{2})$ iterations until the extremal point $(1,...,1)$ , which is the unique maximum of $f_{1}$ , is in $P_{t}$ . To this aim, let $X_{t}:=\max\{f_{1}(x)\mid x∈ P_{t}\}$ denote the maximum $f_{1}$ value in the parent population. We note that any $x∈ P_{t}$ with $f_{1}(x)=X_{t}$ lies on the first front $F_{1}$ and that there is a $y∈ P_{t}$ with infinite crowding distance and $f(y)=f(x)$ , in particular, $f_{1}(y)=X_{t}$ . If parents are chosen via independent tournaments, such a $y$ has a $\frac{2}{N}$ chance of being one of the two individuals of a fixed tournament. It then wins the tournament with at least 50% chance (where the 50% show up only in the rare case that the other individual also lies on the first front and has an infinite crowding distance). Hence the probability that this $y$ is chosen at least once as a parent to be mutated is at least $p=1-(1-\frac{1}{2}\frac{2}{N})^{N}≥ 1-\frac{1}{e}$ . When the two-permutation implementation of tournament selection is used, then $y$ appears in both permutations and has a random partner in both. Again, this partner with probability at most $\frac{1}{2}$ wins the tournament. Hence the probability that $y$ is selected as a parent at least once is at least $p=1-(\frac{1}{2})^{2}=\frac{3}{4}$ . Conditional on $y$ being chosen at least once, let us regard a fixed mutation step in which $y$ was selected as a parent. To mutate $y$ into an individual with higher $f_{1}$ value, it suffices to flip a particular single bit (namely the first zero after the initial contiguous segment of ones). The probability for this is $p^{*}=\frac{1}{n}$ for one-bit mutation and $p^{*}=\frac{1}{n}(1-\frac{1}{n})^{n-1}≥\frac{1}{en}$ for standard bit-wise mutation. Denoting by $Y_{t}:=\max\{f_{1}(x)\mid x∈ R_{t}\}$ the maximum $f_{1}$ value in the combined parent and offspring population, we have just shown that $\Pr[Y_{t}≥ X_{t}+1]≥ pp^{*}=\Omega(1/n)$ whenever $X_{t}<n$ . We note that any $x∈ R_{t}$ with $f_{1}(x)=Y_{t}$ lies on the first front $F_{1}$ of $R_{t}$ and that there is a $y∈ R_{t}$ with infinite crowding distance and $f(y)=f(x)$ , in particular, $f_{1}(y)=Y_{t}$ . Consequently, such a $y$ will be kept in the next parent population $P_{t+1}$ (note that there are at most $4$ individuals in $F_{1}$ with infinite crowding distance – since $N≥ 4$ , they will all be included in $P_{t+1}$ ). This shows that we also have $\Pr[X_{t+1}≥ X_{t}+1]≥ pp^{*}$ whenever $X_{t}<n$ . By adding the expected waiting times for an increase of the $X_{t}$ value, we see that the expected time to have $X_{t}=n$ , that is, to have $(1,...,1)∈ P_{t}$ , is at most
| | $\displaystyle\frac{n}{pp^{*}}≤\frac{n}{(1-\frac{1}{e})\frac{1}{en}}=\frac{e%
^{2}n^{2}}{e-1}$ | |
| --- | --- | --- |
iterations. By a symmetric argument, we see that after another at most $\frac{e^{2}}{e-1}n^{2}$ iterations, also the other extremal point $(0,...,0)$ is in the population (and remains there forever by Lemma 7). With now both extremal points of the Pareto front covered, we analyze the remaining time until the Pareto front is fully covered. We note that by Lemma 7, the number of Pareto front points covered cannot decrease. Hence it suffices to prove a lower bound for the probability that the coverage increases in one iteration. This is what we do now. Assume that the Pareto front is not yet fully covered. Since we have some Pareto optimal individuals, there also is a Pareto optimal individual $x∈ P_{t}$ such that $f(x)$ is a neighbor of a point $v$ on the Pareto front that is not covered. Since we have both extremal points in the Pareto front, the differences between the maximum and minimum value are the same for both objectives (namely $n$ ). Consequently, in the same way as in the proof of Lemma 3, we know that there is also such a $y$ with $f(y)=f(x)$ and with crowding distance at least $\frac{2}{n}$ . We estimate the number of individuals in $P_{t}\setminus\{y\}$ which could win a tournament against this $y$ . Clearly, these can only be individuals in the first front $F_{1}$ of the non-dominated sorting. Assume first that $|f(F_{1})|≤ 0.8(n+1)$ . We note that, just by the definition of crowding distance and in a similar fashion as in the proof of Lemma 1, for each $v∈ f(F_{1})$ there are at most four individuals with $f$ value equal to $v$ and positive crowding distance. All other individuals in $F_{1}$ have a crowding distance of zero (and thus lose the tournament against $y$ ), as do all individuals not in $F_{1}$ . Consequently, there are at least $N_{0}=N-4· 0.8(n+1)$ individuals other than $y$ that would lose a tournament against $y$ . Assume now that $m:=|f(F_{1})|>0.8(n+1)$ . Since $F_{1}$ consists of pair-wise incomparable solutions or solutions with identical objective value (which we may ignore for the following argument), we have $|f_{1}(F_{1})|=|f_{2}(F_{1})|=m$ . For any $v=(v_{1},v_{2})$ , let $v^{+}:=(v_{1}+1,v_{2}-1)$ and $v^{-}:=(v_{1}-1,v_{2}+1)$ . Then we divide $f(F_{1})$ into two disjoint sets $U_{1}=\{v∈ f(F_{1})\cap[1..n-1]^{2}\mid v^{+}∉ f(F_{1})\text{ or }v^{-}%
∉ f(F_{1})\}$ and $U_{2}=f(F_{1})\setminus U_{1}$ . Since both $f_{1}(F_{1})$ and $f_{2}(F_{1})$ are subsets of $[0..n]$ , which has $n+1$ elements, we see that less than $0.2(n+1)$ of the values in $[0..n]$ are missing in $f_{1}(F_{1})$ , and analogously in $f_{2}(F_{1})$ . Since each value missing in $f_{1}(F_{1})$ or $f_{2}(F_{1})$ leads to at most two values in $U_{1}$ , we have $|U_{1}|<4· 0.2(n+1)=0.8(n+1)$ . For the values in $U_{1}$ , we use the blunt estimate from above that at most $4$ individuals with this objective value and positive crowding distance exist. For the values $v∈ U_{2}$ , we are in the same situation as in Lemma 4, and thus there are at most two individuals $x∈ F_{1}$ with $f(x)=v$ and crowding distance at least $\frac{2}{n}$ (this was not formally proven in Lemma 4 for the case that $v∈\{(0,n),(n,0)\}$ and the unique neighbor of $v$ is in $f(F_{1})$ , but it is easy to see that in this case only the at most two $x$ with $f(x)=v$ and infinite crowding distance can have a crowding distance of at least $\frac{2}{n}$ ). Consequently, there are more than
| | $\displaystyle N-4|U_{1}|-2|U_{2}|$ | $\displaystyle={}N-4|U_{1}|-2(m-|U_{1}|)=N-2|U_{1}|-2m$ | |
| --- | --- | --- | --- |
individuals in $P_{t}\setminus\{y\}$ that would lose against $y$ . Note that this bound is weaker than the one from the first case, so it is valid in both cases. From this, we now estimate the probability that $y$ is selected as a parent at least once. We first regard the case of independent tournaments. The probability that $y$ is the winner of a fixed tournament is at least the probability that it is chosen as the first contestant times the probability that one of the at least $N-3.6(n+1)$ sure losers is chosen as the second contestant. This probability is at least $\frac{1}{N}·\frac{N-3.6(n+1)}{N-1}≥\frac{1}{N}\frac{0.4(n+1)}{4n+3}≥
0%
.1\frac{1}{N}$ . Hence the probability $p$ that $y$ is chosen at least once as a parent for mutation is at least $p≥ 1-(1-0.1\frac{1}{N})^{N}≥ 1-\exp(-0.1)≥ 0.09$ . For the two-permutation implementation of tournament selection, $y$ appears in both permutations and has a random partner in each of them. Hence the probability that $y$ wins at least one of these two tournaments is at least $p≥ 1-(1-\frac{N-3.6(n+1)}{N-1})^{2}≥ 1-(1-0.1)^{2}=0.19$ . Conditional on $y$ being selected at least once, we regard a mutation step in which $y$ is selected. The probability $p^{*}$ that the Pareto optimal $y$ is mutated into the unique Pareto optimal bit string $z$ with $f(z)=v$ is $p^{*}=\frac{1}{n}$ for one-bit mutation and $p^{*}=\frac{1}{n}(1-\frac{1}{n})^{n-1}≥\frac{1}{en}$ for standard bit-wise mutation. Consequently, the probability that one iteration generates the missing Pareto front value $v$ is at least $pp^{*}$ , the expected waiting time for this is at most $\frac{1}{pp^{*}}$ iterations, and the expected time to create all missing Pareto front values is at most
| | $\displaystyle\frac{n}{pp^{*}}≤\frac{n}{0.09\frac{1}{en}}=\frac{100en^{2}}{9}$ | |
| --- | --- | --- |
iterations. Hence, the runtime for the full coverage of the Pareto front starting from the initial population is at most
| | $\displaystyle\tfrac{e^{2}}{e-1}n^{2}+\tfrac{e^{2}}{e-1}n^{2}+\tfrac{100e}{9}n^%
{2}<15en^{2}$ | |
| --- | --- | --- |
iterations, which is at most $15eNn^{2}$ fitness evaluations. Now we will prove the concentration result. Note that in this proof, we consider three phases, the first phase to reach the extremal point $(1,...,1)$ , the second phase to reach $(0,...,0)$ , and the third phase to cover the full Pareto front. The runtime for each phase is dominated by the sum of $n$ independent geometric random variables with success probabilities of $\frac{0.09}{en}$ . Formally, let $X_{1},...,X_{3n}$ be independent geometric random variables with success probabilities of $\frac{0.09}{en}$ , and let $T$ be the number of iterations to cover the full Pareto front. Then we have $Z:=\sum_{i=1}^{3n}X_{i}$ stochastically dominates $T$ , and $E[Z]=\frac{3en^{2}}{0.09}$ . From the Chernoff bound [Doe20, (1.10.46) in Theorem 1.10.32], we have that for any $\delta≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}\left[Z\geq(1+\delta)\frac{100e}{3}n^{2}}\right]%
\leq\exp\mathopen{}\mathclose{{}\left(-\frac{\delta^{2}}{2}\frac{3n-1}{1+%
\delta}}\right).
$$
Since $Z$ dominates $T$ , this shows the theorem. ∎
5 An Exponential Lower Bound for Small Population Size
In this section, we prove a lower bound for a small population size. Since lower bound proofs can be quite complicated – recall for example that there are matching upper and lower bounds for the runtime of the SEMO (using one-bit mutation) on OneMinMax and LeadingOnesTrailingZeroes, but not for the GSEMO (using bit-wise mutation) – we restrict ourselves to the simplest variant using each parent once to generate one offspring via one-bit mutation. From the proofs, though, we are optimistic that our results, with different implicit constants, can also be shown for all other variants of the NSGA-II regarded in this work. Our experiments support this believe, see Figure 3 in Section 6.
Our main result is that this NSGA-II takes an exponential time to find the whole Pareto front (of size $n+1$ ) of OneMinMax when the population size is $n+1$ . This is different from the SEMO and GSEMO algorithms (which have no fixed population size, but which will never store a population larger than $n+1$ when optimizing OneMinMax) and the $(\mu+1)$ SIBEA with population size $\mu=n+1$ . Even stronger, we show that there is a constant $\varepsilon>0$ such that when the current population $P_{t}$ covers at least $|f(P_{t})|≥(1-\varepsilon)(n+1)$ points on the Pareto front of OneMinMax, then with probability $1-\exp(-\Theta(n))$ , the next population $P_{t+1}$ will cover at most $|f(P_{t+1})|≤(1-\varepsilon)(n+1)$ points on the front. Hence when a population covers a large fraction of the Pareto front, then with very high probability the next population will cover fewer points on the front. When the coverage is smaller, that is, $|f(P_{t})|≤(1-\varepsilon)(n+1)$ , then with probability $1-\exp(-\Theta(n))$ the combined parent and offspring population $R_{t}$ will miss a constant fraction of the Pareto front. From these two statements, it is easy to see that there is a constant $\delta$ such that with probability $1-\exp(-\Omega(n))$ , in none of the first $\exp(\Omega(n))$ iterations the combined parent and offspring population covers more than $(1-\delta)(n+1)$ points of the Pareto front.
Since it is the technically easier one, we start with proving the latter statement that a constant fraction of the front not covered by $P_{t}$ implies also a constant fraction not covered by $R_{t}$ . Before stating the formal result and proof, let us explain the reason behind this result. With a constant fraction of the front not covered by $P_{t}$ , also a constant fraction that is $\Omega(n)$ away from the boundary points $(0,n)$ and $(n,0)$ is not covered. These values have the property that from an individual corresponding to either of their neighboring positions, an individual with this objective value can only be generated with constant probability via one-bit mutation. Again a constant fraction of these values have only a constant number of individuals on neighboring positions. These values thus have a (small) constant probability of not being generated in this iteration. This shows that in expectation, we are still missing a constant fraction of the Pareto front in $R_{t}$ . Via the method of bounded differences (exploiting that each mutation operation can change the number of missing elements by at most one), we turn this expectation into a bound that holds with probability $1-\exp(-\Omega(n))$ .
**Lemma 10**
*Let $\varepsilon∈(0,1)$ be a sufficiently small constant. Consider optimizing the OneMinMax benchmark via the NSGA-II applying one-bit mutation once to each parent individual. Let the population size be $N=n+1$ . Assume that $|f(P_{t})|≤(1-\varepsilon)(n+1)$ . Then with probability at least $1-\exp(-\Omega(n))$ , we have $|f(R_{t})|≤(1-\frac{1}{10}\varepsilon(\tfrac{1}{5}\varepsilon-\tfrac{2}{n})%
^{5/\varepsilon})(n+1)$ .*
* Proof*
Let $F=\{(v,n-v)\mid v∈[0..n]\}$ be the Pareto front of OneMinMax. For a value $(v,n-v)∈ F$ , we say that $(v-1,n-v+1)$ and $(v+1,n-v-1)$ are neighbors of $(v,n-v)$ provided that they are in $[0..n]^{2}$ . We write $(a,b)\sim(u,v)$ to denote that $(a,b)$ and $(u,v)$ are neighbors. Let $\Delta=\lceil\frac{5}{\varepsilon}\rceil-1$ and let $F^{\prime}$ be the set of values in $F$ such that more than $\Delta$ individuals in $P_{t}$ have a function value that is a neighbor of this value, that is,
$$
F^{\prime}=\mathopen{}\mathclose{{}\left\{(v,n-v)\in F\,\big{|}\,|\{x\in P_{t}%
\mid f(x)\sim(v,n-v)\}|\geq\Delta+1}\right\}.
$$
Then $|F^{\prime}|≤\frac{2}{\Delta+1}(n+1)≤\frac{2}{5}\varepsilon(n+1)$ as otherwise the number of individuals in our population could be bounded from below by
$$
|F^{\prime}|\tfrac{1}{2}(\Delta+1)>\tfrac{2}{\Delta+1}(n+1)\cdot\tfrac{1}{2}(%
\Delta+1)=n+1,
$$
which contradicts our assumption $N=n+1$ (note that the factor of $\tfrac{1}{2}$ accounts for the fact that we may count each individual twice). Let $M=F\setminus f(P_{t})$ be the set of Pareto front values not covered by the current population. By assumption, $|M|≥\varepsilon(n+1)$ . Let
$$
M_{1}=\mathopen{}\mathclose{{}\left\{(v,n-v)\in M\,\middle|\,v\in\mathopen{}%
\mathclose{{}\left[\lfloor\tfrac{1}{5}\varepsilon(n+1)\rfloor..n-\lfloor\tfrac%
{1}{5}\varepsilon(n+1)\rfloor}\right]}\right\}\setminus F^{\prime}.
$$
Then $|M_{1}|≥|M|-2\lfloor\tfrac{1}{5}\varepsilon(n+1)\rfloor-|F^{\prime}|≥%
\tfrac{1}{5}\varepsilon(n+1)$ . We now argue that a constant fraction of the values in $M_{1}$ is not generated in the current generation. We note that via one-bit mutation, a given $(v,n-v)∈ F$ can only be generated from an individual $x$ with $f(x)\sim(v,n-v)$ . Let $(v,n-v)∈ M_{1}$ . Since $v∈[\lfloor\tfrac{1}{5}\varepsilon(n+1)\rfloor..n-\lfloor\tfrac{1}{5}%
\varepsilon(n+1)\rfloor]$ , the probability that a given parent $x$ is mutated to some individual $y$ with $f(y)=(v,n-v)$ is at most
| | $\displaystyle\frac{n-\lfloor\tfrac{1}{5}\varepsilon(n+1)\rfloor+1}{n}≤ 1-%
\frac{1}{5}\varepsilon+\frac{2}{n}$ | |
| --- | --- | --- |
since there are at most $n-\lfloor\tfrac{1}{5}\varepsilon(n+1)\rfloor+1$ bit positions such that flipping them creates the desired value. Since $v∉ F^{\prime}$ , the probability that $Q_{t}$ (and thus $R_{t}$ ) contains no individual $y$ with $f(y)=(v,n-v)$ , is at least
| | $\displaystyle\mathopen{}\mathclose{{}\left(1-\mathopen{}\mathclose{{}\left(1-%
\tfrac{1}{5}\varepsilon+\tfrac{2}{n}}\right)}\right)^{\Delta}≥(\tfrac{1}{5}%
\varepsilon-\tfrac{2}{n})^{5/\varepsilon}:=p.$ | |
| --- | --- | --- |
Let $X=|F\setminus f(R_{t})|$ denote the number of Pareto front values not covered by $R_{t}$ . We have $E[X]≥|M_{1}|p≥\frac{1}{5}\varepsilon p(n+1)$ . The random variable $X$ is functionally dependent on the $N=n+1$ random decisions of the $N$ mutation operations, which are stochastically independent. Changing the outcome of a single mutation operation changes $X$ by at most $1$ . Consequently, $X$ satisfies the assumptions of the method of bounded differences [McD89] (also to be found in [Doe20, Theorem 1.10.27]). Hence the classic additive Chernoff bound applies to $X$ as if it was a sum of $N$ independent random variables taking values in an interval of length $1$ . In particular, the probability that $X≤\frac{1}{10}\varepsilon p(n+1)≤\frac{1}{2}E[X]$ is at most $\exp(-\Omega(n))$ . ∎
We now turn to the other main argument, which is that when the current population covers the Pareto front to a large extent, then the selection procedure of the NSGA-II will remove individuals in such a way from $R_{t}$ that at least some constant fraction of the Pareto front is not covered by $P_{t+1}$ . The key arguments to show this claim are the following. When a large part of the front is covered by $P_{t}$ , then many points are only covered by a single individual (since the population size equals the size of the front). With some careful counting, we derive from this that close to two thirds of the positions on the front are covered exactly twice in the combined parent and offspring population $R_{t}$ and that the corresponding individuals have the same crowding distance. Since these are roughly $\frac{4}{3}(n+1)$ individuals appearing equally preferable in the selection, a random set of at least roughly $\frac{1}{3}(n+1)$ of them will be removed in the selection step. In expectation, this will remove both individuals from a constant fraction of the points on the Pareto front. Again, the method of bounded differences turns this expectation into a statement with probability $1-\exp(-\Omega(n))$ .
**Lemma 11**
*Let $\varepsilon>0$ be a sufficiently small constant. Consider optimizing the OneMinMax benchmark via the NSGA-II applying one-bit mutation once to each individual. Let the population size be $N=n+1$ . Assume that the current population $P_{t}$ covers $|f(P_{t})|≥(1-\varepsilon)(n+1)$ elements of the Pareto front. Then with probability at least $1-\exp(-\Omega(n))$ , the next population $P_{t+1}$ covers less than $(1-0.01)(n+1)$ elements of the Pareto front.*
* Proof*
Let $U$ be the set of Pareto front values that have exactly one corresponding individual in $P_{t}$ , that is, for any $(v,n-v)∈ U$ , there exists only one $x∈ P_{t}$ with $f(x)=(v,n-v)$ . We first note that $|U|≥(1-2\varepsilon)(n+1)$ as otherwise there would be at least
| | $\displaystyle 2$ | $\displaystyle{}\mathopen{}\mathclose{{}\left(|f(P_{t})|-|U|}\right)+|U|=2|f(P_%
{t})|-|U|$ | |
| --- | --- | --- | --- |
individuals in $P_{t}$ , which contradicts our assumption $N=n+1$ . Let $U^{\prime}$ denote the set of values in $U$ which have all their neighbors also in $U$ . Since each value not in $U$ can prevent at most two values in $U$ from being in $U^{\prime}$ , we have
$$
\begin{split}|U^{\prime}|&\geq{}|U|-2(n+1-|U|)=3|U|-2(n+1)\\
&\geq{}3(1-2\varepsilon)(n+1)-2(n+1)=\mathopen{}\mathclose{{}\left(1-6%
\varepsilon}\right)(n+1).\end{split}
$$
We say that $(v,n-v)$ is double-covered by $R_{t}$ if there are exactly two individuals in $R_{t}$ with function value $(v,n-v)$ . Noting that via one-bit mutation a certain function value can only be generated from the individuals corresponding to the neighbors of this function value, we see that a given $(v,n-v)∈ U^{\prime}$ with $v∈[1..n-1]$ is double-covered by $R_{t}$ with probability exactly
$$
p_{v}=\frac{n-(v-1)}{n}+\frac{v+1}{n}-2\frac{n-(v-1)}{n}\frac{v+1}{n}=1-\frac{%
2v}{n}+2\frac{v^{2}-1}{n^{2}}.
$$
Thus the expected number of double-coverages in $U^{\prime}$ is at least
$$
\displaystyle\sum_{v\in[1..n-1]:\atop(v,n-v)\in U^{\prime}}p_{v}={} \displaystyle\mathopen{}\mathclose{{}\left(\sum_{v=1}^{n-1}p_{v}}\right)-%
\mathopen{}\mathclose{{}\left(\sum_{v\in[1..n-1]:\atop(v,n-v)\notin U^{\prime}%
}p_{v}}\right) \displaystyle\geq{} \displaystyle\mathopen{}\mathclose{{}\left(\sum_{v=1}^{n-1}1-\frac{2v}{n}+2%
\frac{v^{2}-1}{n^{2}}}\right)-\mathopen{}\mathclose{{}\left(\sum_{v\in[1..n-1]%
:\atop(v,n-v)\notin U^{\prime}}1}\right) \displaystyle\geq{} \displaystyle(n-1)-\frac{2}{n}\frac{(n-1)n}{2}+\frac{2}{n^{2}}\mathopen{}%
\mathclose{{}\left(\frac{(n-1)n(2(n-1)+1)}{6}-(n-1)}\right) \displaystyle{}-6\varepsilon(n+1) \displaystyle={} \displaystyle\frac{n-1}{n^{2}}\frac{2n^{2}-n-6}{3}-6\varepsilon(n+1)=(\tfrac{2%
}{3}-6\varepsilon)(n+1)-O(1). \tag{1}
$$ Denote by $U^{\prime\prime}$ the set of values in $U^{\prime}$ that are double-covered by $R_{t}$ and note that we have just shown $E[|U^{\prime\prime}|]≥(\frac{2}{3}-6\varepsilon)(n+1)-O(1)$ . The number $m:=|U^{\prime\prime}|$ of double-covered elements is functionally dependent on the random decisions taken (independently) in the $N$ mutation operations. Each mutation operation determines one offspring and thus can change the number of double-covered values by at most $2$ . Consequently, we can use the method of bounded differences [McD89] and obtain that $|U^{\prime\prime}|$ is at least $(\tfrac{2}{3}-8\varepsilon)(n+1)$ with probability at least $1-\exp(-\Omega(n))$ . We condition on this in the remainder. Our next argument is that these double-coverages correspond to approximately $\frac{4}{3}(n+1)$ individuals in $R_{t}$ that have the same crowding distance. Consequently, the selection procedure has to discard at least roughly $\frac{1}{3}(n+1)$ of them, randomly chosen, and this will lead to a decent number of values in $U^{\prime\prime}$ that are not covered anymore by $P_{t+1}$ . To make this precise, let $R^{\prime\prime}$ denote the individuals $x$ in $R_{t}$ such that $f(x)∈ U^{\prime\prime}$ . By construction, there are exactly two such individuals for each value in $U^{\prime\prime}$ , hence $|R^{\prime\prime}|=2m$ . Further, both neighboring values are also present in $f(R_{t})$ . Consequently, each $x∈ R^{\prime\prime}$ has crowding distance (in $R_{t}$ ) exactly $d=\frac{1}{v_{1}^{\max}-v_{1}^{\min}}+\frac{1}{v_{2}^{\max}-v_{2}^{\min}}$ . We recall that the selection procedure (since all ranks are equal to one) first discards all individuals with crowding distance less than $d$ since these are at most $|R_{t}|-|R^{\prime\prime}|≤ 2(n+1-\tilde{m})=(2-\frac{4}{3}+16\varepsilon)(%
n+1)+O(1)$ , which is less than $N$ for $n$ large and $\varepsilon$ small enough. Then, randomly, the selection procedure discards a further number of individuals from all individuals with crowding distance exactly $d$ so that exactly $N$ individuals remain. For $N$ individuals to remain, we need that at least $k:=|R^{\prime\prime}|-N$ individuals from $R^{\prime\prime}$ are discarded. To ease the calculation, we first reduce the problem to the case that $|R^{\prime\prime}|=2\tilde{m}$ . Indeed, let $U^{\prime\prime\prime}$ be any subset of $U^{\prime\prime}$ having cardinality exactly $\tilde{m}$ and let $R^{\prime\prime\prime}$ be the set of individuals $x∈ R_{t}$ with $f(x)∈ U^{\prime\prime\prime}$ . Then $R^{\prime\prime\prime}⊂eq R^{\prime\prime}$ and $|R^{\prime\prime\prime}|=2\tilde{m}$ . With the same argument as in the previous paragraph we see that the selection procedure has to remove at least $\tilde{k}:=2\tilde{m}-N$ elements from $R^{\prime\prime\prime}$ . We thus analyze the number of elements of $U^{\prime\prime\prime}$ that become uncovered when we remove a random set of $\tilde{k}$ individuals from $R^{\prime\prime\prime}$ , knowing that this is a lower bound for the number of elements uncovered in $U^{\prime\prime}$ , both because the number of individuals removed from $R^{\prime\prime\prime}$ can be higher than $\tilde{k}$ and because the removal of elements in $R^{\prime\prime}\setminus R^{\prime\prime\prime}$ can also lead to uncovered elements in $U^{\prime\prime}$ . We take a final pessimistic simplification, and this is that we select $\tilde{k}$ elements from $R^{\prime\prime\prime}$ with replacement and remove these individuals from $R^{\prime\prime\prime}$ . Clearly, this can only lower the number of removed elements, hence our estimate for the number of uncovered elements is also valid for the random experiment without replacement (where we choose exactly $\tilde{k}$ elements to be removed). For this random experiment the probability for uncovering a position in $U^{\prime\prime\prime}$ is at least
| | $\displaystyle 1-{}$ | $\displaystyle{}2\mathopen{}\mathclose{{}\left(1-\frac{1}{2\tilde{m}}}\right)^{%
\tilde{k}}+\mathopen{}\mathclose{{}\left(1-\frac{1}{2\tilde{m}}}\right)^{2%
\tilde{k}}$ | |
| --- | --- | --- | --- |
where we used the estimate $\frac{\tilde{k}}{2\tilde{m}}=1-\frac{n+1}{2\tilde{m}}=1-\frac{3}{4}\frac{1}{1-%
12\varepsilon}$ and the fact that $\tilde{m}=\Theta(n)$ . Let $Y$ denote the number of elements of $U^{\prime\prime\prime}$ uncovered in our random experiment. We note that $1-2\exp(-1/4)+\exp(-1/2)≥ 0.04892$ . Hence when $n$ is large enough and $\varepsilon$ was chosen as a sufficiently small constant, then
$$
E[Y]=p\tilde{m}\geq 0.02(n+1).
$$
The random variable $Y$ is functionally dependent on the $\tilde{k}$ selected individuals, which are stochastically independent. Changing the outcome of a single selected individual changes $Y$ by at most $1$ . Consequently, $Y$ satisfies the assumptions of the method of bounded differences [McD89]. The classic additive Chernoff bound thus applies to $Y$ as if it was a sum of $k=\Omega(n)$ independent random variables taking values in an interval of length $1$ . In particular, the probability that $Y≤ 0.01(n+1)≤\frac{1}{2}E[Y]$ is at most $\exp(-\Omega(n))$ . ∎
Combining Lemmas 10 and 11, we have the following exponential runtime result.
**Theorem 12**
*Consider optimizing OneMinMax via the NSGA-II applying one-bit mutation once to each individual. Let the population size be $N=n+1$ . There are a positive constant $\gamma$ and a time $T=\exp(\Omega(n))$ such that with probability $1-\exp(-\Omega(n))$ , in each of the first $T$ iterations at most a fraction of $1-\gamma$ of the Pareto front is covered by $P_{t}$ .*
* Proof*
Let $\varepsilon$ be a small constant rendering the claims of Lemmas 10 and 11 valid. Assume that $n$ is sufficiently large. Let $\tilde{\varepsilon}=(\frac{1}{10}\varepsilon)^{5/\varepsilon+1}$ . By a simple Chernoff bound, we note that a random initial individual $x$ satisfies $\frac{1}{4}n≤ f_{1}(x)≤\frac{3}{4}n$ with probability $1-\exp(\Omega(n))$ . Taking a union bound over the $n+1$ initial individuals, we see that the initial population $P_{0}$ with probability $1-\exp(-\Omega(n))$ covers at most half of the Pareto front. Let $t$ be some iteration. If $|f(P_{t})|≥(1-\varepsilon)(n+1)$ , then by Lemma 11 with probability $1-\exp(-\Omega(n))$ the next population $P_{t+1}$ covers less than $(1-0.01)(n+1)$ values of the Pareto front. If $|f(P_{t})|≤(1-\varepsilon)(n+1)$ , then by Lemma 10 with probability $1-\exp(-\Omega(n))$ we have $n+1-|f(P_{t+1})|≥\frac{1}{10}\varepsilon(\frac{1}{5}\varepsilon-\tfrac{2}{n%
})^{5/\varepsilon}(n+1)≥\tilde{\varepsilon}(n+1)$ , where the last estimate holds when $n$ is sufficiently large. Consequently, for each generation $t$ , the probability that $P_{t}$ covers more than $(1-\min\{\tilde{\varepsilon},0.01\})(n+1)$ values of the Pareto front, is only $\exp(-\Omega(n))$ . In particular, a union bound shows that for $T=\exp(\Theta(n))$ suitably chosen, with probability $1-\exp(-\Omega(n))$ in all of the first $T$ iterations, the population covers at most $(1-\min\{\tilde{\varepsilon},0.01\})(n+1)$ values of the Pareto front. ∎
6 Experiments
To complement our asymptotic results with runtime data for concrete problem sizes, we conducted the following experiments.
6.1 Settings
We use, in principle, the version of the NSGA-II given by Deb (Revision 1.1.6), available at [Deb], except that, as in our theoretical analysis, we do not use crossover. We re-implemented the algorithm in Matlab (R2016b). When a sorting procedure is used, we use the one provided by Matlab (and not randomized Quicksort as in Deb’s implementation). The code is available at [Zhe].
Our theoretical analysis above covers four parent selection strategies and two mutation operators. In the interest of brevity, with the exception of the data presented in Figure 3 we concentrate in our experiments on one variant of the algorithm, namely we use two-permutation binary tournament selection (as proposed in [DPAM02]) and standard bit-wise mutation with mutation rate $\frac{1}{n}$ (which is the most common mutation operator in evolutionary computation). We use the following experimental settings.
- Problem size $n$ : $100,200,300,$ and $400$ for OneMinMax, and $30,60,90,$ and $120$ for LeadingOnesTrailingZeroes.
- Population size $N$ : Our theoretical analyses (Theorems 6 and 9) showed that the NSGA-II find the optima of OneMinMax and LeadingOnesTrailingZeroes efficiently for population sizes of at least $N^{*}=4(n+1)$ . We use this value also in the experiments. We also use the value $N=2N^{*}$ , for which our theory results apply, but our runtime guarantees are twice as large as for $N^{*}$ (when making the implicit constants in the results visible). We also use the smaller population sizes $2(n+1)$ and $1.5(n+1)$ for OneMinMax and $2(n+1)$ for LeadingOnesTrailingZeroes. For these values, we have no proven result, but it is not uncommon that mathematical runtime analyses cannot cover all efficient parameter setting, and in fact, we shall observe a good performance in these experiments as well (the reason why we do not display results for $N=1.5(n+1)$ for LeadingOnesTrailingZeroes is that here indeed the algorithm was not effective anymore). Finally, we conduct experiments with the population size $N=n+1$ , which is large enough to represent the full Pareto front, but for which we have proven the NSGA-II to be ineffective (on OneMinMax and when letting each parent create an offspring via one-bit mutation).
- Number of independent runs: $50$ for the efficient population sizes in Section 6.2 and $20$ for more time-consuming experiments with inefficient population sizes in Sections 6.3 to 6.4. These numbers of independent runs have already shown good concentrations.
6.2 Efficient Population Sizes
Figure 1 displays the runtime (that is, the number of fitness evaluations until the full Pareto front is covered) of the NSGA-II with population sizes large enough to allow an efficient optimization, together with the runtime of the (parameter-less) GSEMO.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Line Chart: Runtime for solving OneMinMax
### Overview
The chart compares the runtime (measured in fitness evaluations) of five optimization algorithms (GSEMO and NSGA-II variants) as a function of problem size `n` (ranging from 100 to 400). The y-axis uses a logarithmic scale (10⁵ to 10⁷) to accommodate exponential growth trends.
### Components/Axes
- **X-axis**: `n` (problem size), labeled with ticks at 100, 200, 300, 400.
- **Y-axis**: "Fitness Evaluations" (log scale, 10⁵ to 10⁷).
- **Legend**: Located in the top-left corner, with five entries:
- **GSEMO** (dark blue, cross markers)
- **NSGA-II, N=1.5(n+1)** (orange, plus markers)
- **NSGA-II, N=2(n+1)** (green, plus markers)
- **NSGA-II, N=4(n+1)** (red, plus markers)
- **NSGA-II, N=8(n+1)** (dark green, plus markers)
### Detailed Analysis
1. **GSEMO** (dark blue):
- At `n=100`: ~8×10⁴ evaluations.
- At `n=200`: ~3×10⁵ evaluations.
- At `n=300`: ~1.2×10⁶ evaluations.
- At `n=400`: ~2.5×10⁶ evaluations.
- **Trend**: Linear increase on the log scale (exponential growth in raw values).
2. **NSGA-II, N=1.5(n+1)** (orange):
- At `n=100`: ~5×10⁴ evaluations.
- At `n=200`: ~1.5×10⁵ evaluations.
- At `n=300`: ~5×10⁵ evaluations.
- At `n=400`: ~1.2×10⁶ evaluations.
- **Trend**: Slightly steeper than GSEMO but less than higher-N NSGA-II variants.
3. **NSGA-II, N=2(n+1)** (green):
- At `n=100`: ~7×10⁴ evaluations.
- At `n=200`: ~2×10⁵ evaluations.
- At `n=300`: ~8×10⁵ evaluations.
- At `n=400`: ~2×10⁶ evaluations.
- **Trend**: Intermediate growth rate between GSEMO and higher-N NSGA-II.
4. **NSGA-II, N=4(n+1)** (red):
- At `n=100`: ~1×10⁵ evaluations.
- At `n=200`: ~5×10⁵ evaluations.
- At `n=300`: ~1.5×10⁶ evaluations.
- At `n=400`: ~3×10⁶ evaluations.
- **Trend**: Steeper than GSEMO and N=2 NSGA-II.
5. **NSGA-II, N=8(n+1)** (dark green):
- At `n=100`: ~1.5×10⁵ evaluations.
- At `n=200`: ~1×10⁶ evaluations.
- At `n=300`: ~3×10⁶ evaluations.
- At `n=400`: ~6×10⁶ evaluations.
- **Trend**: Steepest slope, indicating the highest computational cost.
### Key Observations
- **Scalability**: All algorithms exhibit exponential growth in runtime as `n` increases, but NSGA-II with larger `N` values scales poorly.
- **GSEMO vs. NSGA-II**: GSEMO consistently outperforms NSGA-II with `N=1.5(n+1)` but is outperformed by NSGA-II with `N≥2(n+1)`.
- **Error Bars**: Small vertical error bars (e.g., ±10⁴ at `n=100`) suggest low variability in evaluations across runs.
### Interpretation
The data demonstrates that **NSGA-II with larger `N` values (e.g., 8(n+1))** has the highest runtime, likely due to increased population diversity or computational overhead. **GSEMO** offers a middle-ground performance, suggesting it may be more efficient than NSGA-II for moderate `N` values but less scalable than NSGA-II with smaller `N`. The logarithmic y-axis highlights the exponential nature of the problem, emphasizing the need for algorithm selection based on problem size and resource constraints. Notably, NSGA-II with `N=8(n+1)` doubles its runtime between `n=200` and `n=400`, indicating severe scalability issues for large-scale problems.
</details>
(a)
<details>
<summary>x2.png Details</summary>

### Visual Description
## Line Chart: Runtime for solving LOTZ
### Overview
The chart compares the runtime (measured in fitness evaluations) of four optimization algorithms (GSEMO and NSGA-II variants) as a function of problem size `n`. The y-axis uses a logarithmic scale (10⁵ to 10⁶), while the x-axis represents `n` with discrete values at 30, 60, 90, and 120. All algorithms show exponential growth in runtime with increasing `n`, but with distinct performance characteristics.
### Components/Axes
- **X-axis (n)**: Discrete values at 30, 60, 90, 120 (problem size).
- **Y-axis (Fitness Evaluations)**: Logarithmic scale from 10⁵ to 10⁶.
- **Legend**: Located in the top-left corner, with four entries:
- **GSEMO** (black line with square markers)
- **NSGA-II, N=2(n+1)** (green line with triangle markers)
- **NSGA-II, N=4(n+1)** (orange line with diamond markers)
- **NSGA-II, N=8(n+1)** (dark green line with cross markers)
- **Error Bars**: Present for all data points, indicating variability in measurements.
### Detailed Analysis
1. **GSEMO (Black Line)**:
- At `n=30`: ~1.2×10⁵ evaluations.
- At `n=60`: ~1.5×10⁵ evaluations.
- At `n=90`: ~2.8×10⁵ evaluations.
- At `n=120`: ~5.0×10⁵ evaluations.
- **Trend**: Steady exponential growth, with error bars showing moderate variability.
2. **NSGA-II, N=2(n+1) (Green Line)**:
- At `n=30`: ~1.0×10⁵ evaluations.
- At `n=60`: ~1.3×10⁵ evaluations.
- At `n=90`: ~2.5×10⁵ evaluations.
- At `n=120`: ~4.8×10⁵ evaluations.
- **Trend**: Slightly slower growth than GSEMO, with smaller error bars.
3. **NSGA-II, N=4(n+1) (Orange Line)**:
- At `n=30`: ~1.1×10⁵ evaluations.
- At `n=60`: ~1.6×10⁵ evaluations.
- At `n=90`: ~3.2×10⁵ evaluations.
- At `n=120`: ~6.0×10⁵ evaluations.
- **Trend**: Faster growth than GSEMO, with larger error bars.
4. **NSGA-II, N=8(n+1) (Dark Green Line)**:
- At `n=30`: ~1.4×10⁵ evaluations.
- At `n=60`: ~2.0×10⁵ evaluations.
- At `n=90`: ~4.0×10⁵ evaluations.
- At `n=120`: ~8.0×10⁵ evaluations.
- **Trend**: Steepest growth, with the largest error bars.
### Key Observations
- **Exponential Scaling**: All algorithms exhibit exponential runtime growth with increasing `n`, consistent with the logarithmic y-axis.
- **NSGA-II Variants**: Higher `N` values (e.g., 8(n+1)) result in significantly worse performance compared to lower `N` values (e.g., 2(n+1)).
- **GSEMO vs. NSGA-II**: GSEMO outperforms NSGA-II with `N=2(n+1)` but underperforms NSGA-II with `N=4(n+1)` and `N=8(n+1)`.
- **Error Bars**: Larger error bars for NSGA-II with higher `N` suggest greater variability in runtime measurements.
### Interpretation
The data demonstrates that **NSGA-II's performance degrades with larger population sizes (`N`)**, as evidenced by the steepest growth for `N=8(n+1)`. **GSEMO offers a middle ground**, balancing efficiency and scalability compared to NSGA-II variants. The exponential trend highlights the computational complexity of solving LOTZ as `n` increases, with NSGA-II's higher `N` configurations becoming impractical for large `n`. The error bars indicate that runtime measurements are not deterministic, likely due to stochastic elements in the algorithms or problem instances.
</details>
(b)
Figure 1: The number of function evaluations for the NSGA-II (binary tournament selection, standard bit-wise mutation) with different population sizes and for the GSEMO optimizing OneMinMax (1a) and LeadingOnesTrailingZeroes (1b). Displayed are the median (with $1$ st and $3$ rd quartiles) in 50 independent runs.
This data confirms that the NSGA-II can efficiently cover the Pareto front of OneMinMax and LeadingOnesTrailingZeroes when using a population size of at least $N^{*}$ . The runtimes for $N=2N^{*}$ are clearly larger than for $N^{*}$ , but by a factor slightly less than $2$ for both problems. The data for the population sizes smaller than $N^{*}$ indicates that also for these parameter settings the NSGA-II performs very well.
Comparing the NSGA-II to the GSEMO, we observe that the NSGA-II with a proper choice of the population size shows a better performance. This is interesting and somewhat unexpected, in particular, for simple problems like OneMinMax and LeadingOnesTrailingZeroes. It is clear that the NSGA-II using tournament selection chooses extremal parents with higher rate. More precisely, each individual appears twice in a tournament. For an extremal value on the Pareto front, at least one individual has an infinite crowding distance, making it the tournament winner almost surely (except in the rare case that the tournament partner has infinite crowding distance as well). Consequently, for each extremal objective value, the NSGA-II mutates at least $2-o(1)$ individuals per iteration. This is twice the average rate. In contrast, the GSEMO treats all individuals equally. This advantage of the NSGA-II comes at the price of a larger population, hence a larger cost per iteration. We note that the NSGA-II throughout the run works with a population of size $N$ , whereas the GSEMO only keeps non-dominated individuals in its population. Consequently, in particular in the early stages of the optimization process, each iteration takes significantly fewer fitness evaluations.
6.3 Inefficient Population Sizes
When the population size is small, we do not have the result that points on the front cannot be lost (Lemmas 1 and 7) and the proof of Theorem 12 shows that indeed we can easily lose points on the front, leading to a runtime at least exponential in $n$ when $N=n+1$ . In this subsection, we analyze this phenomenon experimentally. As discussed earlier, we first concentrate on the NSGA-II with two-permutation tournament selection and standard bit-wise mutation.
Since it is hard to show an exponential runtime experimentally, we do not run the algorithm until it found the full Pareto front (this would be possible only for very small problem sizes), but we conduct a slightly different experiment for reasonable problem sizes which also strongly indicates that the NSGA-II has enormous difficulties in finding the full front. We ran the NSGA-II for $3000$ generations for OneMinMax and $5000$ generations for LeadingOnesTrailingZeroes and measured for each generation the ratio by which the Pareto front is covered. This data is displayed in Figure 2. We see clearly that the coverage of the Pareto front steeply increases at first, but then stagnates at a constant fraction clearly below one (around $80$ % for OneMinMax and between 50% and 60% for LeadingOnesTrailingZeroes) and this in a very concentrated manner. From this data, there is no indication that the Pareto front will be covered anytime soon.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Line Chart: Ratios of the current Pareto front size for solving OneMinMax
### Overview
The chart illustrates the convergence behavior of Pareto front size ratios across different population sizes (n=100, 200, 300, 400) over 3000 generations. All lines exhibit rapid initial growth followed by stabilization near 0.8 ratio values.
### Components/Axes
- **X-axis (Generations)**: Linear scale from 0 to 3000 in increments of 500
- **Y-axis (Ratios)**: Logarithmic scale from 0.1 to 0.9 in increments of 0.1
- **Legend**: Positioned in top-right quadrant with four entries:
- Black squares: n=100
- Orange diamonds: n=200
- Green triangles: n=300
- Red crosses: n=400
- **Error bars**: Present on all data points, indicating measurement uncertainty
### Detailed Analysis
1. **n=100 (Black squares)**:
- Initial ratio: ~0.1 at 0 generations
- Rapid ascent to 0.8 by ~500 generations
- Plateau maintained with minor fluctuations (±0.02)
- Error bars: ~±0.01
2. **n=200 (Orange diamonds)**:
- Initial ratio: ~0.1 at 0 generations
- Slightly slower ascent (reaches 0.8 by ~750 generations)
- Stable plateau with similar error margins
3. **n=300 (Green triangles)**:
- Initial ratio: ~0.1 at 0 generations
- Gradual increase (reaches 0.8 by ~1000 generations)
- Consistent plateau with error bars matching other series
4. **n=400 (Red crosses)**:
- Initial ratio: ~0.1 at 0 generations
- Slowest ascent (reaches 0.8 by ~1250 generations)
- Maintains stable plateau with comparable error margins
### Key Observations
- All population sizes converge to similar final ratios (~0.8)
- Larger populations (n=400) require more generations to reach convergence
- Error margins remain consistently small (±0.01-0.02) across all series
- No significant outliers or anomalies detected
### Interpretation
The data demonstrates that while larger population sizes (n) require more generations to reach Pareto front convergence, they ultimately achieve similar stabilization ratios. This suggests:
1. **Population efficiency tradeoff**: Smaller populations converge faster but may require more generations for larger problems
2. **Algorithmic stability**: The plateau region indicates algorithmic stability once convergence is achieved
3. **Diminishing returns**: Increasing population size beyond a certain point (n=300-400) yields minimal improvements in final ratio
The consistent error margins across all series suggest reliable measurement methodology. The logarithmic y-axis emphasizes early-stage growth differences while compressing the plateau region, highlighting the algorithm's convergence characteristics.
</details>
(a)
<details>
<summary>x4.png Details</summary>

### Visual Description
## Line Graph: Ratios of the current Pareto front size for solving LOTZ
### Overview
The graph depicts the evolution of Pareto front size ratios across generations for solving the LOTZ problem. Four data series represent different population sizes (`n=30`, `n=60`, `n=90`, `n=120`), showing how the ratio stabilizes over time. All lines exhibit a rapid initial increase followed by a plateau phase.
### Components/Axes
- **X-axis (Generations)**: Ranges from 0 to 5000 in increments of 1000.
- **Y-axis (Ratios)**: Scaled from 0 to 0.7 in increments of 0.1.
- **Legend**: Located in the bottom-right corner, mapping colors/markers to population sizes:
- Black triangles (`▲`): `n=30`
- Orange squares (`■`): `n=60`
- Green diamonds (`◇`): `n=90`
- Red crosses (`✖`): `n=120`
- **Error Bars**: Vertical error bars are present for all data points, indicating variability in measurements.
### Detailed Analysis
1. **Line Trends**:
- All lines start at 0 and rise sharply within the first 500 generations.
- After ~500 generations, all lines plateau at ratios between **0.55–0.6**, with minor fluctuations.
- **n=30** (black) and **n=60** (orange) reach the highest plateau values (~0.6).
- **n=90** (green) and **n=120** (red) plateau slightly lower (~0.55–0.57).
- Error bars are consistent across all lines, with no clear pattern in variability.
2. **Key Observations**:
- Higher population sizes (`n=90`, `n=120`) achieve lower plateau ratios compared to smaller populations (`n=30`, `n=60`).
- The initial steep increase is nearly identical across all `n` values.
- No line shows a decline after the plateau phase; all stabilize within the first 1000 generations.
- Error bars suggest measurement uncertainty but do not correlate with specific `n` values.
### Interpretation
The data suggests that increasing the population size (`n`) in solving LOTZ results in a marginally smaller Pareto front size ratio after convergence. This could imply that larger populations may explore the solution space more efficiently, leading to a tighter Pareto front. However, the plateau values for all `n` are relatively close (~0.55–0.6), indicating that population size has a limited impact on the final ratio. The consistent error bars across all lines suggest that variability in measurements is independent of population size. The rapid convergence within 1000 generations highlights the efficiency of the algorithm for this problem.
</details>
(b)
Figure 2: Ratio of the coverage of the Pareto front by the current population of the NSGA-II (binary tournament selection, standard bit-wise mutation) with population size $N=n+1$ for solving OneMinMax (2a) and LeadingOnesTrailingZeroes (2b). Displayed are the median (with $1$ st and $3$ rd quartiles) in 20 independent runs.
We said in Section 5 that we were optimistic that our negative result for small population size would also hold for all other variants of the NSGA-II. To experimentally support this claim, we now run all variants of the NSGA-II discussed in this work on OneMinMax with problem size $n=200$ , $20$ times for $3000$ iterations. In Figure 3, we see the ratios of the coverage of the Pareto front by the populations in the $20$ runs and in iterations $[2001..3000]$ (that is, we regard together $20*1000$ populations). We see that all variants fail to cover a constant fraction of the Pareto. The precise constant is different for each variant. Most notable, we observe that the variants using standard bit-wise mutation cover the Pareto front to a lesser extent than those building on one-bit mutation. We do not have a clear explanation for this phenomenon, but we speculate that standard bit-wise mutation is harmed by its constant fraction of mutations that just create a copy of the parent. We would, however, not interpret the results in this figure as a suggestion to prefer one-bit mutation. As shown in [DQ23a], with high probability the NSGA-II using one-bit mutation fails to find the Pareto front of the OneJumpZeroJump benchmark, regardless of the runtime allowed.
<details>
<summary>x5.png Details</summary>

### Visual Description
## Scatter Plot: Ratios of the Pareto front size for solving OneMinMax (n=200)
### Overview
The image is a scatter plot visualizing the Pareto front size ratios for eight algorithms solving the OneMinMax problem with n=200. Each algorithm is represented by a green data point with vertical error bars indicating measurement uncertainty. The y-axis (Ratios) ranges from 0.76 to 0.88, while the x-axis lists algorithms as categorical variables (Aa, Ab, Ba, Bb, Ca, Cb, Da, Db).
### Components/Axes
- **X-axis (Algorithms)**: Categorical labels (Aa, Ab, Ba, Bb, Ca, Cb, Da, Db) positioned at the bottom.
- **Y-axis (Ratios)**: Numerical scale from 0.76 to 0.88, labeled "Ratios" on the left.
- **Data Points**: Green circles with error bars (vertical lines) centered at each algorithm's ratio value.
- **Title**: "Ratios of the Pareto front size for solving OneMinMax (n=200)" at the top center.
### Detailed Analysis
1. **Aa**: Ratio = 0.832 ±0.010 (error bar spans 0.822–0.842).
2. **Ab**: Ratio = 0.785 ±0.015 (error bar spans 0.770–0.800).
3. **Ba**: Ratio = 0.850 ±0.020 (error bar spans 0.830–0.870).
4. **Bb**: Ratio = 0.801 ±0.018 (error bar spans 0.783–0.819).
5. **Ca**: Ratio = 0.855 ±0.015 (error bar spans 0.840–0.870).
6. **Cb**: Ratio = 0.812 ±0.022 (error bar spans 0.790–0.834).
7. **Da**: Ratio = 0.845 ±0.017 (error bar spans 0.828–0.862).
8. **Db**: Ratio = 0.801 ±0.019 (error bar spans 0.782–0.820).
### Key Observations
- **Highest Ratio**: Ca (0.855 ±0.015) achieves the largest Pareto front size ratio.
- **Lowest Ratio**: Ab (0.785 ±0.015) performs worst.
- **Error Bar Variability**: Ca has the smallest error (±0.015), suggesting higher precision. Cb has the largest error (±0.022), indicating greater uncertainty.
- **Trends**: Algorithms starting with "C" (Ca, Cb) show higher ratios than others. "A" and "B" algorithms cluster around lower ratios, while "D" algorithms (Da, Db) fall in the mid-range.
### Interpretation
The data suggests algorithmic performance varies significantly in solving OneMinMax. Ca outperforms all others, while Ab underperforms. The error bars highlight measurement precision, with Ca's tight confidence interval implying consistent results. The grouping of "C" algorithms at higher ratios may indicate a design advantage, though further context (e.g., algorithmic structure) is needed to confirm causality. The plot emphasizes the importance of Pareto front size as a metric for algorithmic efficiency in multi-objective optimization.
</details>
Figure 3: Ratios of the coverage of the Pareto front by the population of the different NSGA-II variants (using $A$ (selecting each individual as a parent once), $B$ ( $N$ times choosing a parent uniformly at random), $C$ (independent binary tournaments), or $D$ (two-permutation binary tournaments) as the mating selection strategy, and using $a$ (one-bit mutation) or $b$ (standard bit-wise mutation) as the mutation strategy) with population size $N=n+1$ on the OneMinMax with problem size $n=200$ . Displayed are the median (with $1$ st and $3$ rd quartiles) in 20 independent runs and $[2001..3000]$ generations.
6.4 Optimization With Small Population Sizes
In the previous subsection, we showed that the NSGA-II with population size equal to the size of the Pareto front cannot cover the full Pareto front in a reasonable time. On the positive side, however, still a large fraction of the Pareto front was covered, e.g., around 80% for the OneMinMax problem. This could indicate that the NSGA-II also with smaller population sizes is an interesting algorithm. This is what we briefly discuss now. We shall not explore this question in full detail, but only to the extent that we observe a good indication that the NSGA-II performs well also with small population sizes. We note that the subsequent work [ZD22a] took up this research question and discussed it in detail.
To understand how well the NSGA-II performs with small population size $n+1$ , we first regard how fast its population spreads out on the Pareto front. From the data in Figure 4, we see that also with this small population size, the NSGA-II quickly finds the two extremal points $(0,n)$ and $(n,0)$ of the Pareto front. This fits our understanding of the algorithms. Since the two outer-most individuals in the population have infinite crowding distance and since there are at most four individuals with infinite crowding distance, these individuals will never be lost, even if the population size is relatively small.
More interesting is the question how evenly the population is distributed on the Pareto front once the two extremal points are found. To this aim, we display in Figure 5 the function values of the populations after a moderate runtime in a run of the NSGA-II. In all eight datasets, the complete Pareto front was not found (as expected). However, the plots also show that in all cases, the front is well approximated by the population. Also, we note that the population contains only individuals on the Pareto front (which is trivially satisfied for OneMinMax, but not so for LeadingOnesTrailingZeroes). We note that the data from two individual runs displayed in the figure is representative. In all runs we never encountered an interval of uncovered points of length longer than $6$ and $4$ respectively.
<details>
<summary>x6.png Details</summary>

### Visual Description
## Box Plot: NSGA-II with N=n+1 on OneMinMax
### Overview
The image displays a box plot comparing the number of generations required for the NSGA-II algorithm (with N=n+1) to reach both (0,n) and (n,0) on the OneMinMax problem. The x-axis represents the parameter `n` (100, 200, 300, 400), and the y-axis represents generations (0–2200). Two box plots are shown per `n` value: one in blue (main distribution) and one in red (median), with red "+" symbols indicating outliers.
---
### Components/Axes
- **Title**: "NSGA-II with N=n+1 on OneMinMax" (centered at the top).
- **X-axis**: Labeled "n" with discrete values: 100, 200, 300, 400 (bottom axis).
- **Y-axis**: Labeled "Generations to reach both (0,n) and (n,0)" (left axis, range 0–2200).
- **Legend**: Implied via color coding:
- **Blue**: Box plots (interquartile range and whiskers).
- **Red**: Median lines and outliers ("+" symbols).
- **Spatial Grounding**:
- Title: Centered at the top.
- X-axis labels: Bottom, centered below each box.
- Y-axis labels: Left, vertical.
- Box plots: Positioned above each x-axis label (100, 200, 300, 400).
- Outliers: Red "+" symbols above/below boxes.
---
### Detailed Analysis
1. **n = 100**:
- Median (red line): ~350 generations.
- Interquartile range (IQR): ~300–400 generations.
- Outlier: ~550 generations (red "+").
2. **n = 200**:
- Median: ~650 generations.
- IQR: ~500–800 generations.
- No visible outliers.
3. **n = 300**:
- Median: ~1250 generations.
- IQR: ~1000–1500 generations.
- Outlier: ~1800 generations (red "+").
4. **n = 400**:
- Median: ~1550 generations.
- IQR: ~1200–1800 generations.
- Outlier: ~2200 generations (red "+").
---
### Key Observations
1. **Trend**: The median generations increase monotonically with `n` (350 → 650 → 1250 → 1550).
2. **Spread**: The IQR widens as `n` increases, indicating greater variability in performance for larger `n`.
3. **Outliers**: Outliers appear only for `n = 100`, `n = 300`, and `n = 400`, suggesting rare but extreme cases where the algorithm took significantly longer.
4. **Color Consistency**: Red lines (medians) and "+" (outliers) align with the implied legend. Blue boxes match the interquartile ranges.
---
### Interpretation
The data demonstrates that the NSGA-II algorithm with N=n+1 requires more generations to solve larger instances of the OneMinMax problem. The median generations scale linearly with `n`, while the increasing IQR suggests diminishing consistency in performance for larger `n`. Outliers at higher `n` values highlight potential instability or suboptimal convergence in specific runs. This trend implies that the algorithm’s efficiency degrades with problem size, necessitating further optimization or parameter tuning for scalability.
</details>
(a)
<details>
<summary>x7.png Details</summary>

### Visual Description
## Box Plot: NSGA-II with N=n+1 on LOTZ
### Overview
The image is a box plot visualizing the distribution of generations required for the NSGA-II optimization algorithm (with N=n+1) to reach both (0,n) and (n,0) on the LOTZ problem. The x-axis represents the parameter `n` (30, 60, 90, 120), and the y-axis represents the number of generations (0 to 5000). Red plus signs denote outliers, while blue boxes with red medians represent the interquartile ranges.
### Components/Axes
- **Title**: "NSGA-II with N=n+1 on LOTZ" (centered at the top).
- **X-axis**: Labeled "n" with categories 30, 60, 90, 120 (bottom axis).
- **Y-axis**: Labeled "Generations to reach both (0,n) and (n,0)" (left axis), scaled from 0 to 5000.
- **Legend**: Located in the top-right corner, with:
- **Blue**: Median (red line inside boxes).
- **Red**: Outliers (red plus signs).
### Detailed Analysis
- **n=30**:
- Median: ~200 generations (red line).
- IQR: ~100–300 generations.
- Outliers: ~300 and ~400 generations (red plus signs).
- **n=60**:
- Median: ~900 generations.
- IQR: ~700–1100 generations.
- Outliers: ~1200 and ~1500 generations.
- **n=90**:
- Median: ~2000 generations.
- IQR: ~1600–2400 generations.
- Outliers: ~2500 and ~3000 generations.
- **n=120**:
- Median: ~3200 generations.
- IQR: ~2600–3800 generations.
- Outliers: ~3800 and ~4800 generations.
### Key Observations
1. **Trend**: As `n` increases, the median number of generations required grows exponentially (e.g., from ~200 for n=30 to ~3200 for n=120).
2. **Spread**: The interquartile range (IQR) widens significantly with larger `n`, indicating greater variability in performance.
3. **Outliers**: Outliers become more frequent and extreme at higher `n` values, suggesting rare but extreme cases where the algorithm takes much longer to converge.
### Interpretation
The data demonstrates that the NSGA-II algorithm's performance degrades as the problem size (`n`) increases. The exponential rise in median generations aligns with the computational complexity of optimization problems, where larger search spaces require more iterations. The increasing IQR and outliers highlight the algorithm's sensitivity to problem configuration, with some instances (e.g., n=120) requiring nearly 5000 generations. This suggests that while NSGA-II is effective for smaller `n`, its efficiency diminishes for larger-scale problems, potentially necessitating hybrid approaches or parameter tuning for practical applications.
</details>
(b)
Figure 4: First generation when both extreme function values $(0,n)$ and $(n,0)$ were contained in the population of the NSGA-II (binary tournament selection, standard bit-wise mutation, population size $N=n+1$ ) for OneMinMax (4a) and LeadingOnesTrailingZeroes (4b).
<details>
<summary>x8.png Details</summary>

### Visual Description
## Line Chart: NSGA-II with N=n+1 on OneMinMax
### Overview
The chart visualizes the Pareto front generated by the NSGA-II (Non-dominated Sorting Genetic Algorithm II) algorithm applied to the OneMinMax multi-objective optimization problem. Four linear trade-off curves are plotted, representing different solutions in the objective space defined by two conflicting objectives, *f₁* and *f₂*. The axes range from 0 to 400, with *f₁* on the x-axis and *f₂* on the y-axis.
### Components/Axes
- **X-axis (f₁)**: Labeled "f₁", scaled from 0 to 400 in increments of 100.
- **Y-axis (f₂)**: Labeled "f₂", scaled from 0 to 400 in increments of 50.
- **Legend**: Located on the right side of the chart, associating four colors with distinct lines:
- **Red**: Topmost line (highest *f₂* values).
- **Green**: Second line (intermediate *f₂* values).
- **Yellow**: Third line (lower *f₂* values).
- **Blue**: Bottommost line (lowest *f₂* values).
- **Lines**: Four straight, diagonal lines spanning the plot, each representing a Pareto-optimal solution.
### Detailed Analysis
1. **Line Trends**:
- All lines exhibit a consistent negative slope, indicating an inverse relationship between *f₁* and *f₂* (as one objective improves, the other worsens).
- **Red Line**: Starts at (0, 400) and ends at (400, 0). Equation: *f₂ = -f₁ + 400*.
- **Green Line**: Starts at (0, 300) and ends at (300, 0). Equation: *f₂ = -f₁ + 300*.
- **Yellow Line**: Starts at (0, 200) and ends at (200, 0). Equation: *f₂ = -f₁ + 200*.
- **Blue Line**: Starts at (0, 100) and ends at (100, 0). Equation: *f₂ = -f₁ + 100*.
2. **Spatial Grounding**:
- The legend is positioned on the right, with colors matching the lines precisely.
- Lines are ordered vertically from top (red) to bottom (blue), reflecting decreasing *f₂* values for a given *f₁*.
3. **Data Points**:
- No discrete data points are marked; the lines are continuous and represent theoretical trade-offs.
### Key Observations
- **Linear Trade-offs**: All solutions lie on straight lines, suggesting a linear relationship between the two objectives in the OneMinMax problem.
- **Pareto Dominance**: The red line dominates all others (no solution improves *f₂* without worsening *f₁*), while the blue line is the most efficient (lowest *f₂* for any *f₁*).
- **Spacing**: The vertical spacing between lines increases as *f₁* decreases, indicating diminishing returns in optimizing *f₂* at lower *f₁* values.
### Interpretation
The chart demonstrates how NSGA-II balances conflicting objectives in the OneMinMax problem. The linear Pareto front implies that the algorithm efficiently explores the solution space, generating diverse non-dominated solutions. The slope of -1 for all lines suggests equal weighting of the two objectives in this specific problem setup. The absence of curved fronts indicates that the OneMinMax problem may have inherent linearity, or the NSGA-II parameters (e.g., population size *N=n+1*) favor linear approximations. This visualization highlights the algorithm’s ability to identify optimal trade-offs, critical for multi-objective decision-making.
</details>
(a)
<details>
<summary>x9.png Details</summary>

### Visual Description
## Line Chart: NSGA-II with N=n+1 on LOTZ
### Overview
The chart visualizes a Pareto front generated by the NSGA-II evolutionary algorithm with population size N=n+1 applied to the LOTZ problem. It shows four linear trade-off curves between two objective functions, f₁ and f₂, with distinct line styles and colors.
### Components/Axes
- **X-axis (f₁)**: Ranges from 0 to 120 in increments of 20.
- **Y-axis (f₂)**: Ranges from 0 to 120 in increments of 20.
- **Legend**: Located in the top-left corner, associating four line styles with colors:
- Solid red
- Dotted green
- Dashed orange
- Dash-dot black
### Detailed Analysis
1. **Solid Red Line**:
- Connects (0, 120) to (120, 0).
- Equation: f₂ = -f₁ + 120.
- Represents the outermost Pareto-optimal boundary.
2. **Dotted Green Line**:
- Connects (0, 90) to (90, 0).
- Equation: f₂ = -f₁ + 90.
- Parallel to the red line but shifted inward.
3. **Dashed Orange Line**:
- Connects (0, 60) to (60, 0).
- Equation: f₂ = -f₁ + 60.
- Further inward, maintaining parallelism.
4. **Dash-Dot Black Line**:
- Connects (0, 30) to (30, 0).
- Equation: f₂ = -f₁ + 30.
- Innermost curve, closest to the origin.
All lines have a slope of -1, indicating linear trade-offs between f₁ and f₂. The spacing between lines suggests incremental improvements in Pareto dominance.
### Key Observations
- All curves are straight and parallel, implying a linear relationship between objectives in this problem instance.
- The lines are evenly spaced vertically (30 units apart), suggesting uniform exploration of the Pareto front.
- No outliers or non-linear segments observed.
### Interpretation
The chart demonstrates that NSGA-II with N=n+1 efficiently explores the LOTZ problem's Pareto front, generating four distinct non-dominated solutions. The linear trade-offs suggest the problem has separable objectives or constraints that allow for straightforward optimization. The uniform spacing between curves may indicate consistent algorithmic performance across different population sizes or generations. This visualization highlights the algorithm's ability to balance exploration and exploitation in multi-objective optimization.
</details>
(b)
Figure 5: The function values of the population $P_{t}$ for $t=3000$ when optimizing OneMinMax (5a) and for $t=5000$ when optimizing LeadingOnesTrailingZeroes (5b) via the NSGA-II (binary tournament selection, standard bit-wise mutation, population size $N=n+1$ ) in one typical run. Both plots show that this population size is not sufficient to completely cover the Pareto front, but it suffices to approximate very well the front. Different colors are for different problem sizes $n$ , and $n=\{100,200,300,400\}$ for OneMinMax and $n=\{30,60,90,120\}$ for LeadingOnesTrailingZeroes. Also note that the Pareto front is $\{(i,n-i)\mid i∈[0..n]\}$ .
7 Conclusion
In this work, we conducted the first mathematical runtime analysis of the NSGA-II, which is the predominant framework in real-world multi-objective optimization. We proved that with a suitable population size, all variants of the NSGA-II regarded in this work satisfy the same asymptotic runtime guarantees as the previously regarded much simpler SEMO, GSEMO, and $(\mu+1)$ SIBEA when optimizing the two benchmarks OneMinMax and LeadingOnesTrailingZeroes. The choice of the population size is important. We proved an exponential runtime when the population size equals the size of the Pareto front.
On the technical side, this paper shows that mathematical runtime analyses are feasible also for the NSGA-II. We provided a number of arguments to cope with the challenges imposed by this algorithm, in particular, the fact that points in the Pareto front can be lost and the parent selection via binary tournaments based on the rank and crowding distance. We are optimistic that these tools will aid future analyses of the NSGA-II (and in fact, they have already been used several times in subsequent work, see the discussion in the introduction).
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant No. 62306086), Science, Technology and Innovation Commission of Shenzhen Municipality (Grant No. GXWD20220818191018001), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177).
This work was also supported by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH.
References
- [AD11] Anne Auger and Benjamin Doerr, editors. Theory of Randomized Search Heuristics. World Scientific Publishing, 2011.
- [BFN08] Dimo Brockhoff, Tobias Friedrich, and Frank Neumann. Analyzing hypervolume indicator based algorithms. In Parallel Problem Solving from Nature, PPSN 2008, pages 651–660. Springer, 2008.
- [BFQY20] Chao Bian, Chao Feng, Chao Qian, and Yang Yu. An efficient evolutionary algorithm for subset selection with general cost constraints. In Conference on Artificial Intelligence, AAAI 2020, pages 3267–3274. AAAI Press, 2020.
- [BNE07] Nicola Beume, Boris Naujoks, and Michael Emmerich. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181:1653–1669, 2007.
- [BQ22] Chao Bian and Chao Qian. Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection. In Parallel Problem Solving From Nature, PPSN 2022, pages 428–441. Springer, 2022.
- [BQT18] Chao Bian, Chao Qian, and Ke Tang. A general approach to running time analysis of multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2018, pages 1405–1411. IJCAI, 2018.
- [BZLQ23] Chao Bian, Yawen Zhou, Miqing Li, and Chao Qian. Stochastic population update can provably be helpful in multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5513–5521. ijcai.org, 2023.
- [CDH ${}^{+}$ 23] Sacha Cerf, Benjamin Doerr, Benjamin Hebras, Jakob Kahane, and Simon Wietheger. The first proven performance guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a combinatorial optimization problem. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5522–5530. ijcai.org, 2023.
- [COGNS20] Edgar Covantes Osuna, Wanru Gao, Frank Neumann, and Dirk Sudholt. Design and analysis of diversity-based parent selection schemes for speeding up evolutionary multi-objective optimisation. Theoretical Computer Science, 832:123–142, 2020.
- [Cra19] Victoria G. Crawford. An efficient evolutionary algorithm for minimum cost submodular cover. In International Joint Conference on Artificial Intelligence, IJCAI 2019, pages 1227–1233. ijcai.org, 2019.
- [Cra21] Victoria G. Crawford. Faster guarantees of evolutionary algorithms for maximization of monotone submodular functions. In International Joint Conference on Artificial Intelligence, IJCAI 2021, pages 1661–1667. ijcai.org, 2021.
- [DD18] Benjamin Doerr and Carola Doerr. Optimal static and self-adjusting parameter choices for the ${(1+(\lambda,\lambda))}$ genetic algorithm. Algorithmica, 80:1658–1709, 2018.
- [DDN ${}^{+}$ 20] Benjamin Doerr, Carola Doerr, Aneta Neumann, Frank Neumann, and Andrew M. Sutton. Optimization of chance-constrained submodular functions. In Conference on Artificial Intelligence, AAAI 2020, pages 1460–1467. AAAI Press, 2020.
- [Deb] Kalyanmoy Deb’s implementation of the NSGA-II. https://www.egr.msu.edu/~kdeb/codes.shtml.
- [DGN16] Benjamin Doerr, Wanru Gao, and Frank Neumann. Runtime analysis of evolutionary diversity maximization for OneMinMax. In Genetic and Evolutionary Computation Conference, GECCO 2016, pages 557–564. ACM, 2016.
- [DN20] Benjamin Doerr and Frank Neumann, editors. Theory of Evolutionary Computation—Recent Developments in Discrete Optimization. Springer, 2020. Also available at http://www.lix.polytechnique.fr/Labo/Benjamin.Doerr/doerr_neumann_book.html.
- [Doe19] Benjamin Doerr. Analyzing randomized search heuristics via stochastic domination. Theoretical Computer Science, 773:115–137, 2019.
- [Doe20] Benjamin Doerr. Probabilistic tools for the analysis of randomized optimization heuristics. In Benjamin Doerr and Frank Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 1–87. Springer, 2020. Also available at https://arxiv.org/abs/1801.06733.
- [DOSS23a] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. Analysing the robustness of NSGA-II under noise. In Genetic and Evolutionary Computation Conference, GECCO 2023, pages 642–651. ACM, 2023.
- [DOSS23b] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. A proof that using crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation. In Conference on Artificial Intelligence, AAAI 2023, pages 12390–12398. AAAI Press, 2023.
- [DPAM02] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002.
- [DQ23a] Benjamin Doerr and Zhongdi Qu. A first runtime analysis of the NSGA-II on a multimodal problem. Transactions on Evolutionary Computation, 2023. https://doi.org/10.1109/TEVC.2023.3250552.
- [DQ23b] Benjamin Doerr and Zhongdi Qu. From understanding the population dynamics of the NSGA-II to the first proven lower bounds. In Conference on Artificial Intelligence, AAAI 2023, pages 12408–12416. AAAI Press, 2023.
- [DQ23c] Benjamin Doerr and Zhongdi Qu. Runtime analysis for the NSGA-II: provable speed-ups from crossover. In Conference on Artificial Intelligence, AAAI 2023, pages 12399–12407. AAAI Press, 2023.
- [DZ21] Benjamin Doerr and Weijie Zheng. Theoretical analyses of multi-objective evolutionary algorithms on multi-modal objectives. In Conference on Artificial Intelligence, AAAI 2021, pages 12293–12301. AAAI Press, 2021.
- [FHH ${}^{+}$ 10] Tobias Friedrich, Jun He, Nils Hebbinghaus, Frank Neumann, and Carsten Witt. Approximating covering problems by randomized search heuristics using multi-objective models. Evolutionary Computation, 18:617–633, 2010.
- [FN15] Tobias Friedrich and Frank Neumann. Maximizing submodular functions under matroid constraints by evolutionary algorithms. Evolutionary Computation, 23:543–558, 2015.
- [GD90] David E. Goldberg and Kalyanmoy Deb. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms, FOGA 1990, pages 69–93. Morgan Kaufmann, 1990.
- [Gie03] Oliver Giel. Expected runtimes of a simple multi-objective evolutionary algorithm. In Congress on Evolutionary Computation, CEC 2003, pages 1918–1925. IEEE, 2003.
- [GL10] Oliver Giel and Per Kristian Lehre. On the effect of populations in evolutionary multi-objective optimisation. Evolutionary Computation, 18:335–356, 2010.
- [HZ20] Zhengxin Huang and Yuren Zhou. Runtime analysis of somatic contiguous hypermutation operators in MOEA/D framework. In Conference on Artificial Intelligence, AAAI 2020, pages 2359–2366. AAAI Press, 2020.
- [HZCH19] Zhengxin Huang, Yuren Zhou, Zefeng Chen, and Xiaoyu He. Running time analysis of MOEA/D with crossover on discrete optimization problem. In Conference on Artificial Intelligence, AAAI 2019, pages 2296–2303. AAAI Press, 2019.
- [Jan13] Thomas Jansen. Analyzing Evolutionary Algorithms – The Computer Science Perspective. Springer, 2013.
- [KD06] Saku Kukkonen and Kalyanmoy Deb. Improved pruning of non-dominated solutions based on crowding distance for bi-objective optimization problems. In Conference on Evolutionary Computation, CEC 2006, pages 1179–1186. IEEE, 2006.
- [LTZ ${}^{+}$ 02] Marco Laumanns, Lothar Thiele, Eckart Zitzler, Emo Welzl, and Kalyanmoy Deb. Running time analysis of multi-objective evolutionary algorithms on a simple discrete optimization problem. In Parallel Problem Solving from Nature, PPSN 2002, pages 44–53. Springer, 2002.
- [LTZ04] Marco Laumanns, Lothar Thiele, and Eckart Zitzler. Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions. IEEE Transactions on Evolutionary Computation, 8:170–182, 2004.
- [LZZZ16] Yuan-Long Li, Yu-Ren Zhou, Zhi-Hui Zhan, and Jun Zhang. A primary theoretical study on decomposition-based multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 20:563–576, 2016.
- [McD89] Colin McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 48–118. Cambridge Univ. Press, 1989.
- [Neu07] Frank Neumann. Expected runtimes of a simple evolutionary algorithm for the multi-objective minimum spanning tree problem. European Journal of Operational Research, 181:1620–1629, 2007.
- [NRS11] Frank Neumann, Joachim Reichel, and Martin Skutella. Computing minimum cuts by randomized search heuristics. Algorithmica, 59:323–342, 2011.
- [NSN15] Anh Quang Nguyen, Andrew M. Sutton, and Frank Neumann. Population size matters: rigorous runtime results for maximizing the hypervolume indicator. Theoretical Computer Science, 561:24–36, 2015.
- [NT10] Frank Neumann and Madeleine Theile. How crossover speeds up evolutionary algorithms for the multi-criteria all-pairs-shortest-path problem. In Parallel Problem Solving from Nature, PPSN 2010, Part I, pages 667–676. Springer, 2010.
- [NW06] Frank Neumann and Carsten Witt. Runtime analysis of a simple ant colony optimization algorithm. In Algorithms and Computation, ISAAC 2006, pages 618–627. Springer, 2006.
- [NW10] Frank Neumann and Carsten Witt. Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity. Springer, 2010.
- [NW22] Frank Neumann and Carsten Witt. Runtime analysis of single- and multi-objective evolutionary algorithms for chance constrained optimization problems with normally distributed random variables. In International Joint Conference on Artificial Intelligence, IJCAI 2022, pages 4800–4806. ijcai.org, 2022.
- [QBF20] Chao Qian, Chao Bian, and Chao Feng. Subset selection by Pareto optimization with recombination. In Conference on Artificial Intelligence, AAAI 2020, pages 2408–2415. AAAI Press, 2020.
- [QSYT17] Chao Qian, Jing-Cheng Shi, Yang Yu, and Ke Tang. On subset selection with general cost constraints. In International Joint Conference on Artificial Intelligence, IJCAI 2017, pages 2613–2619. ijcai.org, 2017.
- [QYT ${}^{+}$ 19] Chao Qian, Yang Yu, Ke Tang, Xin Yao, and Zhi-Hua Zhou. Maximizing submodular or monotone approximately submodular functions by multi-objective evolutionary algorithms. Artificial Intelligence, 275:279–294, 2019.
- [QYZ13] Chao Qian, Yang Yu, and Zhi-Hua Zhou. An analysis on recombination in multi-objective evolutionary optimization. Artificial Intelligence, 204:99–119, 2013.
- [QYZ15] Chao Qian, Yang Yu, and Zhi-Hua Zhou. On constrained Boolean Pareto optimization. In International Joint Conference on Artificial Intelligence, IJCAI 2015, pages 389–395. AAAI Press, 2015.
- [RNNF19] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich. Pareto optimization for subset selection with dynamic cost constraints. In Conference on Artificial Intelligence, AAAI 2019, pages 2354–2361. AAAI Press, 2019.
- [Rud98] Günter Rudolph. Evolutionary search for minimal elements in partially ordered finite sets. In Evolutionary Programming, EP 1998, pages 345–353. Springer, 1998.
- [WD23] Simon Wietheger and Benjamin Doerr. A mathematical runtime analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III). In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5657–5665. ijcai.org, 2023.
- [ZD22a] Weijie Zheng and Benjamin Doerr. Better approximation guarantees for the NSGA-II by using the current crowding distance. In Genetic and Evolutionary Computation Conference, GECCO 2022, pages 611–619. ACM, 2022.
- [ZD22b] Weijie Zheng and Benjamin Doerr. Runtime analysis for the NSGA-II: proving, quantifying, and explaining the inefficiency for three or more objectives. CoRR, abs/2211.13084, 2022.
- [Zhe] Implementation of the NSGA-II in this paper. https://github.com/zhengwj13/NSGA_II_Clean.
- [ZLD22] Weijie Zheng, Yufei Liu, and Benjamin Doerr. A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In Conference on Artificial Intelligence, AAAI 2022, pages 10408–10416. AAAI Press, 2022.
- [ZQL ${}^{+}$ 11] Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1:32–49, 2011.