2312.05607

Model: nemotron-free

# Closed-Loop Finite-Time Analysis of Suboptimal Online Control **Authors**: Aren Karapetyan, Efe C. Balta, Andrea Iannelli, and John Lygeros > This work has been supported by the Swiss National Science Foundation under NCCR Automation (grant agreement51⁢NF⁢40⁢_⁢22515551NF40_22515551\text{NF}40\_22515551 NF 40 _ 225155) and by the German Research Foundation (DFG) under Germany’s Excellence Strategy - EXC 2075 – 390740016.A. Karapetyan, and J. Lygeros are with the Automatic Control Laboratory, Swiss Federal Institute of Technology (ETH Zürich), 8092 Zürich, Switzerland (E-mails:{akarapetyan, lygeros}@control.ee.ethz.ch).E. C. Balta is with Inspire AG, 8005 Zürich, Switzerland (E-mail:efe.balta@inspire.ch).A. Iannelli is with the Institute for Systems Theory and Automatic Control, University of Stuttgart, Stuttgart 70569, Germany (E-mail:andrea.iannelli@ist.uni-stuttgart.de). Abstract Suboptimal methods in optimal control arise due to a limited computational budget, unknown system dynamics, or a short prediction window among other reasons. In this work, we study the transient closed-loop performance of such methods by providing finite-time suboptimality gap guarantees. We consider the control of discrete-time, nonlinear time-varying dynamical systems and establish sufficient conditions for such guarantees. These allow the control design to distribute a limited computational budget over a time horizon and estimate the on-the-go loss in performance due to suboptimality. We study exponential incremental input-to-state stabilizing policies and show that for nonlinear systems, under some mild conditions, this property is directly implied by exponential stability without further assumptions on global smoothness. The analysis is showcased on a suboptimal model predictive control use case. ©Published in IEEE Transactions on Automatic Control DOI: 10.1109/TAC.2025.3539988 ©2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. {IEEEkeywords} Nonlinear Systems, Optimization Algorithms, Predictive Control 1 Introduction Optimal control aims to compute an input signal to drive a dynamical system to a given target state, while optimizing a performance cost subject to constraints. In the absence of uncertainty, the problem has been studied using calculus of variations [1] and dynamic programming [2]. However, in many practical applications with limited computational power, it becomes difficult or infeasible to solve due to the curse of dimensionality [2]. This is further exacerbated if there are unknown system and/or cost parameters. As a result, control designers rely on approximate or suboptimal methods [3] to solve the problem. If there are adequate computational resources and an accurate simulator of the true system, the problem can be solved up to an arbitrary accuracy using approximate dynamic programming [4] or reinforcement learning [5] techniques. When this is not the case, e.g. the system has unpredictable dynamics or the cost to be optimized for is changing adversarially, offline methods alone are not sufficient. In such cases, the policy is updated online or adaptively as more data becomes available. We consider a controller to be online, rather than offline or pre-fixed if it is capable of adapting to changes in the control problem during its single trajectory execution. For example, a model predictive controller (MPC) can utilize a finite window of predictions for possibly time-varying dynamics, disturbances, references, or costs to generate an input at each step. While such a definition partially overlaps with that of adaptive control [6], it defines a wider class of controllers, that also includes online optimization-based methods, like [7, 8]. <details> <summary>2312.05607v3/x1.png Details</summary> ![cab6a137d6a1f85106c4176b64018fa458a1e86b54b62b38a7bc62eb3a23e856](http://localhost:8000/v1/image/cab6a137d6a1f85106c4176b64018fa458a1e86b54b62b38a7bc62eb3a23e856) ### Visual Description # Technical Document Extraction: Diagram Analysis ## Diagram Overview The image depicts a comparative diagram of two distinct paths (blue and green) connecting sequential points labeled $x_0$ to $x_T$ (blue) and $x_0$ to $x_T^*$ (green). The diagram emphasizes directional flow, labeled transitions, and symbolic differentiation between paths. --- ### **Key Components** 1. **Paths**: - **Blue Path** (solid line): - Connects points: $x_0 \rightarrow x_1 \rightarrow x_2 \rightarrow x_T$. - Arrows labeled: $u_0^\mu$, $u_1^\mu$. - Spatial Position: Upper trajectory. - **Green Path** (dashed line): - Connects points: $x_0 \rightarrow x_1^* \rightarrow x_2^* \rightarrow x_T^*$. - Arrows labeled: $u_0^*$, $u_1^*$. - Spatial Position: Lower trajectory. 2. **Labels**: - **Points**: - Blue: $x_0, x_1, x_2, x_T$. - Green: $x_0, x_1^*, x_2^*, x_T^*$. - **Arrows**: - Blue: $u_0^\mu, u_1^\mu$. - Green: $u_0^*, u_1^*$. 3. **Symbols**: - Star notation ($*$) on green path points ($x_1^*, x_2^*, x_T^*$) suggests alternative or optimized states. - Superscript $\mu$ on blue arrows ($u_0^\mu, u_1^\mu$) may denote a specific mode or parameter (e.g., "mu-mode"). --- ### **Flow and Trends** - **Blue Path**: - Smooth, upward-curving trajectory from $x_0$ to $x_T$. - Arrows indicate sequential progression with $\mu$-labeled transitions. - No intermediate discontinuities (solid line). - **Green Path**: - Dashed line with a more variable trajectory, ending at $x_T^*$. - Arrows labeled with star notation ($u_0^*, u_1^*$) suggest alternative or stochastic transitions. - Starred endpoint ($x_T^*$) implies a divergent or optimized outcome compared to $x_T$. --- ### **Color and Legend** - **Color Coding**: - Blue (solid): Primary or reference path. - Green (dashed): Alternative or comparative path. - **Legend**: Absent in the diagram. Color differentiation is inferred from line style and spatial positioning. --- ### **Spatial Grounding** - **Coordinate Placement**: - Blue path consistently above green path. - Points $x_0$ and $x_T$ (blue) align vertically with $x_0$ and $x_T^*$ (green), suggesting shared origins but divergent endpoints. --- ### **Critical Observations** 1. **Path Divergence**: - Blue path ends at $x_T$; green path ends at $x_T^*$, indicating a potential optimization or alternative outcome. 2. **Transition Labels**: - $\mu$-labeled transitions (blue) vs. star-labeled transitions (green) may represent distinct operational modes or constraints. 3. **Star Symbolism**: - Starred points ($x_1^*, x_2^*, x_T^*$) on the green path could denote critical nodes, milestones, or Pareto-optimal states. --- ### **Conclusion** The diagram illustrates two competing or complementary trajectories between shared initial and terminal states ($x_0$ and $x_T$ vs. $x_T^*$). The blue path represents a standard progression, while the green path highlights an alternative strategy with starred nodes and transitions. The absence of a legend necessitates reliance on color and symbol conventions for interpretation. **Note**: No numerical data or explicit axis titles are present. The diagram focuses on qualitative flow and symbolic differentiation. </details> Figure 1: Two separate closed-loop trajectories, generated by applying a suboptimal input signal $\boldsymbol{u}^{\mu}$ , and a benchmark input signal $\boldsymbol{u^{\star}}$ . The focus of this work is the closed-loop suboptimality analysis of online controllers in the finite-time or transient domain. Given their real-time implementation, online methods need to stay computationally efficient while stabilizing the system. Additionally, their performance is measured in terms of the accumulated cost that needs to be kept to a minimum. To quantify this, we fix a benchmark policy that we may deem to be close to the desired optimal one, visualized in Figure 1, and study the suboptimality gap of the given online algorithm in terms of the additional incurred cost due to its suboptimality with respect to the benchmark. Such an analysis provides a relative measure on the performance of the given algorithm, since, in general, the benchmark policy attains a non-zero cost. In this context, we pose the following questions. 1. How does the transient cost performance of an online algorithm scale with a measure of its suboptimality? 1. How should the benchmark policy be chosen to achieve meaningful finite-time bounds? We consider nonlinear time-varying systems and derive conditions on the suboptimal and benchmark policies such that the suboptimality gap, defined as the difference of their respective closed-loop costs, can be quantified. We show that it scales with the pathlength of the suboptimal trajectory and the rate of convergence of the suboptimal policy to the benchmark. The former can be computed online providing an on-the-go estimate of the suboptimality, and the latter usually depends on suboptimal algorithm parameters, allowing the control designer to tune these accordingly. We study the suboptimal, projected gradient method (PGM)-based linear-quadratic MPC [9] as an example of online control satisfying the proposed assumptions, and bound its suboptimality gap with respect to optimal MPC. Our contributions are summarized below: 1. We show that if the system dynamics in closed-loop with the benchmark policy are exponentially incrementally input-to-state stable (E- $\delta$ -ISS), and the suboptimal policy is linearly converging, then the suboptimality gap scales with the pathlength of the closed-loop suboptimal trajectory, and the convergence rate. 1. We derive sufficient conditions under which exponential stability (ES)of a non-smooth nonlinear time-varying system implies E- $\delta$ -ISS, making the condition on the benchmark policy easier to verify. 1. We study the suboptimal, PGM-based linear-quadratic MPC problem as an example satisfying these assumptions, and bound its suboptimality gap. The E- $\delta$ -ISS assumption on the benchmark policy is crucial for the validity of the results. Incremental input-to-state-stability is studied in [10], in the continuous-time and in [11], in the discrete-time settings, providing a condition on the deviation of two separate trajectories of the same system. If the dynamics are smooth, a sufficient condition for E- $\delta$ -ISS to hold is that of contraction [12, 13, 14, 15]. However, this is often not the case when one considers closed-loop dynamics under optimal controllers, e.g. under a constrained MPC [16] policy. Since we are often interested in such benchmarks, we devote Section 4 to the derivation of sufficient conditions for E- $\delta$ -ISS to hold when the dynamics are non-smooth in general. Such an incremental stability analysis allows for the derivation of asymptotically tight bounds in the sense that they scale with the level of suboptimality, or the rate of convergence of the suboptimal policy to the optimal, converging to zero when the algorithm matches with the benchmark. The bounds also scale with the pathlength of the suboptimal trajectory, allowing an on-the-go calculation of the suboptimality gap that is independent of the benchmark states. Hence, for the suboptimal MPC example we achieve asymptotically tighter bounds under the same assumptions compared to [17]. Moreover, our result is independent of the asymptotic properties of the suboptimal algorithm, providing finite-time performance bounds even when the closed-loop is not exponentially stable. Several notable examples of settings where such finite-time suboptimality analysis can be of use include adaptive control [18, 19, 20], with suboptimality due to unknown system parameters, online feedback optimization [21, 22] and online control [7, 8], with suboptimality due to unknown future costs, or real-time optimization-based control, such as real-time MPC [23, 24, 25], suboptimal due to finite computational resources. While our analysis and results hold for any algorithm satisfying the outlined assumptions, optimization-based methods, including MPC, and their suboptimal variants are of particular interest given their wide applicability in practice. Suboptimal MPC has been studied extensively in the literature [26, 27, 28, 23, 25, 24, 29, 30]. Real-time MPC with iterative optimization methods has been studied in [26, 27, 28, 23, 25, 24]. Efficient warm-start methods are explored in [28, 27] with [27] also showing a lower bound on the number of gradient descent steps needed to achieve an open-loop suboptimality level, but no closed-loop analysis is performed. The closed-loop stability is analyzed in [26] using a robust MPC formulation, and also in later works in [25, 24] that extend the real-time MPC work of [23] and show asymptotic stability using Lyapunov methods and the small gain theorem of interconnected systems. Other sources of suboptimality for MPC have been studied in [29], analyzing the effect of the prediction horizon on the suboptimality of the MPC closed-loop cost, or in [30] in the context of distributed control, to name a few. However, in this line of work the suboptimality gap as we define here is not considered. This is a common performance metric in online learning, where it is referred to as regret. While there are several works [7, 31, 32] studying the regret of various model predictive controllers, their settings are different from ours. For instance, [31] considers Bayesian MPC, [7] studies the effect of predictions for linear time-invariant systems, and [32] studies optimal controllers with inexact predictions using perturbation analysis, thus leading to settings that are not considered under the assumptions in this paper. The earlier work [17] also analyzes the suboptimality of PGM-based MPC but does not utilize nonlinear incremental stability analysis achieving a weaker bound that is not asymptotically tight. The article is structured as follows. In Section 2 we provide the preliminaries and the problem setup. In Section 3, we conduct the suboptimality gap analysis. Sufficient conditions for E- $\delta$ -ISS are derived in Section 4, and in Section 5, the suboptimal PGM-based linear MPC use case is studied with a numerical example. Notation: The sets of positive real numbers, positive integers, and non-negative integers are denoted by $\mathbb{R}_{+}$ , $\mathbb{N}_{+}$ and $\mathbb{N}$ , respectively. For some $k_{0}∈\mathbb{N}$ , the set of integers greater than or equal to $k_{0}$ is denoted by $\mathbb{N}_{≥ k_{0}}$ . For a given vector $x$ , its Euclidean norm is denoted by $\|x\|$ , and the two-norm weighted by a $Q\succ 0$ by $\|x\|_{Q}=\sqrt{x^{→p}Qx}$ . For a square matrix $W$ , the spectral radius and the spectral norm are denoted by $\rho(W)$ , and $\|W\|$ , respectively. Given a symmetric, positive definite matrix $M\succ 0$ , we define its unique square-root by $M^{\frac{1}{2}}$ , such that $M=M^{\frac{1}{2}}M^{\frac{1}{2}}$ and $M^{\frac{1}{2}}\succ 0$ . For a matrix $W$ and $M\succ 0$ , $\lambda_{M}^{-}(W)$ and $\lambda_{M}^{+}(W)$ denote the minimum and maximum eigenvalues of ${M}^{-\frac{1}{2}}W{M}^{-\frac{1}{2}}$ ; for any vector $x$ , they satisfy $\lambda_{M}^{-}(W)\|x\|_{M}^{2}≤\|x\|_{W}^{2}≤\lambda_{M}^{+}(W)\|x\|_{M% }^{2}$ . The Euclidean point-to-set distance of a vector $x$ from a nonempty, closed, convex set $\mathcal{A}$ is denoted by $|x|_{\mathcal{A}}:=\min_{y∈\mathcal{A}}\|x-y\|$ , and the projection onto it by $\Pi_{\mathcal{A}}[x]=\operatorname*{arg\,min}_{y∈\mathcal{A}}\|x-y\|$ . The $n-$ dimensional closed ball of radius $r>0$ , centered at the origin is denoted by $\mathcal{B}(r):=\{x∈\mathbb{R}^{n}\ |\ \|x\|≤ r\}$ . 2 Preliminaries and Problem Setup We consider the optimal control problem for discrete-time, nonlinear time-varying systems of the form $$ x_{k+1}=f_{0}(k,x_{k})+g(k,x_{k},u_{k}),\quad k\in\mathbb{N}_{\geq k_{0}}, \tag{1} $$ where $x_{k}∈\mathbb{R}^{n}$ and $u_{k}∈\mathbb{R}^{m}$ denote the state and control input at time $k$ , respectively, $f_{0}:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}→\mathbb{R}^{n}$ denotes the unforced nominal dynamics and $g:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}×\mathbb{R}^{m}→% \mathbb{R}^{n}$ the controlled dynamics. The system evolution starts from some initial state $x_{k_{0}}∈\mathbb{R}^{n}$ at some initial time $k_{0}∈\mathbb{N}$ . The optimal control objective is to find the sequence of control inputs $\boldsymbol{u}=[u_{k_{0}}^{→p}... u_{T-1}^{→p}]^{→p}$ that minimizes the finite-time cost $$ J_{T}(x_{k_{0}},\boldsymbol{u})=F\left(T,x_{T}\right)+\sum_{k=k_{0}}^{T-1}c(k,% x_{k},u_{k}), \tag{2} $$ where $c:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}×\mathbb{R}^{m}% \longrightarrow\mathbb{R}$ is the stage cost at time $k$ , $F:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}\longrightarrow\mathbb{R}$ is the terminal cost, and $T∈\mathbb{N}_{≥{k_{0}+1}}$ is the control horizon. In addition, the control input has to satisfy $u_{k}∈\mathcal{U}$ for all $k$ , for some bounded $\mathcal{U}⊂\mathbb{R}^{m}$ . An admissable policy $\pi(k,x):\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}→\mathcal{U}$ maps the state at time $k$ to a control input, generating the control signal $\boldsymbol{u}^{\pi}=[u_{k_{0}}^{\pi→p}... u_{T-1}^{\pi→p}]^{→p}$ and the associated trajectory $\boldsymbol{x}^{\pi}=[x_{k_{0}}^{\pi→p}... x_{T}^{\pi→p}]^{→p}$ . With a slight abuse of notation, its associated cost is denoted by $J_{T}(x_{k_{0}},\pi)$ . We consider time-varying systems for generality, and note that the analysis holds directly for time-invariant systems and policies as a special case. We showcase this in Section 5. We are interested in the relation of a policy $\mu$ , corresponding to a given suboptimal algorithm, with respect to another benchmark policy $\mu^{*}$ that is equipped with desirable characteristics, e.g. optimality. The two policies are defined as follows. Benchmark dynamics: Consider a benchmark policy $\mu^{\star}:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}→\mathcal{U}$ . Given an initial state $x_{k_{0}}^{\star}∈\mathbb{R}^{n}$ , the benchmark dynamics are given by For readability, we place the time $k$ in the subscripts of $\mu$ , $c$ , and $F$ . $$ x^{\star}_{k+1}=f_{0}(k,x_{k}^{\star})+g(k,x_{k}^{\star},\mu_{k}^{\star}(x_{k}% ^{\star})):=f(k,x^{\star}_{k}), \tag{3} $$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . We assume that there exists a set $\mathcal{D}^{\star}⊂eq\mathbb{R}^{n}$ that is forward invariant under the closed-loop dynamics (3), and restrict attention to $x_{k_{0}}^{\star}∈\mathcal{D}^{\star}$ . Hence, $x_{k}^{\star}∈\mathcal{D}^{\star}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . Suboptimal dynamics: The suboptimal state evolution for a given policy $\mu:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}→\mathcal{U}$ can be represented in the following form We drop the explicit reference to $\mu$ from the superscript of $x$ for readability. for any $x_{k_{0}}∈\mathbb{R}^{n}$ $$ \begin{split}x_{k+1}=f(k,x_{k})+\underbrace{g(k,x_{k},\mu_{k}(x_{k}))-g(k,x_{k% },\mu_{k}^{\star}(x_{k}))}_{:=w_{k}(x(k))},\end{split} \tag{4} $$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . The mapping $w:\mathbb{N}_{≥ k_{0}}×\mathbb{R}^{n}→\mathbb{R}^{n}$ can be thought of as a state-dependent disturbance acting on the benchmark state dynamics (3), introduced due to suboptimality. It is assumed to be such that the closed-loop suboptimal dynamics (4) evolve within a set $\mathcal{\mathcal{D}^{\mu}}⊂eq\mathcal{D}^{\star}$ that is forward invariant under (4). Restricting attention to initial states $x_{k_{0}}∈\mathcal{D}^{\mu}$ , it then holds that $x_{k}∈\mathcal{D}^{\mu}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . Figure 1 shows the pictorial evolution of the two considered trajectories starting from the same initial state $x_{0}$ at time $k_{0}=0$ . For each $x^{\star}_{k}$ , $u^{\star}_{k}$ denotes the control input generated by $\mu_{k}^{\star}(x_{k}^{\star})$ and for each $x_{k}$ , $u_{k}^{\mu}:=\mu_{k}(x_{k})$ the input generated by the suboptimal policy. To quantify the relation between $\mu$ and $\mu^{\star}$ , we define the suboptimality gap of the policy $\mu$ as the additional incurred cost compared to the benchmark $$ \mathcal{R}_{T}^{\mu}(x_{k_{0}}):=J_{T}(x_{k_{0}},{\mu})-J_{T}(x_{k_{0}},\mu^{% \star}), \tag{5} $$ given some $x_{k_{0}}∈\mathcal{D}^{\mu}$ . An informative finite-time bound on it, which depends on fundamental quantities of the online control problem, can provide a quantifiable tradeoff between the effort needed to compute the suboptimal policy $\mu$ and the additional cost incurred by using it instead of $\mu^{\star}$ . We assume the benchmark policy $\mu^{\star}$ to have good performance, since otherwise, $\mathcal{R}_{T}^{\mu}$ can be uninformative. We characterize this performance in terms of E- $\delta$ -ISS. **Definition 1** *A nonlinear dynamical system $x_{k+1}=f(k,x_{k})$ is E- $\delta$ -ISS in some forward invariant $\mathcal{D}⊂eq\mathbb{R}^{n}$ , if there exist constants $c_{0},c_{w},r_{w}∈\mathbb{R}_{+}$ and $\rho∈(0,1)$ , such that for any $(x_{k_{0}},y_{k_{0}})∈\mathcal{D}×\mathcal{D}$ , and $w_{k}∈\mathcal{B}(r_{w}),k∈\mathbb{N}_{≥ k_{0}}$ , the perturbed dynamics $y_{k+1}=f(k,y_{k})+w_{k}$ satisfy $$ \|x_{k}-y_{k}\|\leq c_{0}\rho^{{k-k_{0}}}\|x_{k_{0}}-y_{k_{0}}\|+c_{w}\sum_{i=% k_{0}}^{k-1}\rho^{{k-i-1}}\|w_{i}\|, $$ for all $k∈\mathbb{N}_{≥ k_{0}}$ , where the disturbances $w_{k}$ are such that $y_{k}∈\mathcal{D}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . If $\mathcal{D}=\mathbb{R}^{n}$ the system is called globally E- $\delta$ -ISS.* E- $\delta$ -ISS for continuous-time systems has been introduced in [10]. For an in-depth discussion and analysis of incremental stability in discrete-time, and its relation to contraction [15] and convergent dynamics [33], we refer the interested reader to [11], and the references therein. We assume the following for the benchmark policy. **Assumption 1** *(Benchmark Policy). Given the closed-loop system (3), the benchmark policy $\mu^{*}$ is such that 1. it is uniformly $L-$ Lipschitz continuous in $\mathcal{D}^{\star}$ , i.e. there exists a constant $L∈\mathbb{R}_{+}$ , such that for all $(x,y)∈\mathcal{D}^{\star}×\mathcal{D}^{\star}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|\mu^{\star}_{k}(x)-\mu^{\star}_{k}(y)\|\leq L\|x-y\|, $$ 1. there exist $a_{k}≥ 0,k∈\mathbb{N}_{≥ k_{0}}$ , such that for all $x∈\mathbb{\mathcal{D}^{\star}}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|\mu_{k+1}^{\star}(x)-\mu_{k}^{\star}(x)\|\leq a_{k}, $$ 1. the closed-loop dynamics (3) are E- $\delta$ -ISS in $\mathcal{D}^{\star}$ with a rate $\rho∈(0,1)$ .* Although the Lipschitz continuity condition excludes benchmark policies with abrupt changes, such as discontinuities or jumps in the policy, it still holds for a relevant class of policies. In particular, when $\mu^{\star}$ is defined as a solution to an optimization problem, as is, for example, the MPC policy, then Lipschitz continuity holds [34] if the optimal solution mapping is strongly regular [35]. This can be verified for a range of problems by checking for strong second order sufficient conditions (SOSC), which is well studied in the literature, e.g. [36, 25]. Additionally, when state constraints are present, these can be relaxed by introducing soft constraints [37, 38], making the SOSC conditions easier to verify. Furthermore, we stress that the benchmark policy is not necessarily the optimal policy but any comparator policy as long as the conditions in Assumption 1 are satisfied. The second assumption limits how fast the benchmark policy changes given the same state between two timesteps. Since $\mathcal{U}$ is bounded, such an $a_{k}$ always exists, for all $k$ , and can be set equal to the diameter of $\mathcal{U}$ . However, it can also encode additional information, such as stationarity of the benchmark policy, in which case $a_{k}=0$ for all $k$ . The E- $\delta$ -ISS condition, on the other hand, is often harder to verify, unless one assumes the existence of an incremental Lyapunov equation [10, 11], or uses contraction to infer incremental stability [13], which in turn assumes a smooth policy. Since a large class of benchmark policies, including MPC, are non-smooth globally, we show in Section 4, that under further mild conditions on $f$ , ES is enough to guarantee E- $\delta$ -ISS. Next, we impose a linear convergence condition on the suboptimal policy $\mu$ . **Assumption 2** *(Suboptimal Policy). Let $u_{k}^{\mu}=\mu_{k}(x_{k})$ denote the suboptimal input evaluated on the suboptimal trajectory (4). There exist $\eta_{k}∈[0,1),k∈\mathbb{N}_{≥ k_{0}}$ and some $u^{\mu}_{k_{0}-1}=\nu∈\mathcal{U}$ , such that for all $x_{k_{0}}∈\mathcal{D}^{\mu}$ , $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|u^{\mu}_{k}-\mu^{\star}_{k}(x_{k})\|\leq\eta_{k}\|u^{\mu}_{k-1}-\mu^{\star}_% {k}(x_{k})\|. \tag{6} $$* The assumption imposes at least a linear rate of convergence on the suboptimal input with respect to the benchmark given the suboptimal state. In some cases, the rate $\eta_{k}$ can be thought of as a design parameter that can be tuned to control the desired level of suboptimality depending on the available computational budget. Notably, such rates appear in iterative optimization algorithms, such as PGM [39] or alternating direction method of multipliers (ADMM) [40], where the benchmark input is optimal with respect to some objective function, and the suboptimal input is obtained by the corresponding optimization method. When such methods are applied to optimization-based control, warm-started with the previous input, the exact linear convergence form of (6) appears in [24, 25, 9] for PGM and in [41] for ADMM. In Section 5, we consider the PGM-based MPC setting of [9] and show how both assumptions are verified. We restrict our attention to systems where the controlled dynamics $g$ are Lipschitz continuous with respect to $u$ , uniformly in $x$ and $k$ . **Assumption 3** *There exists a constant $L_{u}∈\mathbb{R}_{+}$ , such that for any $(u,v)∈\mathbb{R}^{m}×\mathbb{R}^{m}$ , for all $x∈\mathcal{D}^{\mu}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|g(k,x,u)-g(k,x,v)\|\leq L_{u}\|u-v\|. $$* This is satisfied, for instance, in linear time-invariant systems or control-affine nonlinear systems of the form $x_{k+1}=f(x_{k})+g(x_{k})u_{k}$ , often studied in the context of feedback linearization (see [42, 43] for details). Finally, we restrict our analysis to local Lipschitz continuous stage costs. **Assumption 4** *There exist constants $M_{x},M_{u}∈\mathbb{R}_{+}$ , such that for all $(x,y)∈\mathcal{D}^{\mu}×\mathcal{D}^{\mu}$ , $(u,z)∈\mathcal{U}×\mathcal{U}$ and $k∈\mathbb{N}_{≥ k_{0}}$ | | $\displaystyle\|c_{k}(x,u)-c_{k}(y,z)\|$ | $\displaystyle≤ M_{x}\|x-y\|+M_{u}\|u-z\|,$ | | | --- | --- | --- | --- |* 3 Suboptimality Gap Analysis In this section, we analyze the suboptimality gap for a given policy and show that $\mathcal{R}_{T}$ scales with the product of the pathlength of the suboptimal dynamics and a vector dependent on the convergence rates. In the analysis we take $k_{0}=0$ without loss of generality. We define the backward difference path vector, ${\Delta}∈\mathbb{R}^{T-1}$ , to be $$ {\Delta}:=\begin{bmatrix}\|\delta x_{1}\|&\|\delta x_{2}\|&\ldots&\|\delta x_{% T-1}\|\end{bmatrix}^{\top}, $$ where $\delta x_{k}=x_{k}-x_{k-1},\;k∈\mathbb{N}_{+}$ , where $x_{k}$ is the state at time $k$ for the suboptimal dynamics (4). The pathlength of the suboptimal trajectory is then defined as $\mathcal{S}_{T}=\|{\Delta}\|_{1}$ and the Euclidean pathlength as $\mathcal{S}_{T,2}:=\|{\Delta}\|$ . The policy convergence rate vector, ${\tilde{\eta}}∈\mathbb{R}^{T-1}$ is defined as $$ {\tilde{\eta}}:=\begin{bmatrix}\tilde{\eta}_{1}&\tilde{\eta}_{2}&\ldots&\tilde% {\eta}_{T-1}\end{bmatrix}^{\top}, $$ where $$ \tilde{\eta}_{k}:=\sum_{i=k}^{T-1}\prod_{j=k}^{i}\eta_{j},\qquad\forall k\in[0% ,T-1]. $$ Note that $\tilde{\eta}_{k}=\mathcal{O}(\eta_{k})$ and provides a weighting on the influence of the $\delta x_{k}$ on $\mathcal{R}_{T}$ . This is analyzed further in Section 3.2. We denote the Euclidean norm of the suboptimality vector by $\bar{\eta}:=\|{\tilde{\eta}}\|$ . The rate of change of the benchmark input $\mu^{\star}(x^{\star})$ is captured by the vector $a∈\mathbb{R}_{+}^{T-1}$ , defined as $$ a:=\begin{bmatrix}a_{1}&a_{2}&\ldots&a_{T-1}\end{bmatrix}^{\top}. $$ 3.1 Upper Bound The bound in the following theorem captures the tradeoff between suboptimality and the additional cost in closed-loop. **Theorem 1** *Let Assumptions 1, 3 and 4 hold, then the suboptimality gap of any policy, $\mu$ , fulfilling Assumption 2 satisfies $$ \mathcal{R}_{T}^{\mu}(x_{0})=\mathcal{O}\left(\left(a+{\Delta}\right)^{\top}% \tilde{\eta}\right), $$ for all $x_{0}∈\mathcal{D}^{\mu}$ . Specifically, it is bounded by $$ \mathcal{R}_{T}^{\mu}(x_{0})\leq\bar{M}\left(\tilde{\eta}_{0}\|\delta u_{0}\|+% \left(a+L{\Delta}\right)^{\top}{\tilde{\eta}}\right), $$ for all $x_{0}∈\mathcal{D}^{\mu}$ , where $\delta u_{0}:=\nu-\mu_{0}^{\star}(x_{0})$ and $\bar{M}:=\left(M_{u}+\frac{c_{w}L_{u}\left(M_{u}L+M_{x}\right)}{1-\rho}\right)$ .* The asymptotic variable in the $\mathcal{O}(·)$ notation is the time horizon length $T$ , with all terms in the product $\left(a+{\Delta}\right)^{→p}\tilde{\eta}$ scaling with $T$ , as $T→∞$ . The bound tends to zero as $\tilde{\eta}$ decreases. This is intuitive, as smaller $\tilde{\eta}$ suggests that the benchmark and suboptimal trajectories are closer to each other. Additionally, the suboptimality gap is a relative measure, but the bound is fully decoupled from the performance of $\mu^{\star}$ and only depends on the performance of the suboptimal state evolution if $a=\boldsymbol{0}$ . In this case, the bound can also be given as $\mathcal{R}_{T}(x_{0})=\mathcal{O}(\bar{\eta}\mathcal{S}_{T,2})$ , where $\mathcal{S}_{T,2}$ captures the transient behavior of the suboptimal system and is well-defined in the limit as ${T\!→\!∞}$ when, for example, (4) is exponentially stable. Before we prove Theorem 1, we introduce several supporting lemmas, and the Cauchy Product inequality defined for two finite series $\{a_{i}\}_{i=1}^{T}$ and $\{b_{i}\}_{i=1}^{T}$ as follows $$ \textstyle{\sum_{i=0}^{T}\left|\sum_{j=0}^{i}a_{j}b_{i-j}\right|\leq\left(\sum% _{i=0}^{T}|a_{i}|\right)\left(\sum_{j=0}^{T}|b_{j}|\right)}. \tag{7} $$ We let $u_{k}^{\mu}=\mu_{k}(x_{k})$ denote the suboptimal input as in Assumption 2. **Lemma 1** *Let Assumption 1 hold, then for any policy, $\mu$ , fulfilling Assumption 2, and for all $x_{0}∈\mathcal{D}^{\mu}$ $$ \begin{split}\sum_{k=0}^{T-1}\|d_{k}\|\leq\tilde{\eta}_{0}\|\delta u_{0}\|+% \left(a+L{\Delta}\right)^{\top}{\tilde{\eta}},\end{split} \tag{8} $$ where we define $d_{k}:=u_{k}^{\mu}-\mu_{k}^{\star}(x_{k}),$ and $\delta u_{0}:=\nu-\mu^{\star}_{0}(x_{0})$ .* * Proof* Consider the bound on the per-step error in the input | | $\displaystyle\|d_{k}\|$ | $\displaystyle\stackrel{{\scriptstyle{(a)}}}{{≤}}\eta_{k}\|u_{k-1}^{\mu}-\mu% ^{\star}_{k}(x_{k})\|$ | | | --- | --- | --- | --- | the inequality $(a)$ follows directly from Assumption 2, $(b)$ and $(c)$ follow from the triangle inequality for vector norms and $(d)$ from the uniform Lipschitz condition in Assumption 1. i. and Assumption 1. ii.. Applying the above inequality recursively leads to | | $\displaystyle\|d_{k}\|$ | $\displaystyle≤\|d_{0}\|\prod_{i=1}^{k}\eta_{i}+\sum_{j=1}^{k}\left(a_{j}+L% \|\delta x_{j}\|\right)\prod_{i=j}^{k}\eta_{i},$ | | | --- | --- | --- | --- | for all $k∈\mathbb{N}_{+}$ . Summing up for $k=0,... T-1$ $$ \begin{split}\sum_{k=0}^{T-1}\|d_{k}\|&\leq\|\delta u_{0}\|\sum_{k=0}^{T-1}% \prod_{i=0}^{k}\eta_{i}+\sum_{k=1}^{T-1}\sum_{j=1}^{k}\left(a_{j}+L\|\delta x_% {j}\|\right)\prod_{i=j}^{k}\eta_{i}\\ &=\tilde{\eta}_{0}\|\delta u_{0}\|+\left(a+L{\Delta}\right)^{\top}{\tilde{\eta% }},\end{split} $$ where the inequality follows from Assumption 2 and the equality from the definitions of $\tilde{\eta}$ , $a$ and $\Delta$ . ∎ The following lemma provides an upper bound on the norm of the finite-time trajectory mismatch. **Lemma 2** *Let Assumptions 1 and 3 hold, and consider any policy $\mu$ fulfilling Assumption 2. Then, there exist constants $c_{0},c_{W}∈\mathbb{R}_{+}$ , such that for all $(x_{0},x_{0}^{\star})∈(\mathcal{D}^{\mu}×\mathcal{D}^{\mu})$ | | $\displaystyle\sum_{k=0}^{T}\|x_{k}-x^{\star}_{k}\|≤\|x_{0}-x_{0}^{\star}\|% \left(\frac{c_{0}\left(1-\rho^{T+1}\right)}{1-\rho}\right)$ | | | --- | --- | --- | where $x_{k}$ and $x_{k}^{\star}$ denote the states at time $k$ under, respectively, the suboptimal and benchmark policies.* * Proof* Given the boundedness of $\mathcal{U}$ , the uniform Lipschitz continuity of $g$ in $u$ , and recalling the definition of $w_{k}$ from (4), it follows that there exists a $r_{w}∈\mathbb{R}_{+}$ , such that $w_{k}(x_{k})∈\mathcal{B}(r_{w}),k∈\mathbb{N}$ . Consider then the dynamics (4) as a perturbed version of the benchmark dynamics (3). Assumption 1 ensures that there exist $c_{0},c_{w}∈\mathbb{R}_{+}$ and $\rho∈(0,1)$ , such that for all $k∈\mathbb{N}$ and $(x_{0},x_{0}^{\star})∈\mathcal{D}^{\mu}×\mathcal{D}^{\mu}$ | | $\displaystyle\|x_{k}-x_{k}^{\star}\|$ | $\displaystyle≤ c_{0}\rho^{k}\|x_{0}-x_{0}^{\star}\|+c_{w}\sum_{i=0}^{k-1}% \rho^{k-i-1}\|w_{i}(x_{i})\|$ | | | --- | --- | --- | --- | where the second inequality follows from Assumption 3, where $d_{k}:=u_{k}^{\mu}-\mu_{k}^{\star}(x_{k})$ . Summing up over the whole trajectory and noting the resultant finite geometric series | | $\displaystyle\sum_{k=0}^{T}\|x_{k}-x^{\star}_{k}\|$ | | | --- | --- | --- | where the last inequality follows from (7). Finally, the result follows by using the bound (8) in Lemma 1. ∎ Similarly, the norm of the finite-time input signal mismatch due to the difference in applied inputs can be bounded in the following Lemma. **Lemma 3** *Let Assumptions 1 and 3 hold, and consider any policy $\mu$ fulfilling Assumption 2. Then, there exist constants $c_{0},c_{w}∈\mathbb{R}_{+}$ , such that for all $(x_{0},x_{0}^{\star})∈(\mathcal{D}^{\mu}×\mathcal{D}^{\mu})$ | | $\displaystyle\sum_{k=0}^{T-1}\|u_{k}^{\mu}-\mu_{k}^{\star}(x_{k}^{\star})\|% ≤\|x_{0}-x_{0}^{\star}\|\frac{Lc_{0}\left(1-\rho^{T}\right)}{1-\rho}$ | | | --- | --- | --- | where $x_{k}$ and $x_{k}^{\star}$ are the states at time $k$ under, respectively, the suboptimal and benchmark policies, $\mu$ and $\mu^{\star}$ .* * Proof* Using the triangle inequality for vector norms and defining $d_{k}:=u_{k}^{\mu}-\mu_{k}^{\star}(x_{k})$ | | $\displaystyle\|u_{k}^{\mu}-\mu_{k}^{\star}(x_{k}^{\star})\|$ | $\displaystyle≤\|d_{k}\|+\|\mu_{k}^{\star}(x_{k})-\mu_{k}^{\star}(x_{k}^{% \star})\|$ | | | --- | --- | --- | --- | where the last inequality follows from Lipschitz continuity of $\mu^{\star}$ from Assumption 1. i.. Summing up over the trajectory horizon, using the bound (8) from Lemma 1 and the bound in Lemma 2 completes the proof. ∎ * Proof of Theorem1* By Assumption 4, for all $(x_{0},x_{0}^{\star})∈\mathcal{D}^{\mu}×\mathcal{D}^{\mu}$ | | $\displaystyle\mathcal{R}_{T}(x_{0},x_{0}^{\star}):=J_{T}(x_{0},{\mu})-J_{T}(x_% {0}^{\star},\mu^{\star})$ | | | --- | --- | --- | Then, using Lemmas 2 and 3 for the two respective sums | | $\displaystyle\mathcal{R}_{T}(x_{0},x_{0}^{\star})$ | $\displaystyle<\|x_{0}-x_{0}^{\star}\|\frac{c_{0}\left(M_{u}L+M_{x}\right)}{1-\rho}$ | | | --- | --- | --- | --- | The result follows by taking $x_{0}=x_{0}^{\star}$ . ∎ For the special case when the convergence rate of the suboptimal policy is constant, the following Corollary follows. **Corollary 1** *Let Assumptions 1, 3 and 4 hold, and consider a suboptimal policy, $\mu$ , fulfilling Assumption 2 with $\eta_{k}=\eta,\;k∈\mathbb{N}$ . Then, its suboptimality gap is given by $$ \mathcal{R}_{T}^{\mu}(x_{0})=\mathcal{O}\left(\frac{\eta}{1-\eta}\left(% \mathcal{S}_{T}+\|a\|_{1}\right)\right), $$ for all $x_{0}∈\mathcal{D}^{\mu}$ .* * Proof* If $\eta_{k}=\eta,\;k∈\mathbb{N}$ , $\tilde{\eta}_{k}=\eta\left(1-\eta^{T-k}\right)/\left(1-\eta\right),$ and is bounded by $$ \tilde{\eta}_{k}\leq\frac{\eta}{1-\eta},\quad k\in[0,T]. $$ The complexity term then satisfies $$ \left(a+{\Delta}\right)^{\top}{\tilde{\eta}}\leq\frac{\eta\left(\mathcal{S}_{T% }+\|a\|_{1}\right)}{1-\eta}. $$ The rest of the proof follows directly from Theorem 1 by replacing the complexity term with the new bound. ∎ For bounded $\mathcal{S}_{T}$ , the corollary shows the effect of $\eta$ on the suboptimality gap. Crucially, the gap is $0 0$ in the absence of suboptimality. On the other hand, given a fixed $\eta$ , one can calculate an upper bound on the suboptimality gap of the policy $\mu$ by only considering its pathlength $\mathcal{S}_{T}$ . The independence from the pathlength of the benchmark policy is particularly useful for online suboptimality estimation since it cannot be generally accessed by the control designer. 3.2 Interpretation of the Upper Bound The term $\tilde{\eta}_{0}\|\delta u_{0}\|$ in the bound of Theorem 1 captures the error due to the initial mismatch in the control input, $\delta u_{0}$ . This term in general cannot be avoided, unless the initial “guess” of the input $\nu$ is correct, or $\eta_{0}=0$ , so that the suboptimal and benchmark policies match at the initial timestep. The second term, $a^{→p}\tilde{\eta}$ , scales with the magnitude of the rate of change of the time-varying benchmark policy $\mu^{\star}$ , as defined in Assumption 1. ii.. It vanishes either when $\mu^{\star}$ is stationary, or when the benchmark and suboptimal policies coincide. <details> <summary>2312.05607v3/x2.png Details</summary> ![a5833355c8595de315c49352b73e92a3a0484bc2ee1512d37236b7006f148a50](http://localhost:8000/v1/image/a5833355c8595de315c49352b73e92a3a0484bc2ee1512d37236b7006f148a50) ### Visual Description # Technical Diagram Analysis ## Diagram Overview The image depicts a mathematical/physical system diagram with labeled points, directional flows, and defined regions. Key components include: ### Labeled Points 1. **x₀** - Position: Left side of diagram - Color: Gray - Description: Starting point of the blue flow path 2. **x̄_j** - Position: Upper right quadrant - Color: Blue - Description: Intermediate node in the blue flow path 3. **x̄** - Position: Lower right quadrant - Color: Teal - Description: Terminal node for both blue and green flow paths 4. **x̄_j*** - Position: Lower left quadrant - Color: Green - Description: Origin of the green flow path ### Flow Paths - **Blue Path** - Direction: `x₀ → x̄_j → x̄` - Arrows: Solid blue arrows - Spatial Flow: Left → Upper Right → Lower Right - **Green Path** - Direction: `x̄_j* → x̄` - Arrows: Solid green arrows - Spatial Flow: Lower Left → Lower Right - **Dashed Connection** - Between: `x̄_j` (blue) and `x̄_j*` (green) - Style: Dashed green line ### Defined Regions 1. **D^μ** - Position: Outer boundary of the diagram - Description: Primary operational domain 2. **D^⋆** - Position: Inner boundary (enclosed by D^μ) - Description: Subdomain containing all labeled points ### Mathematical Notations - **x̄**: Bar notation indicating averaged/mean values - **x̄_j***: Star notation denoting a specific variant of x̄_j - **D^μ/D^⋆**: Superscript notation for domain differentiation ## Spatial Grounding & Color Verification | Element | Color | Position | Legend Match | |---------------|--------|-------------------------|--------------| | x₀ | Gray | Left edge | Confirmed | | x̄_j | Blue | Upper right quadrant | Confirmed | | x̄_j* | Green | Lower left quadrant | Confirmed | | x̄ | Teal | Lower right quadrant | Confirmed | | Blue Arrows | Blue | Connects x₀→x̄_j→x̄ | Confirmed | | Green Arrows | Green | Connects x̄_j*→x̄ | Confirmed | | Dashed Line | Green | Connects x̄_j→x̄_j* | Confirmed | ## Trend Verification 1. **Blue Flow Path** - Visual Trend: Starts at left boundary, curves upward to upper right, then downward to lower right - Numerical Consistency: Matches labeled sequence `x₀ → x̄_j → x̄` 2. **Green Flow Path** - Visual Trend: Direct diagonal flow from lower left to lower right - Numerical Consistency: Matches labeled sequence `x̄_j* → x̄` 3. **Dashed Connection** - Visual Trend: Short diagonal link between upper right and lower left nodes - Numerical Consistency: Confirms relationship between `x̄_j` and `x̄_j*` ## Component Isolation ### Header Region - No explicit header elements present ### Main Chart - Central focus: Interconnected flow paths within D^⋆ - Key Features: - Overlapping blue/green paths converging at x̄ - Dashed green connection between distinct flow origins ### Footer Region - No explicit footer elements present ## Conclusion This diagram represents a multi-path system with: 1. Two distinct flow trajectories converging at x̄ 2. Spatial differentiation between domain boundaries (D^μ/D^⋆) 3. Notation-based point differentiation (bars, stars) 4. Color-coded path identification All textual elements have been extracted and cross-verified for accuracy. </details> (a) Exponentially stable case. Suboptimality is captured by (9). <details> <summary>2312.05607v3/x3.png Details</summary> ![f9d7dfa4d3eaeb4b076b832605f4ccaf539005f6682395ee8f6dfa3070dbed5a](http://localhost:8000/v1/image/f9d7dfa4d3eaeb4b076b832605f4ccaf539005f6682395ee8f6dfa3070dbed5a) ### Visual Description # Technical Diagram Analysis ## Diagram Overview The image depicts a multi-region flow diagram with labeled points and directional arrows. The diagram uses color-coded elements to represent different components and relationships. ## Key Components ### 1. Boundary Regions - **Outer Boundary**: Labeled `D^μ` (black outline) - **Inner Boundary**: Labeled `D^⋆` (black outline) - **Central Region**: Labeled `D` (dark blue circular area) ### 2. Labeled Points - **x₀**: - Position: Top-left quadrant - Color: Gray - Symbol: Solid circle - **x_j**: - Position: Right-center - Color: Blue - Symbol: Solid circle - **x̄_j**: - Position: Bottom-center - Color: Green - Symbol: Solid circle - **x⋆_j**: - Position: Bottom-right quadrant - Color: Dark blue - Symbol: Solid circle ### 3. Arrows and Connections - **Primary Flow**: - Gray arrow: `x₀` → `x_j` - Green arrow: `x_j` → `x̄_j` - Dashed green arrow: `x⋆_j` → `x_j` - **Cyclic Loop**: - Dark blue circular arrow: Encircles `x⋆_j` ### 4. Legend - **Color Coding**: - Gray: `x₀` - Blue: `x_j` - Green: `x̄_j` - Dark Blue: `x⋆_j` - **Legend Position**: Bottom-right quadrant ## Spatial Relationships 1. `x₀` (gray) is positioned in the upper-left quadrant of `D^μ`. 2. `x_j` (blue) lies in the right-center of `D^⋆`. 3. `x̄_j` (green) is located in the bottom-center of `D^⋆`. 4. `x⋆_j` (dark blue) resides in the bottom-right quadrant of `D^⋆`. 5. The dark blue cyclic loop encloses `x⋆_j` within `D`. ## Flow Dynamics - **Sequential Path**: `x₀` → `x_j` → `x̄_j` → `x⋆_j` (via dashed green arrow) - **Feedback Loop**: `x⋆_j` connects back to `x_j` via dashed green arrow - **Cyclic Behavior**: Dark blue loop indicates persistent activity around `x⋆_j` ## Structural Notes - All elements are contained within the `D^μ` boundary. - `D^⋆` serves as an intermediate region between `D^μ` and the central `D`. - The diagram uses both solid and dashed arrows to distinguish primary flows from secondary connections. ## Validation Checks 1. **Legend Accuracy**: All point colors match their corresponding legend entries. 2. **Spatial Consistency**: Points are positioned within their designated regions. 3. **Flow Logic**: Arrows maintain logical progression between labeled points. This diagram appears to represent a dynamical system with feedback loops and hierarchical boundary regions, possibly modeling a physical or mathematical process. </details> (b) Case of limit cycles. Suboptimality is captured by (10). Figure 2: The pictorial evolution of suboptimal and benchmark trajectories evolving in $\mathcal{D}^{\mu}⊂eq\mathcal{D}^{\star}$ . The main complexity term of interest, $\Delta^{→p}\tilde{\eta}$ captures the suboptimality of the policy through the inner product of the path vector ${\Delta}$ and the suboptimality vector ${\tilde{\eta}}$ . To study the interplay of these two quantities in more detail, let us consider the case when the benchmark dynamics (3) have an equilibrium at some $\bar{x}∈\mathbb{R}^{n}$ , and $k_{0}=0$ without loss of generality. If the suboptimal closed-loop system (4) is also exponentially stable with $\eta_{k}≠ 0,\;∀ k∈\mathbb{N}$ , then $\|\delta x_{j}\|≈ 0,\;∀ j≥\bar{j}$ , for some $\bar{j}∈\mathbb{N}$ , as visualised in Figure 2(a). Note that, in order to provide this interpretation it is assumed that the systems (3) and (4) have the same single equilibrium point $\bar{x}$ . In such a setting, the complexity term captures the fact that the suboptimality gap is finite as follows $$ \begin{split}&{\Delta}^{\top}{\tilde{\eta}}=\\ &\left[\begin{array}[]{c c|c c c}\|\delta x_{1}\|\;\ldots\;\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\makebox[0.0pt][l]{$% \smash{\underbrace{\phantom{\begin{matrix}\color[rgb]{.5,.5,.5}\definecolor[% named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\|\delta x_{\bar{j}}\|\;\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\ldots\;\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\|\delta x_{T-1}\|\end{matrix}}}_{\text{$\approx% \boldsymbol{0}$}}}$}\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{% rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\|\delta x% _{\bar{j}}\|\;\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\ldots\;% \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\|\delta x_{T-1}\|\end% {array}\right]\left[\begin{array}[]{c}\vphantom{\vdots}\tilde{\eta}_{1}\\ \vphantom{\vdots}\vdots\\ \hline\cr\vphantom{\vdots}\tilde{\eta}_{\bar{j}}\\ \vphantom{\vdots}\vdots\\ \vphantom{\vdots}\tilde{\eta}_{T-1}\end{array}\right]\begin{array}[]{@{\kern-% \nulldelimiterspace}l@{}}1.2pt\lx@intercol\begin{array}[]{@{}c@{}}\vphantom{% \vdots}\\ \vphantom{\vdots}\end{array}\\ 1.2pt\lx@intercol\left.\begin{array}[]{@{}c@{}}\vphantom{\vdots}\\ \vphantom{\vdots}\\ \vphantom{\vdots}\\ \end{array}\right\}\neq 0\end{array}.\end{split} \tag{9} $$ This example coincides with the suboptimal MPC use case discussed in detail in Section 5. When there are multiple equilibria, a similar interpretation of the bound holds. Among other possibilities, one can also consider the case when the benchmark dynamics (3) converge to a limit cycle. Since (3) is E- $\delta$ -ISS it follows from (4) that if at a given point in time $\bar{j}∈\mathbb{N}$ , the suboptimal policy coincides with the considered benchmark, i.e. $\eta_{j}=0,\;∀ j≥\bar{j}$ , then the trajectories will necessarily coincide, as visualized in Figure 2(b). This is captured by the complexity term as $$ \begin{split}&{\Delta}^{\top}{\tilde{\eta}}=\\ &\left[\begin{array}[]{c c|c c c}\|\delta x_{1}\|\;\ldots\;\makebox[0.0pt][l]{% $\smash{\underbrace{\phantom{\begin{matrix}\|\delta x_{\bar{j}}\|\;\ldots\;\|% \delta x_{T-1}\|\end{matrix}}}_{\text{$\neq\boldsymbol{0}$}}}$}\|\delta x_{% \bar{j}}\|\;\ldots\;\|\delta x_{T-1}\|\end{array}\right]\left[\begin{array}[]{% c}\vphantom{\vdots}\tilde{\eta}_{1}\\ \vphantom{\vdots}\vdots\\ \hline\cr\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\vphantom{% \vdots}\tilde{\eta}_{\bar{j}}\\ \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\vphantom{\vdots}% \vdots\\ \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\vphantom{\vdots}% \tilde{\eta}_{T-1}\end{array}\right]\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\begin{array}[]{@{\kern-\nulldelimiterspace}l@{}}1.% 2pt\lx@intercol\begin{array}[]{@{}c@{}}\vphantom{\vdots}\\ \vphantom{\vdots}\end{array}\\ 1.2pt\lx@intercol\left.\begin{array}[]{@{}c@{}}\vphantom{\vdots}\\ \vphantom{\vdots}\\ \vphantom{\vdots}\\ \end{array}\right\}=\boldsymbol{0}\end{array}\color[rgb]{0,0,0}\definecolor[% named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0}{.}\end{split} \tag{10} $$ Even though the pathlength keeps increasing, the norm of the suboptimality vector is finite, resulting in a finite suboptimality gap, containing only the additional cost due to suboptimality at the first $\bar{j}$ timesteps. 4 Exponentially Stable Policies and E- $\delta$ -ISS The analysis in the previous section is agnostic to the steady-state properties of (3) and, therefore, has general validity. In this section, we analyze a common case when the dynamics (3) have a unique equilibrium point at the origin. We derive sufficient conditions under which the exponential stability of globally non-smooth nonlinear dynamics implies E- $\delta$ -ISS, thus verifying the otherwise non-trivial Assumption 1. iii.. We treat (3) as a general nonlinear time-varying system of the form $$ x_{k+1}=f(k,x_{k}),\quad k\in\mathbb{N}_{\geq k_{0}}, \tag{11} $$ where $f:\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathcal{D}$ is continuous with respect to both arguments, $x_{k_{0}}=\xi∈\mathbb{R}^{n}$ for some $k_{0}∈\mathbb{N}$ , $\mathcal{D}⊂\mathbb{R}^{n}$ . Moreover, the set $\mathcal{D}$ is assumed to be compact and to contain the origin. The solution of the system (11) at time $k∈\mathbb{N}_{≥ k_{0}}$ is characterized by the function $\phi:\mathbb{N}_{≥ k_{0}}×\mathbb{N}_{≥ k_{0}}×\mathcal{D}% →\mathcal{D}$ mapping the current time, initial time and the initial state to the current state, i.e. $\phi(k+1,k_{0},\xi)=f(k,\phi(k,k_{0},\xi))$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . We consider the origin to be an equilibrium point for (11), i.e. $f(k,\boldsymbol{0})=\boldsymbol{0}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . Although this restricts the attention to regulation problems, one can always recast a tracking problem into a regulation one by considering the nonlinear evolution of the error between the state and the reference. We impose the following assumption. **Assumption 5** *(Local Behavior). The dynamics $f$ in (11) are 1. $L_{f}$ -Lipschitz continuous in $\mathcal{D}⊂\mathbb{R}^{n}$ , 1. continuously differentiable with respect to $x$ , and the Jacobian matrix $[∂ f(k,x)/∂ x]$ is bounded and Lipschitz continuous uniformly in $k$ within some region $\mathcal{D}_{0}⊂eq\mathcal{D}$ containing the origin.* For completeness, we present a series of auxiliary definitions and theorems for (incremental) exponential stability before presenting the main results of this section. 4.1 Preliminaries on Exponential Stability We define uniform exponential stability for discrete-time, nonlinear time-varying systems [43]. **Definition 2** *Given the system (11), the equilibrium point $x=\boldsymbol{0}$ is uniformly exponentially stable in $\mathcal{D}$ with a rate $\lambda$ , if there exist constants $d∈\mathbb{R}_{+}$ and $\lambda∈(0,1)$ , such that for all $\xi∈\mathcal{D}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|\phi(k,k_{0},\xi)\|\leq d\|\xi\|\lambda^{k-k_{0}}. \tag{12} $$ If $\mathcal{D}=\mathbb{R}^{n}$ , then the equilibrium is uniformly globally exponentially stable.* If the origin is exponentially stable, we also refer to the system (11) as such. Lyapunov theory provides necessary and sufficient conditions for the exponential stability of nonlinear systems. Below are the discrete-time Lyapunov theorems for exponential stability. **Theorem 2** *[43, Thm. 13.11] If there exists a continuous mapping $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathbb{R}_{+}$ , and some constants $c_{1},c_{2}∈\mathbb{R}_{+}$ , $\beta∈(0,1)$ and $p≥ 1$ , such that for all $\xi∈\mathcal{D}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ | | $\displaystyle c_{1}\|\xi\|^{p}≤ V(k,\xi)≤ c_{2}\|\xi\|^{p},$ | | | --- | --- | --- | then the nonlinear system (11) is uniformly exponentially stable in $\mathcal{D}$ , with a rate $\beta$ .* The converse Lyapunov theorem for the discrete-time case shows the implication in the opposite direction. **Theorem 3** *If the nonlinear system (11) is uniformly exponentially stable in $\mathcal{D}$ , then there exists a continuous function $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathbb{R}$ and constants $c_{1},c_{2}∈\mathbb{R}_{+}$ and $\beta∈(0,1)$ , such that for all $\xi∈\mathcal{D}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \displaystyle c_{1}\|\xi\|^{2}\leq V(k,\xi)\leq c_{2}\|\xi\|^{2}, \displaystyle V\left(k+1,f(k,\xi)\right)\leq\beta^{2}V(k,\xi). \tag{13} $$* The proof for Theorem 3 follows directly from [44, Thm 2], by adapting the global result to the case with forward invariant subspace $\mathcal{D}$ . The above theorems generalize to uniform global exponential stability if $\mathcal{D}=\mathbb{R}^{n}$ [42, 44]. In continuous-time, the rate of change of the Lyapunov function with respect to the state is bounded by the norm of the state [42, Thm 4.14]. The following Lemma is the discrete-time equivalent of this. **Lemma 4** *Let Assumption 5. i. hold, then if the system (11) is uniformly exponentially stable in $\mathcal{D}$ , there exists a constant $c_{3}∈\mathbb{R}_{+}$ and a continuous Lyapunov function $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathbb{R}$ , that, in addition to (13) and (14), for all $(\xi,\zeta)∈\mathcal{D}×\mathcal{D}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ also satisfies $$ |V(k,\xi)-V(k,\zeta)|\leq c_{3}\|\xi-\zeta\|\left(\|\xi\|+\|\zeta\|\right). $$* The proof of Lemma 4 is provided in the appendix. 4.2 Preliminaries on Exponential Incremental Stability Exponential incremental stability shows the exponential convergence of two trajectories generated by the same system to each other [11, 13]. **Definition 3** *The system (11) is uniformly exponentially incrementally stable in $\mathcal{D}$ if there exist constants $d∈\mathbb{R}_{+}$ and $\lambda∈(0,1)$ , such that for all $(\xi,\zeta)∈\mathcal{D}×\mathcal{D}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|\phi(k,k_{0},\xi)-\phi(k,k_{0},\zeta)\|\leq d\|\xi-\zeta\|\lambda^{k-k_{0}}. $$ If this holds only for a subset of initial conditions in $\mathcal{D}×\mathcal{D}$ the system is uniformly locally exponentially incrementally stable in this subset. If $\mathcal{D}=\mathbb{R}^{n}$ the definition is global.* The theory of exponential convergence of two trajectories of the same system has first been studied as uniform convergence by Demidovich [33, 45], and later extended through contraction theory [12, 13]. Contraction is a sufficient condition for exponential incremental stability, that when $f$ is smooth, can be checked by the following condition. **Theorem 4** *Let Assumption 5. ii. hold, and suppose that there exists a sequence of positive-definite, bounded matrices $P(k)∈\mathbb{R}^{n× n}$ , $k∈\mathbb{N}_{≥ k_{0}}$ , and some $\rho∈(0,1)$ such that $$ D(k,x):=\frac{\partial f}{\partial x}(k,x)^{T}P(k+1)\frac{\partial f}{\partial x% }(k,x)-\rho^{2}P(k)\prec 0 \tag{15} $$ for all $x∈\mathcal{\tilde{D}}_{0}⊂eq\mathcal{D}_{0}$ . Then, there exists a constant $r_{\mathcal{\tilde{D}}_{0}}∈\mathbb{R}_{+}$ , such that (11) is uniformly locally exponentially incrementally stable in $\mathcal{B}(r_{\mathcal{\tilde{D}}_{0}})⊂eq\mathcal{\tilde{D}}_{0}$ with a rate $\rho$ .* We also state the following converse Lyapunov theorem. **Theorem 5** *If the system (11) is uniformly exponentially incrementally stable in $\mathcal{D}$ , then there exists a function $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}×\mathcal{D}→\mathbb{% R}_{+}$ and constants $c_{1},c_{2}∈\mathbb{R}_{+}$ and $\beta∈(0,1)$ , such that for all $(\xi,\zeta)∈\mathcal{D}×\mathcal{D}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \displaystyle c_{1}\|\xi-\zeta\|^{2}\leq V(k,\xi,\zeta)\leq c_{2}\|\xi-\zeta\|% ^{2}, \displaystyle V\left(k+1,f(k,\xi),f(k,\zeta)\right)\leq\beta^{2}V(k,\xi,\zeta). \tag{16} $$ Moreover, under Assumption 5. i., there exists a constant $c_{3}∈\mathbb{R}_{+}$ , such that for all $(\xi,\zeta,\tilde{\xi},\tilde{\zeta})∈\mathcal{D}×\mathcal{D}×% \mathcal{D}×\mathcal{D}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \begin{split}&|V(k,\xi,\zeta)-V(k,\tilde{\xi},\tilde{\zeta})|\leq\\ &\leq c_{3}\left(\|\xi-\tilde{\xi}\|+\|\zeta-\tilde{\zeta}\|\right)\left(\|\xi% -\zeta\|+\|\tilde{\xi}-\tilde{\zeta}\|\right).\end{split} \tag{18} $$* The proofs of Theorem 4, adapted from [13], and Theorem 5, extended from [10] and [11], are provided in the appendix. 4.3 Main Results In this subsection, we show that if the nonlinear dynamics (11) are uniformly exponentially stable in $\mathcal{D}$ and satisfy Assumption 5, then they are also E- $\delta$ -ISS in the same region. First, we show that under the local Lipschitz continuity assumption, exponential incremental stability implies E- $\delta$ -ISS. **Theorem 6** *Let Assumption 5 hold, then if the nonlinear system (11) is uniformly exponentially incrementally stable in $\mathcal{D}$ , it is E- $\delta$ -ISS in the same region.* For completeness, we provide the proof in the appendix, and note that the implication is similar to existing results in the literature, e.g. [46], that studies the asymptotic stability of the equilibrium, or [47, Lemma 2] that considers a non-autonomous setting. Next, we show that if in addition to local Lipschitz continuity, the nonlinear dynamics are also locally differentiable in some arbitrarily small region $\mathcal{D}_{0}$ around the equilibrium, then uniform exponential stability implies uniform exponential incremental stability and E- $\delta$ -ISS. The results are formalized in Theorem 7 and Corollary 2. **Theorem 7** *Let Assumption 5 hold, then if the nonlinear system (11) is uniformly exponentially stable in $\mathcal{D}$ , it is also uniformly exponentially incrementally stable in the same region.* * Proof* We start by showing that the exponential stability of $f$ implies that the linearized dynamics around the origin are also stable by following similar arguments to [42]. Let $$ A(k):=\frac{\partial f(k,x)}{\partial x}(k,\boldsymbol{0}), $$ which is well-defined given Assumption 5. Moreover, there exists a $\bar{A}∈\mathbb{R}_{+}$ , such that $\|{A}(k)\|≤\bar{A}$ , $k∈\mathbb{N}_{≥ k_{0}}$ . It follows from Theorem 3, that there exists a continuous mapping $V:k∈\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathbb{R}$ satisfying (13)-(14). Let us consider $V$ as a candidate Lyapunov function for $A(k)$ . Then for all $x∈\mathcal{D}$ , $k∈\mathbb{N}_{≥ k_{0}}$ there exist constants $\beta∈(0,1)$ , and $c_{4},d∈\mathbb{R}_{+}$ such that | | $\displaystyle V(k+1,A(k)x)=$ | | | --- | --- | --- | where the first inequality follows from Theorem 3, the second from Lemma 4 and the last from Definition 2 and properties of induced norms. Denoting $h(k,x):=f(k,x)-A(k)x$ , it follows from the Lipschitz continuity of the Jacobian of $f$ [42, Chpt. 4.6] that there exists a $L_{h}∈\mathbb{R}_{+}$ , such that $\|h(k,x)\|≤ L_{h}\|x\|^{2}$ , for all $k∈\mathbb{N}_{≥ k_{0}}$ for all $x∈\mathcal{D}_{0}$ . Using this | | $\displaystyle V(k+1,A(k)x)≤\beta^{2}V(k,x)+c_{4}L_{h}\left(d\beta+\bar{A}% \right)\|x\|^{3}$ | | | --- | --- | --- | where the second inequality follows from the converse Lyapunov Theorem 3 for some $c_{1}∈\mathbb{R}_{+}$ . Defining $r_{1}:=\min\left\{\frac{c_{1}\left(1-\beta^{2}\right)}{c_{4}L_{h}\left(d\beta+% \bar{A}\right)},\max\{r>0\ |\ \|x\|<r,x∈\mathcal{D}_{0}\}\right\}$ , for all $\|x\|<r_{1}$ , it holds that $\gamma<1$ . Hence, using Theorem 2 the linearized dynamics $x_{k+1}=A(k)x_{k}$ are uniformly exponentially stable. Then, from [48, Thm 23.3] there exists a sequence of uniformly positive definite $P(k)$ that solves the difference Lyapunov equation $A^{→p}(k)P(k+1)A(k)-P(k)≤-cI,$ for some $c∈\mathbb{R}_{+}$ . Considering now equation (15) for some $k∈\mathbb{N}_{≥ k_{0}}$ , $\bar{x}∈\mathcal{D}_{0}$ and denoting $d_{k}(\bar{x}):=\frac{∂ f}{∂ x}(k,\bar{x})-A(k)$ , it holds that | | $\displaystyle\frac{∂ f}{∂ x}(k,\bar{x})^{→p}P(k+1)\frac{% ∂ f}{∂ x}(k,\bar{x})-P(k)$ | | | --- | --- | --- | where the first equality follows from the definition of $d_{k}(\bar{x})$ and the last one from [48, Thm 23.3]. Following the same arguments as in [42, Chpt. 4.6], there exists a constant $L_{2}∈\mathbb{R}_{+}$ , such that for all $k∈\mathbb{N}_{≥ k_{0}}$ and $\bar{x}∈\mathcal{D}_{0}$ , $\|d_{k}(\bar{x})\|≤ L_{2}\|\bar{x}\|$ . Pre- and post-multiplying the above with some $x^{→p}$ and $x$ , respectively then yields | | $\displaystyle x^{→p}\left[\frac{∂ f}{∂ x}(k,\bar{x})^{→p}P(k% +1)\frac{∂ f}{∂ x}(k,\bar{x})-P(k)\right]x≤$ | | | --- | --- | --- | where $r_{d}=\max\limits_{x∈\mathcal{D}}\|x\|$ , and $\|P(k)\|≤\bar{P},\;k∈\mathbb{N}_{≥ k_{0}}$ . Note that the rate of exponential stability of the linear system $A(k)$ is $\sqrt{1-\frac{c}{\bar{P}}}∈(0,1)$ . Then, choosing $\rho^{2}>1-\frac{c}{\bar{P}}$ , adding $x^{→p}\left(1-\rho^{2}\right)P(k)x$ to both sides of the above inequality and defining $r_{2}:=\frac{c-\left(1-\rho^{2}\right)\bar{P}}{2\bar{A}\bar{P}L_{2}+\bar{P}r_{% d}L_{2}^{2}}$ ensures that $$ x^{\top}\left[\frac{\partial f}{\partial x}(k,x)^{T}P(k+1)\frac{\partial f}{% \partial x}(k,x)-\rho^{2}P(k)\right]x<0, $$ uniformly in $k$ and $x$ , for all $\|x\|<r:=\min\left(r_{1},r_{2}\right)$ . This implies by Theorem 4 that there exists a positive constant $r_{\mathcal{\tilde{D}}_{0}}≤ r$ , such that for all $x∈\mathcal{B}(r_{\mathcal{\tilde{D}}_{0}})$ the system (11) is uniformly exponentially incrementally stable with rate of $\rho$ . To show that the system is also uniformly exponentially incrementally stable in $\mathcal{D}$ , consider any $\xi_{1},\xi_{2}∈\mathcal{D}$ , then for all $k∈\mathbb{N}_{≥ k_{0}}$ , and some $d∈\mathbb{R}_{+}$ , the following two inequalities hold $$ \displaystyle\begin{split}&\|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|\leq% \\ &\qquad\|\phi(k,k_{0},\xi_{1})\|+\|\phi(k,k_{0},\xi_{2})\|\leq 2r_{d}d\beta^{k% -k_{0}},\end{split} \displaystyle\|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|\leq L_{f}^{k-k_{0% }}\underbrace{\|\xi_{1}-\xi_{2}\|}_{:=\Delta\xi}. \tag{19} $$ The bound in (19) follows from the exponential stability of $f$ , and the one in (20) from its Lipschitz continuity. Combining the two $$ \|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|\leq\min\{2r_{d}d\cdot\beta^{k-% k_{0}},L_{f}^{k-k_{0}}\|\Delta\xi\|\}. $$ Define $k^{\prime}∈\mathbb{N}_{≥ k_{0}}$ such that both $\|\phi(k^{\prime},k_{0},\xi_{1})\|<r_{\mathcal{\tilde{D}}_{0}},\|\phi(k^{% \prime},k_{0},\xi_{2})\|<r_{\mathcal{\tilde{D}}_{0}}$ . Then, from the above analysis, there exists a $d^{\prime}∈\mathbb{R}_{+}$ , such that for all $k∈\mathbb{N}_{≥ k^{\prime}}$ | | $\displaystyle\|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|$ | | | --- | --- | --- | It then follows that | | $\displaystyle\|\phi(k,k_{0},\xi_{1})$ | $\displaystyle-\phi(k,k_{0},\xi_{2})\|$ | | | --- | --- | --- | --- | Note that for all $k=\{k_{0},...,k^{\prime}\}$ $$ \|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|\leq c_{5}\|\Delta\xi\|\rho^{k-% k_{0}}, $$ where $c_{5}:=\max\{1,\frac{L_{f}^{k^{\prime}}}{\rho^{k^{\prime}}}\}$ , makes it a constant independent of $\|\Delta\xi\|$ , and since $k^{\prime}$ is finite, also independent of time. Combining the bounds, the following holds for all $k∈\mathbb{N}_{≥ k_{0}}$ $$ \|\phi(k,k_{0},\xi_{1})-\phi(k,k_{0},\xi_{2})\|\leq c_{5}d^{\prime}\rho^{k-k_{% 0}}\|\Delta\xi\|, $$ which is the definition of uniform exponential incremental stability. ∎ Combining Theorems 6 and 7 the following corollary follows directly. **Corollary 2** *Let Assumption 5 hold, then if the nonlinear system (11) is uniformly exponentially stable in $\mathcal{D}$ , it is also E- $\delta$ -ISS in the same region.* **Remark 1** *The conditions on the Jacobian matrix of $f$ in Assumption 5 are required only in the time-varying case, as these are satisfied trivially in the time-invariant setting; for further details, we refer the reader to [42, Theorems 4.7, 4.12].* **Remark 2** *It is worth highlighting that the implication in Corollary 2 is also claimed in [11, Thm. 12] without assuming local differentiability. While the proof in [11] holds for asymptotic stability, the same arguments break down in the exponential case since one cannot find constants $d∈\mathbb{R}_{+}$ and $\lambda∈(0,1)$ as in Definition 3 that are independent of the initial deviation $\|\xi-\zeta\|$ . Hence, the results in this section serve to fill this gap by deriving sufficient conditions in the form of Assumption 5 under which the implication holds.* In the sequel, we use these insights to address the closed-loop dynamics under suboptimal and optimal linear MPC policies as a notable use case. 5 Model Predictive Control - A Use Case In Section 3, we showed that under certain assumptions on the suboptimal policy $\mu$ (Assumption 2) and the benchmark policy $\mu^{\star}$ (Assumption 1), the suboptimality gap of $\mu$ can be bounded for a certain family of costs. We now exploit the results of Section IV to show that these assumptions are satisfied in the case of suboptimal, PGM-based linear-quadratic MPC, and derive an asymptotically tighter suboptimality gap bound compared to [17] under the same assumptions. Consider the control of linear time-invariant dynamical systems, modeled by $$ x_{k+1}=Ax_{k}+Bu_{k},\quad k\in\mathbb{N}, $$ where $A∈\mathbb{R}^{n× n}$ , and $B∈\mathbb{R}^{n× m}$ are known system matrices, and the state and input vectors are defined as before. We consider the finite horizon linear-quadratic regulator (LQR) problem with the objective of minimizing the finite-time cost $$ J_{T}(x_{0},\boldsymbol{u})=\|x_{T}\|^{2}_{P}+\sum_{k=0}^{T-1}\|x_{k}\|^{2}_{Q% }+\|u_{k}\|^{2}_{R}, \tag{21} $$ where $Q∈\mathbb{R}^{n× n}$ and $R∈\mathbb{R}^{m× m}$ are design matrices and $P$ is taken to be the solution of the discrete-time Algebraic Riccati Equation, $P=Q+K^{→p}RK+(A-BK)^{→p}P(A-BK)$ , with $K=(R+B^{→p}PB)^{-1}(B^{→p}PA)$ . The control inputs must satisfy $u_{k}∈\mathcal{U}$ for all $k∈\mathbb{N}$ , where $\mathcal{U}⊂eq\mathbb{R}^{m}$ is a constraint set. The following standard assumptions ensure a unique minimizer for (21) always exists [49, 16]. **Assumption 6** *(Well-posed problem) 1. The pair $(A,B)$ is stabilizable, $Q\succ 0$ , $R\succ 0$ . 1. The input constraint set $\mathcal{U}$ is closed, convex and contains the origin in its interior.* The model predictive controller solves this problem in a receding horizon fashion, solving the following parametric optimal control problem (POCP) at each timestep $k$ , having measured a state $x∈\mathbb{R}^{n}$ $$ \begin{split}\mu^{\star}(x):=&\operatorname*{arg\,min}_{\nu}\;J_{N}(\xi_{0},% \nu)\\ \text{s.t.}\;&\xi_{i+1}=A\xi_{i}+B\nu_{i},\;i\!=\!0,\dots,N-1,\\ &\xi_{0}=x,\;\nu_{i}\in\mathcal{U},\;i\!=\!0,\dots,N-1.\end{split} \tag{22} $$ Here $N$ is the prediction horizon length, and $\nu=[{\nu}_{0}^{→p}...{\nu}_{N-1}^{→p}]^{→p}$ denotes the predicted input vector. We refer to the minimiser of (22) for a given initial state (parameter) $x∈\mathbb{R}^{n}$ , $\mu^{\star}(x):\mathbb{R}^{n}→\mathbb{R}^{Nm}$ , as the optimal mapping. For this example, we take as the benchmark policy $\mu^{\star}$ , the map solving the POCP (22). The optimal cost attained by this mapping is denoted by $J_{N}^{\star}(x):=J_{N}(x,\mu^{\star}(x))$ , which serves as an approximate value function for the problem. For each $k$ , the model predictive controller applies the first element of $\mu^{\star}(x_{k})$ to the system, and the process is repeated in a receding horizon fashion. The optimal state evolution under this optimal MPC policy is then given by $$ x^{\star}_{k+1}=Ax_{k}^{\star}+\overline{B}\mu^{\star}(x_{k}^{\star}):=f(x^{% \star}_{k}),\quad k\in\mathbb{N}, \tag{23} $$ where $x_{0}^{\star}:=x_{0}$ , $\overline{B}:=BS$ , and $S:=\left[I_{m× m}~{}\boldsymbol{0}~{}...~{}\boldsymbol{0}\right]∈% \mathbb{R}^{m× Nm}$ is a matrix selecting the first control input. Note that the optimal MPC policy, $\mu^{\star}$ is time-invariant due to the structure of the problem. Problem (22) is a parametric quadratic program and for a given parameter $x∈\mathbb{R}^{n}$ can be represented in an equivalent condensed form $J_{N}^{\star}(x)=\min_{\nu∈\mathcal{N}}\|(x,\nu)\|_{M}^{2}$ , where $\mathcal{N}=\mathcal{U}^{N}⊂eq\mathbb{R}^{Nm}$ $$ M=\begin{bmatrix}W&G^{\top}\\ G&H\end{bmatrix}, \tag{24} $$ and the definitions of $H∈\mathbb{R}^{Nm× Nm}$ , $W∈\mathbb{R}^{n× n}$ and, $G∈\mathbb{R}^{Nm× n}$ can be found in [9]. As the optimal $\mu^{\star}(x)$ may often be infeasible to compute exactly, suboptimal schemes are often considered. In our setting, a suboptimal policy is computed by performing only a finite number of optimization steps for (22). In particular, given $x∈\mathbb{R}^{n}$ and an input vector ${\nu}∈\mathbb{R}^{Nm}$ , consider the operator that performs one step of the projected gradient method $$ \mathcal{T}(x,\nu):=\Pi_{\mathcal{N}}[\nu-\alpha\nabla{\nu}{J_{N}}(x,\nu)], \tag{25} $$ where $\alpha∈\mathbb{R}$ is a step size. Applying (25) iteratively $\ell_{k}∈\mathbb{N}_{+}$ times provides an approximation for the optimal input, and hence the optimal policy. The combined dynamics of the system and the approximate optimizer are then given by The subscript of $\ell_{k}$ is dropped when it is taken to be a constant. $$ \displaystyle u_{k}^{\mu} \displaystyle=\mathcal{T}^{\ell_{k}}(x_{k},u_{k-1}^{\mu}), \displaystyle x_{k+1} \displaystyle=Ax_{k}+\overline{B}u_{k}^{\mu}, $$ where $u_{-1}^{\mu}∈\mathbb{R}^{Nm}$ is an initialization vector, and for some $l∈\mathbb{N},x∈\mathbb{R}^{n}$ and $\nu∈\mathbb{R}^{Nm}$ , we define $$ \mathcal{T}^{l}(x,\nu)=\mathcal{T}(x,\mathcal{T}^{l-1}(x,\nu)),\qquad\mathcal{% T}^{0}(x,\nu)=\nu. \tag{27} $$ The dynamics under the suboptimal policy are (26b) by taking $u_{k}^{\mu}=\mu_{k}(x_{k})$ for all $k∈\mathbb{N}$ , i.e. $$ x_{k+1}=\underbrace{Ax_{k}+\overline{B}{\mu}^{\star}(x_{k})}_{f(x_{k})}+% \overline{B}\underbrace{\left(u_{k}^{\mu}-\mu^{\star}(x_{k})\right)}_{=d_{k}},% \quad k\in\mathbb{N}. $$ Note that $\mu_{k}$ is also a function of the previous input state $u_{k-1}^{\mu}$ . However, since the closed-loop evolution is noise-free, it can be uniquely determined given the initialization vector, the current time $k$ , and the current state. Hence, the dependence on $u_{k-1}^{\mu}$ is encoded in the subscript of $\mu_{k}$ . The suboptimal policy can in general be defined as a function of the information vector $\mathcal{I}_{k}=\{x_{k},u_{k-1},...,u_{0}\}$ ; as long as Assumption 2 is satisfied, the results in this manuscript hold. 5.1 Optimal MPC In this subsection, we review the properties of the optimal mapping $\mu^{\star}(x)$ . As shown in [9, 50, 51], the system (23) is exponentially stable with the forward invariant ROA estimate $$ \Gamma_{N}:=\{x\in\mathbb{R}^{n}\mid\psi(x)\leq r_{N}\}, $$ where $\psi(x):=\sqrt{J_{N}^{\star}(x)}$ , $\textstyle{d=c·{\lambda^{-}(Q)}/{\lambda^{+}(P)}}$ , $r_{N}=\sqrt{Nd+c}$ and $c>0$ is such that the following set $$ \Omega=\{x\in\mathbb{R}^{n}\mid\|x\|_{P}^{2}\leq c\}, $$ also satisfies $\Omega⊂\{x∈\mathbb{R}^{n}\mid-Kx∈\mathcal{U}\}$ . The function $\psi(x)$ is a Lyapunov function for the optimal MPC algorithm, satisfying $$ \displaystyle\|x\|_{P}\leq\psi(x)\leq\|x\|_{W} \displaystyle\psi\left(f(x)\right)\leq\beta\psi(x), \tag{28} $$ where $\beta∈(0,1)$ is the exponential decay rate. The Lipschitz continuity of the optimal mapping is formalized in the following lemma. **Lemma 5** *[9, Corollary 2] For any $(x,y)∈\Gamma_{N}×\Gamma_{N}$ , the optimal solution mapping, $\mu^{\star}(x)$ , satisfies | | $\displaystyle\|\mu^{\star}(x)-\mu^{\star}(y)\|≤\|H^{-\frac{1}{2}}\|\|G(x-y)% \|_{H^{-1}}≤ L\|x-y\|$ | | | --- | --- | --- | with a Lipschitz constant $L:=\|H^{-\frac{1}{2}}\|·\|H^{-\frac{1}{2}}G\|$ .* The proof follows from the parametric quadratic program structure of the MPC problem and is derived in [9] or [16] from an explicit MPC point of view. 5.2 Suboptimal MPC The suboptimal policy in this setting is defined by (26a). The following well-established result shows the linear rate of convergence of the PGM method. **Theorem 8** *[39, Theorem 3.1] For any $x∈\mathbb{R}^{n}$ , $\nu∈\mathbb{R}^{Nm}$ , $\ell∈\mathbb{N}_{+}$ , and for $\alpha=\frac{1}{\lambda^{+}(H)+\lambda^{-}(H)}$ $$ \left\|\mathcal{T}^{\ell}(x,\nu)-\mu^{\star}(x)\right\|\leq\eta^{\ell}\|\nu-% \mu^{\star}(x)\|, $$ where $\eta=(\lambda^{+}(H)-\lambda^{-}(H))/(\lambda^{+}(H)+\lambda^{-}(H))$ .* The suboptimal MPC scheme is treated by considering the combined evolution of the system-optimizer dynamics (26). The stability of such a scheme, also referred to as Time-Distributed MPC (TD-MPC) or as real-time implementation of MPC is shown in [24, 25, 9] for a fixed number of iterations $\ell$ and in [17] for a time-varying $\ell_{k}$ . In particular, if $\ell_{k}>\ell^{\star}$ for all $k∈\mathbb{N}$ , where $$ \ell^{\star}=\frac{\log(1-\beta)-\log(\sigma\kappa+\omega(1-\beta))}{\log(\eta% )}, $$ where $\beta$ is the same as in Section 5.1, $\omega=1+\|H^{-\frac{1}{2}}\|\|H^{-\frac{1}{2}}G\overline{B}\|$ , $\sigma=\|W^{\frac{1}{2}}\overline{B}\|$ , and | | $\displaystyle\kappa$ | $\displaystyle=\|H^{-\frac{1}{2}}\|\|H^{-\frac{1}{2}}G(A-I)P^{-\frac{1}{2}}\|$ | | | --- | --- | --- | --- | Then, the combined dynamics (26) are exponentially stable in the following forward invariant ROA estimate | | $\displaystyle\Sigma_{N}=\biggl{\{}(x,z)\!∈\!\Gamma_{N}\!×\!\mathcal{N}% \mid~{}$ | $\displaystyle\psi(x)\!≤\!r_{N},\biggr{.}$ | | | --- | --- | --- | --- | recalling $r_{N}$ from Section 5.1. For the rest of the section we take $u_{-1}^{\mu}=0$ , with the assumption that the resulting $u_{0}^{\mu}=\mathcal{T}^{\ell}(x_{0},u_{-1}^{\mu})$ is such that $(x_{0},u_{0}^{\mu})∈\Sigma_{N}$ . For further analysis of the choice of the initial input and the forward invariant dynamics, we refer the interested reader to [50]. The exponential stability result from [17] is summarized in the following Lemma. **Lemma 6** *[17, Lemma 5] Given the dynamics (26), the following holds for all $(x_{0},u_{0}^{\mu})∈\Sigma_{N}$ , $k∈\mathbb{N}$ and $\ell_{k}>\ell^{\star}$ $$ \|x_{k}\|\leq h_{0}\|P^{-\frac{1}{2}}\|\cdot\|x_{0}\|_{W}\prod_{i=-1}^{k}% \varepsilon_{i}, $$ where $\varepsilon_{k}:=\max\{\beta+\tau\kappa\eta^{\ell_{k}},\frac{\sigma+\tau\eta^{% \ell_{k}}\omega}{\tau}\}∈(0,1)$ , $\varepsilon_{-1}:=1$ and $h_{0}=1+\tau\eta^{\ell_{0}}L\|W^{-\frac{1}{2}}\|$ .* If the same number of optimization iterations are taken at all times the bound reduces to the following. **Corollary 3** *[17, Corollary 3] Given the dynamics (26), the following holds for all $\ell>\ell^{\star}$ , $(x_{0},u_{0}^{\mu})∈\Sigma_{N}$ and $k∈\mathbb{N}$ $$ \|x_{k}\|\leq h\|P^{-\frac{1}{2}}\|\cdot\|x_{0}\|_{W}\cdot\varepsilon^{k}, $$ where $\varepsilon:=\max\{\beta+\tau\kappa\eta^{\ell},\frac{\sigma+\tau\eta^{\ell}% \omega}{\tau}\}∈(0,1)$ and $h=1+\tau\eta^{\ell}L\|W^{-\frac{1}{2}}\|$ .* 5.3 Suboptimality Gap We define the suboptimality vector, ${\tilde{\eta}}_{\ell}$ , for this use case as $$ {\tilde{\eta}}_{\ell}:=\begin{bmatrix}\tilde{\eta}_{\ell,1}&\tilde{\eta}_{\ell% ,2}&\ldots&\tilde{\eta}_{\ell,T-1}\end{bmatrix}, $$ where $\tilde{\eta}_{\ell,k}:=\eta^{\ell_{k}}(1+\tilde{\eta}_{\ell,k+1}),\;k∈[0,T-2]$ , and $\tilde{\eta}_{\ell,T-1}:=\eta^{\ell_{T-1}}$ . We denote the Euclidean norm of the suboptimality vector by $\bar{\eta}_{\ell}:=\|{\tilde{\eta}}_{\ell}\|$ . The main result for a suboptimal MPC scheme is summarized in the following Theorem. **Theorem 9** *Let Assumption 6 hold, then if $\ell_{k}>\ell^{\star},\ ∀ k∈\mathbb{N}$ , the suboptimality gap of the suboptimal MPC (26) is given by $$ \mathcal{R}_{T}(x_{0})=\mathcal{O}\left(\bar{\eta}_{\ell}\|x_{0}\|\right), $$ for all $(x_{0},u_{0}^{\mu})∈\Sigma_{N}$ . Specifically, it is bounded by $$ \mathcal{R}_{T}(x_{0})\leq\bar{M}\left(\tilde{\eta}_{\ell,0}\|\delta u_{0}\|+L% {\Delta}^{\top}{\tilde{\eta}_{\ell}}\right), $$ with $\bar{M},\Delta$ defined as in Theorem 1 and $\delta u_{0}=u_{-1}^{\mu}-\mu_{0}^{\star}(x_{0})$ . Moreover, if $\ell_{k}=\ell>\ell^{\star},∀ k∈\mathbb{N}$ , then, for all $(x_{0},u_{0}^{\mu})∈\Sigma_{N}$ $$ \mathcal{R}_{T}(x_{0})=\mathcal{O}\left(\frac{\eta^{\ell}}{1-\eta^{\ell}}\|x_{% 0}\|\right). $$* * Proof* We start by showing that Assumptions 1 - 4 are satisfied in the MPC use-case. In this setting, the linear dynamics to be controlled are given by $$ x_{k+1}=Ax_{k}+\overline{B}\bar{u}_{k},\quad k\in\mathbb{N}, \tag{30} $$ with the input $\bar{u}_{k}∈\mathbb{R}^{Nm}$ . First, we note that Assumption 3 is satisfied trivially with $L_{u}=\|\overline{B}\|$ . The benchmark controller is the optimal policy $\mu^{\star}$ that solves the POCP (22), and is given in (23). Taking $\mathcal{D}^{\star}$ to be the forward invariant ROA estimate $\mathcal{D}^{\star}=\Gamma_{N}$ , it follows from Lemma 5 that Assumption 1. i. is satisfied. Since the policy is time-invariant, Assumption 1. ii. is satisfied trivially with $a_{k}=0$ for all $k$ . To show that the Assumption 1. iii. is satisfied, we use the analysis from Section 4. Exploiting the structure of the optimization problem (22), it has been shown in [16] that under Assumption 6, the solution of the MPC problem is piecewise-affine in the state. Moreover, the set $\Omega$ , that defines a forward invariant set around the origin for the dynamics (23) with inactive input constraints, is non-empty [4]. This and Lemma 5 imply that Assumption 5 is satisfied. As discussed in Section 5.1, the benchmark dynamics (23) are exponentially stable in $\mathcal{D}^{\star}$ , therefore, by Corollary 2 we conclude that they are also E- $\delta$ -ISS in the same region. The suboptimal policy is given by (26a). The results in [17] show that for all $\ell_{k}≥\ell^{\star}$ , $\Sigma_{N}$ is a forward invariant ROA estimate for the combined dynamics (26). Given this and an initial $u_{-1}^{\mu}=0$ , consider $\mathcal{D}^{\mu}=\{x∈\mathbb{R}^{n}\ |\ (x,\mathcal{T}^{\ell}(x,u_{-1}^{\mu% }))∈\Sigma_{N}\}$ . Then Assumption 2 is satisfied directly from Theorem 8 and the suboptimal policy definition. Finally, to reconcile the quadratic cost defined in (21) and the modified dynamics (30), we redefine the cost, as $$ J_{T}(x_{0},\bar{\boldsymbol{u}})=\|x_{T}\|^{2}_{P}+\sum_{k=0}^{T-1}\|x_{k}\|^% {2}_{Q}+\|\bar{u}_{k}\|^{2}_{\overline{R}}, $$ where $\overline{R}=S^{→p}RS$ . For the quadratic costs in (21), and for any $(x,y)∈\mathcal{D}^{\mu}×\mathcal{D}^{\mu}$ , and $(u,z)∈(\mathcal{N}×\mathcal{N})$ | | $\displaystyle|x^{→p}Qx-y^{→p}Qy$ | $\displaystyle+u^{→p}Ru-z^{→p}Rz|$ | | | --- | --- | --- | --- | where $x_{m}$ and $u_{m}$ are such that, $\|x\|≤ x_{m}$ and $\|u\|≤ u_{m}$ for all $x∈\mathcal{D}^{\mu},u∈\mathcal{N}$ . Then the condition in Assumption 4 is satisfied with $M_{x}=2x_{m}\max\{\|Q\|,\|P\|\}$ and $M_{u}=2u_{m}\|\overline{R}\|$ . Since Assumption 6 implies Assumptions 1 - 4 are satisfied, we can invoke the bound in Theorem 1 for the suboptimality gap. As the suboptimal dynamics are exponentially stable, its pathlength is finite. In particular | | $\displaystyle\mathcal{S}_{T,2}^{2}$ | $\displaystyle=\sum_{k=1}^{T}\|x_{k}-x_{k-1}\|^{2}≤ 4\sum_{k=0}^{T}\|x_{k}\|% ^{2}$ | | | --- | --- | --- | --- | where the first inequality follows by the triangle inequality, the second from Lemma 6 and by denoting $c_{0}:=2h_{0}\|P^{-\frac{1}{2}}\|\|W^{\frac{1}{2}}\|$ , and the last one by bounding the geometric series and denoting $\bar{\varepsilon}:=\max\{\varepsilon_{i}\}_{i=0}^{T-1}$ . Noting that $a_{k}=0$ for all $k∈\mathbb{N}$ in this example, the suboptimality gap is bounded by $\mathcal{O}(\mathcal{S}_{T,2}\bar{\eta}_{\ell})=\mathcal{O}(\bar{\eta}_{\ell}% \|x_{0}\|)$ . In the case when $\ell_{k}=\ell>\ell^{\star}$ for all $k∈\mathbb{N}$ , one can use the bound derived in Corollary 1, with a simplified expression for the pathlength $$ \mathcal{S}_{T}\leq\frac{c}{1-\varepsilon}\|x_{0}\|. $$ The above is obtained directly from the bound in Corollary 3 and by denoting $c:=2h\|P^{-\frac{1}{2}}\|\|W^{\frac{1}{2}}\|$ . ∎ The theorem shows that the suboptimality gap of the MPC suboptimal controller (26) scales with $\tilde{\eta}_{\ell}$ or $\eta^{\ell}$ , where the number of iterations $\ell_{k}$ are design parameters. Note that the higher $\ell_{k}$ the more computation is required at each timestep, but the lower the suboptimality gap; in the limit, as $\ell→∞$ , the suboptimality gap is zero. This is a tighter result than the one derived in [17], as in the latter no incremental properties of the optimal controller are used. Specifically, when looking at the limit case of $\eta_{k}=0,\;k∈\mathbb{N}$ , the suboptimality gap in [17] is strictly positive, while the bound in Theorem 9 vanishes, reflecting the exact matching of the suboptimal and benchmark trajectories. The derived bounds can be used by control designers to give a quantifiable measure of the finite-time suboptimality of the controller. This can then be utilized to find the best sequence of $\ell_{k}$ to deliver a desired tradeoff between suboptimality and computational power. In practice, the suboptimal MPC can be asymptotically stable even when the number of optimization iterations $\ell$ is less than $\ell^{\star}$ . In this case, the existence of a forward invariant region of attraction $\mathcal{D}^{\mu}$ is not given, but the bounds in Theorem 1 and Corollary 1 still hold, as long as the closed-loop suboptimal system stays stable. This is shown in the following numerical example. 5.4 Numerical Example The suboptimal TD-MPC scheme described in this section is implemented for the following linearized, continuous-time model of an inverted pendulum from [50], [17] $$ A_{c}=\left[\begin{array}[]{cc}0&1\\ 14.7&0\\ \end{array}\right],\>B_{c}=\left[\begin{array}[]{c}0\\ 30\end{array}\right], $$ where the state is $x=[\theta,~{}\dot{\theta}]^{→p}$ , $\theta$ is the angle relative to the equilibrium position and the control input is the applied torque. We consider the control of the discretized model of the plant with a sampling time of $T_{s}=0.1$ . The input constraint set is taken to be $\mathcal{U}=[-1,1]$ , the cost matrices are $Q=I_{2}$ , and $R=1$ and the initial state is $x_{0}=[-\pi/4~{}\pi/3]^{→p}$ . <details> <summary>2312.05607v3/x4.png Details</summary> ![945958a1d4f6068de48145d0831db1c31db519c9898a4832fd4843656fad4cfc](http://localhost:8000/v1/image/945958a1d4f6068de48145d0831db1c31db519c9898a4832fd4843656fad4cfc) ### Visual Description # Technical Document Extraction ## Left Chart: Trajectory Comparison ### Axes - **X-axis**: θ [rad] (ranging from -1 to 0) - **Y-axis**: θ̇ [rad/s] (ranging from 0.2 to 1.2) ### Legend - **Optimal MPC**: Green dots - **TD-MPC**: Purple dots ### Data Points 1. **x₀**: θ = -1 rad, θ̇ = 1.0 rad/s 2. **x₁**: θ = -0.8 rad, θ̇ = 1.1 rad/s 3. **x₂**: θ = -0.6 rad, θ̇ = 0.95 rad/s 4. **x₃**: θ = -0.4 rad, θ̇ = 0.8 rad/s 5. **x_T**: θ = 0 rad, θ̇ = 0 rad/s ### Trends - **Optimal MPC**: Smooth, gradual decline from x₀ to x_T. - **TD-MPC**: Steeper decline, diverges from Optimal MPC at θ ≈ -0.5 rad. ## Right Chart: Computation Time vs. Performance Metrics ### Axes - **X-axis**: Avg. Computation Time [ms] (log scale: 10⁻⁴ to 10²) - **Y-axis**: - R_T(x₀) (green line) - η^ℓ S_T / (1 - η^ℓ) (red line) ### Legend - **R_T(x₀)**: Green line - **η^ℓ S_T / (1 - η^ℓ)**: Red line ### Data Points 1. **ℓ = 1**: - R_T(x₀): 10² ms, 10² - η^ℓ S_T / (1 - η^ℓ): 10² ms, 10² 2. **ℓ = 6**: - R_T(x₀): 10¹ ms, 10¹ - η^ℓ S_T / (1 - η^ℓ): 10¹ ms, 10¹ 3. **ℓ = 40**: - R_T(x₀): 10⁰.⁵ ms, 10⁰ - η^ℓ S_T / (1 - η^ℓ): 10⁰.⁵ ms, 10⁰ 4. **ℓ = 125**: - R_T(x₀): 10⁰ ms, 10⁻¹ - η^ℓ S_T / (1 - η^ℓ): 10⁰ ms, 10⁻¹ ### Trends - **R_T(x₀)**: Exponential decay as computation time increases. - **η^ℓ S_T / (1 - η^ℓ)**: Exponential decay with ℓ, plotted on log-log scale. ## Spatial Grounding - **Left Chart Legend**: Lower-left corner (green/purple dots). - **Right Chart Legend**: Lower-left corner (green/red lines). ## Component Isolation 1. **Left Chart**: Trajectory comparison with labeled waypoints. 2. **Right Chart**: Log-log plot of computation time vs. performance metrics. ## Notes - All labels, axis titles, and legends are in English. - No non-English text detected. - Data points and trends cross-verified for consistency. </details> Figure 3: The phase plot on the left shows two separate trajectories generated by applying respectively TD-MPC, with a constant $\ell=6$ (in green) and optimal MPC policies (in purple). The logarithmic scale plot on the right shows the empirical suboptimality gap (in green), as well as the order of the upper bound decrease (in red) as $\ell$ , and therefore the computation time is increased. The two curves on the right are parameterized by $\ell$ , ranging from $1$ to $5000$ . The left panel of Figure 3 shows the evolution of two trajectories in closed-loop with the TD-MPC policy with $\ell=6$ and with an optimal MPC. For this example $\ell^{\star}=849$ . However, even with the low value of $\ell=6$ the closed-loop system stays stable, as also observed in [9, 50]. Although the asymptotic/exponential stability cannot be proven, the finite time bounds can still be computed online using only the suboptimal states as per Corollary 1. The order of this upper bound, as well as the empirically observed suboptimality gap, $\mathcal{R}_{T}(x_{0})$ , are plotted in the right panel of Figure 3 for $T=30$ for a range of values of $\ell$ , increasing from $1$ to $5000$ . The decrease of the suboptimality gap for increasing values of $\ell$ is juxtaposed with the increase of simulation/computational time in the same figure. The simulation time for each $\ell$ is calculated as the sum of the times it takes to solve the TD-MPC for each timestep, over the horizon $T$ . To obtain an averaged value for this time, its average over $100$ repeated independent runs from the same initial conditions is taken. The initial states are intialized in $\Sigma_{N}$ following the procedure described in [50]. In the right panel of the figure, only the complexity $\frac{\eta^{\ell}}{1-\eta^{\ell}}\mathcal{S}_{T}$ of the suboptimality gap upper bound is plotted. The true constant multiplying the complexity term above is much larger; our aim here is not to compare the very conservative theoretical bound with the practical performance, but to give an estimate of whether the bound captures the order correctly. Indeed, it can be observed from the figure that the order of the true suboptimality gap is approximately captured by the upper bound with an underestimation. Among other possible uses of the closed-loop suboptimality analysis, the insights in the figure can be used to design the allocation of finite computational resources. The right-side plot in the figure can be used to estimate the relative gain in computational time and loss in optimality for a given change in $\ell$ . For example, a change of $\ell=6$ to $\ell=40$ , or equivalently $\eta^{\ell}=0.92$ to $\eta^{\ell}=0.56$ results in a $3.6$ times increase in computational time and a $9.3$ times decrease in the suboptimality gap bound. This provides an approximation for the variation of the true subopimality gap that decreased $8.7$ times in the same interval. 6 Conclusions We study the finite-time suboptimality gap of policies for discrete-time nonlinear, time-varying systems. We show that when the benchmark policy is chosen to be exponentially incrementally stable, then given a geometric improvement condition on the suboptimal policy, its suboptimality gap scales with its pathlength and improvement factor. We further show, that for non-smooth nonlinear systems E- $\delta$ -ISS is implied by exponential stability under certain conditions. The assumptions are verified on the suboptimal linear quadratic MPC use case and on a numerical example. The generality of the provided analysis enables the study of other examples where the suboptimality is due to unknown system parameters or cost functions, for example in the fields of adaptive and online control. We leave the analysis of these use cases as well as the treatment of measurement and process noise to future work. Proof of Lemma 4 * Proof* Given a state $\xi∈\mathcal{D}$ and some $N∈\mathbb{R}_{+}$ , let $$ V(k_{0},\xi)=\sum_{k=0}^{N-1}\phi(k+k_{0},k_{0},\xi)^{\top}\phi(k+k_{0},k_{0},% \xi). $$ Then | | $\displaystyle V(k_{0}$ | $\displaystyle,\xi)=$ | | | --- | --- | --- | --- | and, from Definition 2 $$ V(k_{0},\xi)\leq\sum_{k=0}^{N-1}d^{2}\|\xi\|^{2}\lambda^{2k}\leq\frac{d^{2}}{1% -\lambda^{2}}\|\xi\|^{2}. $$ Thus, (13) is satisfied with $c_{1}=1$ and $c_{2}=\frac{d^{2}}{1-\lambda^{2}}$ . To show that (14) holds, consider | | $\displaystyle V(k_{0}+1,f(k_{0},\xi))-V(k_{0},\xi)$ | | | --- | --- | --- | Choosing $N$ large enough such that $d^{2}\lambda^{2N}<1$ , ensures (14) holds with $\beta^{2}=1-\frac{\left(1-d^{2}\lambda^{2N}\right)}{c_{2}}∈(0,1)$ since $c_{2}≥ c_{1}=1$ and $c_{2}∈\mathbb{R}_{+}$ . Finally, for some $(\xi,\zeta)∈\mathcal{D}×\mathcal{D}$ and $k_{0}∈\mathbb{N}$ , denote $\Delta\phi(k,k_{0},\xi,\zeta):=\phi(k+k_{0},k_{0},\xi)-\phi(k+k_{0},k_{0},\zeta)$ and consider | | | $\displaystyle|V(k_{0},\xi)-V(k_{0},\zeta)|$ | | | --- | --- | --- | --- | where the last inequality follows from the exponential stability and $L_{f}$ -Lipschitz continuity of the nonlinear mapping. Taking $c_{3}=\sum_{k=0}^{N-1}d\lambda^{k}L_{f}^{k}$ completes the proof. ∎ Proof of Theorem 4 * Proof* The condition in (15) implies the contraction of the dynamics within the contraction region $\mathcal{\tilde{D}}_{0}$ , as defined in [12]. We will first show that there exists a $r_{\mathcal{\tilde{D}}_{0}}$ such that for all initial states $x∈\mathcal{B}(r_{\mathcal{\tilde{D}}_{0}})$ the dynamics (11) evolve in a forward invariant set contained in $\mathcal{\tilde{D}}_{0}$ . To see this, for some $r∈\mathbb{R}_{+}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ define $\mathcal{E}_{0}(k,r):=\{x∈\mathbb{R}^{n}\ |\ \|x\|_{P(k)}≤ r\}$ , and $r^{\star}:=\max\{r>0\ |\mathcal{E}_{0}(k,r)⊂eq\mathcal{\tilde{D}}_{0},,% \;∀ k∈\mathbb{N}_{≥ k_{0}}\}$ . This implies that if a state satisfies $x∈\mathcal{E}_{0}(k,r^{\star})$ , then $x∈\mathcal{\tilde{D}}_{0}$ necessarily. The largest ball contained in the intersection of the sets $\mathcal{E}_{0}(k,r^{\star}),\;k∈\mathbb{N}_{≥ k_{0}}$ is then given by $\mathcal{B}(\frac{r^{\star}}{\bar{P}})$ , where $P(k)\preceq\bar{P}I$ for all $k∈\mathbb{N}_{≥ k_{0}}$ , and $\bar{P}>0$ exists since $P(k)$ is uniformly bounded. Consider now the differential displacement $\delta x$ for the dynamics (11), defining an infinitesimal displacement at a given time $k$ . The shortest path integral with respect to the metric $P(k)$ between $0 0$ and a given state $x_{k}$ at time $k$ is given by $$ V_{\ell}(\delta x_{k},k):=\int_{0}^{x_{k}}\|\delta x\|_{P(k)}=\|x_{k}\|_{P(k)}, $$ where the second equality follows from the fact that the shortest path integral corresponds to the Riemannian distance (see e.g. [52, 11, 13]). Using the same arguments as in [13, Thm 2.8] or [11, Thm. 15], one can then show that for any $x_{k}∈\mathcal{B}(\frac{r^{\star}}{\bar{P}})$ , it holds that $V_{\ell}(\delta x_{k+1},k+1)≤\rho V_{\ell}(\delta x_{k},k)<r^{\star}$ . This, in turn, implies that $\|x_{k+1}\|_{P(k+1)}<r^{\star}\implies x_{k+1}∈\mathcal{E}_{0}(k+1,r^{\star}% )\implies x_{k+1}∈\mathcal{\tilde{D}}_{0}$ . Taking $r_{\mathcal{\tilde{D}}_{0}}=\frac{r^{\star}}{\bar{P}}$ completes the claim that the dynamics are forward invariant within $\mathcal{\tilde{D}}_{0}$ . For any initial state in $\mathcal{B}(r_{\mathcal{\tilde{D}}_{0}})$ , uniform local exponential incremental stability with a rate $\rho$ then follows from standard Lyapunov arguments, see e.g. [13, Thm 2.8]. ∎ Proof of Theorem 5 The proof hinges on extending results from [44], [53], [10] and [11]. Before presenting the proof, we provide the following auxiliary definition and theorem. **Definition 4** *[53, 44] Given a closed, positively invariant set $\mathcal{A}⊂\mathbb{R}^{n}$ , the system (11) is uniformly exponentially stable in $\mathcal{D}$ with respect to $\mathcal{A}$ , if there exist constants $d∈\mathbb{R}_{+}$ and $\lambda∈(0,1)$ , such that for all $\xi∈\mathcal{D}$ and $k∈\mathbb{N}_{≥ k_{0}}$ $$ |\phi(k,k_{0},\xi)|_{\mathcal{A}}\leq d|\xi|_{\mathcal{A}}\lambda^{k-k_{0}}. $$* The extension of the converse Lyapunov results from [10] and [44] to the case of stability in a forward invariant region is included below for completeness. **Theorem 10** *If the system (11) is uniformly exponentially stable in $\mathcal{D}$ with respect to some closed set $\mathcal{A}$ , then there exists a function $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}→\mathbb{R}_{+}$ and constants $c_{1},c_{2}∈\mathbb{R}_{+}$ and $\beta∈(0,1)$ , such that for all $\xi∈\mathcal{D}$ , and $k∈\mathbb{N}_{≥ k_{0}}$ $$ \displaystyle c_{1}|\xi|^{2}_{\mathcal{A}}\leq V(k,\xi)\leq c_{2}|\xi|^{2}_{% \mathcal{A}}, \displaystyle V\left(k+1,f(k,\xi)\right)\leq\beta^{2}V(k,\xi). \tag{31} $$* * Proof* Following the same line of arguments from the proof of Lemma 4, consider a $x∈\mathcal{D}$ and some $N∈\mathbb{R}_{+}$ and let $$ V(k_{0},\xi)=\sum_{k=0}^{N-1}|\phi(k+k_{0},k_{0},\xi)|^{2}_{\mathcal{A}}, \tag{33} $$ for some $k_{0}∈\mathbb{N}$ . Then, $$ V(k_{0},\xi)=|\xi|^{2}_{\mathcal{A}}+\sum_{k=1}^{N-1}|\phi(k+k_{0},k_{0},\xi)|% ^{2}_{\mathcal{A}}\geq|\xi|^{2}_{\mathcal{A}}, $$ and, from Definition 4 $$ V(k_{0},\xi)\leq\frac{d^{2}}{1-\lambda^{2}}|\xi|^{2}_{\mathcal{A}}. $$ Next, consider | | $\displaystyle V(k_{0}+1,f(k_{0},\xi))-V(k_{0},\xi)$ | | | --- | --- | --- | Choosing $N$ large enough such that $d^{2}\lambda^{2N}<1$ completes the proof with $c_{1}=1$ , $c_{2}=\frac{d^{2}}{1-\lambda^{2}}$ , and $\beta^{2}=1-\frac{\left(1-d^{2}\lambda^{2N}\right)}{c_{2}}∈(0,1)$ since $c_{2}≥ c_{1}=1$ and $c_{2}∈\mathbb{R}_{+}$ . ∎ * Proof of Theorem5* Consider the augmented system $$ \left\{\begin{array}[]{l}\xi_{k+1}=f(k,\xi_{k})\\ \zeta_{k+1}=f(k,\zeta_{k}).\end{array}\right. $$ We define the diagonal as the set $\Delta:=\{\left[\xi^{→p},\xi^{→p}\right]^{→p}∈\mathcal{D}×% \mathcal{D}:\xi∈\mathcal{D}\}$ . Let $z:=[\xi^{→p},\zeta^{→p}]^{→p}∈\mathcal{D}×\mathcal{D}$ , then it is shown in [10] that $$ |z|_{\Delta}=\frac{\sqrt{2}}{2}\|\xi-\zeta\|. \tag{34} $$ Then, considering the evolution of the combined system $$ z_{k+1}=\mathcal{F}(k,z_{k}):=\begin{bmatrix}f(k,\xi_{k})\\ f(k,\zeta_{k})\end{bmatrix}, \tag{35} $$ one can note that $z_{k}∈\mathcal{D}×\mathcal{D}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ , for any $(\xi_{k_{0}},\zeta_{k_{0}})∈\mathcal{D}×\mathcal{D}$ , since $\mathcal{D}$ is forward invariant. It follows that system (11) is exponentially incrementally stable in $\mathcal{D}$ if and only if (35) is exponentially stable in $\mathcal{D}$ with respect to the diagonal $\Delta$ . Moreover, using Theorem 10, the combined system (35) admits a Lyapunov function satisfying (31)-(32). Given a $z∈\mathcal{D}×\mathcal{D}$ , it follows from the equivalence in (34), and the proof of Theorem 10, that (16)-(17) are satisfied with $c_{1}=\frac{\sqrt{2}}{2}$ , and $c_{2}=\frac{d^{2}}{\sqrt{2}-\lambda^{2}\sqrt{2}}$ for the following Lyapunov function $$ V(k_{0},z)=\sum_{k=0}^{N-1}|\phi_{\mathcal{F}}(k_{0}+k,k_{0},z)|^{2}_{\Delta}, $$ where $\phi_{\mathcal{F}}:\mathbb{N}_{≥ k_{0}}×\mathbb{N}_{≥ k_{0}}×% \mathbb{R}^{2n}→\mathbb{R}^{2n}$ is the state transition matrix for the system (35). We show the inequality (18) next. For any $(\xi,\zeta,\tilde{\xi},\tilde{\zeta})∈\mathcal{D}×\mathcal{D}×% \mathcal{D}×\mathcal{D}$ , $k∈\mathbb{N}_{≥ k_{0}}$ , $w:=[\tilde{\xi}^{→p},\tilde{\zeta}^{→p}]^{→p}$ and $z$ defined as above | | $\displaystyle\begin{split}&|V(k_{0},z)-V(k_{0},w)|\\ &=\left|\sum_{k=0}^{N-1}\left(|\phi_{\mathcal{F}}(k_{0}+k,k_{0},z)|^{2}_{% \Delta}-|\phi_{\mathcal{F}}(k_{0}+k,k_{0},w)|^{2}_{\Delta}\right)\right|\end{split}$ | | | --- | --- | --- | where the last two inequalities follow from the Lipschitz continuity and exponential stability (with respect to $\Delta$ ) of the solutions. The first inequality follows from the following. For any two vectors $\xi,\zeta∈\mathcal{D}$ and closed, convex set $\mathcal{A}$ | | $\displaystyle|\xi|_{\mathcal{A}}-|\zeta|_{\mathcal{A}}$ | $\displaystyle=|\xi|_{\mathcal{A}}-\|\zeta-\zeta_{p}\|≤\|\xi-\zeta_{p}\|-\|% \zeta-\zeta_{p}\|$ | | | --- | --- | --- | --- | where $\zeta_{p}∈\mathcal{A}$ is such that $|\zeta|_{\mathcal{A}}=\|\zeta-\zeta_{p}\|$ , which exists and is unique, since $\mathcal{A}$ is closed and convex. Finally, choosing $c_{3}=\sum_{k=0}^{N-1}\frac{d\lambda^{k}L^{k}}{\sqrt{2}}$ completes the proof. ∎ Proof of Theorem 6 * Proof* Given the nonlinear system (11), consider the evolution of two, respectively unperturbed and perturbed trajectories | | $\displaystyle x_{k+1}$ | $\displaystyle=f(k,x_{k}),$ | | | --- | --- | --- | --- | for some $(x_{k_{0}},y_{k_{0}})∈\mathcal{D}×\mathcal{D}$ , where $w_{k}∈\mathcal{B}(r_{w})$ , for some $r_{w}∈\mathbb{R}_{+}$ is such that $y_{k}∈\mathcal{D}$ for all $k∈\mathbb{N}_{≥ k_{0}}$ . Given $f$ is uniformly exponentially incrementally stable in $\mathcal{D}$ , then from Theorem 5, there exists a Lyapunov function $V:\mathbb{N}_{≥ k_{0}}×\mathcal{D}×\mathcal{D}→\mathbb{% R}_{+}$ satisfying (16)-(18). Then, for any $(x,y)∈\mathcal{D}×\mathcal{D}$ , $k∈\mathbb{N}_{≥ k_{0}}$ , and $w∈\mathcal{B}(r_{w})$ | | $\displaystyle\begin{split}&V\left(k+1,f(k,x),f(k,y)+w\right)-V\left(k,x,y% \right)=\end{split}$ | | | --- | --- | --- | where the first inequality follows from Theorem 5, and the second from the triangle inequality. Completing the square, and denoting $c_{4}:=(1-\beta^{2})c_{1}$ it follows from above that | | $\displaystyle\begin{split}&V\left(k+1,f(k,x),f(k,y)+w\right)-V\left(k,x,y% \right)≤\end{split}$ | | | --- | --- | --- | where the second inequality follows by the fact that $\left(a+b\right)^{2}≤ 2a^{2}+2b^{2}$ for any $a,b∈\mathbb{R}$ , and the last inequality from Theorem 5. Finally, rearranging the Lyapunov equations it follows that $$ V\left(k+1,f(k,x),f(k,y)+w\right)\leq\rho^{2}V\left(k,x,y\right)+c_{5}\|w\|^{2}, $$ for all $k∈\mathbb{N}_{≥ k_{0}}$ , with $\rho^{2}:=1-\frac{c_{4}}{2c_{2}}∈(0,1)$ , since $c_{2}>c_{4}$ , and $c_{5}:=\frac{2c_{3}^{2}L_{f}^{2}}{c_{4}}$ . Unrolling the recursion and using (16) | | $\displaystyle c_{1}\|x_{k}-y_{k}\|^{2}$ | | | --- | --- | --- | Dividing by $c_{1}$ and taking the square root, completes the proof | | $\displaystyle\|x_{k}-y_{k}\|≤\sqrt{\frac{c_{2}}{c_{1}}}\rho^{(k-k_{0})}\|x_% {k_{0}}$ | $\displaystyle-y_{k_{0}}\|$ | | | --- | --- | --- | --- | ∎ References - [1] L. S. Pontryagin, Mathematical theory of optimal processes. CRC press, 1987. - [2] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966. - [3] D. Bertsekas, Lessons from AlphaZero for optimal, model predictive, and adaptive control. Athena Scientific, 2022. - [4] D. Bertsekas, Dynamic programming and optimal control: Volume I, vol. 4. Athena scientific, 2012. - [5] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. - [6] K. J. Åström and B. Wittenmark, Adaptive control. Courier Corporation, 2013. - [7] Y. Li, X. Chen, and N. Li, “Online optimal control with linear dynamics and predictions: Algorithms and regret analysis,” Advances in Neural Information Processing Systems, vol. 32, 2019. - [8] E. Hazan, S. Kakade, and K. Singh, “The nonstochastic control problem,” in Algorithmic Learning Theory, pp. 408–421, PMLR, 2020. - [9] D. Liao-McPherson, T. Skibik, J. Leung, I. Kolmanovsky, and M. M. Nicotra, “An analysis of closed-loop stability for linear model predictive control based on time-distributed optimization,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2618–2625, 2021. - [10] D. Angeli, “A Lyapunov approach to incremental stability properties,” IEEE Transactions on Automatic Control, vol. 47, no. 3, pp. 410–421, 2002. - [11] D. N. Tran, B. S. Rüffer, and C. M. Kellett, “Convergence properties for discrete-time nonlinear systems,” IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3415–3422, 2018. - [12] W. Lohmiller and J.-J. E. Slotine, “On contraction analysis for non-linear systems,” Automatica, vol. 34, no. 6, pp. 683–696, 1998. - [13] H. Tsukamoto, S.-J. Chung, and J.-J. E. Slotine, “Contraction theory for nonlinear stability analysis and learning-based control: A tutorial overview,” Annual Reviews in Control, vol. 52, pp. 135–169, 2021. - [14] A. Davydov, S. Jafarpour, and F. Bullo, “Non-euclidean contraction theory for robust nonlinear stability,” IEEE Transactions on Automatic Control, vol. 67, no. 12, pp. 6667–6681, 2022. - [15] F. Bullo, Contraction Theory for Dynamical Systems. Kindle Direct Publishing, 1.1 ed., 2023. - [16] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002. - [17] A. Karapetyan, E. C. Balta, A. Iannelli, and J. Lygeros, “On the finite-time behavior of suboptimal linear model predictive control,” in 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 5053–5058, IEEE, 2023. - [18] N. Hovakimyan and C. Cao, $\mathcal{L}1$ adaptive control theory: Guaranteed robustness with fast adaptation. SIAM, 2010. - [19] N. M. Boffi, S. Tu, and J.-J. E. Slotine, “Regret bounds for adaptive nonlinear control,” in Learning for Dynamics and Control, pp. 471–483, PMLR, 2021. - [20] A. Karapetyan, E. C. Balta, A. Tsiamis, A. Iannelli, and J. Lygeros, “On the regret of recursive methods for discrete-time adaptive control with matched uncertainty,” 2024 63rd IEEE Conference on Decision and Control (CDC), arXiv preprint arXiv:2404.02023, 2024. - [21] G. Belgioioso, D. Liao-McPherson, M. H. de Badyn, S. Bolognani, R. S. Smith, J. Lygeros, and F. Dörfler, “Online feedback equilibrium seeking,” IEEE Transactions on Automatic Control, vol. 70, no. 1, pp. 203–218, 2025. - [22] Z. He, S. Bolognani, J. He, F. Dörfler, and X. Guan, “Model-free nonlinear feedback optimization,” IEEE Transactions on Automatic Control, 2023. - [23] M. Diehl, R. Findeisen, F. Allgöwer, H. G. Bock, and J. P. Schlöder, “Nominal stability of real-time iteration scheme for nonlinear model predictive control,” IEE Proceedings-Control Theory and Applications, vol. 152, no. 3, pp. 296–308, 2005. - [24] A. Zanelli, Q. Dinh, and M. Diehl, “A lyapunov function for the combined system-optimizer dynamics in nonlinear model predictive control,” Automatica, 2020. - [25] D. Liao-McPherson, M. M. Nicotra, and I. Kolmanovsky, “Time-distributed optimization for real-time model predictive control: Stability, robustness, and constraint satisfaction,” Automatica, vol. 117, p. 108973, 2020. - [26] M. N. Zeilinger, C. N. Jones, D. M. Raimondo, and M. Morari, “Real-time MPC-stability through robust MPC design,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp. 3980–3986, IEEE, 2009. - [27] S. Richter, C. N. Jones, and M. Morari, “Computational complexity certification for real-time MPC with input constraints based on the fast gradient method,” IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1391–1403, 2011. - [28] M. N. Zeilinger, C. N. Jones, and M. Morari, “Real-time suboptimal model predictive control using a combination of explicit MPC and online optimization,” IEEE Transactions on Automatic Control, vol. 56, no. 7, pp. 1524–1534, 2011. - [29] L. Grüne, J. Pannek, M. Seehafer, and K. Worthmann, “Analysis of unconstrained nonlinear MPC schemes with time varying control horizon,” SIAM Journal on Control and Optimization, vol. 48, no. 8, pp. 4938–4962, 2010. - [30] A. Ferramosca, D. Limón, I. Alvarado, and E. F. Camacho, “Cooperative distributed MPC for tracking,” Automatica, vol. 49, no. 4, pp. 906–914, 2013. - [31] K. P. Wabersich and M. N. Zeilinger, “Cautious bayesian MPC: regret analysis and bounds on the number of unsafe learning episodes,” IEEE Transactions on Automatic Control, vol. 68, no. 8, pp. 4896–4903, 2022. - [32] Y. Lin, Y. Hu, G. Qu, T. Li, and A. Wierman, “Bounded-regret MPC via perturbation analysis: Prediction error, constraints, and nonlinearity,” Advances in Neural Information Processing Systems, vol. 35, pp. 36174–36187, 2022. - [33] B. P. Demidovich, “Lectures on stability theory (in Russian),” 1967. - [34] A. L. Dontchev, M. Krastanov, R. T. Rockafellar, and V. M. Veliov, “An euler–newton continuation method for tracking solution trajectories of parametric variational inequalities,” SIAM Journal on Control and Optimization, vol. 51, no. 3, pp. 1823–1840, 2013. - [35] S. M. Robinson, “Strongly regular generalized equations,” Mathematics of Operations Research, vol. 5, no. 1, pp. 43–62, 1980. - [36] A. L. Dontchev, I. Kolmanovsky, M. I. Krastanov, M. Nicotra, and V. M. Veliov, “Lipschitz stability in discretized optimal control with application to SQP,” SIAM Journal on Control and Optimization, vol. 57, no. 1, pp. 468–489, 2019. - [37] M. N. Zeilinger, M. Morari, and C. N. Jones, “Soft constrained model predictive control with robust stability guarantees,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1190–1202, 2014. - [38] N. Chatzikiriakos, K. P. Wabersich, F. Berkel, P. Pauli, and A. Iannelli, “Learning soft constrained MPC value functions: Efficient MPC design and implementation providing stability and safety guarantees,” in 6th Annual Learning for Dynamics & Control Conference, pp. 387–398, PMLR, 2024. - [39] A. B. Taylor, J. M. Hendrickx, and F. Glineur, “Exact worst-case convergence rates of the proximal gradient method for composite convex minimization,” Journal of Optimization Theory and Applications, vol. 178, no. 2, pp. 455–476, 2018. - [40] R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” in International conference on machine learning, pp. 343–352, PMLR, 2015. - [41] A. Srikanthan, A. Karapetyan, V. Kumar, and N. Matni, “Closed-loop analysis of admm-based suboptimal linear model predictive control,” IEEE Control Systems Letters, vol. 8, pp. 3195–3200, 2024. - [42] H. Khalil, “Nonlinear systems, third edition, vol. 115,” Upper Saddle River, NJ, USA: Patience-Hall, 2002. - [43] W. M. Haddad and V. Chellaboina, Nonlinear dynamical systems and control: a Lyapunov-based approach. Princeton university press, 2008. - [44] Z.-P. Jiang and Y. Wang, “A converse Lyapunov theorem for discrete-time systems with disturbances,” Systems & control letters, vol. 45, no. 1, pp. 49–58, 2002. - [45] A. Pavlov, A. Pogromsky, N. van de Wouw, and H. Nijmeijer, “Convergent dynamics, a tribute to Boris Pavlovich Demidovich,” Systems & Control Letters, vol. 52, no. 3-4, pp. 257–261, 2004. - [46] Z.-P. Jiang and Y. Wang, “Input-to-state stability for discrete-time nonlinear systems,” Automatica, vol. 37, no. 6, pp. 857–869, 2001. - [47] J. Köhler and M. N. Zeilinger, “Globally stable and locally optimal model predictive control using a softened initial state constraint–extended version,” arXiv preprint arXiv:2207.10216, 2022. - [48] W. J. Rugh, Linear system theory. Prentice-Hall, Inc., 1996. - [49] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, no. 6, pp. 789–814, 2000. - [50] J. Leung, D. Liao-McPherson, and I. V. Kolmanovsky, “A computable plant-optimizer region of attraction estimate for time-distributed linear model predictive control,” in 2021 American Control Conference (ACC), pp. 3384–3391, IEEE, 2021. - [51] D. Limón, T. Alamo, F. Salas, and E. F. Camacho, “On the stability of constrained MPC without terminal constraint,” IEEE transactions on automatic control, vol. 51, no. 5, pp. 832–836, 2006. - [52] S. Singh, A. Majumdar, J.-J. Slotine, and M. Pavone, “Robust online motion planning via contraction theory and convex optimization,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5883–5890, IEEE, 2017. - [53] E. D. Sontag and Y. Wang, “New characterizations of input-to-state stability,” IEEE transactions on automatic control, vol. 41, no. 9, pp. 1283–1294, 1996.

Rendering Paper...