### Prediction of Dynamical Systems

It goes without saying, the prediction of dynamical systems is an essential area of scientific research and industry. Often, it is beneficial to engineers and scientists to develop surrogate models for systems that may otherwise require expensive simulations to resolve. The best way to think of a surrogate is a cheap and fairly accurate approximation to the true physics, something to get you in the ball park. However, for most problems of interest, the underlying dynamics tend to be extremely complex which makes modeling a challenge. The traditional approach would be to make approximations of the true physics such as the use of a closure model, yet this can introduce prediction error and uncertainty. This raised the question: Is there some method to project a dynamical system into space that simplifies the dynamics without physical approximations?

Koopman operator theory allows for precisely this, a method to simply the dynamics that can theoretically be applied to all physical phenomena. In recent years, Koopman subspaces have had a growing interest in academia due to its theoretical promises. Here we will discuss the foundations of Koopman operators and how machine learning can be used to discover such embedding to simplify complex dynamics.

### Koopman Operators

In the context of Koopman operators, we will focus on dynamical systems that can be modeled as: $\dot{\boldsymbol{x}} = f(\boldsymbol{x}),$ where $\boldsymbol{x}$ is an element of the state space $S\in\mathbb{R}^{n}$ (the current state of the system) and $f$ is a vector field that lies in same space as $S$. More often than not, dynamical systems are typically discretized for the sake of numerical analysis which leads to the following representation of the same system but with a discrete time map: $\boldsymbol{x}^{t+1} = T(\boldsymbol{x}^{t}),$ in which $T(\cdot) : S \rightarrow S$ can be referred to as the dynamic mapping which performs time-integration on the state variables. Given that we will be interested in the prediction of time-steps of the dynamical system, we will be generally interested in this discrete time formulation.

In 1931, Bernard Koopman proposed the existence of a compositional operator for all dynamical systems that can be expressed as the equations above [1]. Now known as the Koopman operator (or the left-adjoint of the Frobenius-Perron operator), this allows for a discrete dynamical system to be decomposed as follows: $Ug(\boldsymbol{x}) = g \circ T(\boldsymbol{x}),$ in which $U$ denotes the Koopman operator which is nothing more than a linear transformation. The continuous time version of the Koopman operator contains a single time parameter, but essentially follows the same form: $U(t)g(\boldsymbol{x}) = g \circ F(t,\boldsymbol{x}),$ in which $F$ is the flow map between the initial state to time $t$ in continuous space. Now you may be thinking “Wow, any dynamical system can be evolved using a linear transformation! That sounds fantastic!” Indeed it does, however the catch is that the Koopman operator updates some unknown vector of real valued observables of this dynamical system denoted by $g: S \rightarrow \mathbb{R}$. Typically, these values $g$ are referred to as the Koopman observables. To make matters even more difficult, theoretically, the vector space of these observables is infinite which practically speaking will need to be approximated. We can view the use of Koopman operators as a trade off from simple observables and complex dynamics to complex observables and simple dynamics.

At this point this may sound extremely silly. Why would I ever want to work with a dynamical representation with observables I don’t know and are theoretically infinite for any given system? A perfectly valid question and you would be in good company. There is a reason why this theory layed somewhat dormant until recently.

### Koopman Spectral Analysis

For the sake of discussion, lets just pretend we know the Koopman operator and observables. Typically, the Koopman operator and its observables are decomposed into eigenvalues and functions to better understand the underlying characteristics: $U \psi_{i} = \lambda_{i} \psi_{i}, \quad g(\boldsymbol{x}) = \sum_{i=1}^{\infty} \psi_{i}(\boldsymbol{x})c_{i},$ where $\left\{\lambda_{1}, \lambda_{2},…\right\}$ and $\left\{\psi_{1}, \psi_{2},…\right\}$ are the eigenvalues and eigenfunctions of the Koopman operator, respectively. With this decomposition, we suppose that the observables can also be expressed though the linear summation of the eigenfunctions and set of coefficients $\left\{c{1}, c{2},…\right\}$. Remember our Koopman operator is theoretically infinite dimensional, so theoretically we have an infinite set of orthogonal basis functions that we can project onto. A key insight is that $g(\boldsymbol{x})$ can be expressed at any time-step (we are dealing with discrete time here) though the simple summation: $g(\boldsymbol{x}_{t}) =\sum^{\infty}_{i=1} \lambda_{i}^{t}\psi_{i}(\boldsymbol{x}_{0})c_{i}.$ Time-step indexes have been switched to subscripts (i.e. $\boldsymbol{x}_{t}$ and $\boldsymbol{x}_{0}$) to not draw confusion with exponential operations. We arrive at this result by repeatedly multiplying $g(\boldsymbol{x}_{0})$ by the Koopman operator $U$ until the system is evolved to time-step $t$. This result tells us that $g(\cdot)$ is based on the Koopman modes $\psi_{i}(\boldsymbol{x}_{0})c_{i}$ which evolve with the frequency and decay rate of $\angle \lambda_{i}$ and $|\lambda_{i}|$. Note how these eigenvalues and Koopman modes are in fact invariant to time. This Koopman Mode Decomposition is a popular approach to model non-linear dynamics as both the eigenvalues and eigenvectors can reveal insightful phenomena of the underlying physics. To explicitly make sure this invariance is understood, the value of the Koopman observables at three different time-steps follows: $g(\boldsymbol{x}_{1}) =\sum^{\infty}_{i=1} \lambda_{i}\psi_{i}(\boldsymbol{x}_{0})c_{i}, \quad g(\boldsymbol{x}_{2}) =\sum^{\infty}_{i=1} \lambda_{i}\lambda_{i}\psi_{i}(\boldsymbol{x}_{0})c_{i}, \quad g(\boldsymbol{x}_{3}) =\sum^{\infty}_{i=1} \lambda_{i}\lambda_{i}\lambda_{i}\psi_{i}(\boldsymbol{x}_{0})c_{i}.$

### Machine Learning Koopman Embeddings

With an understanding of what the Koopman operator is and the physical insights it can provide, the objective is to now formulate a method for learning these Koopman observables, $g(\boldsymbol{x})$, as well as the Koopman operator itself. In the age of big data, machine learning methods have become a viable way for discovering these Koopman subspaces relatively efficiently. This is the exact reason there has been a resurgence in interest regarding Koopman operators in recent years. Of course, anyone with a background in machine learning knows there is a surplus of different methods to typically achieve the same goal and learning Koopman operators is no exception. Common methods include dynamic mode decomposition (DMD) as well as its similar variants such as extended DMD (EDMD) or sparse identification of non-linear dynamics (SINDy). More recently, the growing interest in deep learning has resulted in several works also exploring the use of deep neural networks for learning Koopman embeddings as well.

###### DMD/EDMD/SINDy:
• Williams, Matthew O., Ioannis G. Kevrekidis, and Clarence W. Rowley. “A data–driven approximation of the koopman operator: Extending dynamic mode decomposition.” Journal of Nonlinear Science 25.6 (2015): 1307-1346. [Link]
• Brunton, Steven L., et al. “Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control.” PloS one 11.2 (2016). [Link]
• Korda, Milan, and Igor Mezić. “Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control.” Automatica 93 (2018): 149-160. [Link]
###### Deep Neural Networks:
• Takeishi, Naoya, Yoshinobu Kawahara, and Takehisa Yairi. “Learning Koopman invariant subspaces for dynamic mode decomposition.” Advances in Neural Information Processing Systems. (2017). [Link]
• Li, Qianxiao, et al. “Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator.” Chaos: An Interdisciplinary Journal of Nonlinear Science 27.10 (2017): 103111. [Link]
• Mardt, Andreas, et al. “VAMPnets for deep learning of molecular kinetics.” Nature communications 9.1 (2018): 1-11. [Link]
• Lusch, Bethany, J. Nathan Kutz, and Steven L. Brunton. “Deep learning for universal linear embeddings of nonlinear dynamics.” Nature communications 9.1 (2018): 1-10. [Link]
• Yeung, Enoch, Soumya Kundu, and Nathan Hodas. “Learning deep neural network representations for Koopman operators of nonlinear dynamical systems.” 2019 American Control Conference (ACC). IEEE, 2019. [Link]
• Otto, Samuel E., and Clarence W. Rowley. “Linearly recurrent autoencoder networks for learning dynamics.” SIAM Journal on Applied Dynamical Systems 18.1 (2019): 558-593. [Link]

The main draw back of traditional DMD based methods is that one needs to pre-specify a library of potential observables to learn the Koopman operator from. Since the true Koopman observables are not known, we must hope that our library is expansive enough. In the deep learning approaches, a neural network is used to provide a learnable function from the system’s state variables to the Koopman observables (and vise-versa). A DMD based algorithm can still used to find the approximate Koopman operator, yet there is no need to provide a library of observations since the neural network will learn them. A neural network is attractive since it allows for the Koopman observables to potentially be extremely complex with respect to the system states if needed. Additionally, such deep learning methods are easy, they tend to be simple fully connected models. Now no prior expertise is needed to try to guess good observables, instead you let the deep neural network and gradient descent figure it out for you.

### Deep Learning Koopman Operator of the Duffing Equation

Lets take a look at an example implementation of using deep neural networks for the estimation of the Koopman operator and observables. Consider the following second-order differential equation, known as the Duffing equation: $\ddot{x}=-\delta\dot{x} - x\left(\beta + \alpha x^{2}\right).$ Here we consider the parameters: $\alpha=1$, $\beta=-1$ and $\delta=0.5$ with the states being $\boldsymbol{\phi} = \left[x, \dot{x}\right]$. This results in the system having two sinks located at $[1,0]$ and $[-1,0]$. We selected this system because it has been studied in past Koopman works such as Williams et al. [3] and Takeishi et al. [4], so we know this is a good toy example. The solution of the duffing equation for two random initial states is shown below as well as scatter plot of the solution of a few hundred initial states.

To learn the Koopman operator and observables we will follow a similar methodology proposed in Lusch et al. [5] for which the Koopman operator will be learned through gradient descent rather than calculated though a DMD procedure. As previously discussed, our model will consist of two fully-connected neural networks. One to encode input states to the Koopman observables, the other to decode from the Koopman observables to state space. The input to our model is the $10$ previous time-steps of the system, $\boldsymbol{x}^{t} = \left[\boldsymbol{\phi}^{t-9},\boldsymbol{\phi}^{t-8},\boldsymbol{\phi}^{t-7},\boldsymbol{\phi}^{t-6},\boldsymbol{\phi}^{t-5},\boldsymbol{\phi}^{t-4},\boldsymbol{\phi}^{t-3},\boldsymbol{\phi}^{t-2},\boldsymbol{\phi}^{t-1},\boldsymbol{\phi}^{t}\right],$ which was found to improve learning compared to just using the current state similar to that of Takeishi et al. [4]. Of course, this now means we must provide the first $10$ time-steps to predict a given initial state which must be solved numerically, yet the intent here is not to computationally beat a numerical solver. Rather the focus is on the identification of the Koopman modes/operator.

Table of core model and training parameters. For more details please see the provided source code.
Model Parameters Training Parameters
Encoder Model $20\rightarrow 100\rightarrow 100\rightarrow 50$ Training Cases 200
Decoder Model $50\rightarrow 100\rightarrow 100\rightarrow 20$ Epochs 600
Activation Units ReLU Mini-batch Size 16
Learning Rate 0.001

We learn $50$ observables and the learnable Koopman operator is set to be a skew-symmetric matrix with a learnable positive diagonal. Thus for a set of $n$ observables there are only $n(n-1)$ learnable parameters for the $n\times n$ Koopman matrix.

$U = \begin{bmatrix} u_{11} & u_{12} & \dots & u_{1n} \\ -u_{12} & u_{22} & & u_{2n} \\ \vdots & & \ddots & \vdots \\ -u_{1n} & -u_{2n} &\dots & u_{nn} \end{bmatrix}, \quad \left\{u_{11},u_{22},...,u_{nn}\right\} \geq 0,$

This is done to enforce complex symmetry in the eigenvalues with positive real components. This is similar to Lusch et al. [5] where a block Jordan Koopman operator is learned but our model is more general here. The model was trained with $200$ different initial states each with $100$ time-steps. The loss of the model consists of four components: reconstruction error, the state prediction error, the Koopman observable error and Koopman operator decay error. The Koopman operator decay error helps promote some level of sparsity in $U$ and prevent its elements from getting too extreme. As with most loss functions, we have a whole collection of tunable mixing parameters. I am going to pretend these don’t exist but if you really care check out the code.

$\mathcal{L} = MSE\left(\boldsymbol{x}^{t}, \hat{\boldsymbol{x}}^{t}\right) + MSE\left(\boldsymbol{x}^{t+1}, \hat{\boldsymbol{x}}^{t+1}\right) + MSE\left(\boldsymbol{y}^{t+1}, \hat{\boldsymbol{y}}^{t+1}\right) + \frac{1}{n}\left\| U \right\|_{1}$

### Results

Enough about the details of the model and training, this is just a blog post after all not a journal paper. With the model trained, lets take a look at the results. First, lets just see if evolving the system in the Koopman space yields accurate predictions. This isn’t trivial as we are entirely relying on the accuracy of the Koopman observables and operator to describe the dynamics.

We have already shown the MSE of a test set of 64 initial states during training which converges to 0.74 after 600 epochs. To get a better qualitative understanding of how the model performs four random test initial conditions are plotted below. Over all the predictions are fairly accurate for the 120 time-steps of each test cases, considering that the system is evolved through just one matrix the performance is acceptable.

Digging deeper into the learned Koopman operator (i.e. the learned dynamics), we plot the final eigenvalues of the matrix. The major notes:

• The Koopman decay term forces sparsity of the eigenvalues.
• The eigenvalues with smaller real components have minimal influence past the first several time-steps.
• The eigenvalues are symmetric in the complex domain because our matrix is skew-symmetric.

Lastly, we plot the eigenvectors of the learned Koopman matrix. These Koopman eigenvectors are computed by $\psi\cdot g(\boldsymbol{x})$ at different locations in state-space. This reveals the underlying structure of the dynamics that have been extracted, so in some sense this is the end result. Since these eigenvalues are complex, the magnitude and angle of the eigenvectors corresponding to the top 5 eigenvalues are plotted below. The major notes:

• The most dominate eigenvector clearly has discovered the two basins of the duffing system.
• The most dominate complex eigenvectors are symmetric as expected.
• These two eigenvectors have a phase that corresponds to polar coordinates around each basin. Thus in some sense, these eigenvectors control the rotation near each sink! This result aligns with the findings of past literature [3].
• The last two eigenvectors are again symmetrical in the imaginary dimension and have clearly distinguished the two basins.
• Overall from the top 5 eigenvectors we can see a lot of rich dynamics of the duffing system represented.

### Final Thoughts

If you’ve made it this far, you’ve probably had enough of Koopman operators for now. Hopefully this was a good introduction to both Koopman operators and using some basic machine learning to learn them. I certainly learned a lot writing it. Some closing and critical notes I think people should be aware of:

• This was not super easy to train, this took me over two weeks to get this model tuned and working.
• Learning the Koopman operator through gradient descent is hard, you NEED to enforce constraints on its form.
• Maybe combining DNNs with a DMD methods would be more reliable.
• There is no way in hell using a discovered Koopman operator to predict a complex dynamical system will work reliably. There is a reason why Koopman literature is all focused on little ODE attractors like the duffing equation.
• This doesn’t mean Koopman isn’t useful! Koopman can allow for physical analysis and control of many non-linear systems, which is really cool.

Stay critical and thanks for reading!

### Code

You can grab the code for the model and data used in the post on Github.

### References

1. B.O. Koopman, Hamiltonian systems and transformation in Hilbert space, Proceedings of the National Academy of Sciences of the United States of America. 17 (1931) 315. [Link]
2. H. Arbabi, Introduction to Koopman operator theory of dynamical systems, University of California, Santa Barbara, 2020. [Link]
3. M.O. Williams, I.G. Kevrekidis, C.W. Rowley, A data–driven approximation of the koopman operator: Extending dynamic mode decomposition, Journal of Nonlinear Science. 25 (2015) 1307–1346. [Link]
4. N. Takeishi, Y. Kawahara, T. Yairi, Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition, in: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017: pp. 1130–1140. [Link]
5. B. Lusch, J.N. Kutz, S.L. Brunton, Deep learning for universal linear embeddings of nonlinear dynamics, Nature Communications. 9 (2018) 1–10. [Link]