Darts: Time Series Made Easy in Python

Darts: Time Series Made Easy in Python

Time series simply represent data points over time. They are thus everywhere in nature and in business: temperatures, heartbeats, births, population dynamics, internet traffic, stocks, inventories, sales, orders, factory production — you name it. In countless cases, efficient processing and forecasting of time series has the potential to provide decisive advantages. It can help businesses adapt their strategies ahead of time (e.g. if production can be planned in advance), or improve their operations (e.g. by detecting anomalies in complex systems). Although there exist many models and tools for time series, they are still often nontrivial to work with, because they each have their own intricacies and cannot always be used in the same way. At Unit8, we often work with time series and thus we started developing our own tool to make our lives simpler. We also decided to contribute to the community by open-sourcing it. In this article, we introduce Darts, our attempt at simplifying time series processing and forecasting in Python.

If you are a data scientist working with time series you already know this: time series are special beasts. With regular tabular data, you can often just use scikit-learn for doing most ML things — from preprocessing to prediction and model selection. But with time series, the story is different. You can easily end up in situations where you need one library for pre-processing (e.g. Pandas to interpolate missing values and re-sample), another to detect seasonality (e.g. statsmodels), a third one to fit a forecasting model (e.g. Facebook Prophet), and finally more often than not you’ll have to implement your own backtesting and model selection routines. This can be quite tedious, as most libraries use different APIs and data types. And that’s not even mentioning cases involving more complex models based on neural networks, or problems involving external data and more dimensions. In such cases you’d likely have to implement the models yourself for your use-case, for instance using libraries such as Tensorflow or PyTorch. Overall, we feel that the experience of doing machine learning on time series in Python is just not really smooth, yet.

is open source and available here. You can install it in your favourite Python environment as follows:

The basic data type in Darts is , which represents a multivariate (and possibly probabilistic) time series. It can be very easily built, for example from a Pandas . Compared to a , a comes with some additional guarantees to ensure that it represents a well-formed time series with a proper time index. Behind the scenes, it can also optionally store several samples, which is a convenient way of representing the probabilistic outcomes of certain models. Here’s how to build one from a Pandas :

In the above snippet, we first read a containing the air passengers dataset. We then build a (univariate) , specifying the time and value columns ( and , respectively).

Let’s now split our series in a training and validation , and train an exponential smoothing model on the training series:

That’s it, we now have a prediction over our validation series. We can plot it, along with the actual series:

That’s all it takes. Note that the plot contains confidence intervals. By default, if a is probabilistic, Darts will show its 5th and 95th percentiles (here the series is probabilistic because we called with ).

As you may have guessed, we are mimicking the scikit-learn and pattern for training models and making forecasts. The function takes in argument a training and the function returns a new representing the forecast. This means that models manipulate , and this is pretty much the only data type being manipulated in Darts. This allows users to easily swap and compare models. For example, we could have just as easily used an auto-ARIMA model (which behind the scenes wraps around pmdarima):

Basically, Darts is based on the following simple principles:

Darts already contains working implementations of the following forecasting models:

Darts has a strong support for deep learning models. These models provide richer functionalities, such as the ability to be trained on multiple series and covariates. This can simply be done by calling with sequences of instead of unique . It can scale to large datasets and use GPUs.

In addition, the library also contains functionalities to backtest forecasting and regression models, perform grid search on hyper-parameters, pre-process , evaluate residuals, and even perform automatic model selection. Finally, it also contains some filtering models (such as the Kalman filter and Gaussian processes), which allow to perform probabilistic filtering and inference on time series.

In our example above, we used Darts to obtain once a forecast over the next 36 months starting in January 1958. However, forecasts often need to be updated as soon as new data becomes available. With Darts, it’s easy to compute the forecasts resulting from such a process, using backtesting. For instance, using backtesting to compare two models looks as follows:

The function is available on all models. It takes a time series, a starting point (here, we are starting at half of the series) and a forecast horizon. It returns the containing the historical forecasts would have been obtained when using the model to forecast the series with the specified forecast horizon (here 3 months), starting at the specified timestamp (using an expanding window strategy).

The return type is a , and so we can quickly compute error metrics — for instance here the mean absolute percentage error:

In addition, because the return type of is a , we can also simply consume the outputs features series in regression models, which can serve to ensemble (stack) the forecasts made by several models, and potentially also include external time series data. All the models also have a function that works similarly, but directly returns the distributions of errors (for a desired error function) instead.

There is a lot more that we did not cover here. We provide a series of example notebooks covering more material. For instance, you can look at the intro notebook, or see how to easily train RNNs or TCNs neural networks using the and pattern. In addition, we also recommend consulting the Darts documentation and watching our introductory video:

It is possible to train some Darts models on multiple time series, optionally using covariate time series as well. We have covered these features in more details in a dedicated article.

We are actively developing Darts and adding new features. For instance here are a couple of things on our roadmap:

We are welcoming contributions and issues on github. This is also the best place to get up-to-date information on the library.

Finally, Darts is one of the tools we are using internally during our day-to-day AI/ML work for several companies. If you think your company could benefit from time series solutions or have other data-centric issues, don’t hesitate to contact us.

Images Powered by Shutterstock