This this the first of a series of 4 articles I am writing about time-series. You wonder why this topic ? I come from neurosciences and in this domain, signal processing of EEG data is a really interesting though difficult topic. Then, turning into being a developer and now an AI engineer, I deal with other kind of time-series data (from body sensors for example) and realised that there was a lack in the understanding of these data. I made (and will probably keep on making) mistakes due to this understanding and writing these articles is a way to share and keep track of what I ha learned so far.
The importance of time series is prodigious. We are no longer, and it’s been a while, in a static world. Everything is moving, and it is moving faster than most of us can even think. This is making our brain quite bad at analysing the data coming from connected sources. Too much ordered information to compute. These information can come from IoT devices, stock markets, the survey active online in website and so on. Hopefully, computing and machine learning methods are here to help us manipulate these data and give us insight on the information they drive.
Why time series analysis is important? Well in life, what you are doing today can (and will) have an effect on what will happened tomorrow. As a DataScientist you should know that the big power you have implies big responsibilities. You have the duty to create models that will explain the properly your data. How the data you keep preciously on your hard drive (because you don’t trust cloud) will influence the future? You have the power to predict which event the fly of a butterfly will produce! More seriously, timeseries are all around, analysing them involve understanding various aspects about the inherent nature of the world that wurrounds us. Being able to manage them and get the best insight from them can really change our life for the best… or not.
So before digging more seriously into this world, time as come for some definitions and the most important one is: Time-series. We are talking about it and, eventhough this notion is known by a lot of people (and probably you), what if I ask you to define it?
Let’s do this together. Imagine a quantitative value, le gyrometer of this fancy sport watch you secretly want for Christmas. The value of its linear acceleration is varying through time. Here we are, we got time. What about series. If you’re familiar with the python library pandas, a Serie is a fancy kind of list. So easy shortcut, a series is a list of values. Time-series coin a list of a specific value varying through time. Easy !
What now? We are not done yet. This definition is very simplist so let’s go a little deeper.
First, this kind of data is supposed to be continuous but to be honest even if time is so, we are not able to measure it or any other values varying with if in a continuous way. So what? We are catching the value of a continuous data called time-series at discrete moments in time. The tricky part here is that if these discrete moments are too far in time from each other, we can miss useful information. On the contrary, if these moments are too close, we are likely to be overwhelmed with useless data. One can easily imagine that it would be better to be in the second case than the first one. This is addressed by the Nyquist theorem.
Time series are by definition, ordered. This means that the position of a given point in time is driven by the position of the points before. This can be due to different components (this is why you will often hear about component analysis.
- Trend component: it has no cycle, it is “just” increasing or decreasing. This is mainly found in stock market analysis.
- Seasonal component: this is an easy one. Its value depends on season like wood price for fireplace or symphonic orchestra price for new years eve concert.
- Cyclic component: seasonal component is kind of cyclic but here we are more likely to find out data that are measured on a long scale such as, stock crash, epidemic and so on.
- Unpredictable: these data or event are, by nature, stochastic. It is difficult (nearly impossible) to predict them.
These characteristics were closely linked to what we call the period of a signal. The period of a cyclic signal is the time it takes to realise a full cycle.
- Amplitude: is the maximum displacement from a mean value.
- Frequency: this is 1/period. As we define the period as the number of time unit it takes to perform a cycle, the frequency is the number of event by period of time. A good example of frequencies in daily life is sound. Notes that are composing more complex sounds have very specific frequencies.
Fig1: Illustration of some time-series characteristics.
There are some other characteristics of signal such as wavelengths but I will address them when needed not to bother you with extra information too soon. If you want to play with the different characteristics of a time series, I invite you to go there in order to measure the effect of each of them on the data.
When talking about medical field lots of people agree that we are all different. But despite this we are using the same pills, therapies etc. What if AI or ML could adapt medicine to each one of us. Create the perfect pill to cure our headache considering our age, gender, way of life, medical history? Thus we’ll end up with the perfect pill and the perfect cure recommendation (number of takes, for how many days and so on).
The very first application of math in a medical purpose does not come from physicians but from an insurance company. Their goal: predict if a customer will be more likely to die in the year to come. Yes, this is not such a surprise right? The surprise on the other hand comes from the time in which it took place. The 17th century. This what is called an innovation came from a guy called John Graunt. He is considered as being the creator of the life table and thus the originator of demography.
Figure 2: John Graunt’s actuarial tables.
John Graunt’s actuarial tables were one of the first results of time series style thinking applied to medical questions. Image is in the public domain and taken from the Wikipedia article on John Graunt
Nowadays, time series are not the most studied data in medicine. Indeed studies are more focused on visual data to help medical teams to detect cancer and so on. Plus, the lack of sharing and the difficulty of working as spread teams is making the aggregation of sufficient quantity of data difficult. In this context, clinical studies keep on being the norm. However, some experiment mixing visual and time-series data are driven. This is how an AI is capable of predicting blues evolution more precisely than any pratician. Lately, timeseries has been used as epidemiological predictors and both local and international political decisions are made to help this field to be developed. Nevertheless, we still have troubles to anticipate the course of an epidemic.
The medical field in which time series are widely used for more than a century now is neurosciences.
Indeed, physicians and researchers have discovered the electrical activity of the brain (citation) and technology like EEC (Electroencephalogram) is used since the first quarter of 20th century(source). This is not a surprise if trials to match mathematical models to the brain’s behaviour have soon been made.
One of the problem is that at first, EEG data were mainly from patient and so related to a disease. It has been important to measure brain electrical activity on a priori healthy people in order to compare, understand that when a given function is lost, this is related with these anatomical and these electrical modifications etc. Since then loads of students were asked to register their brain doing numerous tasks in order to compare to patients. Nowadays, healthcare is benefiting from the digital revolution of the late 20th/early 21st century. With the advent of wearable sensors and smart electronic medical devices, we are entering an era where healthy adults are taking routine measurements, either automatically or with minimal manual input. (as is) The issue with a Iot of devices is the precision of the data, one would minor its weight a little and major its size. Plus we cannot be sure that it is worn by its actual owner and the global health of his if he decide not to share that he has diabetes or any other personal information. Medical field is no longer a physician’s world, several different actors are trying to forecast people’s biometric data with more or less ethics.
Western world has been shacked by several crisis. These crises have left scars on our banking system and in order to predict such big changes models are applied to help bankers make the right decision at the right moment. Early banking system relying on data forecasting and analysis gave rise of economic indicators. Most of them are still in use today. In the banking system, almost every decision rely on time series data management. One of the first approach is data visualization. This technique help human being to handle ordered data by transposing them in an unordered world. Indeed, if machines are very good to process huge amount of data stored in databases of unbeatable excel files containing trillions of rows and thousands of columns, our brain is not meant for this. Our brain is made for images, sounds, touch. Our body is analogous when we are trying to feed it with binary data. An other approach is found in expert models or models that can adapt to the tendencies, evolve and give a good insight. Wait a minute, isn’t it artificial intelligence ?
Human beings have always wished to predict (and why not act on) the weather. After being a philosopher’s affair in the antique time, weather forecast has been taken seriously by scientists and today thousands of recording stations are spread all over the world in order to understand the phenomenon driven by mother nature. Weather forecasting is nothing more than a time-series prediction game. If the tools used at the very beginning of this field relied purely on complicated algorithms, the tendency has been to simplify them in the name of the economy principle. Then some machine learning was added to these expert algorithms in order to assemble the results “automatically” and take a good decision. Finally today’s attempts are more focused on deep learning techniques. Despite the fashion of deep learning this has a real scientific interest but let’s keep this for later on.
I’ve told a lot about time series to make predictions but this is not the only thing we can do with them. Time-series can be used to encrypt or decrypt, add noise to a signal. This is mainly use in signal management systems such as communication or espionage. Finally, time series data can be used in order to simply understand what is around us. Trying to figure the effect of a signal on an other by simple statistical analysis of the properties of the data.