The previous blog post in this series provided a brief introduction to the SQL Window Function construct, which can be used as a basis for many sorts of ordered calculations within SQL. This post elaborates on how this query capability can be used for a specific type of problem that frequently shows up in time series analysis, which is the recognition of simple patterns of movement within a series. Time series data can be found in many real world applications, including click stream processing, financial analysis, and sensor data.
Within a financial time series, one common problem of interest is identifying a V pattern, e.g. a strictly falling price followed by a strictly rising price. To identify this pattern within a SQL query it is useful to build up to a solution using several layers of subquery to facilitate the processing. The first stage of the query marks each row as either rising or falling, which is done using the window function lag() to compare the current row to the preceding row.
SELECT ticker, ts, price, price - lag(price, 1) OVER (w) > 0 as rising, price - lag(price, 1) OVER (w) < 0 as falling FROM market_data WINDOW w as (PARTITION BY ticker ORDER BY ts) ORDER BY ticker, ts; ticker | ts | price | rising | falling --------+----------------------------+-------+--------+--------- abcd | 2014-01-14 11:00:44.863071 | 57.98 | | abcd | 2014-01-14 11:00:47.727358 | 57.96 | f | t abcd | 2014-01-14 11:00:50.631657 | 57.96 | f | f abcd | 2014-01-14 11:00:53.624203 | 57.95 | f | t abcd | 2014-01-14 11:00:56.584564 | 57.93 | f | t abcd | 2014-01-14 11:00:58.776706 | 57.94 | t | f abcd | 2014-01-14 11:01:01.99315 | 57.95 | t | f abcd | 2014-01-14 11:01:03.289345 | 57.94 | f | t
In this example, the ticker starts at 57.98, drops down to 57.93, and then rises to 57.95 before it drops again. This query identifies which trades are moving in which direction. The next step is to use this initial query as a subquery and filter out instances where the stock is holding steady, which can be done by adding an outer where clause, which will restrict the input where the stock is either rising or falling. It is also possible to begin the next stage of processing–detecting if next record is moving in the same direction as the current record.
WITH market_rising_falling as ( SELECT ticker, ts, price, price - lag(price, 1) OVER (w) > 0 as rising, price - lag(price, 1) OVER (w) < 0 as falling FROM market_data WINDOW w as (PARTITION BY ticker ORDER BY ts) ) SELECT ticker, ts, price, rising, falling, rising != lead(rising, 1) OVER (w2) as change FROM market_rising_falling WHERE rising is null or rising or falling WINDOW w2 as (PARTITION BY ticker ORDER BY ts); ticker | ts | price | rising | falling | change --------+----------------------------+-------+--------+---------+-------- abcd | 2014-01-14 11:00:44.863071 | 57.98 | | | abcd | 2014-01-14 11:00:47.727358 | 57.96 | f | t | f abcd | 2014-01-14 11:00:53.624203 | 57.95 | f | t | f abcd | 2014-01-14 11:00:56.584564 | 57.93 | f | t | t abcd | 2014-01-14 11:00:58.776706 | 57.94 | t | f | f abcd | 2014-01-14 11:01:01.99315 | 57.95 | t | f | t abcd | 2014-01-14 11:01:03.289345 | 57.94 | f | t |
In this query peaks are now marked as true; 57.93 is the low point of the V pattern and 57.95 is the right side peak. There are also two rows that are NULL with respect to the calculated change field, which are our first and last records within the window. For these rows it is not possible to know if they are peaks as the calculation of a peak requires knowing the value of the previous/next record in the series; these records will need some special case handling.
The next step is to make one more pass to filter out rows that are not changing direction, and then determine the low and high marks for direction change. This is accomplished using lag and lead to retrieve the previous/next peak after filtering out the non-peak records.
WITH market_rising_falling as ( SELECT ticker, ts, price, price - lag(price, 1) OVER (w) > 0 as rising, price - lag(price, 1) OVER (w) < 0 as falling FROM market_data WINDOW w as (PARTITION BY ticker ORDER BY ts) ), market_change as ( SELECT ticker, ts, price, rising, falling, rising != lead(rising, 1) OVER (w2) as change FROM market_rising_falling WHERE rising is null or rising or falling WINDOW w2 as (PARTITION BY ticker ORDER BY ts) ) SELECT ticker, ts, lag(price, 1) OVER (w3) as prev_peak, price, lead(price, 1) OVER (w3) as next_peak FROM market_change WHERE change is null or change WINDOW w3 as (PARTITION BY ticker ORDER BY ts); ticker | ts | prev_peak | price | next_peak --------+----------------------------+-----------+-------+----------- abcd | 2014-01-14 11:00:44.863071 | | 57.98 | 57.93 abcd | 2014-01-14 11:00:56.584564 | 57.98 | 57.93 | 57.95 abcd | 2014-01-14 11:01:01.99315 | 57.93 | 57.95 | 57.94 abcd | 2014-01-14 11:01:03.289345 | 57.95 | 57.94 |
This gives both “strictly rising then strictly falling” and “strictly falling then strictly rising” patterns. A simple filter will restrict this to just the “strictly falling then strictly rising” V pattern. A decision also needs to be made how to handle the edge conditions at the beginning and end of the window. In this next example the record will be kept so long as one of prev_peak or next_peak indicates that the record is the bottom of a V pattern.
WITH market_rising_falling as ( SELECT ticker, ts, price, price - lag(price, 1) OVER (w) > 0 as rising, price - lag(price, 1) OVER (w) < 0 as falling FROM market_data WINDOW w as (PARTITION BY ticker ORDER BY ts) ), market_change as ( SELECT ticker, ts, price, rising, falling, rising != lead(rising, 1) OVER (w2) as change FROM market_rising_falling WHERE rising is null or rising or falling WINDOW w2 as (PARTITION BY ticker ORDER BY ts) ), market_peaks as ( SELECT ticker, ts, lag(price, 1) OVER (w3) as prev_peak, price, lead(price, 1) OVER (w3) as next_peak FROM market_change WHERE change is null or change WINDOW w3 as (PARTITION BY ticker ORDER BY ts) ) SELECT * FROM market_peaks WHERE price < prev_peak or price < next_peak ORDER BY ticker, ts; ticker | ts | prev_peak | price | next_peak --------+----------------------------+-----------+-------+----------- abcd | 2014-01-14 11:00:56.584564 | 57.98 | 57.93 | 57.95 abcd | 2014-01-14 11:01:03.289345 | 57.95 | 57.94 |
If, instead, only the complete V patterns are desired, e.g. excluding the last partial result, then the “price < prev_peak or price < next_peak” expression can be changed to an AND expression “price < prev_peak AND price < next_peak”.
This concludes the introduction to Time Series Pattern Matching, part two of the Time Series Analysis blog posts. Stay tuned for the third article in the series, where resampling and interpolation will be discussed.