Introduction to Temporal Windows (Azure Stream Analytics)

In applications that process real-time events, it is common to perform some set-based computation (aggregation) or other operations over subsets of events that fall within some period of time. Because the concept of time is a fundamental necessity to complex event-processing systems, it’s important to have a simple way to work with the time component of query logic in the system. In Azure Stream Analytics, these subsets of events are defined through windows to represent groupings by time. This article describes windows and how they are defined, identifies the types of windows that are supported, and explains how you can use windows with various operators.

Understanding Windows

A window contains event data along a timeline and enables you to perform various operations against the events within that window. For example, you may want to sum the values of payload fields in a given window as shown in the following illustration.

Every window operation outputs event at the end of the window. The windows of Azure Stream Analytics are opened at the window start time and closed at the window end time. For example, if you have a 5 minute window from 12:00 AM to 12:05 AM all events with timestamp greater than 12:00 AM and up to timestamp 12:05 AM inclusive will be included within this window. The output of the window will be a single event based on the aggregate function used with a timestamp equal to the window end time. The timestamp of the output event of the window can be projected in the SELECT statement using the System.Timestamp property using an alias. Every window automatically aligns itself to the zeroth hour. For example, a 5 minute tumbling window will align itself to (12:00-12:05] , (12:05-12:10], …

There are four types of windows:

  1. Hopping Window
  2. Sliding Window
  3. Session Window
  4. Tumbling Window

Hopping Window

Unlike tumbling windows, hopping windows model scheduled overlapping windows. A hopping window specification consist of three parameters: the timeunit, the windowsize (how long each window lasts) and the hopsize (by how much each window moves forward relative to the previous one). Additionally, offsetsize may be used as an optional fourth parameter. Note that a tumbling window is simply a hopping window whose ‘hop’ is equal to its ‘size’.

The following illustration shows a stream with a series of events. Each box represents a hopping window and the events that are counted as part of that window, assuming that the ‘hop’ is 5, and the ‘size’ is 10.

Syntax

HOPPINGWINDOW ( timeunit  , windowsize , hopsize, [offsetsize] )   
HOPPINGWINDOW ( Duration( timeunit  , windowsize ) , Hop (timeunit  , windowsize ), [Offset(timeunit  , offsetsize)])  
  
Note

The Hopping Window can be used in the above two ways. If the windowsize and the hopsize has the same timeunit, you can use it without the Duration and Hop functions. The Duration function can also be used with other types of windows to specify the window size.

Arguments

timeunit

Is the unit of time for the windowsize or the hopsize. The following table lists all valid timeunitarguments.

Timeunit Abbreviations
day dd, d
hour hh
minute mi, n
second ss, s
millisecond ms
microsecond mcs

windowsize

A big integer which describes the size of the window. The windowsize is static and cannot be changed dynamically at runtime.

The maximum size of the window in all cases is 7 days.

hopsize

A big integer which describes the size of the Hop.

offsetsize

By default, hopping windows are inclusive in the end of the window and exclusive in the beginning – for example 12:05 PM – 1:05 PM window will include events that happened exactly at 1:05 PM, but will not include events that happened at 12:05:PM (these event will be part of 12:00 PM – 01:00 PM window).
The Offset parameter can be used to change behavior and include the events in the beginning of the window and exclude the ones that happened in the end.

Examples

SELECT System.TimeStamp AS WindowEnd, TollId, COUNT(*)  
FROM Input TIMESTAMP BY EntryTime  
GROUP BY TollId, HoppingWindow(Duration(hour, 1), Hop(minute, 5), Offset(millisecond, -1))  
  

Session window

Session windows group events that arrive at similar times, filtering out periods of time where there is no data. Session window function has three main parameters: timeout, maximum duration, and partitioning key (optional).

The following diagram illustrates a stream with a series of events and how they are mapped into session windows of 5 minutes timeout, and maximum duration of 10 minutes.

A session window begins when the first event occurs. If another event occurs within the specified timeout from the last ingested event, then the window extends to include the new event. Otherwise if no events occur within the timeout, then the window is closed at the timeout.

If events keep occurring within the specified timeout, the session window will keep extending until maximum duration is reached. Please note that the maximum duration checking intervals are set to be the same size as the specified max duration. For example, if the max duration is 10, then the checks on if the window exceed maximum duration will happen at t = 0, 10, 20, 30, etc.

Thus mathematically, our session window ends if the following condition is satisfied:

Stream Analytics session window 5 mins timeout & 10 mins maximum

When a partition key is provided, the events are grouped together by the key and session window is applied to each group independently. This is useful for cases where you need different session windows for different users or devices.

Syntax

SESSIONWINDOW(timeunit, timeoutSize, maxDurationSize) [OVER (PARTITION BY partitionKey)]

SESSIONWINDOW(Timeout(timeunit , timeoutSize), MaxDuration(timeunit, maxDurationSize)) [OVER (PARTITION BY partitionKey)]

Note

The Sliding Window can be used in the above two ways.

Arguments

timeunit Is the unit of time for the windowsize. The following table lists all valid timeunit arguments.

Timeunit Abbreviations
day dd, d
hour hh
minute mi, n
second ss, s
millisecond ms
microsecond mcs

timeoutsize

A big integer that describes the gap size of the session window. Data that occur within the gap size are grouped together in the same window.

maxdurationsize

If the total window size exceeds the specified maxDurationSize at a checking point, then the window is closed and a new window is opened at the same point. Currently, the size of the checking interval is equal to maxDurationSize.

partitionkey

An optional parameter that specifies the key that the session window operates over. If specified, the window will only group together events of the same key.

Examples

Suppose you have the following json data:

[
  // time: the timestamp when the user clicks on the link
  // user_id: the id of the user
  // url: the url the user clicked on
  {
    "time": "2017-01-26T00:00:00.0000000z",
    "user_id": 0,
    "url": "www.example.com/a.html"
  },
  {
    "time": "2017-01-26T00:00:20.0000000z",
    "user_id": 0,
    "url": "www.example.com/b.html"
  },
  {
    "time": "2017-01-26T00:00:55.0000000z",
    "user_id": 1,
    "url": "www.example.com/c.html"
  },
  // ...
]

To measure how long each user sessions are, you can use the following query:

CREATE TABLE localinput(time DATETIME, user_id BIGINT, url NVARCHAR(MAX))
SELECT
    user_id,
    MIN(time) AS window_start,
    System.Timestamp AS window_end,
    DATEDIFF(s, MIN(time), System.Timestamp) AS duration_in_seconds
FROM localinput TIMESTAMP BY time
GROUP BY user_id, SessionWindow(minute, 2, 60) OVER (PARTITION BY user_id)

The preceding query creates a session window with a timeout of 2 minutes, a maximum duration of 60 minutes and partitioning key of user_id. This means independent session windows will be created for each user_id. For each window, this query will generate output that contains the user_id, the start time of the window (window_start), the end of the window (window_end) and the total duration of the user session (duration_in_seconds).

Sliding Window

When using a sliding window, the system is asked to logically consider all possible windows of a given length. As the number of such windows would be infinite, Azure Stream Analytics instead outputs events only for those points in time when the content of the window actually changes, in other words when an event entered or exits the window.

The following diagram illustrates a stream with a series of events and how they are mapped into sliding windows of 10 seconds.

Syntax

SLIDINGWINDOW ( timeunit  , windowsize )   
SLIDINGWINDOW ( Duration( timeunit  , windowsize ) )  
  
Note

The Sliding Window can be used in the above two ways. To allow consistency with the Hopping Window, the Duration function can also be used with all types of windows to specify the window size.

Arguments

timeunit

Is the unit of time for the windowsize. The following table lists all valid timeunit arguments.

Timeunit Abbreviations
day dd, d
hour hh
minute mi, n
second ss, s
millisecond ms
microsecond mcs

windowsize

A big integer which describes the size of the window. The windowsize is static and cannot be changed dynamically at runtime.

The maximum size of the window in all cases is 7 days.

Examples

This example finds all toll booths which have served more than 3 vehicles in the last 5 minutes:

SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*)   
FROM Input TIMESTAMP BY EntryTime  
GROUP BY TollId, SlidingWindow(minute, 5)  
HAVING COUNT(*) > 3  

Tumbling Window

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a series of events and how they are mapped into 5-second tumbling windows.

Syntax

TUMBLINGWINDOW ( timeunit  , windowsize, [offsetsize] )  
TUMBLINGWINDOW ( Duration( timeunit  , windowsize ), [Offset(timeunit  , offsetsize)] )  
  
Note

The Tumbling Window can be used in the above two ways. To allow consistency with the Hopping Window, the Duration function can also be used with all types of windows to specify the window size.

Arguments

timeunit

Is the unit of time for the windowsize. The following table lists all valid timeunit arguments.

Timeunit Abbreviations
day dd, d
hour hh
minute mi, n
second ss, s
millisecond ms
microsecond mcs

windowsize

A big integer which describes the size of the window. The windowsize is static and cannot be changed dynamically at runtime.

The maximum size of the window is 7 days.

offsetsize

By default, tumbling windows are inclusive in the end of the window and exclusive in the beginning – for example 12:00 PM – 1:00 PM window will include events that happened exactly at 1:00 PM, but will not include events that happened at 12:00PM (these events will be part of 11:00 AM – 12:00 PM window).

The Offset parameter can be used to change this behavior and include the events in the beginning of the window and exclude the ones that happened in the end.

Examples

SELECT System.TimeStamp AS WindowEnd, TollId, COUNT(*)  
FROM Input TIMESTAMP BY EntryTime  
GROUP BY TollId, TumblingWindow(Duration(hour, 1), Offset(millisecond, -1))  

Note 

All windows should be used in a GROUP BY clause. The maximum size of the window in all cases is 7 days.

Leave a Reply