TIME MANAGEMENT – Data Sources and Ingestion

There are many scenarios where the timestamp of a message/event is critical to the success of an application or business process. Consider a banking transaction that applies credits and debits to an account in the order they are performed. Also consider scenarios where you would need to implement first in, first out (FIFO) or last in, last out (LIFO) operations. From a streaming perspective, knowing the timeframe in which a stream event was received is useful for aggregations and comparing data between two streams. Azure Stream Analytics provides two options to handle time: the System.Timestamp() property and the TIMESTAMP BY clause.

As an event enters the stream and passes through Azure Stream Analytics, a timestamp is associated with it at every stage. This is the case for both event and IoT hub events. Consider the following query, followed by sample output:

SELECT READINGTIMESTAMP, READINGTYPE, System.Timestamp() t
FROM brainwaves
+————————–+—————–+——————————+
| READINGTIMESTAMP         | READINGTYPE     | t                            |
+————————–+—————–+——————————+
| 2022-03-17T14:00:00.0000 | Brainjammer-POW | 2022-03-17T14:00:01.9019671Z |
| 2022-03-17T15:01:15.0000 | Brainjammer-POW | 2022-03-17T15:01:17.9176555Z |
| …                        | …               | …                            |
+————————–+—————–+——————————+

Notice in the data result that there is a small difference between the two timestamps. You need to determine which of the timestamps is most important in your solution: the time in which the event was generated on source or when the event arrived at the Azure Stream Analytics stream. The other option is to use the TIMESTAMP BY clause, similar to the following:

SELECT READINGTIMESTAMP, READINGTYPE, System.Timestamp()
FROM brainwaves
TIMESTAMP BY DATEADD(millisecond, READINGTIMESTAMP, ‘1970-01-01T00:00:00Z’)

The impact of this is that the platform uses the datetime value existing within the READINGTIMESTAMP as the event timestamp instead of the System.Timestamp() value. The point here is that the platform will do its best to process the streamed data events in the order they are received using the default event timestamp. If your requirements call for something different from that, you have these two options to change the default behavior.

Outputs

The inputs are where the data streams are coming from, and the outputs are where the data is to be stored after the query is performed on it. Figure 3.79 shows a few examples of the different Azure products that the data can be placed onto. If the data needs to be transformed more before it is ready for serving and consumption, a good location would be Azure Synapse Analytics. If the data is ready for consumption, then it can be streamed real time to a Power BI workspace. There will be more on this in later chapters, Chapter 7.

Write a Comment

Your email address will not be published. Required fields are marked *