You no longer need to be a genius to do pattern matching

Tapping into life – An Introduction to Stream Analytics

Dear Readers,

Welcome to a new stream (no pun intended) on Red Mavericks articles. This time, we’ll be doing an introduction on Oracle’s new Stream Analytics.

We’ll be guiding you through this new, and very cool, product showing what it is and what it can do to leverage this largely untapped resource which is event stream analysis. In fact, streams are everywhere and are becoming more and more open and accessible. If you “wiretap” these, listen to them and understand the behavioral patterns , you can build extremely valuable applications that will help you deliver more to your customers.

It’s a whole new ball game. I hope you find this interesting.

What is Oracle Stream Analytics?

Oracle Stream Analytics (previously Oracle Stream Explorer) is, in fact, an application builder platform, focused on applications that process events coming from the most various systems, internal or external to the organization, thus enabling Business Insight information and deriving relevant data from these events.

Stream Analytics - Login Screen

Stream Analytics – Welcome to Fast Data Business Insight

It works using an Event Processing Engine to perform Fast Data Analysis over a large number of events that typically appear in a given timeframe.

It also provides a run-time platform that will allow you to run and manage the applications you built.

It’s not a new Oracle Event Processor. It uses OEP as the underlying Event Processing Engine (you can also use Apache Spark as a processing engine, if you prefer. More on this in other articles)

The real power in Oracle Stream Analytics is, curiously, in its UI. As an application builder, it went to great lengths to keep the UI really easy to use. The result is, in my view, very well achieved, with enough simplicity to allow that Business Users, provided they have a bit of technical knowledge, can actually build  applications on their own or with little help from the IT.

Concepts and Ideas

But to be able to build these applications, you must first understand the concepts and rules behind them. We’ll explain these by mixing real-life concepts and their representations on the platform (Oracle Stream Analytics). Let’s start by the main concepts…

Event

An Event is the representation of something that happened in a particular time. This is most important, as events must always be correlated with a notion of time, of when it happened.

Shape

A Shape is the data structure representation of an event. It describes the actual information structure of an event, to ensure at least a minimum of data coherence between events that represent the same occurrence type. If you have a bit of technical knowledge, try to think of the shapes as the XSD of the event.

Events that represent the same type of occurrence should use the same Shape. Events that represent different types of occurrences should use different shapes.

Stream

A Stream is a sequence of data elements (in this case Events) made available over time. These data elements have shapes that must be known before hand to allow proper processing. The easiest way to visualize a Stream is to think of a food processing plant conveyor belt transporting vegetables from one point to another inside the plant.

A Bell Pepper Stream

A Bell Pepper “Stream” – Photo by the US Department of Agriculture

As the vegetables go through the conveyor belt they will be made available at a given time at the output of the belt. This will be the point where the person or the system will collect the bell peppers and process them.

Source

A Source represents the system that is making a given stream available. Typically it represents a system that is producing its own data streams or “proxying” data streams from other systems. Stream Analytics will connect to Sources by making Connections to them.

Target

A Target is a channel to where Stream Analytics will send the result of the event processing work. A Target will connect downstream to other systems and will obey to a given Shape.

Exploration

An Exploration is Stream Analytics‘ way to process events. It allows for events to be filtered, combined and enriched with additional data, as well as allowing for event data manipulation and conversion, when suited, thus producing their own events which are the result of all of this processing.

Explorations can use other the product of other Explorations as their inputs, as well as Streams and Reference Data Tables (called simply References in Stream Analytics), which are used to enrich the Exploration outputs.

For instance, a Stream can contain the status of a given vending machine, identified by an internal vending machine ID, while its GPS coordinates are stored in a reference database table. This way, the vending machine doesn’t have to send the GPS coordinates every 5 seconds along with the status, as this information will not change frequently or by itself.

Pattern

A Pattern is, well… a pattern 🙂 a repetitive regularity that can be identified by some means.

Stream Analytics allow to create new Explorations based on given patterns such as trends over time, geospace boundary checks, Top/Bottom N matches, etc… and, if there are matches, pass these on to Targets.

Stream Analytics - Patterns Palette

Stream Analytics – Patterns Palette

Timeframe

A Timeframe defines the time window reference for a given Exploration event processing. Stream Analytics allow you to define two characteristics of the Timeframe:

  • Range – The universe of events that will be considered when making Exploration processing, for instance by using aggregate functions. In plain English, the range is used to limit the events considered calculating averages, max values or event counts (e.g: Nr of Events of type A happening in the last 30 minutes). As there can be too much events, it’s essential to have some kind of boundaries in which the analysis makes sense
    • If a sensor states that is operating below a given threshold, it’s important to know that it’s not a sporadic event that happens once a year, but something that is happening every 2 minutes in the last hour.
  • Eval. Frequency – The frequency in which the events are passed on to the Exploration. Sometimes, it’s important to collect the data from the Exploration inputs not at every milisecond, but in bigger intervals. This will stipulate the cadence at which a given exploration produces results (and thus pushes them to Targets)

Although some of these concepts may seem confusing and unclear, as we go through the next articles and use them, they’ll become second nature.

 

So that’s a wrap on this article.

On our next article, we’ll start building our example application. Be prepared to have some fun doing it. Until then…

Maverick (José Rodrigues)

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *