StreamSight

StreamSight is a query-driven framework for streaming analytics in edge computing. It supports users in composing analytic queries that are automatically translated and mapped to streaming operations suitable for running on distributed processing engines deployed in wide areas of coverage. Our reference implementation has been integrated with Apache Spark 2.1.1.

What does StreamSight do?

Data scientists and platform operators can compose complex analytic queries without any knowledge of the programmable model of the underlying distributed processing engine by solely using high-level directives and expressions. An example of such analytic query from the domain of intelligent transportation is the following:

COMPUTE
    ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
    BY city_segment
EVERY 5 SECONDS

The above query computes for a 10min sliding window the average bus delay per city segment with new datapoints considered every 5s, which is particularly useful for a traffic operator in detecting traffic congestion in each city segment. Furthermore, StreamSight gives user the ability to:

Prioritize query execution so results of high importance are always output in time (e.g., high load influx), while low priority insights are scheduled when resources are available
Request query enforcement on a sample of the data stream for indicative but in time responses
Define constraints such as the maximum tolerable upper error bounds for approximate query responses

COMPUTE
    ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
    BY city_segment EVERY 5 SECONDS
    WITH MAX_ERROR 0.005 AND CONFIDENCE 0.95

The above example extends the previous query so that approximate answers are provided to significantly improve response time by executing the query on a sample of the data stream.

Getting Started

Build StreamSight

Make sure you've installed Docker on your machine.
Configure (optionally) and submit your queries:
- See this section for configuring StreamSight.
- See this section for composing your queries.
Build the docker image using the following command:

docker build -t streamsight -f Dockerfile .

Local Mode

Run StreamSight in your local machine

docker run -p 4040:4040 -it streamsight

Cluster Mode

If you have an existing Spark Cluster, you only have to specify the address of the master node:

MASTER_NODE=spark://${spark-url}
spark-submit --master $MASTER_NODE ./StreamSight-0.0.1.jar

For convenience, we provide a number of scripts (located in directory /cluster) to help you setup a Spark cluster via docker-compose. The only requirement is to have docker-compose installed on your system.

cd cluster/
# Create 1 Master and 4 Spark workers
./start-infra.sh

# You can scale to 8 workers with the following script:
./scale-workers.sh 8

# To submit StreamSight, just run:
./build-run-cluster.sh

StreamSight Architecture

The following figure depicts a high-level and abstract overview of the StreamSight framework. Users submit ad-hoc queries following the declarative query model and the system compiles these queries into low-level streaming commands.

Once the executable artifact is produced, users can submit it to the underlying distributed processing engine. The raw monitoring metrics are fed into the processing engine where they are transformed into analytic insights and streamed to a high-availability queuing system.

Insight Declaration Examples

In this section we present a few eaxmples for useful insights from monitoring a cloud infrastructure.

Example 1

A useful insight that many companies need to monitor and take decisions on that is cpu utilization. So the following insight returns the average CPU utilization of a service for 30 seconds every 10 seconds.

cpu_utilization = COPMUTE (
        ARITHMETIC_MEAN( cpu_user, 30 SECONDS )
        + ARITHMETIC_MEAN( cpu_sys, 30 SECONDS )
) EVERY 10 SECONDS ;

Example 2

The free space in ram can be crucial for some applications, thus, the following expression gives us the RAM Average Usage for 10 minutes every 30 seconds.

ram_usage_per_service = COPMUTE
        ARITHMETIC_MEAN( service:ram , 10 MINUTES )
EVERY 30 SECONDS ;

Example 3

Next we present an insight for maximum number of HTTP Requests per Second for 10 minutes which computes every 30 seconds grouped by region. For devops and developers, who works on web-based applications, the peak of traffic in a specific region can be a critical factor.

http_requests_per_seconds_by_region = COPMUTE
        MAX( service:requests_per_seconds , 10 MINUTES ) BY Region
EVERY 30 SECONDS ;

Configuration

Parameter	Default Value	Description
app	insights-test	The name of the StreamSight job
insights.file	/home/insights.ins	The file containing insights
dummy.input	true	When enabled a producer pushes random measurements to the system
dummy.interval	100	The periodicity of the dummy producer in ms
dummy.dimensions	5	The number of metrics produced (e.g., m1,m2,...)

Reference

When using the framework please use the following reference to cite our work:

"StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing", Z. Georgiou, M. Symeonides, D. Trihinas, G. Pallis, M. D. Dikaiakos, 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2018), Zurich, Switzerland, Dec 2018.

Licence

The framework is open-sourced under the Apache 2.0 License base. The codebase of the framework is maintained by the authors for academic research and is therefore provided "as is".

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cluster		cluster
img		img
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
StreamSight-0.0.1.jar		StreamSight-0.0.1.jar
configuration.properties		configuration.properties
insights.ins		insights.ins
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamSight

What does StreamSight do?

Getting Started

Build StreamSight

Local Mode

Cluster Mode

StreamSight Architecture

Insight Declaration Examples

Example 1

Example 2

Example 3

Configuration

Reference

Licence

About

Releases

Packages

Contributors 2

Languages

License

UCY-LINC-LAB/StreamSight

Folders and files

Latest commit

History

Repository files navigation

StreamSight

What does StreamSight do?

Getting Started

Build StreamSight

Local Mode

Cluster Mode

StreamSight Architecture

Insight Declaration Examples

Example 1

Example 2

Example 3

Configuration

Reference

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages