The basis of Azure Data Factory

In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers. Big data requires service that can orchestrate and operationalize processes to … Continue reading The basis of Azure Data Factory

How to Create an ARIMA Model for Time Series Forecasting in Python

A popular and widely used statistical method for time series forecasting is the ARIMA model. ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data. In this tutorial, you will discover how to develop an … Continue reading How to Create an ARIMA Model for Time Series Forecasting in Python

Comparing Pulsar and Kafka: unified queuing and streaming

In previous blog posts, we described several reasons why Apache Pulsar is an enterprise-grade streaming and messaging system that you should consider for your real-time use cases. We also took a deep dive into enterprise features like real-time durable storage to prevent data loss, multi-tenancy, geo-replication, encryption, and security. These blog posts helped develop an understanding of Pulsar and generated a lot … Continue reading Comparing Pulsar and Kafka: unified queuing and streaming

Why Apache Pulsar? Part 2

This is part 2 of a series of blog posts that highlights key features of Apache Pulsar (incubating). Apache Pulsar is a next-generation pub-sub messaging system developed at Yahoo. In part 1 of the series, we discussed how Pulsar supports a flexible messaging model, multi-tenancy, geo-replication, and durability. In this post, we’ll continue the discussion by showing how … Continue reading Why Apache Pulsar? Part 2

Why Apache Pulsar? Part 1

Apache Pulsar (incubating) is a next-generation pub/sub messaging system developed at Yahoo. Pulsar was developed from the ground up to address several shortcomings of existing open source messaging systems and has been running in production for three years, powering critical applications like Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the Gemini Ads Platform, and Sherpa, Yahoo’s … Continue reading Why Apache Pulsar? Part 1

Introduction to the Apache Pulsar pub-sub messaging platform

Apache Pulsar (incubating) is an enterprise-grade publish-subscribe (aka pub-sub) messaging system that was originally developed at Yahoo. Pulsar was first open-sourced in late 2016, and is now undergoing incubation under the auspices of the Apache Software Foundation. At Yahoo, Pulsar has been in production for over three years, powering major applications like Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the … Continue reading Introduction to the Apache Pulsar pub-sub messaging platform

Introduction to Temporal Windows (Azure Stream Analytics)

In applications that process real-time events, it is common to perform some set-based computation (aggregation) or other operations over subsets of events that fall within some period of time. Because the concept of time is a fundamental necessity to complex event-processing systems, it’s important to have a simple way to work with the time component … Continue reading Introduction to Temporal Windows (Azure Stream Analytics)

Microsoft Azure Data Lake Store: An Introduction

The Azure Data Lake Store service provides a platform for organizations to park - process and analyse - vast volumes of data in any format.  With increasing volumes of data to manage, enterprises are looking for appropriate infrastructure models to help them apply analytics to their big data, or simply to store them for undetermined … Continue reading Microsoft Azure Data Lake Store: An Introduction

Sharding pattern in Azure

Divide a data store into a set of horizontal partitions or shards. This can improve scalability when storing and accessing large volumes of data. Distribution Models Once the hardware resources, server nodes, for deploying a distributed database are available, a distribution model should be chosen to leverage the cluster capacity. Roughly, there are two paths … Continue reading Sharding pattern in Azure

Eight new features in Azure Stream Analytics

This week at Microsoft Ignite 2018, we are excited to announce eight new features in Azure Stream Analytics (ASA). These new features include Support for query extensibility with C# custom code in ASA jobs running on Azure IoT Edge. Custom de-serializers in ASA jobs running on Azure IoT Edge. Live data Testing in Visual Studio. … Continue reading Eight new features in Azure Stream Analytics