Introduction to Kubernetes

Introduction  Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized applications in a clustered environment. It aims to provide better ways of managing related, distributed components and services across varied infrastructure. In this article, we'll discuss some of Kubernetes' basic concepts. We will talk about the architecture of the system, the … Continue reading Introduction to Kubernetes

Java 10 Features

After Java 9 release, Java 10 came very quickly. Unlike it’s previous release, Java 10 does not have many exciting features, still it has few important updates which will change the way you code. Novelties’ in Java 10 Local-variable type Inference Root Certificates for OpenJDK Change in Java garbage collecting Garbage collector interface Experimental Java-based JIT … Continue reading Java 10 Features

Spark study notes: core concepts visualized

Learning Spark is not an easy thing for a person with less background knowledge on distributed systems. Even though I have been using Spark for quite some time, I find it time-consuming to get a comprehensive grasp of all the core concepts in Spark. The official Spark documentation provides a very detailed explanation, yet it focuses more … Continue reading Spark study notes: core concepts visualized

The basis of Azure Data Factory

In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers. Big data requires service that can orchestrate and operationalize processes to … Continue reading The basis of Azure Data Factory

How to Create an ARIMA Model for Time Series Forecasting in Python

A popular and widely used statistical method for time series forecasting is the ARIMA model. ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data. In this tutorial, you will discover how to develop an … Continue reading How to Create an ARIMA Model for Time Series Forecasting in Python

Comparing Pulsar and Kafka: unified queuing and streaming

In previous blog posts, we described several reasons why Apache Pulsar is an enterprise-grade streaming and messaging system that you should consider for your real-time use cases. We also took a deep dive into enterprise features like real-time durable storage to prevent data loss, multi-tenancy, geo-replication, encryption, and security. These blog posts helped develop an understanding of Pulsar and generated a lot … Continue reading Comparing Pulsar and Kafka: unified queuing and streaming

Why Apache Pulsar? Part 2

This is part 2 of a series of blog posts that highlights key features of Apache Pulsar (incubating). Apache Pulsar is a next-generation pub-sub messaging system developed at Yahoo. In part 1 of the series, we discussed how Pulsar supports a flexible messaging model, multi-tenancy, geo-replication, and durability. In this post, we’ll continue the discussion by showing how … Continue reading Why Apache Pulsar? Part 2