Simple Guide to the confusion matrix

A confusion matrix is a table that is often used to describe the performance of the classification model (or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing. Confusion matrix A classification problem can be evaluated … Continue reading Simple Guide to the confusion matrix

Overfitting in Machine Learning

In this guide, we’ll walk you through exactly what overfitting means, how to spot it in your models, and what to do if your model is overfitting. By the end, you’ll know how to deal with this tricky problem once and for all. Table of Contents Examples of Overfitting Signal vs. Noise Goodness of fit … Continue reading Overfitting in Machine Learning

Data Imputation Techniques in Machine Learning

Have you come across the problem of handling missing data/values for respective features in machine learning (ML) models during prediction time? This is different from handling missing data for features during training/testing phase of ML models. Data scientists are expected to come up with an appropriate strategy to handle missing data during, both, model training/testing phase and also model prediction time … Continue reading Data Imputation Techniques in Machine Learning

Role Of Data Analysis In Business

In this blog post, we discuss the roles of data analysis in business, discuss how data are used in evaluating business performance, introduce some fundamental issues of statistics and measurement and a support tool for data analysis and decision making. DATA IN THE BUSINESS ENVIRONMENT Data are used in virtually every major function in business, … Continue reading Role Of Data Analysis In Business

Difference between classification and association algorithms

The term data mining refers loosely to finding relevant information or discovering knowledge from a large volumes of data. Like knowledge discovery in artificial intelligence, data mining attempts to discover statistical rules and patterns automatically from data. Knowledge discovered from a database can be represented by a set of rules. The following is an example … Continue reading Difference between classification and association algorithms

Deep Learning Resources

Online Courses Andrew Ng’s Machine-Learning Class on Coursera Geoff Hinton’s Neural Networks Class on Coursera (2012) U. Toronto: Introduction to Neural Networks (2015) Yann LeCun’s NYU Couse Ng’s Lecture Notes for Stanford’s CS229 Machine Learning Nando de Freitas’s Deep Learning Class at Oxford (2015) Andrej Karpathy’s Convolutional Neural Networks Class at Stanford Patrick Winston’s Introduction … Continue reading Deep Learning Resources

Summarize whole paragraph to sentence by Extractive Approach​

To catch a quick idea of a long document, we will always to do a summarization when we read an article or book. In English, the first (or first two) sentence(s) of each article has a very high chance of representing the whole article. Of course, the topic sentence can be the last sentence in … Continue reading Summarize whole paragraph to sentence by Extractive Approach​

Docker in a Nutshell

I want to start to tackle two very important questions that we are going to be answering throughout this blog post. The two important questions are: What is Docker? Why do we use Docker? Let’s answer first Why we do use Docker by going through a quick little demo right now. Let’s have a look at this … Continue reading Docker in a Nutshell

Introduction to Natural Language Processing with NLTK

What is Natural Language Processing? Natural Language Processing (NLP) helps computers (machines) "read and understand" text or speech by simulating human language abilities. However, in recent years, NLP has grown rapidly because of an abundance of data. Given that more and more unstructured data is available, NLP has gained immense popularity. Prerequisites  Python 3.+ Jupyter Notebook Natural … Continue reading Introduction to Natural Language Processing with NLTK

How to Create an ARIMA Model for Time Series Forecasting in Python

A popular and widely used statistical method for time series forecasting is the ARIMA model. ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data. In this tutorial, you will discover how to develop an … Continue reading How to Create an ARIMA Model for Time Series Forecasting in Python