In this blog post, we discuss the roles of data analysis in business, discuss how data are used in evaluating business performance, introduce some fundamental issues of statistics and measurement and a support tool for data analysis and decision making.

**DATA IN THE BUSINESS ENVIRONMENT**

Data are used in virtually every major function in business, government, health care, education, and other nonprofit organizations. For example:

- Annual reports summarize data about companies profitability and market share both in numerical form and in charts and graphs to communicate with shareholders.
- Financial analysts collect and analyze a variety of data to understand the contribution that a business provides to its shareholders. These typically include profitability, revenue growth, return on investment, asset utilization, operating margins, earnings per share, economic value added (EVA), shareholder value, and other relevant measures.
- Marketing researches collect and analyze data to evaluate consumer perceptions of new products.
- Operations managers use data on production performance, manufacturing quality, delivery times, order accuracy, supplier performance, productivity, costs, and environmental compliance to manage their operations.
- Human resource managers measure employee satisfaction, track turnover, training costs, market innovation, training effectiveness, and skill development.
- Within the federal government, economists analyze unemployment rates, manufacturing capacity and global economic indicators to provide forecasts and trends.
- Hospitals track many different clinical outcomes for regulatory compliance reporting and for their own analysis.
- Schools analyze test performance and state boards of education use statistical performance data to collect budgets to school districts.

Data support a variety of company purposes, such as planning, reviewing company performance, improving operations, and comparing company performance with competitors. Data that organizations use should focuses on critical successes factors that lead to competitive advantage.

Data also provide key inputs to decision models. A **decision model **is a logical or mathematical representation of a problem or business situation that can be developed from theory or observation. Decision models establish relationships between actions that decision makers might take and results that they might expect, thereby allowing the decision makers to predict what might happen based on the model.

**SOURCES AND TYPES OF DATA**

Data may come from a variety of sources: internal record-keeping, special studies, and external databases. Internal data are routinely collected by accounting, marketing, and operations functions of a business. External databases are often used for comparative purposes, marketing projects, and economic analyses.

**METRICS OF DATA CLASSIFICATION**

A **metric **is a unit of measurement that provides a way to objectively quantify performance. **Measurement **is the act of obtaining data associated with a metric. **Measures **are numerical values associated with a metric.

Metrics can be either discrete or continuous. A **discrete metric **is one that is derived from counting something. For example, an order is complete or incomplete. **Continuous metrics **are based on a continuous scale of measurement. Any metrics involving dollars, length, time, volume, or weight are continuous.

When we deal with data, it is important to understand the type of data in order to select the appropriate statistical tool or procedure. One classification of data is the following:

- Types of data
**Cross-sectional**– data that are collected over a single period of time.**Time series**– data collected over time.

- Number of variables
**Univariate**– data consisting of a single variable.**Multivariate**– data consisting of two or more variables.

Another classification of data is by the type of measurement scale. Failure to understand the differences in measurement scales can easily result in erroneous or misleading analysis. Data may be classified into four groups:

**Categorical (nominal) data,**which is stored in categories according to specified characteristics. For example, a firm’s customers might be classified by their geographical region (North America, South America, Europe).The categories bear no quantitative relationship to one another, but we usually assign an arbitrary number to each category to ease the process of managing the data and computing statistics. Categorical data are usually counted or expressed as proportions or percentages.**Ordinal data,**which are ordered or ranked according to some relationships to one another. A common example in business is data from survey scales. For example, rating a service as poor, average, good, very good, or excellent. Such data are categorical but also have a natural order, and consequently, are ordinal. Ordinal data are more meaningful than categorical data because data can be compared to one another.**Interval data,**which are ordered, have a specified measure of the distance between observations but have no natural zero. Common examples are time and temperature. In contrast to ordinal data, interval data allow meaningful comparison of ranges, averages, and other statistics.**Ratio data,**which have natural zero. For example, dollar has an absolute zero. Ratios of dollar figures are meaningful.

This classification is hierarchical in that each level includes all of the information content of the one preceding it. For example, ratio information can be converted to any of the other types of data. Interval information can be converted to ordinal or categorical data but cannot be converted to ratio data without the knowledge of the absolute zero point. Thus, a ratio scale is the strongest form of measurement.

**STATISTICAL THINKING**

The importance of applying statistical concepts to make good business decisions and improve performance cannot be overemphasised. **Statistical thinking **is a philosophy of learning and action for improvement that is based on the following principles:

- All work occurs in a system of interconnected processes.
- Variation exists in all processes.
- Better performance results from understanding and reducing variation.

Work gets done in any organization through processes—systematic ways of doing things that achieve desired results. Understanding processes provides the context for determining the effects of variation and the proper type of action to be taken. Any pro- cess contains many sources of variation.

**Populations and Samples**

One of the most basic applications of statistics is drawing conclusions about populations from sample data. A population consists of all items of interest for a particular decision or investigation. It is important to understand that a population can be anything we define it to be, such as all customers who have purchased from Amazon over the past year or individuals who do not own a cell phone. A company like Amazon keeps extensive records on its customers, making it easy to retrieve data about the entire population of customers with prior purchases.

A **sample** is a subset of a population. For example, a list of individuals who purchased a CD from Amazon in the past year would be a sample from the population of all customers who purchased from the company. Whether this sample is representative of the population of customers—which depends on how the sample data are intended to be used—may be debatable; nevertheless, it is a sample. Sampling is desirable when complete information about a population is difficult or impossible to obtain.

**Statistics** are summary measures of population characteristics computed from samples. In business, statistical methods are used to present data in a concise and understandable fashion, to estimate population characteristics, to draw conclusions about populations from sample data, and to develop useful decision models for prediction and forecasting.

The process of collection, organization, and description of data is commonly called **descriptive statistics**. **Statistical inference** refers to the process of drawing conclusions about unknown characteristics of a population based on sample data. Finally, **predictive statistics**—developing predictions of future values based on historical data—is the third major component of statistical methodology. In subsequent blog posts, we will cover each of these types of statistical methodology.