Difference between classification and association algorithms

The term data mining refers loosely to finding relevant information or discovering knowledge from a large volumes of data. Like knowledge discovery in artificial intelligence, data mining attempts to discover statistical rules and patterns automatically from data.

Knowledge discovered from a database can be represented by a set of rules. The following is an example of a rule, stated informally: “Young women with annual incomes greater than $50,000 are the most likely people to buy small sports cars”.

We can discover rules from database using one of the two models:

  1. The user is involved directly in the process of knowledge discovery.
  2. The system is responsible for automatically discovering knowledge from the database, by detecting patterns and correlations in the data.

Knowledge representation using Rules:

Rules provide a common framework in which to express various types of knowledge. Rules are of the general form

⍱ ẍ antecedent => consequent

where ẍ is a list of one or more variables with associated ranges.

Let the database contain a relation buys that specifies what was bought in each transaction. The following is an example of a rule:

⍱ transactions T, buys(T, bread) => buys (T, milk)

Here, T is a variables whose range is the set of all transactions. The rule says that, if there is a tuple(t¡, bread) in the relation buys, there must also be a tuple(t¡, milk) in the relation buys.

Rules have an associated support, as well as an associated confidence.

Support is a measure of what fraction of the population satisfies the both the antecedent and the consequent of the rule. For instance, if 0.001% of all transactions include the purchase of milk and bread, the support for the rule

⍱ transactions T, buys(T, bread) => buys(T, milk)

is low.

The rule may not even be statistically significant perhaps there was only a single transaction that purchased both bread and milk. Businesses are often not interested in rules that have low support, since the latter involve few customers and are not worth bothering about.

On the other hand, if 50% of all transactions involve the purchase of milk and bread, then support is relatively high, and the rule is worth attention. Exactly what minimum degree of support is considered desirable depends on the application.

Confidence is a measure of how often the consequent is true when the antecedent is true. For instance, the rule

⍱ transactions T, buys(T, bread) => buys (T, milk)

has a confidence of 80% of the transactions that include the purchase of bread also include the purchase of milk. A rule with a low confidence is not meaningful. In business applications, rules usually have confidence significantly less than 100%.

classes of Data-Mining problems

Two important classes of problems in data mining are:

  1. Classification and
  2. Association

Classification involves finding rules that partition the given data into disjoint groups. For instance, suppose that a credit-card company wants to decide whether or not to give credit card to an applicant. The company has a variety of information about the person – such as her age, educational background, annual income, current debts, and housing location – which it can use for making a decision.

Some of this information could be relevant to the credit worthiness of the applicant, whereas some are not be. To make decision, the company assigns a credit-worthiness level of excellent, good, average, or bad to each of a sample set of current customers. The assignment of credit worthiness is based on the customer’s payment history. Then, the company attempts to find rules that classify its current customers into excellent, good, average, or bad, based on the information about the person, other than the actual payment history(which is unavailable for new customers). Let us consider just two attributes: education level (highest degree earned) and income. The rule may be of the following form:

⍱ person P,  P.degree = Masters and P.income >= 75000 => P.credit = excellent

⍱ person P, P.degree = Bachelors or (P.income >= 25000 and P.income < 75000) => P.credit = good

Similar rules would also be present for the other credit worthiness levels (average and bad).

Other important uses of classification include making loan approvals, setting insurance premiums, deciding whether or not to stock a particular item in a shop based on statistical information about customers, and so on.

Retail shops are often interested in associations between different items that people buy. For instance, someone who buys bread is quite likely also to buy milk – an association represented by the rule that we saw earlier:

⍱ transactions T, buys(T, bread) => buys(T, milk)

with an associated confidence level and support.

Association information can be used in several ways. A shop may decide to place bread close to milk, to help shoppers finish their task faster. Or the shop may place them at opposite end of a row, and place other associated items in between to tempt people to buy those items as well, as the shoppers walk from one end of the row to the other. A shop that offers discounts on one associated item may not offer a discount on the other, since the customer will probably buy the other anyway.

Now we understood the difference between Classification and Association. In the upcoming posts we are going to discuss the topics mentioned below in great detail:


  1. Logistic Regression
  2. Decision Trees
  3. Random Forests
  4. SVM
  5. Naive Bayes
  6. Confusion Matrix


  1. Association rules
  2. Association rule parameters
  3. Apriori Algorithm
  4. Market Basket Analysis





Leave a Reply