Key Concepts

  • Goal: Find relationships among items frequently purchased together.
  • Applications: Market basket analysis, recommendation systems, fraud detection.

Definitions

  • Item: e.g., milk, bread, beer.
  • Basket/Transaction: Set of items bought together.
  • Support: Number (or fraction) of baskets containing an itemset.
  • Frequent Itemset: Appears in at least s baskets.

Association Rules

  • Form: {X} → {Y} meaning “if X is bought, Y is likely to be bought.”
  • Confidence:
  • Interest:

Compact Representations

  • Maximal Itemsets: No frequent supersets exist.
  • Closed Itemsets: Superset has the same support count.

Algorithms

1. Brute-Force

  • Generate all possible item combinations.
  • Count frequencies (expensive for large data).

2. A-Priori Algorithm

  • Idea: Downward closure property. - All subsets of a frequent itemset must also be frequent.
  • Uses multiple passes:
    1. Find frequent individual items (L1).
    2. Generate candidate pairs (C2) from L1.
    3. Count pairs across baskets.
    4. Continue for k-tuples until no frequent sets found.

Example

  • Baskets:
    {milk, bread, beer}, {milk, diaper, beer}, {bread, milk, coke}, etc.
  • Frequent sets (s ≥ 3): {milk}, {beer}, {milk, beer} etc.

Storage & Efficiency

  • Naïve pair counting → (O(n^2)) memory.
  • Triangular matrix or hashes can save memory.
  • A-Priori greatly reduces unnecessary counting.

Summary

  • Identify strong correlations between items (association rules).
  • Optimize computation using frequency thresholds.
  • A-Priori = cornerstone algorithm for large-scale data mining.