Key Concepts
- Goal: Find relationships among items frequently purchased together.
- Applications: Market basket analysis, recommendation systems, fraud detection.
Definitions
- Item: e.g., milk, bread, beer.
- Basket/Transaction: Set of items bought together.
- Support: Number (or fraction) of baskets containing an itemset.
- Frequent Itemset: Appears in at least s baskets.
Association Rules
- Form:
{X} → {Y} meaning “if X is bought, Y is likely to be bought.”
- Confidence:
(conf(X→Y)=support(X)support(X∪Y))
- Interest:
(interest=conf(X→Y)−P(Y))
Compact Representations
- Maximal Itemsets: No frequent supersets exist.
- Closed Itemsets: Superset has the same support count.
Algorithms
1. Brute-Force
- Generate all possible item combinations.
- Count frequencies (expensive for large data).
2. A-Priori Algorithm
- Idea: Downward closure property.
- All subsets of a frequent itemset must also be frequent.
- Uses multiple passes:
- Find frequent individual items (L1).
- Generate candidate pairs (C2) from L1.
- Count pairs across baskets.
- Continue for k-tuples until no frequent sets found.
Example
- Baskets:
{milk, bread, beer}, {milk, diaper, beer}, {bread, milk, coke}, etc.
- Frequent sets (s ≥ 3):
{milk}, {beer}, {milk, beer} etc.
Storage & Efficiency
- Naïve pair counting → (O(n^2)) memory.
- Triangular matrix or hashes can save memory.
- A-Priori greatly reduces unnecessary counting.
Summary
- Identify strong correlations between items (association rules).
- Optimize computation using frequency thresholds.
- A-Priori = cornerstone algorithm for large-scale data mining.