Lecture 7 - Large-Scale Machine Learning

Learning Types

Unsupervised

No predefined labels.
Goal: find structure (e.g., clustering).

Supervised

Data labeled with correct outputs.
Goal: learn function ( f(x) ) to predict ( y ).

Examples

Input: [height, weight] → Output: Dog breed.
Build classifier from training data.
Evaluate performance on unseen (test) data.

Algorithms

1. Decision Trees

Split data based on feature conditions.
Each internal node represents a test; leaves represent predicted class.
Pros: interpretable; handles categorical data.

2. k-Nearest Neighbors (k-NN)

Instance-based learning.
Predict label of query ( q ) using majority label among k closest neighbors.
Requires:
- Distance metric (e.g., Euclidean).
- Choice of k.

Linear Models

Perceptron

Binary classifier:
$(f (x) = s i g n (w \cdot x - θ))$
Update rule (learning rate $(η)$ ):
$(w_{t + 1} = w_{t} + η y_{t} x_{t})$ if misclassified.

Properties

Converges when data is linearly separable.
May oscillate otherwise (Cycling Theorem).

Winnow Algorithm

Works on binary features (0/1).
Maintains positive weights only.
Start with all weights = 1, threshold ( \theta = d ) (dimensions).

Rules

If ( w $\cdot$ x < $θ$ ) and y = +1 → Double weights where x = 1.
If ( w $\cdot$ x > $θ$ ) and y = -1 → Halve weights where x = 1.
Suited for problems with many irrelevant features.

Threshold Variation

Incorporate threshold into weight vector:
- Extend ( x ) with −1 component.
- Include θ as last weight.
Allows Reuse of standard Perceptron update rules.

Perceptron with Margin (Thick Separator)

Improves robustness by introducing margin ( $γ$ ):
Update if ( y = +1 ) and ( w $\cdot$ x < $θ$ + $γ$ ).

Comparison

Algorithm	Strengths	Weaknesses
Perceptron	Robust, simple	Sensitive to irrelevant features
Winnow	Handles many irrelevant features	Only binary input
Both	Converge for linearly separable data	Fail for non-linear cases

Online Learning (Optional)

Adapt models incrementally as new data arrives.
Example: Dynamic pricing, continuous user feedback.
Useful for streaming data applications.

Summary

Supervised ML = learning from labeled data.
Linear models (Perceptron, Winnow) form the foundation.
Extensions improve convergence and adaptability in large-scale environments.