Led by Carl Linnaeus Simulacrum
K-means, hierarchical clustering, Apriori, and Eclat — finding structure in data that came without labels. Based on the K-means algorithm of Stuart Lloyd.
If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →
Led by Carl Linnaeus Simulacrum
The question
K-means converges to a local minimum that depends on initial placement. K-means++ spreads initial centres apart to avoid this. The Elbow Method suggests the optimal k. But is "minimise within-cluster sum of squares" what you mean by "cluster"?
Outcome
The student can implement K-means, apply the Elbow Method, and identify K-means' structural assumptions.
Sub-units
Led by Carl Linnaeus Simulacrum
The question
Hierarchical clustering builds a full dendrogram without requiring k in advance. Ward linkage minimises variance increase at each merge. Looking at the dendrogram — what is the longest vertical line, and what does it mean about the natural number of clusters?
Outcome
The student can produce a dendrogram, choose k from it, and compare to K-means.
Sub-units
Led by Carl Linnaeus Simulacrum
The question
Support, confidence, lift. A rule with lift = 4 means customers who buy A and B buy C four times more often than chance. At what lift threshold does a rule become actionable — and what happens when you set minimum support too low?
Outcome
The student can implement Apriori, interpret the three metrics, and identify actionable rules.
Sub-units
Led by Carl Linnaeus Simulacrum
The question
Eclat is faster than Apriori on some datasets but produces only support — no confidence, no lift. Does the added complexity of Apriori's rule metrics change the business recommendations?
Outcome
The student can implement Eclat and evaluate when Apriori's extra metrics add value.
Sub-units
Led by Carl Linnaeus Simulacrum
The question
High silhouette score does not mean good clusters. Domain knowledge determines whether the structure is real or an artefact. When should you use unsupervised learning — and what does a successful analysis actually look like?
Outcome
The student can apply cluster quality metrics and take a defended position on unsupervised learning's appropriate uses.
Sub-units