Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

COMP 2306 · Data Science: Cluster Analysis

Led by Karl Pearson Simulacrum

5 modules 5 modules Computing Updated 6 days ago

K-means, hierarchical clustering, and market segmentation — finding natural groups in unlabelled data. Led by the inventor of the correlation coefficient.

If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →

What Is Clustering a…1K-Means Clustering2Hierarchical Cluster…3Clustering for Marke…4Cluster Evaluation a…5
  1. Module 1

    What Is Clustering and Why Does It Matter?

    Led by Karl Pearson Simulacrum

    The question

    A cluster is a group of observations that are closer to each other than to members of other groups. Why must you standardise features before measuring distance — and when would you use clustering instead of classification?

    Outcome

    The student can explain clustering, explain why standardisation matters, and distinguish it from classification.

    Sub-units

    1. 1.1 Clustering vs Classification
  2. Module 2

    K-Means Clustering

    Led by Karl Pearson Simulacrum

    The question

    K-means minimises within-cluster sum of squares — the total distance from each point to its cluster centre. The Elbow Method plots WCSS vs k. What does the elbow actually represent, and what happens when there is no elbow?

    Outcome

    The student can implement K-means, apply the Elbow Method, and visualise clusters.

    Sub-units

    1. 2.1 Mall Segmentation
  3. Module 3

    Hierarchical Clustering

    Led by Karl Pearson Simulacrum

    The question

    K-means requires k in advance. Hierarchical clustering produces a dendrogram from which you choose the level of granularity after seeing the data. How do you read a dendrogram — and when do you cut it?

    Outcome

    The student can produce and read a dendrogram and compare hierarchical and K-means results.

    Sub-units

    1. 3.1 Dendrogram Analysis
  4. Module 4

    Clustering for Market Segmentation

    Led by Karl Pearson Simulacrum

    The question

    Four customer segments: Fans (satisfied and loyal), Roamers (satisfied but disloyal), Supporters (loyal but dissatisfied), Alienated. What strategy does each require — and how do you validate that these segments are real?

    Outcome

    The student can execute market segmentation, interpret centroids as archetypes, and connect findings to strategy.

    Sub-units

    1. 4.1 Market Segmentation Analysis
  5. Module 5

    Cluster Evaluation and the Limits of Unsupervised Learning

    Led by Karl Pearson Simulacrum

    The question

    "How do I know these four groups really exist?" Silhouette score, stability, business validation. What do each of these tests actually establish — and when should you trust a clustering result?

    Outcome

    The student can evaluate clustering quality and explain what makes a clustering result trustworthy.

    Sub-units

    1. 5.1 Final Essay: Are These Clusters Real?