Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

COMP 2304 · Data Science: Python for Analysis

Led by Downeyian Computational Thinking Simulacrum

5 modules 5 modules Computing Updated 1 week ago

Python for data analysis — from foundations through NumPy and pandas to exploratory data analysis. Based on the works of Allen Downey.

If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →

Python Foundations1NumPy: Arrays and Ve…2pandas: DataFrames f…3Visualisation: Matpl…4Exploratory Data Ana…5
  1. Module 1

    Python Foundations

    Led by Downeyian Computational Thinking Simulacrum

    The question

    A variable stores a value so you can reuse it. A loop repeats an operation. A function names a procedure. These are tools. What are the eight tools a data scientist must know to do anything useful with Python?

    Outcome

    The student can write Python functions and use comprehensions for data transformation.

    Sub-units

    1. 1.1 Write Three Functions
    2. 1.2 Loops and Comprehensions
  2. Module 2

    NumPy: Arrays and Vectorised Computation

    Led by Downeyian Computational Thinking Simulacrum

    The question

    Every ML library in Python uses NumPy arrays as its core data structure. Vectorised operations are 10-100x faster than Python loops. Why — and what does broadcasting mean?

    Outcome

    The student can manipulate NumPy arrays and explain why vectorisation outperforms loops.

    Sub-units

    1. 2.1 Vectorised vs Loop
  3. Module 3

    pandas: DataFrames for Data Analysis

    Led by Downeyian Computational Thinking Simulacrum

    The question

    Eight operations cover 80% of data wrangling: import, inspect, select, filter, handle missing values, transform, aggregate, merge. What are they and why does each matter?

    Outcome

    The student can execute a complete data wrangling pipeline in pandas.

    Sub-units

    1. 3.1 Data Wrangling Pipeline
  4. Module 4

    Visualisation: Matplotlib and Seaborn

    Led by Downeyian Computational Thinking Simulacrum

    The question

    A visualisation is an argument. The choice of chart is a choice about what claim you are making. What are the right charts for distributions, relationships, and comparisons — and what makes a chart misleading?

    Outcome

    The student can produce histograms, scatter plots, box plots, and heatmaps with interpretation.

    Sub-units

    1. 4.1 Four Charts
  5. Module 5

    Exploratory Data Analysis in Practice

    Led by Downeyian Computational Thinking Simulacrum

    The question

    EDA is what happens before modelling. You look at your data, form hypotheses, and discover things not in the problem statement. What does a complete EDA look like — and why is building a model on data you have not looked at the most common modelling error?

    Outcome

    The student can conduct and write up a complete EDA on a real dataset.

    Sub-units

    1. 5.1 Final EDA Report