ML ZOOMCAMP 2025 - Module 1

 As a Data Science enthusiast, I am always looking for opportunities to make myself better in the field. This quest led me to the Data Science Zoomcamp, Cohort 2025.

I am looking forward to the 4 months of learning, unlearning and re-learning, and documenting my journey with key highlights and insights.

The Youtube live pre-course Q&A session happened on August 19th, whose role was to set the stage for the course. 

The  course officially kicked off on September 15th, and will follow the otline below:

  1. Module 1: Introduction to Machine Learning
  2. Module 2: Machine Learning for Regression
  3. Module 3: Machine Learning for Classification
  4. Module 4: Evaluation Metrics for Classification
  5. Module 5: Deploying Machine Learning Models
  6. Module 6: Decision Trees & Ensemble Learning
  7. Midterm Project
  8. Module 7: Neural Networks & Deep Learning
  9. Module 8: Serverless Deep Learning
  10. Module 9: Kubernetes & TensorFlow Serving
  11. Capstone Project


Module 1: Introduction to Machine Learning

Module 1 began on 15th September, culminating in an assignment to be submitted by 30th September 2025.

This module aimed at learn the fundamentals: what ML is, when to use it, and how to approach ML problems using the CRISP-DM framework.

1 Introduction to Machine Learning with Cars Data

In Machine Learning, patterns are extracted from feature variables in the data to train a model  which learns from the dats and can be used make predictions. We looked at data about cars, including characteristics (features) and prices (target). A Machine Learning (ML) model can be used to extract patterns from known information (data) about some cars in order to predict car prices based on their characteristics.

Rules-Based Systems vs. Machine Learning

Rule-based systems use a set of characteristics/rules to determine an outcome (e.g an email is spam or not), while Machine learning models can be trained with features and targets extracted from data, without explicitly being programmed as rules. 

  • For Rules-Based Systems, rukes are manually converted into code using a programming language and then applied to data, making the process complex and challenging, especially when the rules keep changing and the code requires frequent update and maintenamnce.
      • In Machine Learning, instead of manually coding rules, ML models automatically extract patterns from data using Mathematics and Statistics. 
3 Supervised Machine Learning

In supervised learning, models learn from labeled data (with known outcomes) to make predictions on unseen data. Features are labeled in order to train the ML model. The model learns from labelled data.

Supervised machine learning is divided into

  • Regression: the output is a number (e.g car's price).
  • Classification: the output is a category (e.g spam detection).
    • Binary classification (there are two categories e.g spam or no spam).
    • Multiclass problems: there are more than two categories (e.g cat, dog, horse, donkey)
  • Ranking: the output is based on top scores assigned (eg shirts ranking higher than sweaters). Examples include recommender systems.
4 CRISP-DM (Cross Industry Standard Process for Data Mining)

This is a structured methodology for organizing machine learning projects. It follows the steps:

    • Business Understanding
    • Data Understanding
    • Data Preparation
    • Modeling (choosing and training models, then selecting the best one)
    • Evaluation
    • Deployment
The process is iterative, allowing for continuous improvement.

5 Model selection

The Modelling Step (Model Selection Process) involves s
plitting the data into training, validation, and test sets. Different models are trained, validated, and the best performing one is selected, and then tested on the test set.

Setting up the Environment

Here, the necessary tools to be used such as tools like Python, Numpy, Pandas, Matplotlib, Scikit-learn were installed. There are diffent ways and platforms for doing this. I used Github codespaces, which is the recommended for this class.

7 Introduction to NumPy

An introduction to Numpy was done. Numpy is python library thats crucial for manipulating numerical data, providing efficient operations on arrays and matrices, and is a very important tool in Machine Learning.

Linear Algebra 

A review of concepts in Linear Algebra was done, covering concepts like vector operations such as addition and multiplication, and matrix operations such as matrix-vector multiplication, identity matrix, inverse of a matrix, among others

9 Introduction to Pandas

An introduction to Pandas was done. Pandas is a Python library used for processing and analyzing tabular data efficiently. It is build around the dataframe. Different pandas operations were covered such as elementwise operations, filtering and grouping.

The module ended with a homework assignment that was to be submitted by 30th September 2025.



Comments

Popular posts from this blog

ML ZOOMCAMP 2025 - Module 2