Learn to clean the Netflix dataset using Python and Pandas effectively.
In this project, you will load the Netflix Movies and TV Shows dataset from Kaggle and clean it using Pandas. The dataset has missing values, wrong data types, and mixed-type columns — exactly the kind of mess you find in real data.
The goal is not just to drop nulls, but to understand why values are missing and make deliberate decisions about each column.
Download the Netflix dataset from Kaggle.
Inspect the DataFrame with .info(), .describe(), and .head()
Identify and handle missing values column by column
Fix mixed-type columns (e.g., duration stored as "90 min")
Parse date columns into proper datetime objects
Export the cleaned DataFrame to a new CSV file
Python
Pandas
Jupyter Notebook
You will practice making real decisions about messy data, not just running .dropna() and moving on. You will also get comfortable reading a dataset before transforming it, which is a habit that matters a lot in real projects.
A full walkthrough of this project is available on Towards Data Science: 🔗 How to Clean Your Data in Python
Join the Community
roadmap.sh is the most starred project on GitHub and is visited by hundreds of thousands of developers every month.
Roadmaps Best Practices Guides Videos FAQs YouTube
roadmap.sh by @kamrify @kamrify
Community created roadmaps, best practices, projects, articles, resources and journeys to help you choose your path and grow in your career.
Login or Signup
You must be logged in to perform this action.