Explore the UCI Online Retail Dataset with Pandas for business insights.
In this project, you will work with the UCI Online Retail Dataset, a real transactions dataset from a UK-based online store with over 500,000 rows. You will clean it, filter it, and compute your first business metric.
This project is closer to real work than most beginner datasets. The data is not clean, and some rows make no sense: you have to deal with that before doing any analysis.
Download the UCI Online Retail Dataset (available on Kaggle or the UCI ML Repository)
Sample 10% of rows to keep things manageable
Clean the data: remove nulls, fix data types, filter out returns, and free items
Convert InvoiceDate to a proper datetime object
Create a Revenue column (Quantity × UnitPrice)
Find the top 10 countries by total revenue and plot the result
Python
Pandas
Matplotlib / Seaborn
Jupyter Notebook
You will practice cleaning a large, realistic dataset and computing a derived metric. You will also understand why negative quantities and zero prices exist in real transaction data, and how to handle them without deleting useful rows.
A full walkthrough of this project is available on Towards Data Science: 🔗 EDA in Public: Cleaning and Exploring Sales Data with Pandas
Join the Community
roadmap.sh is the most starred project on GitHub and is visited by hundreds of thousands of developers every month.
Roadmaps Best Practices Guides Videos FAQs YouTube
roadmap.sh by @kamrify @kamrify
Community created roadmaps, best practices, projects, articles, resources and journeys to help you choose your path and grow in your career.
Login or Signup
You must be logged in to perform this action.