Welcome! In this tutorial, we will perform sales analysis using DuckDB and Python.
Sales analysis is a vital process for any business that wants to optimize its sales performance, identify new opportunities, and understand customer behavior.
However, sales analysis can be challenging when dealing with large and complex datasets, especially if you want to perform fast and interactive queries.
That’s where DuckDB comes in handy. DuckDB is an in-process SQL OLAP database management system that is designed for analytical query processing.
It is fast, simple, feature-rich, and free to use. It also has a Python API that allows you to integrate it with your existing data analysis tools and workflows.
In this article, we will cover the following topics:
- How to install and import DuckDB in Python (Colab)
- How to load and query data from CSV files using DuckDB
- How to perform common sales analysis tasks such as calculating revenue, conversion rate, margin analysis, and more
About the dataset
This dataset comprises details of daily sales transactions carried out by sales staff within a retail establishment, covering the period from 2017 to 2022. You can download the dataset from here.
Attributes of dataset
- Seller: Salesperson’s name.
- Department: Department to which the salesperson belongs.
- Revenue: Revenue generated by the salesperson on the respective day.
- Revenue Goal: Salesperson’s revenue goal for the respective day.
- Margin: Gross profit margin achieved by the salesperson on the respective day.
- Margin Goal: Salesperson’s profit margin goal for the respective day.
- Date: Date on which the sales were recorded.
- Sales Quantity: Number of customers who actually made a purchase.
- Customers: Total number of customers served.
Install and import DuckDB in Colab
We will use DuckDB in Google Colab for this tutorial. You may use your own system or any other cloud Python environment.
!pip install duckdb
Load and Query Data from CSV Files using DuckDB
Next, we will load the CSV file of the sales dataset using DuckDB and run a few SQL queries.
df = duckdb.read_csv('department_store_dataset.csv') # query data duckdb.sql('SELECT * FROM df LIMIT 5')
1. Revenue Analysis
Let’s aggregate revenue by sellers using DuckDB.
duckdb.sql('SELECT Seller, ROUND(SUM(Revenue)/1000000, 2) AS Total_Revenue FROM df GROUP BY Seller ORDER BY Total_Revenue DESC')
Let’s also check the average monthly revenue (in millions), year-wise.
query = """ SELECT Year, ROUND(AVG(Monthly_Revenue)/1000000, 3) AS "Average Monthly Revenue" FROM ( SELECT EXTRACT(YEAR FROM Date) AS "Year", EXTRACT(MONTH FROM Date) AS "Month", SUM(Revenue) AS "Monthly_Revenue" FROM df GROUP BY "Year", "Month" ) AS SubQuery GROUP BY "Year" ORDER BY "Year" """ duckdb.sql(query)
2. Margin Analysis
Average gross profit margin department-wise for year 2021
query = """ SELECT Department, ROUND(AVG(Margin),2) AS "Average_Margin" FROM df WHERE EXTRACT(YEAR FROM Date) = 2021 GROUP BY Department ORDER BY Average_Margin DESC """ duckdb.sql(query)
Average gross profit margin month-wise for year 2021.
query = """ SELECT MONTH(Date) AS "Sales_Month", ROUND(AVG(Margin),2) AS "Average_Margin" FROM df WHERE EXTRACT(YEAR FROM Date) = 2021 GROUP BY Sales_Month ORDER BY Sales_Month """ duckdb.sql(query)
3. Salesperson Performance Analysis
Now we will evaluate the performance of individual salespersons.
query = """ SELECT Seller, ROUND(SUM(Revenue)/1000000,2) AS Total_Revenue, ROUND(AVG(Margin),2) AS Average_Margin, ROUND((SUM(Revenue)/SUM(Revenue_Goal)) * 100, 2) AS Revenue_Goal_Achievement_Percentage, ROUND(AVG(Margin/Margin_Goal) * 100, 2) AS Margin_Goal_Achievement_Percentage FROM df GROUP BY Seller """ duckdb.sql(query)
4. Customer Conversion Analysis
Customer conversion rates per salesperson.
query = """ SELECT Seller, (SUM(Sales_Quantity) / SUM(Customers)) * 100 AS Conversion_Rate_Percentage FROM df GROUP BY Seller ORDER BY Conversion_Rate_Percentage DESC; """ duckdb.sql(query)
|What is DuckDB||An in-process SQL OLAP database management system for analytical query processing|
|Why use DuckDB for sales analysis||Fast, simple, feature-rich, free, and integrates well with Python|
|How to use DuckDB for sales analysis||Load and query data from CSV files using DuckDB’s Python API and perform common sales analysis tasks using SQL|