Sales Analysis using DuckDB and Python

Introduction

Welcome! In this tutorial, we will perform sales analysis using DuckDB and Python.

Sales analysis is a vital process for any business that wants to optimize its sales performance, identify new opportunities, and understand customer behavior.

However, sales analysis can be challenging when dealing with large and complex datasets, especially if you want to perform fast and interactive queries.

That’s where DuckDB comes in handy. DuckDB is an in-process SQL OLAP database management system that is designed for analytical query processing.

It is fast, simple, feature-rich, and free to use. It also has a Python API that allows you to integrate it with your existing data analysis tools and workflows.

In this article, we will cover the following topics:

  • How to install and import DuckDB in Python (Colab)
  • How to load and query data from CSV files using DuckDB
  • How to perform common sales analysis tasks such as calculating revenue, conversion rate, margin analysis, and more

About the dataset

This dataset comprises details of daily sales transactions carried out by sales staff within a retail establishment, covering the period from 2017 to 2022. You can download the dataset from here.

Attributes of dataset

  • Seller: Salesperson’s name.
  • Department: Department to which the salesperson belongs.
  • Revenue: Revenue generated by the salesperson on the respective day.
  • Revenue Goal: Salesperson’s revenue goal for the respective day.
  • Margin: Gross profit margin achieved by the salesperson on the respective day.
  • Margin Goal: Salesperson’s profit margin goal for the respective day.
  • Date: Date on which the sales were recorded.
  • Sales Quantity: Number of customers who actually made a purchase.
  • Customers: Total number of customers served.

Install and import DuckDB in Colab

We will use DuckDB in Google Colab for this tutorial. You may use your own system or any other cloud Python environment.

!pip install duckdb
import duckdb

Load and Query Data from CSV Files using DuckDB

Next, we will load the CSV file of the sales dataset using DuckDB and run a few SQL queries.

df = duckdb.read_csv('department_store_dataset.csv')

# query data
duckdb.sql('SELECT * FROM df LIMIT 5')

Output:

sales analysis using duckdb and python

1. Revenue Analysis

Let’s aggregate revenue by sellers using DuckDB.

duckdb.sql('SELECT Seller, ROUND(SUM(Revenue)/1000000, 2) AS Total_Revenue FROM df GROUP BY Seller ORDER BY Total_Revenue DESC')

Output:

sales analysis using duckdb and python

Let’s also check the average monthly revenue (in millions), year-wise.

query = """
SELECT
  Year,
  ROUND(AVG(Monthly_Revenue)/1000000, 3) AS "Average Monthly Revenue"
FROM
  (
    SELECT
      EXTRACT(YEAR FROM Date) AS "Year",
      EXTRACT(MONTH FROM Date) AS "Month",
      SUM(Revenue) AS "Monthly_Revenue"
    FROM
      df
    GROUP BY
      "Year", "Month"
  ) AS SubQuery
GROUP BY
  "Year"
ORDER BY
  "Year"
"""

duckdb.sql(query)

Output:

sales analysis using duckdb and python

2. Margin Analysis

Average gross profit margin department-wise for year 2021

query = """
SELECT
    Department,
    ROUND(AVG(Margin),2) AS "Average_Margin"
FROM
    df
WHERE
    EXTRACT(YEAR FROM Date) = 2021
GROUP BY
    Department
ORDER BY
    Average_Margin DESC
"""

duckdb.sql(query)

Output:

sales analysis using duckdb and python

Average gross profit margin month-wise for year 2021.

query = """
SELECT
    MONTH(Date) AS "Sales_Month",
    ROUND(AVG(Margin),2) AS "Average_Margin"
FROM
    df
WHERE
    EXTRACT(YEAR FROM Date) = 2021
GROUP BY
    Sales_Month
ORDER BY
    Sales_Month
"""

duckdb.sql(query)

Output:

sales analysis using duckdb and python

3. Salesperson Performance Analysis

Now we will evaluate the performance of individual salespersons.

query = """
SELECT 
    Seller, 
    ROUND(SUM(Revenue)/1000000,2) AS Total_Revenue,
    ROUND(AVG(Margin),2) AS Average_Margin,
    ROUND((SUM(Revenue)/SUM(Revenue_Goal)) * 100, 2) AS Revenue_Goal_Achievement_Percentage,
    ROUND(AVG(Margin/Margin_Goal) * 100, 2) AS Margin_Goal_Achievement_Percentage
FROM 
    df 
GROUP BY 
    Seller
"""

duckdb.sql(query)

Output:

sales analysis using duckdb and python

4. Customer Conversion Analysis

Customer conversion rates per salesperson.

query = """
SELECT 
    Seller, 
    (SUM(Sales_Quantity) / SUM(Customers)) * 100 AS Conversion_Rate_Percentage
FROM 
    df 
GROUP BY 
    Seller
ORDER BY 
    Conversion_Rate_Percentage DESC;
"""

duckdb.sql(query)

Output:

sales analysis using duckdb and python

Key Takeaways

TopicSummary
What is DuckDBAn in-process SQL OLAP database management system for analytical query processing
Why use DuckDB for sales analysisFast, simple, feature-rich, free, and integrates well with Python
How to use DuckDB for sales analysisLoad and query data from CSV files using DuckDB’s Python API and perform common sales analysis tasks using SQL

Leave a Reply

Your email address will not be published. Required fields are marked *