Brazilian E-Commerce Public Dataset by Olist

nhtram911

Introduction

This is my personal project, use dataset from Kaggle.com to practice Python skills, data visualization and business knowledge.

Link Kaggle: https://www.kaggle.com/code/trmnguynhi/brazilian-e-commerce-public-eda

Business case

This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners.

After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.

The dataset has 8 CSV files, with the relationship as capture below

This analysis will contain four steps as below:

  • Define business task
  • Data Preparation
  • Data Processing
  • Data Analysis

In this analysis, I will focus on three below factors to have overall overview of the business and see which issue need to be solved:

  • Delivery Performance
  • Sales Performance
  • Customer Experience Indicator

Data Preparation

Import the libraries & datasets

import pandas as pd
import numpy as np import matplotlib.pyplot as plt import seaborn as sns
from pandas.tseries.offsets import DateOffset
from dateutil.relativedelta import relativedelta %matplotlib inline customers = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_customers_dataset.csv")
geolocation= pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_geolocation_dataset.csv")
order_items = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_order_items_dataset.csv")
order_payments = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_order_payments_dataset.csv")
order_reviews = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_order_reviews_dataset.csv")
date_cols = ['order_purchase_timestamp','order_approved_at','order_delivered_carrier_date','order_delivered_customer_date','order_estimated_delivery_date']
orders = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_orders_dataset.csv",parse_dates=date_cols)
products = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_products_dataset.csv")
sellers = pd.read_csv("/kaggle/input/brazilian-ecommerce/olist_sellers_dataset.csv")

Data Processing

Explore datasets

def overview_data(dataset_name):
                #   , null_amount, name_null_cols, num_null_cols):
    nrows=dataset_name.shape[0]
    ncols=dataset_name.shape[1]
    null_values = dataset_name.isnull().sum()    
    name_nul_cols = []
    for i in range(len(null_values)):
        if null_values[i] >0:
            name_nul_cols.append(null_values.index[i])    
    num_nul_cols = len(name_nul_cols)
    return nrows, ncols , name_nul_cols,num_nul_cols
detailed_db = pd.DataFrame()
list_dataset= [customers,geolocation,order_items,order_payments,order_reviews,orders,products,sellers]
list_dataset_name= ['customers','geolocation','order_items','order_payments','order_reviews','orders','products','sellers']
detailed_db['dataset'] = [i for i in list_dataset_name]
detailed_db['nrows'] = [overview_data(i)[0] for i in list_dataset]
detailed_db['ncols'] = [overview_data(i)[1] for i in list_dataset]
detailed_db['names_of_null_cols'] = [', '.join(overview_data(i)[2]) for i in list_dataset]
detailed_db['num_null_cols'] = [overview_data(i)[3] for i in list_dataset]

 
orders.info()
order_items.info()

Data Analysis

1. Delivery Performance

 1.1 Ontime Delivery

orders_info = orders.merge(customers, on = 'customer_id', how = 'left' )
orders_info['order_delivered_customer_month'] = orders_info['order_delivered_customer_date'].dt.strftime('%Y-%m')
orders_info['order_purchase_month'] = orders_info['order_purchase_timestamp'].dt.strftime('%Y-%m')
orders_info['order_purchase_year'] = orders_info['order_purchase_timestamp'].dt.strftime('%Y')
orders_info['Delivery Time'] = ((orders_info['order_delivered_customer_date'] - orders_info['order_delivered_carrier_date']))/24

orders_info['is_deliver_ontime']=np.where( orders_info['order_delivered_customer_date'] > orders_info['order_estimated_delivery_date'],0,1)
delivery_ontime_pivot = orders_info[orders_info['order_delivered_customer_date'].isna() == False].groupby('order_delivered_customer_month')['is_deliver_ontime'].agg(TotalDelivered='count',TotalOntime='sum').reset_index()
delivery_ontime_pivot['Ontime Delivery Rate'] = (delivery_ontime_pivot['TotalOntime'] / delivery_ontime_pivot['TotalDelivered'])*100
fig, ax1 = plt.subplots(figsize=(12,6))
sns.lineplot(data = delivery_ontime_pivot[delivery_ontime_pivot['Ontime Delivery Rate' ]>0], x = 'order_delivered_customer_month', y='Ontime Delivery Rate', marker='o', ax=ax1)
ax2 = ax1.twinx()
sns.barplot(data = delivery_ontime_pivot[delivery_ontime_pivot['Ontime Delivery Rate' ]>0], x= 'order_delivered_customer_month', y='TotalDelivered', alpha=0.5, ax=ax2)
ax1.tick_params(axis='x', labelrotation = 45)

This is shown that the delivery ontime is decreased from 2016 to 2028 with the increasing of orders volumn ( not count for undelivered orders)
delivery_ontime_by_state = orders_info[orders_info['order_delivered_customer_date'].isna() == False].groupby(['customer_state'])['is_deliver_ontime'].agg(TotalDelivered = 'count',TotalOntime='sum').reset_index()
delivery_ontime_by_state['Ontime Delivery Rate'] =  (delivery_ontime_by_state['TotalOntime'] / delivery_ontime_by_state['TotalDelivered'])*100
delivery_ontime_by_state=delivery_ontime_by_state.sort_values(by='Ontime Delivery Rate')
plt.figure(figsize=(12, 7))
sns.barplot(data =delivery_ontime_by_state, x = 'customer_state', y= 'Ontime Delivery Rate',color='skyblue')
 
Top 10 States have the lowest ontime rate need to be reviewed and enhanced

1.2 Average Delivery Time

avg_delivery_time = orders_info[orders_info['order_delivered_customer_date'].isna() == False].groupby(['order_delivered_customer_month'])['Delivery Time'].agg('mean').reset_index()
delivery_time_and_ontime_rate = avg_delivery_time.merge(delivery_ontime_pivot[delivery_ontime_pivot['Ontime Delivery Rate' ]>0],on='order_delivered_customer_month',how='right')
fig, ax1 = plt.subplots(figsize=(12,6))
ax1=sns.lineplot(data= delivery_time_and_ontime_rate,x='order_delivered_customer_month',y='Delivery Time', ax=ax1, color = 'Red',marker='o')
ax2=ax1.twinx()
ax2=sns.lineplot(data = delivery_time_and_ontime_rate, x = 'order_delivered_customer_month', y='Ontime Delivery Rate', marker='o',ax=ax2)
ax1.tick_params(axis='x', labelrotation = 45)
The relationship between delivery time and ontime delivery rate

2. Sales Performance

orders_info['purchase_month'] = orders_info['order_purchase_timestamp'].dt.strftime('%m')
sales_qty_by_month=orders_info.groupby('purchase_month')['order_id'].agg(TotalSales='count').reset_index()
sales_qty_by_year=orders_info.groupby('order_purchase_year')['order_id'].agg(TotalSales='count')
# sales_qty_by_month.head()
sns.set_style('whitegrid')
fig, axes = plt.subplots(2, 1, figsize=(15, 10))
axes1 = sns.lineplot(data=sales_qty_by_month, x= 'purchase_month',y='TotalSales',ax=axes[0])
axes2=sns.lineplot(data=sales_qty_by_year, x= 'order_purchase_year',y='TotalSales',ax=axes[1])
orders_payment_info=orders_info.merge(order_payments, how ='left', on ='order_id')
orders_payment_info_pivot=orders_payment_info.groupby('order_purchase_month')['payment_value'].agg(TotalOrder='count',TotalAmount='sum').reset_index()
orders_payment_info_pivot['Avg Value per Order'] = orders_payment_info_pivot['TotalAmount']/orders_payment_info_pivot['TotalOrder']
orders_payment_info_pivot
sns.lineplot(data=orders_payment_info_pivot,x='order_purchase_month',y='Avg Value per Order',marker = 'o')
plt.tick_params(axis='x', labelrotation =45)
Avg Value Per Order stay around 150 USD from 2016 to 2018

3. Customer Experience

3.1 Customer Reviews

order_reviews_info = order_reviews.groupby('review_score').agg('count').reset_index()
order_reviews_info.rename(columns ={'review_id':'total_review','review_comment_message':'no_review_have_cmt'},inplace=True)
order_reviews_info=order_reviews_info[['review_score','total_review','no_review_have_cmt']]
order_reviews_info['comment_review_rate']=(order_reviews_info['no_review_have_cmt']/order_reviews_info['total_review'])*100
order_reviews_info
plt.title("Distribution of Review Score")
plt.pie(order_reviews_info['total_review'],labels = order_reviews_info['review_score'], autopct='%.0f%%')
plt.show()
Review Score 5 has highest rate compared with other scores
print("Rate of no message review:",(order_reviews_info['no_review_have_cmt'].sum()/order_reviews_info['total_review'].sum())*100,"%")
order_reviews_info.head()

Customers have tend to give message for orders that they are not satisfied

There is nearly a half of review that customers didnot leave a message. Let’s find why they chosse score 1 or 5 (mad or happy)

orders_info_2=orders_info.merge(order_reviews, on ='order_id',how = 'left')
orders_info_2_no_message= orders_info_2[orders_info_2['review_comment_message'].isna()]
orders_info_2_no_message.head()
orders_info_2_no_message_pivot = orders_info_2_no_message.groupby('review_score').agg({'Delivery Time':'mean','is_deliver_ontime':'sum','order_id':'count'}).reset_index()
orders_info_2_no_message_pivot.rename(columns={'order_id':'total_orders'},inplace=True)
orders_info_2_no_message_pivot['ontime_delivery_rate']=orders_info_2_no_message_pivot['is_deliver_ontime']/orders_info_2_no_message_pivot['total_orders']
orders_info_2_no_message_pivot

As we can see, the lower the review score is, the longer the delivery time is so Delivery Time is one of the factor that affects satisfactory of customers

3.2. Customer Churn Rate

data_churn=orders_info.groupby(['customer_unique_id','order_purchase_year'])['order_id'].agg('count').reset_index()
data_churn['filter']=np.where(data_churn['order_id']==1,1,0)
data_churn_2=data_churn.groupby('order_purchase_year')['filter'].agg(NewCustomers='sum',TotalCustomers='count').reset_index()
data_churn_2['churn rate']=data_churn_2['NewCustomers']/data_churn_2['TotalCustomers']*100
data_churn_2['Existing Customer']=data_churn_2['TotalCustomers']-data_churn_2['NewCustomers']
data_churn_2=data_churn_2[data_churn_2['order_purchase_year'].isin(['2017','2018'])]
data_churn_2.index=data_churn_2['order_purchase_year']
data_churn_2[['NewCustomers','Existing Customer']].plot( stacked = True, kind = 'bar')
plt.tick_params(axis='x', labelrotation =360)

There are nearly 3% old customers come back to buy products in this website in 2017 & 2018 ( in my understanding, customer unique id point to an actual customer in reality). Not count for 2016 because 2016 is considered as the first year in data

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

grandpashabet güncel giriş grandpashabet giriş grandpashabet grandpashabet güncel giriş grandpashabet giriş grandpashabet interbahis güncel giriş interbahis giriş interbahis interbahis pokerklas güncel giriş pokerklas giriş pokerklas pokerklas casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom grandpashabet giriş grandpashabet grandpashabet güncel giriş grandpashabet giriş grandpashabet grandpashabet güncel giriş grandpashabet giriş grandpashabet grandpashabet giriş grandpashabet piabet güncel giriş piabet giriş piabet piabet mobilbahis güncel giriş mobilbahis giriş mobilbahis mobilbahis marsbahis twitter marsbahis güncel giriş marsbahis giriş marsbahis tarafbet güncel giriş tarafbet giriş tarafbet tarafbet meritking güncel giriş meritking giriş meritking meritking güncel giriş meritking giriş meritking piabellacasino güncel giriş piabellacasino giriş piabellacasino piabellacasino güncel giriş piabellacasino giriş piabellacasino galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet giriş galabet galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet giriş galabet galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet güncel giriş galabet giriş galabet galabet giriş galabet casinoas güncel giriş casinoas giriş casinoas marsbahis güncel giriş marsbahis giriş marsbahis casibom casibom Online SMS Alma Geçici Telefon Numarası SMS Doğrulama Sanal Numara sms onay marsbahis güncel giriş marsbahis giriş marsbahis marsbahis güncel giriş marsbahis giriş marsbahis kingroyal güncel giriş kingroyal giriş kingroyal kralbet güncel giriş kralbet giriş kralbet perabet güncel giriş perabet giriş perabet perabet güncel giriş perabet giriş perabet matbet güncel giriş matbet giriş matbet matbet güncel giriş matbet giriş matbet pokerklas güncel giriş pokerklas giriş pokerklas pokerklas dedebet güncel giriş dedebet giriş dedebet maxwin güncel giriş maxwin giriş maxwin kalitebet güncel giriş kalitebet giriş kalitebet casinomilyon güncel giriş casinomilyon giriş casinomilyon pokerklas güncel giriş pokerklas giriş pokerklas pokerklas betjuve güncel giriş betjuve giriş betjuve casibom güncel giriş casibom giriş casibom herabet güncel giriş herabet giriş herabet herabet güncel giriş herabet giriş herabet herabet güncel giriş herabet giriş herabet matadorbet güncel giriş matadorbet giriş matadorbet matadorbet betturkey güncel giriş betturkey giriş betturkey pokerklas güncel giriş pokerklas giriş pokerklas pokerklas casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom primebahis güncel giriş primebahis giriş primebahis queenbet güncel giriş queenbet giriş queenbet queenbet güncel giriş queenbet giriş queenbet betkanyon güncel giriş betkanyon giriş betkanyon pokerklas güncel giriş pokerklas giriş pokerklas pokerklas güncel giriş pokerklas giriş pokerklas pokerklas primebahis güncel giriş primebahis giriş primebahis primebahis güncel giriş primebahis giriş primebahis herabet güncel giriş herabet giriş herabet casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom meritking güncel giriş meritking giriş meritking meritking güncel giriş meritking giriş meritking jojobet güncel giriş jojobet giriş jojobet jojobet güncel giriş jojobet giriş jojobet pokerklas güncel giriş pokerklas giriş pokerklas pokerklas casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom betturkey güncel giriş betturkey giriş betturkey betturkey sahabet sahabet güncel giriş sahabet giriş sahabet betturkey güncel giriş betturkey giriş betturkey betturkey herabet güncel giriş herabet giriş herabet herabet güncel giriş herabet giriş herabet teslabahis güncel giriş teslabahis giriş teslabahis teslabahis güncel giriş teslabahis giriş teslabahis jojobet güncel giriş jojobet giriş jojobet betist güncel giriş betist giriş betist betist güncel giriş betist giriş betist betparibu güncel giriş betparibu giriş betparibu betparibu fixbet güncel giriş fixbet giriş fixbet fixbet betparibu güncel giriş betparibu giriş betparibu betparibu borsa takip borsa analiz borsa anlik borsa restbet güncel giriş restbet giriş restbet restbet casibom güncel giriş casibom giriş casibom betist güncel giriş betist giriş betist betist güncel giriş betist giriş betist pusulabet güncel giriş pusulabet giriş pusulabet pusulabet güncel giriş pusulabet giriş pusulabet betturkey güncel giriş betturkey giriş betturkey betturkey primebahis güncel giriş primebahis giriş primebahis casibom güncel giriş casibom giriş casibom tophillbet güncel giriş tophillbet giriş tophillbet tophillbet güncel giriş tophillbet giriş tophillbet casibom güncel giriş casibom giriş casibom primebahis güncel giriş primebahis giriş primebahis hititbet güncel giriş hititbet giriş hititbet primebahis güncel giriş primebahis giriş primebahis bahsegel güncel giriş bahsegel giriş bahsegel casibom güncel giriş casibom giriş casibom milanobet güncel giriş milanobet giriş milanobet milanobet primebahis güncel giriş primebahis giriş primebahis bahsegel güncel giriş bahsegel giriş bahsegel restbet güncel giriş restbet giriş restbet orisbet güncel giriş orisbet giriş orisbet orisbet casinoas güncel giriş casinoas giriş casinoas casinoas casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom restbet güncel giriş restbet giriş restbet casibom güncel giriş casibom giriş casibom casibom güncel giriş casibom giriş casibom markajbet güncel giriş markajbet giriş