This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.

Pandas for Data Visualization¶

Pandas that we used for data analysis and manipulation can also be used to visualize data.

And it is so simple. To step back a bit, Matplotlib is the primary visualization library in Python. Both Seaborn and Pandas visualization are built on top of Matplotlib.

Contents:

1. Imports and loading datasets
2. Basic Plots
3. More Plots
4. Further learnings

1. Imports and Loading datasets¶

In [1]:

            
                Copied!
                
# Imports 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Imports 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

            
                Copied!
                
# Loading dataset

titanic = sns.load_dataset('titanic')
tips = sns.load_dataset('tips')
# Loading dataset

titanic = sns.load_dataset('titanic')
tips = sns.load_dataset('tips')

In [3]:

            
                Copied!
                
titanic.head()
titanic.head()

Out[3]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

In [4]:

            
                Copied!
                
tips.head()
tips.head()

Out[4]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

In [5]:

            
                Copied!
                
<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/nyandwi/machine_learning_complete/blob/main/0_python_for_ml/intro_to_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
</table>

*This notebook was created by [Jean de Dieu Nyandwi](https://twitter.com/jeande_d) for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), [Twitter](https://twitter.com/jeande_d), or [LinkedIn](https://linkedin.com/in/nyandwi).*# Checking if the dataset is a Pandas DataFrame
type(tips)

*This notebook was created by [Jean de Dieu Nyandwi](https://twitter.com/jeande_d) for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), [Twitter](https://twitter.com/jeande_d), or [LinkedIn](https://linkedin.com/in/nyandwi).*# Checking if the dataset is a Pandas DataFrame
type(tips)

Out[5]:

pandas.core.frame.DataFrame

2. Basic Plots¶

In [6]:

            
                Copied!
                
tips[['tip', 'total_bill']].plot()
tips[['tip', 'total_bill']].plot()

Out[6]:

<AxesSubplot:>

In [7]:

            
                Copied!
                
titanic['age'].hist()
titanic['age'].hist()

Out[7]:

<AxesSubplot:>

We can change the style of the plot with plt.style.use('style_name') to create beatiful visualizations.

In [8]:

            
                Copied!
                
plt.style.use('ggplot')

# ggplot is a visualization library in R language
plt.style.use('ggplot')

# ggplot is a visualization library in R language

In [9]:

            
                Copied!
                
titanic['age'].hist()
titanic['age'].hist()

Out[9]:

<AxesSubplot:>

In [10]:

            
                Copied!
                
plt.style.use('seaborn-talk')
titanic['age'].hist()
plt.style.use('seaborn-talk')
titanic['age'].hist()

Out[10]:

<AxesSubplot:>

In [11]:

            
                Copied!
                
plt.style.use('dark_background')
titanic['age'].hist()
plt.style.use('dark_background')
titanic['age'].hist()

Out[11]:

<AxesSubplot:>

In [12]:

            
                Copied!
                
plt.style.use('grayscale')
titanic['age'].hist()
plt.style.use('grayscale')
titanic['age'].hist()

Out[12]:

<AxesSubplot:>

In [13]:

            
                Copied!
                
plt.style.use('fivethirtyeight')
titanic['age'].hist()
plt.style.use('fivethirtyeight')
titanic['age'].hist()

Out[13]:

<AxesSubplot:>

There are more great style cheets that you should check out if you are interested in creating attractive visualizations.

Learn more about style cheets.

More Plots¶

We can use plot() to create more plot types. Here are the following types that we are going to see in this notebook:

Bar plot
Histogram
Box plots
Area plots
Kernel Density estimation plots (KDE)
Scattter plots
Pie charts

A. Bar Plot¶

In [14]:

            
                Copied!
                
plt.style.use('seaborn-dark')
top_20 = tips['total_bill'][0:20]

top_20.plot(kind='bar')

# Same as 
#top_20.plot.bar()
plt.style.use('seaborn-dark')
top_20 = tips['total_bill'][0:20]

top_20.plot(kind='bar')

# Same as 
#top_20.plot.bar()

Out[14]:

<AxesSubplot:>

We can also plot stacked bar plots. We will have to set stacked to True.

In [15]:

            
                Copied!
                
top_30_rows = tips[0:30]
top_30_rows.plot(kind='bar',stacked=True)
top_30_rows = tips[0:30]
top_30_rows.plot(kind='bar',stacked=True)

Out[15]:

<AxesSubplot:>

Use .barh() to create horizontal bar charts

In [16]:

            
                Copied!
                
first_30_pasesengers = titanic[0:30]
first_30_pasesengers.plot(kind='barh',stacked=True)
first_30_pasesengers = titanic[0:30]
first_30_pasesengers.plot(kind='barh',stacked=True)

Out[16]:

<AxesSubplot:>

B. Histogram¶

In [17]:

            
                Copied!
                
titanic['age'].plot(kind='hist')
titanic['age'].plot(kind='hist')

Out[17]:

<AxesSubplot:ylabel='Frequency'>

In [18]:

            
                Copied!
                
first_30_pasesengers.plot(kind='hist',stacked=True, bins=20)
first_30_pasesengers.plot(kind='hist',stacked=True, bins=20)

Out[18]:

<AxesSubplot:ylabel='Frequency'>

In [19]:

            
                Copied!
                
first_30_pasesengers.plot(kind='hist',stacked=True, bins=20, orientation='horizontal')
first_30_pasesengers.plot(kind='hist',stacked=True, bins=20, orientation='horizontal')

Out[19]:

<AxesSubplot:xlabel='Frequency'>

You can also create histograms easily with dataframe.hist(). We saw this in the beginning.

In [20]:

            
                Copied!
                
tips['size'].hist()
tips['size'].hist()

Out[20]:

<AxesSubplot:>

C. Box Plots¶

In [21]:

            
                Copied!
                
top_30_rows.plot(kind='box')
top_30_rows.plot(kind='box')

Out[21]:

<AxesSubplot:>

In [22]:

            
                Copied!
                
# You can also use dataframe.boxplot()

top_30_rows.boxplot()
# You can also use dataframe.boxplot()

top_30_rows.boxplot()

Out[22]:

<AxesSubplot:>

D. Area Plots¶

In [23]:

            
                Copied!
                
size_top_bill = tips[['size','tip', 'total_bill']]
size_top_bill.plot(kind='area')
size_top_bill = tips[['size','tip', 'total_bill']]
size_top_bill.plot(kind='area')

Out[23]:

<AxesSubplot:>

By default, area plot is stacked. But you can disable it.

In [24]:

            
                Copied!
                
# Only displaying top 30 rows for clarity

size_top_bill[0:30].plot(kind='area', stacked=False)
# Only displaying top 30 rows for clarity

size_top_bill[0:30].plot(kind='area', stacked=False)

Out[24]:

<AxesSubplot:>

E. Kernel Density estimation plots (KDE)¶

In [25]:

            
                Copied!
                
titanic['age'].plot.kde()
titanic['age'].plot.kde()

Out[25]:

<AxesSubplot:ylabel='Density'>

F. Scatter Plots¶

In [26]:

            
                Copied!
                
tips.plot.scatter(x='tip', y='total_bill')
tips.plot.scatter(x='tip', y='total_bill')

Out[26]:

<AxesSubplot:xlabel='tip', ylabel='total_bill'>

In [27]:

            
                Copied!
                
tips.plot.scatter(x='tip', y='total_bill', color='Blue')
tips.plot.scatter(x='tip', y='total_bill', color='Blue')

Out[27]:

<AxesSubplot:xlabel='tip', ylabel='total_bill'>

G. Hexagonal Plots¶

In [28]:

            
                Copied!
                
tips.plot.hexbin(x='tip', y='total_bill',gridsize=30)
tips.plot.hexbin(x='tip', y='total_bill',gridsize=30)

Out[28]:

<AxesSubplot:xlabel='tip', ylabel='total_bill'>

H. Pie Plots¶

In [29]:

            
                Copied!
                
df = pd.DataFrame({'qty': [10, 20, 30],
                  'sales': [200, 700, 500]},
                  index=['Apple', 'Orange','Lemon'])

df.plot(kind='pie', y='qty')
df = pd.DataFrame({'qty': [10, 20, 30],
                  'sales': [200, 700, 500]},
                  index=['Apple', 'Orange','Lemon'])

df.plot(kind='pie', y='qty')

Out[29]:

<AxesSubplot:ylabel='qty'>

In [30]:

            
                Copied!
                
df.plot(kind='pie', subplots=True);
df.plot(kind='pie', subplots=True);

3. Further Learning¶

Back to Top

In [ ]: