This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.
Data Visualization with Seaborn¶
Seaborn is a fantastic and easy to use Python Visualization which is built on Matplotlib.
For a quick look, check out the gallery page.
To be covered:
In this lab, we will use real world datasets, which are already part of Seaborn.
1. Relational Plots¶
These kind of plots are used to analyze the relationship between features.
- Scatter Plots
- Line Plots
Imports¶
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Loading the datasets to be used in this lab
titanic = sns.load_dataset('titanic')
fmri = sns.load_dataset('fmri')
tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights')
titanic.head()
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
fmri.head(3)
subject | timepoint | event | region | signal | |
---|---|---|---|---|---|
0 | s13 | 18 | stim | parietal | -0.017552 |
1 | s5 | 14 | stim | parietal | -0.080883 |
2 | s12 | 18 | stim | parietal | -0.081033 |
tips.head(3)
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
flights.head(3)
year | month | passengers | |
---|---|---|---|
0 | 1949 | Jan | 112 |
1 | 1949 | Feb | 118 |
2 | 1949 | Mar | 132 |
Scatter Plots¶
In order to visualize the relationship between between two numeric features, scatter plot can be a go to plot over other types.
We will use sns.scatterplot(data, x, y, hue, style, palette, size, sizes, legend, markers...)
and also sns.relplot()
.
sns.scatterplot(data=titanic, x='age', y='fare')
<AxesSubplot:xlabel='age', ylabel='fare'>
With hue
parameter, we can map another feature to the plot.
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex')
<AxesSubplot:xlabel='age', ylabel='fare'>
You can see it makes the plots more clear. In terms of this titanic dataset, you can directly identify that women paid high fare than men.
To also highlight the difference between the hue classes
, we can add marker style as follows.
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex', style='sex')
<AxesSubplot:xlabel='age', ylabel='fare'>
Increasing the figure size...
plt.figure(figsize=(8,6))
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex', style='pclass')
<AxesSubplot:xlabel='age', ylabel='fare'>
You can also use specific markers with style
and markers
paremeters.
plt.figure(figsize=(8,6))
markers = {1:'P', 2:'X', 3:'D'} # P, X, and D are markers
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex', style='pclass', markers=markers)
<AxesSubplot:xlabel='age', ylabel='fare'>
Adding same feature on hue
to size
will make the plot more meaninfgul. sizes
will control the range of marker areas of size
. The
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex', size='sex', sizes=(20,200))
<AxesSubplot:xlabel='age', ylabel='fare'>
sns.scatterplot(data=titanic, x='age', y='fare', hue='sex', size='pclass', sizes=(20,200))
<AxesSubplot:xlabel='age', ylabel='fare'>