Data Visualization

Max Murphy

The Power of Data Visualization

Introduction

  • Data visualization is a vital tool for understanding and communicating insights from data
  • In this presentation, we will explore the importance and utility of data visualization techniques

Why Data Visualization?

  • Humans are visual creatures; we process visual information more effectively than numbers or text
  • Data visualization helps us comprehend complex patterns, relationships, and trends in data
  • It enables us to make informed decisions and communicate findings with clarity

Key Benefits of Data Visualization

  1. Insight Generation: Visual representations can reveal hidden patterns and insights that may go unnoticed in raw data.
  2. Storytelling: Visualization facilitates storytelling by presenting data in a compelling and memorable way.
  3. Simplification: Complex data can be simplified and transformed into intuitive visual forms, aiding comprehension.

Exploring Data with Visualizations

Let’s explore the gapminder dataset using various visualization techniques.

  • The gapminder dataset contains socio-economic indicators for different countries over time.
  • It includes variables such as life expectancy, GDP per capita, population, and more.
  • The dataset covers multiple years and several countries around the world.

gapminder Dataset

# Load the Iris dataset
data(gapminder, package = "gapminder")

# Display the first few rows of the dataset
head(gapminder)
# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.

gapminder Dataset

  • The dataset contains 1,704 observations and 6 variables
  • The variables are:
    • country: Country name
    • continent: Continent name
    • year: Year
    • lifeExp: Life expectancy at birth
    • pop: Population
    • gdpPercap: GDP per capita

Summary Statistics

# Display summary statistics for the dataset
summary(gapminder)
        country        continent        year         lifeExp     
 Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
 Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
 Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
 Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
 Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
 Australia  :  12                  Max.   :2007   Max.   :82.60  
 (Other)    :1632                                                
      pop              gdpPercap       
 Min.   :6.001e+04   Min.   :   241.2  
 1st Qu.:2.794e+06   1st Qu.:  1202.1  
 Median :7.024e+06   Median :  3531.8  
 Mean   :2.960e+07   Mean   :  7215.3  
 3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
 Max.   :1.319e+09   Max.   :113523.1  
                                       

Exploring the Data

Let’s explore the data and try to answer some questions:

Exploring the Data

Let’s explore the data and try to answer some questions:

  • How does life expectancy relate to GDP per capita?

Exploring the Data

  • How does life expectancy relate to GDP per capita?
# Plot life expectancy vs. GDP per capita
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
    geom_point() +
    labs(x = "GDP per capita", y = "Life expectancy")

Exploring the Data

  • How does life expectancy relate to GDP per capita?

Exploring the Data

There’s a lot of data in this plot. Let’s simplify it by looking at the data for a single year.

Exploring the Data

There’s a lot of data in this plot. Let’s simplify it by looking at the data for a single year.

gapminder_2007 <- gapminder |> filter(year == 2007)

Exploring the Data

There’s a lot of data in this plot. Let’s simplify it by looking at the data for a single year.

gapminder_2007 <- gapminder |> filter(year == 2007)

# Plot life expectancy vs. GDP per capita
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp)) +
    geom_point() +
    labs(x = "GDP per capita", y = "Life expectancy")

Exploring the Data

There’s a lot of data in this plot. Let’s simplify it by looking at the data for a single year.

Exploring the Data

  • How do population, GDP per capita, and life expectancy vary across countries?

Exploring the Data

  • How do population, GDP per capita, and life expectancy vary across countries?
#| echo: true
#| eval: false

ggplot(gapminder_2007, 
    aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
    geom_point() +
    labs(x = "GDP per Capita", y = "Life Expectancy", 
         size = "Population", color = "Continent")

Exploring the Data

  • How do population, GDP per capita, and life expectancy vary across countries?

Exploring the Data

We can also look at data over time by plotting year on the x-axis.

Exploring the Data

  • How does life expectancy vary over time for each country? for each continent?

Exploring the Data

  • How does life expectancy vary over time for each country? for each continent?
#| echo: true
#| eval: false
ggplot(gapminder, 
  aes(x = year, y = lifeExp, color = continent, group = country)) +
  geom_line() +
  labs(x = "Year", y = "Life Expectancy", color = "Continent")

Exploring the Data

Conclusion

  • Data visualization is an important tool for exploring data, communicating information, and answering questions
  • It can reveal patterns, tell compelling stories, and allows us to make informed decisions