A tuturial of ggplot2

This is a again a lesson on ggplot2
R
ggplot2
Data Visualization
Author
Published

December 24, 2023

Modified

May 28, 2024

0. Preparation


I need to install the following packages:

Show the code
# | eval: false
# install CRAN packages
pkg_install =   c("ggplot2", "tibble", "tidyr", "forcats", "purrr", "prismatic", "corrr", 
    "cowplot", "ggforce", "ggrepel", "ggridges", "ggsci", "ggtext", "ggthemes", 
    "grid", "gridExtra", "patchwork", "rcartocolor", "scico", "showtext", 
    "shiny", "plotly", "highcharter", "echarts4r")
# install.packages(pkg_install)

I was facing the error of installing devtools

Show the code
# | eval: false
# install from GitHub since not on CRAN
# install.packages('devtools')
# devtools::install_github("JohnCoene/charter")

I tried to update R to the latest version (commented the code as it would be run once)

Show the code
# | eval: false
# update.packages(repos='http://cran.rstudio.com/', ask=FALSE, checkBuilt=TRUE)

–> Not worked

Oops I was should be using “‘devtools’” instead of “devtools”!!! Problem solved

1. The Dataset


I was using the dataset: “National Morbidity and Mortality Air Pollution Study (NMMAPS)”

Install the readr first:

Show the code
# | eval: false
# install.packages('readr')
# install.packages("quarto")

Import data

“: :” here call the namespace and can be used to access a function without loading the package.

Show the code
chic <- readr::read_csv("https://cedricscherer.com/data/chicago-nmmaps-custom.csv")
Rows: 1461 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): city, season, month
dbl  (6): temp, o3, dewpoint, pm10, yday, year
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

view some data

Show the code
tibble::glimpse(chic)
Rows: 1,461
Columns: 10
$ city     <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chic…
$ date     <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997-01-05, …
$ temp     <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0, 16.0, 1…
$ o3       <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14.940874,…
$ dewpoint <dbl> 37.500, 47.250, 38.000, 45.500, 11.250, 5.750, 7.000, 17.750,…
$ pm10     <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121, 9.3646…
$ season   <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "…
$ yday     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ month    <chr> "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan"…
$ year     <dbl> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1…
Show the code
head(chic,10)
# A tibble: 10 × 10
   city  date        temp    o3 dewpoint  pm10 season  yday month  year
   <chr> <date>     <dbl> <dbl>    <dbl> <dbl> <chr>  <dbl> <chr> <dbl>
 1 chic  1997-01-01  36    5.66    37.5  13.1  Winter     1 Jan    1997
 2 chic  1997-01-02  45    5.53    47.2  41.9  Winter     2 Jan    1997
 3 chic  1997-01-03  40    6.29    38    27.0  Winter     3 Jan    1997
 4 chic  1997-01-04  51.5  7.54    45.5  25.1  Winter     4 Jan    1997
 5 chic  1997-01-05  27   20.8     11.2  15.3  Winter     5 Jan    1997
 6 chic  1997-01-06  17   14.9      5.75  9.36 Winter     6 Jan    1997
 7 chic  1997-01-07  16   11.9      7    20.2  Winter     7 Jan    1997
 8 chic  1997-01-08  19    8.68    17.8  33.1  Winter     8 Jan    1997
 9 chic  1997-01-09  26   13.4     24    12.1  Winter     9 Jan    1997
10 chic  1997-01-10  16   10.4      5.38 24.8  Winter    10 Jan    1997

3. The {ggplot2} Package


A ggplot is built up from a few basic elements:

  1. Data;
  2. Geometries geom_: the geometric shape (hình học) that will represent the data;
  3. Aesthetics aes_: aesthetics (tính thẩm mỹ) of the geometric or statistical objects, such as postition, color, size, shape, and transparency;
  4. Scales scale_: map between the data and the aesthetics dimensions (ánh xạ từ dữ liệu đến đồ thị), such as data range to plot width or factor values to colors;
  5. Statistical transformations stat_: statistical summaries (thống kê) of data, such as quantitles, fitted curves, and sums;
  6. Coordinate system coord_: the transformation used for mapping data coordinates into the plane of the data rectangles (hệ tọa độ);
  7. Facets facet_: the arrangement of the data into a grid of plots;
  8. Visual themes theme(): the overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes and colors (tông).

🚀Không nhất thiết một phần tử được gọi, và chúng cũng có thể được gọi nhiều lần.

4. A default ggplot


Load the package for ability to use the functionality:

Show the code
library(ggplot2)

A default ggplot needs three things that you have to specify: the dataaesthetics, and a geometry.

  • starting define a plot by using ggplot(data = df);

  • if we want to plot (in most cases) 2 variables, we must add positional aesthetics aes(x = var1, y = var2);

🚀Data được đề cập bên ngoàiaes(), trong khi đó biến/variables được đề cập bên trongaes().

Ví dụ:

Show the code
(g <- ggplot(chic, aes(x = date, y = temp)))

Just a blank panel, because ggplot2 does not know how we plot data ~ we still need to provide geometry.

🚀ggplot2 cho phép chúng ta lưu ggobject thành một biến, trong trường hợp này là g . Chúng ta có thể mở rộng g bằng cách thêm cách layers về sau.

🚀Bằng cách dùng dấu (), chúng ta có thể in ngay object được gán ra.

Many different geometries to use (called geoms because each function usually starts with geom_). For e.g., if we want to plot a scatter plot.

Show the code
g + geom_point()

also a lineplot which our managers always like:

Show the code
g + geom_line()

cool but the plot does not look optimal, we can also using mutiple layers of geometry, where the magic and fun start.

Show the code
g + geom_point() + geom_line()

Show the code
# it's the same if we write g + geom_line() + geom_point() 

Change properties of geometries

Turn all points to large fire-red diamonds:

Show the code
g + geom_point(color = 'firebrick', shape = 'diamond', size = 2)

🚀 ggplot2 hiểu khi chúng ta dùng color, colour, cũng như col.

🐱‍🏍Có thể dùng màu mặc định hoặc màu hex, hoặc thậm chí là màu RGB/RGBA với hàm rgb(). Ví dụ:

Show the code
g + geom_point(color = "#b22222", shape = "diamond", size = 2)

Show the code
g + geom_point(color = rgb(178, 34, 34, maxColorValue = 255), shape = "diamond", size = 2)

Replacing the default ggplot2 theme

Calling eg theme_bw() using theme_set(), all following plots will have same blank’n’white theme.

Show the code
theme_set(theme_bw())

g + geom_point(color = 'firebrick')

🚀theme() is also a useful function to modify all kinds of theme elements (texts, rectangles, and lines).

5. Axes


Change Axis Titles

Use labs() to assign character string for each lable.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = 'firebrick') +
    labs(x = 'Year', y = 'Temperature (°F)')

Can also using xlab() and ylab():

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  xlab("Year") +
  ylab("Temperature (°F)")

🐱‍🏍Not only the degree symbol before F, but also the supper script:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = expression(paste("Temperature (", degree ~ F, ")"^"(Hey, why should we use metric units?!)")))

Increase space between Axis and Axis Titles.

Overwrite the default element_text() within the theme() call:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = 'firebrick') +
    labs(x = 'Year', y = 'Temperature (°F)') +
    theme(axis.title.x = element_text(vjust = 0, size = 30),
         axis.title.y = element_text(vjust = 2, size = 30))

vjust refer to vertical alignment. We can also change the distance by specifying the margin of both text elements.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.title.x = element_text(margin = margin(t = 10), size = 15),
        axis.title.y = element_text(margin = margin(r = 10), size = 15))

r and t in the margin are top and right. Margin has 4 arguments: margin(t, r, b, l).

🚀A good way to remember the order of the margin sides is “t-r-ou-b-l-e”.

Change Aesthetics of the Axis Titles

Again, we use theme() function and modify the axis.tile and/or the subordinated elements axis.tile.x and axis.tile.y . Within element_text() we can modify the default of size, color, and face.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.title = element_text(size = 15, color = "firebrick",
                                  face = "italic"))

the face argument can be used to make the font bold, italic, or even bold.italic.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.title.x = element_text(color = "sienna", size = 15, face = 'bold'),
        axis.title.y = element_text(color = "orangered", size = 15, face = 'bold.italic'))

🐱‍🏍You could also use a combination of axis.title and axis.title.y, since axis.title.x inherits the values from axis.title. Eg:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.title = element_text(color = "sienna", size = 15),
        axis.title.y = element_text(color = "orangered", size = 15))

One can modify some properties for both axis titles and other only for one or properties for each on its own:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.title = element_text(color = "sienna", size = 15, face = "bold"),
        axis.title.y = element_text(face = "bold.italic"))

Change Aesthetics of Axis Text

Similar to the title, we can change the appearance of the axis text (number indeed) by using axis.text and/or the subordinated elements axis.text.x and axis.text.y.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = 'firebrick') +
    labs(x= "Year", y = expression(paste("Temperature(",degree ~ F, ")"))) +
    theme(axis.text = element_text(color = "dodgerblue", size = 13),
         axis.text.x = element_text(face = 'italic'))

Rotate Axis Text

Specifying an angle help us to rotate any text elements. With hjust and vjust we can adjust the position of text afterwards horizontally (0 = left, 1 = right), and vertically (0 = top, 1 = bottom).

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = 'firebrick') +
    labs(x= "Year", y = expression(paste("Temperature(",degree ~ F, ")"))) +
    theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust = 1, size = 13))

Show the code
# 50 means 50 degrees, not % =)))

Remove Axis Text & Ticks

Rarely a reason to do this but this is how it works.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank())

🚀If you want to get rid of a theme element, the element is always element_blank.

Remove Axis Titles

We could again use element_blank() but it is way simpler to just remove the label in the labs() (or xlab()) call:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = NULL, y = "")

Note that NULL removes the element (similarly to element_blank()) while empty quotes "" will keep the spacing for the axis title and simply print nothing.

Limit Axis Range

Some time you want to take a closer look at some range of you data. You can do this without subsetting your data:

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = "Temperature (°F)") +
  ylim(c(0, 50))
Warning: Removed 777 rows containing missing values or values outside the scale range
(`geom_point()`).

Alternatively you can use scale_y_continuous(limits = c(0, 50)) (subset) or coord_cartesian(ylim = c(0, 50)). The former removes all data points outside the range while the second adjusts the visible area (zooming) and is similar to ylim(c(0, 50)) (subset).

Force Plot to start at the origin

Show the code
chic_high <- dplyr::filter(chic, temp > 25, o3 > 20)

ggplot(chic_high, aes(x = temp, y = o3)) +
  geom_point(color = "darkcyan") +
  labs(x = "Temperature higher than 25°F",
       y = "Ozone higher than 20 ppb") +
  expand_limits(x = 0, y = 0)

🐱‍🏍Using coord_cartesian(xlim = c(0,NA), ylim = c(0,NA))will lead to the same result.

Show the code
chic_high <- dplyr::filter(chic, temp > 25, o3 > 20)

ggplot(chic_high, aes(x = temp, y = o3)) +
  geom_point(color = "darkcyan") +
  labs(x = "Temperature higher than 25°F",
       y = "Ozone higher than 20 ppb") +
  coord_cartesian(xlim = c(0, NA), ylim = c(0, NA))

But we can also force it to literally start at the origin!

Show the code
ggplot(chic_high, aes(x = temp, y = o3)) +
  geom_point(color = "darkcyan") +
  labs(x = "Temperature higher than 25°F",
       y = "Ozone higher than 20 ppb") +
  expand_limits(x = 0, y = 0) +
  coord_cartesian(expand = FALSE, clip = "off")

🚀The argument clip = "off" in any coordinate system, always starting with coord_*, allows to draw outside of the panel area. Call it here to make sure that the tick marks at c(0, 0) are not cut.

Axes with Same Scaling

Use coord_equal() with default ratio = 1 to ensure the units are equally scaled on the x-axis and on the y-axis. We can set the aspect ratio of a plot with coord_fixed() or coord_equal(). Both use aspect = 1 (1:1) as a default.

Show the code
ggplot(chic, aes(x = temp, y = temp + rnorm(nrow(chic), sd = 20))) +
  geom_point(color = "sienna") +
  labs(x = "Temperature (°F)", y = "Temperature (°F) + random noise") +
  xlim(c(0, 100)) + ylim(c(0, 150)) +
  coord_fixed()
Warning: Removed 54 rows containing missing values or values outside the scale range
(`geom_point()`).

Ratios higher than one make units on the y axis longer than units on the x-axis, and vice versa:

Show the code
ggplot(chic, aes(x = temp, y = temp + rnorm(nrow(chic), sd = 20))) +
  geom_point(color = "sienna") +
  labs(x = "Temperature (°F)", y = "Temperature (°F) + random noise") +
  xlim(c(0, 100)) + ylim(c(0, 150)) +
  coord_fixed(ratio = 1/5)
Warning: Removed 51 rows containing missing values or values outside the scale range
(`geom_point()`).

Use a Function to Alter Labels

In case you want to format (eg adding % sign) without change the data.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
  geom_point(color = "firebrick") +
  labs(x = "Year", y = NULL) +
  scale_y_continuous(label = function(x) {return(paste(x, "Degrees Fahrenheit"))})

6. Titles


Add a Title

We can add a title via ggtitle() function.

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = "firebrick") +
    labs(x = "Year", y = "Temperature (°F)") +
    ggtitle("Temperatures in Chicago")

Alternatively, we can use labs(), where we can add serveral arguments ~ metadata of the plot (a sub-title, a caption, and a tag):

Show the code
ggplot(chic, aes(x = date, y = temp)) +
    geom_point(color = "firebrick") +
    labs(x = "Year", y = "Temperature (°F)",
        title = "Temperatures in Chicago",
        subtitle = "Seasonal pattern of daily temperatures from 1997 to 2001",
        caption = "Data: NMMAPS",
        tag = "Fig 1")

Make title bold & add a space at the baseline

7. Legends


8. Backgrounds & Grid Lines


9. Margins


10. Multi-panel Plots


11. Colors


12. Themes


13. Lines


14. Text


15. Coordinates


16. Chart Types


17. Ribbons (AUC, CI, etc.)


18. Smoothings


19. Interactive Plots


20. Remarks, Tipps & Resources


References


Source: https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/