Data Visualization Week 5 Notes

.docx

School

Virginia Commonwealth University *

*We aren’t endorsed by this school

Course

MISC

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

25

Uploaded by DukeElementCobra30

Report
Data Visualization Week 5 Notes Facet Wrap: - Facet wraps are a useful way to view individual categories in their own graph. Example 1 - Obtain a scatterplot with ‘facet_wrap()’ function: library(ggplot2) library(gapminder) view(gapminder) ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + geom_point(alpha = 0.7) + facet_wrap(~year) --- This R code is using the ‘ggplot2’ package to create a scatter plot using the ‘gapminder’ dataset. The ‘gapminder’ dataset contains information about various countries, including their GDP per capita (‘gdpPercap’), life expectancy (‘lifeExp’), population (‘pop’), continent, and year. --- Breakdown of the code:
library(ggplot2) library(gapminder) This code loads the necessary libraries, ‘ggplot2’ for creating plots and ‘gapminder’ for accessing the dataset. view(gapminder) This command displays the ‘gapminder’ dataset. ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + geom_point(alpha = 0.7) + facet_wrap(~year) ggplot(): Initiates the creation of a new ggplot object. data=gapminder: Specifies the dataset to be used, which is gapminder. mapping=aes(...): Defines the aesthetic mappings. Here: x=gdpPercap: GDP per capita on the x-axis. y=lifeExp: Life expectancy on the y-axis. size=pop: The size of points is determined by the population. color=continent: Points are colored based on the continent. geom_point(alpha=0.7): Adds points to the plot with a transparency (alpha) of 0.7, making overlapping points more visible. facet_wrap(~year): Creates multiple plots, each representing a different year. The tilde ~ indicates that the variable to be faceted is year. So, the final result is a scatter plot where each point represents a country, with GDP per capita on the x-axis, life expectancy on the y-axis, point size based on population, point color based on continent, and separate panels for each year.
Example 2: Using `scale_x_log10()` function to transform gdpPercap into log10 scale: ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + geom_point(alpha = 0.7) + facet_wrap(~year) + scale_x_log10() 1. ggplot() and geom_point(): These functions are the same as in the previous code and are used to set up the basic structure of the scatter plot. 2. facet_wrap(~year): This part creates separate panels for each year, as in the previous example. 3. scale_x_log10(): This function is used to apply a logarithmic scale to the x-axis. Specifically, scale_x_log10() transforms the x-axis to a logarithmic scale with a base of 10. This is often used when dealing with data that spans several orders of magnitude, such as GDP per capita, to make the visualization more interpretable.
So, the addition of scale_x_log10() in this code indicates that the x-axis (GDP per capita) will be displayed on a logarithmic scale, providing a clearer representation of the data when there is a wide range of values. Line Graphs (aka time series graphs): Line graphs, also known as time series graphs, are a type of data visualization that represents data points over a continuous interval or time span. These graphs are particularly useful for showing trends, patterns, and relationships in data that evolve over time. Time series graphs typically have time on the x-axis (horizontal axis) and a variable of interest on the y-axis (vertical axis).
Example 1: Time series graph for lifeExp by year for the two countries in the continent Oceania This R code uses the ggplot2 package to create a plot using the gapminder dataset, specifically focusing on the Oceania continent. library(ggplot2) library(gapminder) help(gapminder, package = "gapminder") library(dplyr) plotdata <- gapminder %>% filter(continent == "Oceania") plotdata ggplot(data = plotdata, mapping = aes(x = year, y = lifeExp, color = country, size = pop)) + geom_point(alpha = 0.7) + geom_line(linewidth = 1) 1. library(ggplot2): Loads the ggplot2 library, which is a powerful data visualization package in R. 2. library(gapminder): Loads the gapminder library, which contains the gapminder dataset. This dataset includes information about countries, continents, population, life expectancy, and GDP per capita over time. 3. help(gapminder, package = "gapminder"): Displays help information for the gapminder dataset. 4. library(dplyr): Loads the dplyr library, which is used for data manipulation and filtering. 5. plotdata <- gapminder %>% filter(continent == "Oceania"): Creates a new data frame called plotdata by filtering the gapminder dataset to include only rows where the continent is "Oceania." 6. plotdata: Displays the filtered data frame, showing only the data for countries in the Oceania continent. 7. The ggplot function is used to create a plot: a. data = plotdata: Specifies the data frame to be used for plotting. b. mapping = aes(...): Maps the aesthetics (variables) to the visual elements of the plot. i. x = year: Maps the x-axis to the "year" variable.
ii. y = lifeExp: Maps the y-axis to the "lifeExp" (life expectancy) variable. iii. color = country: Colors the points based on the "country" variable. iv. size = pop: Sizes the points based on the "pop" (population) variable. 8. geom_point(alpha = 0.7): Adds a scatter plot layer with points, where alpha controls the transparency of the points. 9. geom_line(linewidth = 1): Adds a line plot layer connecting the points, with a specified line width of 1. The code creates a scatter plot and line plot visualizing life expectancy over time for countries in the Oceania continent. Each point represents a country, colored based on the country and sized based on its population. The transparency of the points is set to 0.7, and a line connects the points for each country. Example 2: Time series graph of medianLifeExp by year for the five continents This R code also uses the ggplot2, gapminder, and dplyr libraries to create a summarized dataset and display it. library(ggplot2) library(gapminder) library(dplyr) plotdata <- gapminder %>% group_by(year, continent) %>% summarize(medianLifeExp = median(lifeExp)) plotdata
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help