Data Wrangling and Visualization Course

library(tidyverse) ## -- Attaching packages ---------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.1.0 v purrr 0.3.0 ## v tibble 2.0.1 v dplyr 0.8.0.1 ## v tidyr 0.8.2 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ## -- Conflicts ------------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(rio) library(sf) ## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3 library(maps) ## ## Attaching package: 'maps' ## The following object is masked from 'package:purrr': ## ## map library(USAboundaries) #library(albersusa) shp <- st_read("shp/County-AK-HI-Moved-USA-Map.

Investment Comparison

stock_returns_monthly <- c("AAPL","GOOG","NFLX","COST","AMZN","MSFT") %>% tq_get(get = "stock.prices", from = "2017-10-01", to = today()) %>% group_by(symbol) %>% tq_transmute(select = adjusted, mutate_fun = periodReturn, period = "daily", col_rename = "Ra") Background Me and my hypothetical friend want to compare our two different portfolios’. I am interested in the big companies in Washington State while my friend is all about the big tech companies in Silicon Valley. The following portfolio allocations will be put to the test to see which one will earn more money from October 1st, 2017, to today.

Counting Words in Scripture

scrips <- read_csv("lds-scriptures.csv") names <- read_rds(gzcon(url("https://byuistats.github.io/M335/data/BoM_SaviorNames.rds"))) verse <- read_lines("https://byuistats.github.io/M335/data/2nephi2516.txt") Background In 1978, Susan Easton Black calculated the average number of verses per mention of Christ’s name by each book in the Book of Mormon. She found that Christ’s name is mentioned about every 1.7 verses. But what is the average number of words between each reference of Christ outside the context of books, chapters, and verses? Data Analysis names <- names %>% arrange(desc(nchar)) #------- This prevents splitting inside of larger references names2 <- names$name #-------------------------- From tibble to list names3 <- str_c(names2, collapse = "|") #------- Creates one string w/ all references seperated by or statements BoM <- scrips %>% filter(volume_id == 3) %>% #------------------ Filter for just Book of Mormon select(scripture_text) %>% #------------------ We just want scripture text str_c(collapse = " ") %>% #------------------ Creates one string of whole Book of Mormon str_split(names3) #--------------------------- Splits the string into many based on references #map(function(x) str_count(x, "\\w")) for (split in BoM) { #-------------------------- Lets iterate over all those new strings count <- str_count(split, "\\w+") #--------- Counts the words in each string, assigns to count } count_tbl <- tibble(y = count, #---------------- Turn vector into tibble x = seq_along(y)) %>% #----- Create index variable filter(x !

Worldwide Height

Background The Scientific American suggests that we have been getting taller over the years. Is this true? Let’s dig into some data and find out for ourselves. The purpose of this activity is to demonstrate the ability to combine and make use of data from many sources. Data Wrangling # worldwide_height %>% # select_if(colSums(!is.na(.)) > 0) %>% # mutate(rna = rowSums(!is.na(.))) %>% # filter(rna > 2) %>% # gather(3:21, key = year, value = height) %>% # mutate(year2 = year, # year = parse_integer(year)) %>% # rename(year_decade = year, # country = `Continent, Region, Country`, # height.

Exploring 2014 Gun Deaths

data <- read_csv("https://github.com/fivethirtyeight/guns-data/blob/master/full_data.csv?raw=true") Background - Gun Deaths In America In 2014, there were more than 33,000 gun deaths in America. Data from the CDC, FBI, Mother Jones database, Global Terrorism database, and the U of M IPUMS project were all combined into one dataset here. Information includes death counts, homicides, police fatalities, mass shootings, terrorism gun deaths, and population totals. Data Visualizations based on 33,599 gun deaths in 2014 in the U.

Child Mortality

devtools::install_github("drsimonj/ourworldindata") finance <- ourworldindata::financing_healthcare Question What is the relastionship between healthcare expenditure and child mortality over year? finance %>% select(year, country, health_exp_total, child_mort) %>% filter(country == "United States") %>% na.omit() %>% mutate(child_mort = child_mort/100) %>% ggplot(aes(health_exp_total, child_mort)) + geom_point(size = 2) + geom_text(aes(label = year), vjust = 1.3) + scale_x_continuous(labels = scales::dollar) + scale_y_continuous(labels = scales::percent) + labs(x = "\nTotal Health Expenditure", y = "Child Mortality\n", title = "Child Mortality Declines with Increased Health Expediture in the U.

Prevalence of Obesity

obese <- read_csv("prevalence-of-obesity-in-adults-by-region.csv") Background Data comes from Our World in Data. Obesity has been a growing problem for many parts of the world. I want to visualize this trend by region, assuming that more wealthier regions are leading the world. Then, I look at how child mortality is effected by education level. Obesity obese <- obese %>% rename(continent = Entity, year = Year, prevalence = `Prevalence of obesity in adults (18+ years old) (FAO (2017)) (%)`) label <- obese %>% filter(year == 2014) ggplot(obese, aes(year, (prevalence)/100, group = continent, col = continent)) + geom_point() + geom_line() + theme_minimal() + scale_y_continuous(limits = c(0,.

Wealth & Life Expectancy

Gapminder