R programming

devtools::install_version("dplyr", version = "0.8.5", repos = "http://cran.us.r-project.org", lib="C:/Users/zhao752/Documents/R/win-library/OLDPKGs")



Economic distance

People live in 3-D world, because human-beings cannot really understand logic of higher dimensions.

Gravity modeling in the trade economic theories is useful in understanding higher dimensional meanings of distance, which might not be physical.

Examples are like credit card promotes spending by decreasing the distance (e.g., processing time, security, risk, and rewards) of the monetary flow.

Likewise, today, I bought DataCamp to decrease the virtual distance to learning advanced R programming.

Let's see how this goes.

There are 13 courses with about ~50 hours of learning time on the basic R skill track.


Finished Introduction to R today (~4hr)

Pretty basic but still something new learned

  • List is very useful. [[ ]] should be used for selecting items from a list

  • str() is short for structure

  • tbl_df vs. data.frame

Intermediate R (~6hr)

  • && or || effects on the first element of a set of or vector

awards <- c("Won 1 Oscar.",

"Won 1 Oscar. Another 9 wins & 24 nominations.",

"1 win and 2 nominations.",

"2 wins & 3 nominations.",

"Nominated for 2 Golden Globes. 1 more win & 2 nominations.",

"4 wins & 1 nomination.")


sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)

Sys.Date()

Sys.time()


Dplyr & Tidyverser (~<4hr)

count(var, sort = T)


counties_selected %>%

group_by(state) %>%

top_n(1, population)


counties %>%

select(state, county, drive:work_at_home) >%>

select(state, county, starts_with("income"))


contains()

starts_with()

ends_with()

last_col()


counties %>%

transmute(state, county, fraction_men = men / population)


geom_point() + expand_limits(y = 0)



library(broom)

tidy(model)

library(tidyr)

library(purrr)

by_year_country %>% nest(-country) %>%

mutate(models = map(data, ~ lm(percent_yes ~ year, .))) %>%

mutate(tidied = map(models, tidy)) %>%

unnest(tidied)

https://stackoverflow.com/questions/22713325/fitting-several-regression-models-with-dplyr


example <- c("apple", "banana", "apple", "orange")

recode(example,

apple = "plum",

banana = "grape")




v <- list(1, 2, 3)

map(v, ~ . * 10)


expand.grid( )



.data %>% is.na() %>% colSums()

dplyr:: pull() gives a vector

df[["Sepal.Length"]]

select() gives df

df["Sepal.Length"]


library(simputation)

nhanes_imp <- impute_lm(nhanes, Height + Weight ~ .)


function(df, formula) {

# Extract name of response variable

imp_var <- as.character(formula[2])

# Save locations where the response is missing

missing_imp_var <- is.na(df[imp_var])

# Fit logistic regression mode

logreg_model <- glm(formula, data = df, family = binomial)

# Predict the response

preds <- predict(logreg_model, type = "response")

# Sample the predictions from binomial distribution

preds <- rbinom(length(preds), size = 1, prob = preds)

# Impute missing values with predictions

df[missing_imp_var, imp_var] <- preds[missing_imp_var]

return(df)


tao_imp <- hotdeck(tao)


# Create boolean masks for where is_hot and humidity are missing

missing_is_hot <- tao_imp$is_hot_imp

missing_humidity <- tao_imp$humidity_imp


for (i in 1:3) {

# Set is_hot to NA in places where it was originally missing and re-impute it

tao_imp$is_hot[missing_is_hot] <- NA

tao_imp <- impute_logreg(tao_imp, is_hot ~ sea_surface_temp)

# Set humidity to NA in places where it was originally missing and re-impute it

tao_imp$humidity[missing_humidity] <- NA

tao_imp <- impute_lm(tao_imp, humidity ~ sea_surface_temp + air_temp)

}



Cluster analysis

# Calculate the Distance

dist_players <- dist(lineup, method = 'euclidean')


# Perform the hierarchical clustering using the complete linkage

hc_players <- hclust(dist_players, method = 'complete')


# Calculate the assignment vector with a k of 2

clusters_k2 <- cutree(hc_players, 2)


# Create a new data frame storing these results

lineup_k2_complete <- mutate(lineup, cluster = clusters_k2)


ggplot2::ggplot(lineup_k2_complete) + ggplot2::geom_point(ggplot2::aes(x,y, color = cluster, size = cluster))


# Prepare the Distance Matrix

dist_players <- dist(lineup)


# Generate hclust for complete, single & average linkage methods

hc_complete <- hclust(dist_players, method = "complete")

hc_single <- hclust(dist_players, method = "single")

hc_average <- hclust(dist_players, method = "average")


# Plot & Label the 3 Dendrograms Side-by-Side

# Hint: To see these Side-by-Side run the 4 lines together as one command

par(mfrow = c(1,3))

plot(hc_complete, main = 'Complete Linkage')

plot(hc_single, main = 'Single Linkage')

plot(hc_average, main = 'Average Linkage')


library(dendextend)

# Create a dendrogram object from the hclust variable

dend_players <- as.dendrogram(hc_players)

# Plot the dendrogram

plot(color_branches(dend_players, h = 20))


library(ggdendro)







Did a test

  • map(values, ~.x + 5)

  • map_dbl(vectors, mean); lapply(vectors, mean)

  • case_when(

day == "Saturday" ~ "Weekend",

day == "Sunday" ~ "Weekend",

TRUE ~ "Weekday" )

  • microbenchmark::microbenchmark()


Shell script

Wild cards in Shell

  • ? matches a single character, so 201?.txt will match 2017.txt or 2018.txt, but not 2017-01.txt.

  • [...] matches any one of the characters inside the square brackets, so 201[78].txt matches 2017.txt or 2018.txt, but not 2016.txt.

  • {...} matches any of the comma-separated patterns inside the curly brackets, so {*.txt, *.csv} matches any file whose name ends with .txt or .csv, but not files whose names end with .pdf.

To create a shell variable, you simply assign a value to a name:

training=seasonal/summer.csv

without any spaces before or after the = sign

for filetype in gif jpg png; do echo $filetype; done

chmod +x script-name-here.sh

./script-name-here.sh

bash script-name-here.sh


$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'

(W)elcome (T)o (T)he (G)eek (S)tuff

https://regex101.com/


which bash


The name comes from the Greek word for time, chronos.

Unix has a bewildering variety of text editors