6 How to do simple re-coding
At work, sometimes we may need to recode a categorical variable to another one according to some mapping rules. This is the so-called “re-coding”. Below is an R function that I write for recording.
simple_recoding <- function(v, from, to, mapping_rule_data = NULL)
{L <- length(v)
N <- length(to)
if(is.null(mapping_rule_data) == TRUE)
{mapping_rule <- matrix(rep(1:N, each = 2), N, 2, byrow = TRUE)}
else
{mapping_rule <- matrix(mapping_rule_data, N, 2, byrow = TRUE)}
the_result <- rep("", L)
for(i in 1:L)
for(j in 1:N)
{a <- mapping_rule[j, 1]
b <- mapping_rule[j, 2]
if(v[i] %in% from[a : b]) the_result[i] <- to[j]
}
return(the_result)
}Let me use two examples to explain.
Example 1:
(x <- sample(letters, 30, replace = TRUE))## [1] "v" "e" "l" "e" "y" "e" "d" "h" "a" "q" "b" "h" "w" "x" "j" "c" "m"
## [18] "g" "f" "z" "v" "c" "s" "o" "l" "t" "z" "s" "x" "a"
We want to re-code x to y, where x has lowercase letters and y has the corresponding uppercase letters. In this case.
from = lettersand
to = LETTERSWe can use the default
mapping_rule_data = NULLsince this is a one-to-one mapping.
(y <- simple_recoding(x, from = letters, to = LETTERS))## [1] "V" "E" "L" "E" "Y" "E" "D" "H" "A" "Q" "B" "H" "W" "X" "J" "C" "M"
## [18] "G" "F" "Z" "V" "C" "S" "O" "L" "T" "Z" "S" "X" "A"
Example 2:
(u <- sample(letters, 30, replace = TRUE))## [1] "l" "h" "t" "a" "q" "m" "x" "t" "g" "b" "z" "n" "u" "i" "o" "i" "k"
## [18] "v" "b" "b" "h" "p" "j" "n" "d" "t" "t" "k" "d" "a"
We want to re-code u to w, where the mapping rule is as follows. \[ \begin{array}{ccc} \left\{\hbox{a, b, c, d, e}\right\} &\longrightarrow & \left\{\hbox{A}\right\}\\ \left\{\hbox{f, g, h, i, j}\right\} &\longrightarrow & \left\{\hbox{B}\right\}\\ \left\{\hbox{k, l, m, n, o}\right\} &\longrightarrow & \left\{\hbox{C}\right\}\\ \left\{\hbox{p, q, r, s, t}\right\} &\longrightarrow& \left\{\hbox{D}\right\}\\ \left\{\hbox{u, v, w, x, y}\right\} &\longrightarrow& \left\{\hbox{E}\right\}\\ \left\{\hbox{z}\right\} &\longrightarrow& \left\{\hbox{Z}\right\} \end{array} \] In this case,
from = lettersand
to = c(LETTERS[1:5], "Z")But note that
mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )because letters[1:5] are mapped to “A”, letters[6:10] are mapped to “B”, and so on.
(simple_recoding(u, from = letters, to = c(LETTERS[1:5], "Z"),
mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )))## [1] "C" "B" "D" "A" "D" "C" "E" "D" "B" "A" "Z" "C" "E" "B" "C" "B" "C"
## [18] "E" "A" "A" "B" "D" "B" "C" "A" "D" "D" "C" "A" "A"
Exercise:
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))Create a new variable called “RGB” following the mapping rules \[ \begin{array}{ccc} \left\{\hbox{red, orange, yellow}\right\} &\longrightarrow& \left\{\hbox{R}\right\}\\ \left\{\hbox{green}\right\} &\longrightarrow& \left\{\hbox{G}\right\}\\ \left\{\hbox{blue}\right\} &\longrightarrow& \left\{\hbox{B}\right\} \end{array} \]
Answer to the Exercise:
rm(list = ls())
# load package
library(dplyr)
source("simple_recoding.R")
# create a fake data set
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))
fk_data_1 <-
fk_data %>%
mutate(RGB = simple_recoding(my_colors,
from = my_colors,
to = c("R", "G", "B"),
mapping_rule_data = c(1, 3, 4, 4, 5, 5)))