6 How to do simple re-coding

At work, sometimes we may need to recode a categorical variable to another one according to some mapping rules. This is the so-called “re-coding”. Below is an R function that I write for recording.

simple_recoding <- function(v, from, to, mapping_rule_data = NULL)
{L <- length(v)
 N <- length(to)
 
 if(is.null(mapping_rule_data) == TRUE) 
  {mapping_rule <- matrix(rep(1:N, each = 2), N, 2, byrow = TRUE)}
 else
  {mapping_rule <- matrix(mapping_rule_data, N, 2, byrow = TRUE)}
 
 the_result <- rep("", L)
 
 for(i in 1:L)
   for(j in 1:N)
   {a <- mapping_rule[j, 1]
    b <- mapping_rule[j, 2]
    if(v[i] %in% from[a : b]) the_result[i] <- to[j]
   }
 return(the_result)
}

Let me use two examples to explain.

Example 1:

(x <- sample(letters, 30, replace = TRUE))

##  [1] "v" "e" "l" "e" "y" "e" "d" "h" "a" "q" "b" "h" "w" "x" "j" "c" "m"
## [18] "g" "f" "z" "v" "c" "s" "o" "l" "t" "z" "s" "x" "a"

We want to re-code x to y, where x has lowercase letters and y has the corresponding uppercase letters. In this case.

from = letters

and

to = LETTERS

We can use the default

mapping_rule_data = NULL

since this is a one-to-one mapping.

(y <- simple_recoding(x, from = letters, to = LETTERS))

##  [1] "V" "E" "L" "E" "Y" "E" "D" "H" "A" "Q" "B" "H" "W" "X" "J" "C" "M"
## [18] "G" "F" "Z" "V" "C" "S" "O" "L" "T" "Z" "S" "X" "A"

Example 2:

(u <- sample(letters, 30, replace = TRUE))

##  [1] "l" "h" "t" "a" "q" "m" "x" "t" "g" "b" "z" "n" "u" "i" "o" "i" "k"
## [18] "v" "b" "b" "h" "p" "j" "n" "d" "t" "t" "k" "d" "a"

We want to re-code u to w, where the mapping rule is as follows. \[ \begin{array}{ccc} \left\{\hbox{a, b, c, d, e}\right\} &\longrightarrow & \left\{\hbox{A}\right\}\\ \left\{\hbox{f, g, h, i, j}\right\} &\longrightarrow & \left\{\hbox{B}\right\}\\ \left\{\hbox{k, l, m, n, o}\right\} &\longrightarrow & \left\{\hbox{C}\right\}\\ \left\{\hbox{p, q, r, s, t}\right\} &\longrightarrow& \left\{\hbox{D}\right\}\\ \left\{\hbox{u, v, w, x, y}\right\} &\longrightarrow& \left\{\hbox{E}\right\}\\ \left\{\hbox{z}\right\} &\longrightarrow& \left\{\hbox{Z}\right\} \end{array} \] In this case,

from = letters

and

to = c(LETTERS[1:5], "Z")

But note that

mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )

because letters[1:5] are mapped to “A”, letters[6:10] are mapped to “B”, and so on.

(simple_recoding(u, from = letters, to = c(LETTERS[1:5], "Z"),
                 mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )))

##  [1] "C" "B" "D" "A" "D" "C" "E" "D" "B" "A" "Z" "C" "E" "B" "C" "B" "C"
## [18] "E" "A" "A" "B" "D" "B" "C" "A" "D" "D" "C" "A" "A"

Exercise:

fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))

Create a new variable called “RGB” following the mapping rules \[ \begin{array}{ccc} \left\{\hbox{red, orange, yellow}\right\} &\longrightarrow& \left\{\hbox{R}\right\}\\ \left\{\hbox{green}\right\} &\longrightarrow& \left\{\hbox{G}\right\}\\ \left\{\hbox{blue}\right\} &\longrightarrow& \left\{\hbox{B}\right\} \end{array} \]

Answer to the Exercise:

rm(list = ls())

# load package
library(dplyr)

source("simple_recoding.R")

# create a fake data set
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))

fk_data_1 <- 
  fk_data %>% 
  mutate(RGB = simple_recoding(my_colors, 
                               from = my_colors, 
                               to = c("R", "G", "B"),
                               mapping_rule_data = c(1, 3, 4, 4, 5, 5)))