In my last post, I explained how
tribbles make it easy to write data frames
as a sequence of key-value pairs. But how can
I make these data frames act as lookup tables?
By using the base R function
`match`

.

This is how it works. First, I’ll make a tibble:

dict <- tribble( ~key, ~value, 'a', 'A', 'b', 'B', 'c', 'C' )This gives me a two-column table where each key is in the same row as its value:

# A tibble: 3 x 2 key value <chr> <chr> 1 a A 2 b B 3 c C

The values in the second column represent the translations of the keys in the first column.

Now, suppose I want to translate the
string `'b'`

. It’s in row
two of column 1. Its translation is
in row two of column 2. Generalising,
if I want to translate string `s`

,
I find out which row `r`

of column 1 it’s in, and then
treat row `r`

of column 2
as its translation. I can find
its row using `match`

.
Here are three examples of `match`

looking up a string in a vector of strings:

>match( 'a', c('a','b','c') )[1] 1 >match( 'b', c('a','b','c') )[1] 2 >match( 'c', c('a','b','c') )[1] 3

Because the columns of tibbles (and
data frames) are vectors, I can use
`match`

on these.
Therefore, I can define my lookup function
in this way:

lookup <- function( dict, v ) { keys <- dict[[ 1 ]] indices <- match( v, keys ) translations <- dict[[ 2 ]] result_col <- translations[ indices ] result_col }

There’s a subtlety here. Many R functions are “vectorised”. To quote from the language definition:

R deals with entire vectors of data at a time, and most of the elementary operators and basic mathematical functions like log are vectorized (as indicated in the table above). This means that e.g. adding two vectors of the same length will create a vector containing the element-wise sums, implicitly looping over the vector index. This applies also to other operators like`-`

,`*`

, and`/`

as well as to higher dimensional structures.

One of the built-in functions that’s vectorised
is `match`

. So if I pass
a *vector* as its first argument,
it will look up *each element thereof*
in the second element:

>This is why I gave my variables plural names. My function is operating on amatch( c('b','c','a','b'), c('a','b','c') )[1] 2 3 1 2

*vector*, the

*entire first column*of a lookup table, and passing that to

`match`

.
I’ll finish with a complete listing of my code and a demo. Here’s the listing:

# dictionaries.R library( tidyverse ) # Returns a dictionary. # This is implemented as a tibble with # 'key' and 'value' columns. # dictionary <- function( ... ) { tribble( ~key, ~value, ... ) } # Translates vector v by looking up # each element in dictionary 'dict'. The # result is a vector whose i'th element # is a translation of the i'th element of # v. # lookup <- function( dict, v ) { keys <- dict[[ 1 ]] indices <- match( v, keys ) # # 'indices' will become a vector whose # i'th element is the position p of # the i'th element of v in 'keys'. # The corresponding element in ' # 'translations' will be its translation. translations <- dict[[ 2 ]] result_col <- translations[ indices ] result_col }The three dots near the top may puzzle some. They denote all the arguments to

`dictionary`

,
which get passed to `tribble`

.
Patrick Burns has some examples in
“The three-dots construct in R”.
And here, mimicking the Python with which I began, is a demo using this code.

>tel <- dictionary( 'jack', 4098, 'sape', 4139 )>tel# A tibble: 2 x 2 key value <chr> <dbl> 1 jack 4098 2 sape 4139 >lookup( tel, 'jack' )[1] 4098