From Python Dictionaries to Tribbles I: How I Implemented Lookup Tables in R for Numeric Data Codes

As regular readers will know, I’ve been translating an economic model from Python into R. It reads data about the income and expenditure of British households, from sources such as the Family Resources Survey and Family Expenditure Survey . Much of this data is coded as numbers, and the model has to translate these into something intelligible. The Python version uses a kind of built-in lookup table called a “dictionary”: but these don’t exist in R, and I had to implement an equivalent. It was important that I and my colleague be able to initialise the table by writing it as key-value pairs. So I used tribbles…

I’ll explain what Python does first. Here’s an example taken from python.org’s “Dictionaries” tutorial, run on PythonAnywhere’s interactive interpreter:

In [1]: tel = { 'jack': 4098, 'sape': 4139 }
In [2]: tel
Out[2]: { 'jack': 4098, 'sape': 4139 }
In [3]: tel['guido'] = 4127
In [4]: tel
Out[4]: { 'guido': 4127, 'jack': 4098, 'sape': 4139 }
In [5]: tel['jack']
Out[5]: 4098
The first statement creates a dictionary, using curly brackets around its contents. The third and fifth statements change or look up elements, using indices in square brackets. It’s an easy notation.

Our Python model’s dictionaries look more like the one below, which translates region codes to names, but the idea is the same:

{ 1: 'North_East',
  2: 'North_West_and_Merseyside',
  4: 'Yorks_and_Humberside',
  5: 'East_Midlands',
  6: 'West_Midlands',
  7: 'Eastern',
  8: 'London',
  9: 'South_East',
 10: 'South_West',
 11: 'Wales',
 12: 'Scotland',
 13: 'Northern_Ireland'
}

So I needed a data structure that did the same job in R, and a way to initialise it by writing key-value pairs. But whereas lookup tables are built in to Python, they aren’t in R. There are contributed packages for them such as hashmap and hash. But I decided to implement lookup tables as data frames, as it might give me more control if I needed to do anything odd that these packages didn’t allow.

In fact, I used tibbles instead of ordinary data frames. Tibbles, as Hadley Wickham says in the “Tibbles” chapter of R for Data Science, are data frames, but tweaked to make life a little easier. Importantly for me, “make life easier” includes making it easier to enter small amounts of data in a program by using key-value notation. This is done via the function tribble. This call:

tribble(
  ~x, ~y, ~z,
  "a", 2, 3.6,
  "b", 1, 8.5
)
creates a tibble with columns named x, y and z, and the two rows shown under these names just above. R prints it like this:
# A tibble: 2 x 3
      x     y     z
  <chr> <dbl> <dbl>
1     a     2   3.6
2     b     1   8.5

And this call:

tribble(
   ~key, ~value, 
    1  , 'North_East',
    2  , 'North_West_and_Merseyside',
    4  , 'Yorks_and_Humberside',
    5  , 'East_Midlands',
    6  , 'West_Midlands',
    7  , 'Eastern',
    8  , 'London',
    9  , 'South_East',
    10 , 'South_West',
    11 , 'Wales',
    12 , 'Scotland',
    13 , 'Northern_Ireland'
  )
creates a tibble with with two columns named key and value, and 13 rows. Here’s how R prints this one:
# A tibble: 12 x 2
     key                     value
   <dbl>                      <chr>
 1     1                North_East
 2     2 North_West_and_Merseyside
 3     4      Yorks_and_Humberside
 4     5             East_Midlands
 5     6             West_Midlands
 6     7                   Eastern
 7     8                    London
 8     9                South_East
 9    10                South_West
10    11                     Wales
11    12                  Scotland
12    13          Northern_Ireland

So the Tidyverse has made it easy to enter key-value pairs in Python-dictionary-style notation and turn them into tibbles. How do I make these act as lookup tables? See my next post. By the way, the name “tribble” stands for “transposed tibble”.

Leave a Reply

Your email address will not be published. Required fields are marked *