Two Kinds of Conditional Expression: ifelse(A,B,C) versus if (A) B else C

One out of quite a lot of confusing things about R is that it has two kinds of conditional expression. There’s ifelse(); and there’s the if statement. It’s important to know which one to use, as I found when trying to write a conditional expression that chose between lists.

The first thing to appreciate is that if can be used as a conditional expression as well as a conditional statement. Probably most programmers use it as a statement, like this:

> greet_or_leave <- 'GREET'
> if ( greet_or_leave == 'GREET' ) cat('HELLO') else cat('GOODBYE')
HELLO> 
But you can equally well use it as an expression:
greeting <- if ( greet_or_leave == 'GREET' ) 'HELLO' else 'GOODBYE'
> greeting
[1] "HELLO"

The latter is what I’m interested in in this article. How does it compare with ifelse()? For simple uses, they seem to do the same thing:

> ifelse( TRUE, 1, 2 )
[1] 1
> if ( TRUE ) 1 else 2
[1] 1
> ifelse( FALSE, 1, 2 )
[1] 2
> if ( FALSE ) 1 else 2
[1] 2

But this equivalence breaks down when you ask them to return a list rather than a scalar. The ifelse() returns only the first element of the list. To return it all, you have to use if:

> ifelse( TRUE, list(a=1,b=2), list(a=1,b=2) )
[[1]]
[1] 1

> if ( TRUE ) list(a=1,b=2) else list(a=1,b=2)
$a
[1] 1

$b
[1] 2

This bit me when I was using recode() from the Tidyverse. This function takes a vector and translates each element by looking it up in a list of name-replacement pairs formed by the following arguments. Thus, if codes is c( 'a', 'b', 'c' ), the call

recode( codes, a=1, b=2, c=3 )
returns c(1,2,3). I wanted a version of recode which takes all the replacements in one argument. I implemented it by using !!! to splice these into the call, as demonstrated under the “Capturing multiple variables” section of “Programming with dplyr”:
recode_with_list <- function( x, other_args )
{
  recode( x, !!! other_args )
}
So the call
recode_with_list( codes, list( a=1, b=2, c=3 ) )
also returns c(1,2,3).

I used this when translating data about households in our economic model. Each household has a numeric field indicating its region. We need to convert this to a meaningful string, such as “London”, “Scotland”, or “North East”. That’s easy to do with recode_with_list() and a translation list mapping codes to region names. But unfortunately, different data sets use different coding conventions, so I needed conditionals to select between translation lists. Initially, I did this with ifelse(), like this:

translation_list_1 <- 
  list( '1000'='London', '1001'='Scotland', '1002'='North East' )

translation_list_2 <- 
  list( '1'='London', '2'='Scotland', '3'='North East' )

dataset <- tribble( ~id, ~region_codes
                  ,   1,          1000
                  ,   2,          1001
                  ,   3,          1000
                  )

dataset_follows_convention_1 <- TRUE

dataset$regions <-
  recode_with_list( dataset$region_codes
                  , ifelse( dataset_follows_convention_1
                          , translation_list_1
                          , translation_list_2
                          )
                  )

But I found that recode_with_list() complained “Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively”. This must have been because the ifelse() was returning only one list element, and stripping it of its name. After a bit of thought and experimenting, I realised that I could rewrite as:

dataset$regions <-
  recode_with_list( dataset$region_codes
                  , if ( dataset_follows_convention_1 )
                      translation_list_1
                    else
                      translation_list_2
                  )

This worked, but why didn’t ifelse()? The documentation says that ifelse(test, yes, no) returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE. The “same shape” bit is what’s important, because it means that test determines how many elements ifelse(test, yes, no) returns from yes and no. In my case, dataset_follows_convention_1 had only one element (R scalars are, in reality, single-element vectors), which means that ifelse(test, yes, no) only returned one element from translation_list_1 and translation_list_2.

You can see the influence of the “shape” below. As test becomes longer and longer, so do yes and no:

> ifelse( TRUE, translation_list_1, translation_list_2 )
[[1]]
[1] "London"

> ifelse( c(TRUE,FALSE), translation_list_1, translation_list_2 )
[[1]]
[1] "London"

[[2]]
[1] "Scotland"

> ifelse( c(TRUE,FALSE,TRUE), translation_list_1, translation_list_2 )
[[1]]
[1] "London"

[[2]]
[1] "Scotland"

[[3]]
[1] "North East"

I’m not the only person to have been bitten by this, as “Ryogi”‘s Stack Overflow question “if-else vs ifelse with lists” shows. There are probably other things to beware of too. I notice that the results just above have lost the names from my lists. Moreover, the documentation warns that if(test) yes else no is much more efficient and often much preferable to ifelse(test, yes, no) whenever test has length 1. That’s presumably because ifelse() will waste a lot of time selecting and discarding elements. Indeed, there’s also a warning that “Sometimes it is better to use a construction such as

(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in test“.

This is not good. The whole point of a high-level language is to provide notations that enable you to express your problem clearly and concisely. It’s the language’s responsibility to compile them into efficient code, not the programmer’s. R designers, please note.

Leave a Reply

Your email address will not be published. Required fields are marked *