I keep doing experiments with R, and with its Tidyverse package, to discover whether these do what I think they’re doing. Am I justified in spending the time?
I’ve said before that the Tidyverse follows rather different conventions from those of base R. This is something Bob Muenchen wrote about in “The Tidyverse Curse”. Dare I add that he first published this when updating an article called “Why R is Hard to Learn”? I’ve decided that it’s worth putting up with these differences. They are outweighed by the Tidyverse’s benefits. But it does mean I have to understand its specification thoroughly. If I don’t, my code might be wrong. But the understanding is difficult, because the documentation sometimes lacks detail.
Moreover, I find myself second-guessing it, because I’m never sure how much clever processing might be being done by non-standard evaluation, or even by base-R assignment and vectorisation.
It doesn’t help that I’ve used over 20
other programming languages, some of which
enable you to extend the
language by defining macros —
that is, functions that run code while
your program is being compiled. In
Prolog, a language I once taught Artificial
Intelligence with, built-in functions
can look over your code and replace it by other code. You can make them
read shorthand notations that
describe a problem, and expand those
notations into sequences of function calls to solve
that problem. A well-known application built
into most Prologs is “definite clause
grammars” or DCGs. With these, you can write rules defining
the grammar of — for example — English. These
rules get rewritten by
ending up as code that parses strings and
discovers whether they are grammatically
correct. Markus Triska’s
“Prolog DCG Primer”
shows what DCGs look like, while
his “Prolog Macros”
explains the general working of such things.
In Poplog, another language I taught with, there is a very sophisticated macro system, as John Gibson shows in “POP-11 COMPILER PROCEDURES”. You can even make the compiler put machine instructions into your code.
Given my experience of these and of compiler-writing, plus the knowledge
that R can do weird and wonderful things with
non-standard evaluation, it’s not surprising that
I ask myself how much the Tidyverse is hacking
my code behind the scenes. For instance, the
Tidyverse has an
function, which “can only be used from within
filter()“. It “returns the
number of observations (rows) in each group”. That
document is actually wrong, because
can be used from
But apart from that, I wonder whether
n() really is a function, or whether some clever
bit of non-standard evaluation recognises
the name and replaces it by the numbers of
observations, or instructions to calculate
n(), that probably
doesn’t matter much. It takes no arguments,
so I don’t have to worry about how
it processes them. But
mutate() etc. can also
take functions such as
Now, the documentation for
has a section named “Useful functions”. This lists a few
max(), and some others.
Are these the same as the
I know in base R? Or are they more
like the functions the Tidyverse describes as
Is the Tidyverse recognising the names as special, or the code
pointers or whatever identifies a function,
so that it treats
to a home-grown function with the same definition?
And whatever these functions are, exactly what is the Tidyverse
passing as their arguments, and is it
hacking their results?
Lest I seem over-cautious, remember again
that I want my code to be reliable, and
that the semantics of R and the Tidyverse
are not stated anywhere as precisely as, say,
those of Pascal. Moreover, there’s the
strange behaviour of
c() referred to
in connection with
summarise( lists=list(value) )
in my post
“Experiments with count(), tally(), and summarise(): how to count and sum and list elements of a column in the same call”.
So in answer to my original question: yes, I believe I am justified.