Second-Guessing R

I keep doing experiments with R, and with its Tidyverse package, to discover whether these do what I think they’re doing. Am I justified in spending the time?

I’ve said before that the Tidyverse follows rather different conventions from those of base R. This is something Bob Muenchen wrote about in “The Tidyverse Curse”. Dare I add that he first published this when updating an article called “Why R is Hard to Learn”? I’ve decided that it’s worth putting up with these differences. They are outweighed by the Tidyverse’s benefits. But it does mean I have to understand its specification thoroughly. If I don’t, my code might be wrong. But the understanding is difficult, because the documentation sometimes lacks detail.

Moreover, I find myself second-guessing it, because I’m never sure how much clever processing might be being done by non-standard evaluation, or even by base-R assignment and vectorisation.

It doesn’t help that I’ve used over 20 other programming languages, some of which enable you to extend the language by defining macros — that is, functions that run code while your program is being compiled. In Prolog, a language I once taught Artificial Intelligence with, built-in functions named term_expansion and goal_expansion can look over your code and replace it by other code. You can make them read shorthand notations that describe a problem, and expand those notations into sequences of function calls to solve that problem. A well-known application built into most Prologs is “definite clause grammars” or DCGs. With these, you can write rules defining the grammar of — for example — English. These rules get rewritten by term_expansion, ending up as code that parses strings and discovers whether they are grammatically correct. Markus Triska’s “Prolog DCG Primer” shows what DCGs look like, while his “Prolog Macros” explains the general working of such things.

In Poplog, another language I taught with, there is a very sophisticated macro system, as John Gibson shows in “POP-11 COMPILER PROCEDURES”. You can even make the compiler put machine instructions into your code.

Given my experience of these and of compiler-writing, plus the knowledge that R can do weird and wonderful things with non-standard evaluation, it’s not surprising that I ask myself how much the Tidyverse is hacking my code behind the scenes. For instance, the Tidyverse has an n() function, which “can only be used from within summarise(), mutate() and filter()“. It “returns the number of observations (rows) in each group”. That document is actually wrong, because n() can be used from transmute() too. But apart from that, I wonder whether n() really is a function, or whether some clever bit of non-standard evaluation recognises the name and replaces it by the numbers of observations, or instructions to calculate same.

With n(), that probably doesn’t matter much. It takes no arguments, so I don’t have to worry about how it processes them. But summarise(), mutate() etc. can also take functions such as sum(). Now, the documentation for summarise() has a section named “Useful functions”. This lists a few names: mean(), median(), sd(), min(), max(), and some others. Are these the same as the mean() and min() and max() I know in base R? Or are they more like the functions the Tidyverse describes as Select helpers? Is the Tidyverse recognising the names as special, or the code pointers or whatever identifies a function, so that it treats sum differently to a home-grown function with the same definition? And whatever these functions are, exactly what is the Tidyverse passing as their arguments, and is it hacking their results?

Lest I seem over-cautious, remember again that I want my code to be reliable, and that the semantics of R and the Tidyverse are not stated anywhere as precisely as, say, those of Pascal. Moreover, there’s the strange behaviour of list() and c() referred to in connection with summarise( lists=list(value) ) in my post “Experiments with count(), tally(), and summarise(): how to count and sum and list elements of a column in the same call”.

So in answer to my original question: yes, I believe I am justified.

Generalised Inverses, Adjunctions, Aesthetic Balance, and Too Many Cartoon Bricks

In the essay “Drawing as Translation” which is the topic of my previous post, I had this little diagram:

This represents the process of restoring the “balance” of a drawing, if one were first to copy it mechanically from the original scene. Imagine a pen-and-ink drawing of a house. If the drawing showed every brick and every slate, it would look too “busy”. There would be far too much density of line on the walls and roof, as opposed to neighbouring areas such as windows and doors. In reality, the bricks don’t stand out as much as our drawing would make them do, and we need to restore that balance. We can’t do it by thinning or greying the lines around them — the language of pen and ink doesn’t allow that — so we do the next best thing. We erase some patches of brick altogether.

That’s illustrated in the diagram. The double arrow, which as it were has its head in one language and its tail in another, represents the original mechanically-made translation from scene to drawing. Then the single horizontal arrow, which has its head and tail in the same language, represents “debricking” or “detexturing”. They compose to give the diagonal arrow, which represents going from the tonal balance in the original scene, to that in the final drawing. In mathematics, this would be called a generalised inverse: it’s an operation which undoes another, not exactly, but as closely as the circumstances allow.

Or, and with more mathematical potential, it might be possible to see this as the operation that category theorists call “adjunction”. I say more about this in “Drawing as Translation II”. Adjunctions are extremely powerful mathematical tools. It would be interesting if they could be used to define aesthetic measure.

Drawing as Translation

I’ve subtitled this blog “What a web developer does”, and most of my recent posts have been about web development, mainly in WordPress. But I do other things too. One is drawing cartoons, which I blogged about in “How to Make Pencil on Tracing Paper Look Good with Gimp”. I recently went to the Oxford Literary Festival, and to a talk by Matthew Reynolds, Professor of English and Comparative Criticism. He was introducing his book Translation: A Very Short Introduction. Inspired by this, I wrote an essay on “Drawing as Translation”.

My essay is a companion to a talk I gave to Workshop Thales in 2015 on the semantics of line drawings. It uses some of the same examples, but with a different emphasis. I argue that it isn’t wrong to omit or exaggerate; and indeed, may be unavoidable, given the constraints of the visual language we’re translating into.

I also look briefly at “blooming”, which I came across in the Language Log post “Blooming, embellishment, and bombs” by Victor Mair (17 August 2015), which refers to a comment by Judith Strauser (3 August 2015) to an earlier post also by Victor Mair (2 August 2015), “French vs. English”. Blooming is the increase in size of a translation relative to its original. There may be more than one reason for it; it’s not just because some languages are more concise than others. I discuss blooming in drawing; it would be interesting to see examples of blooming in translating one programming language to another.

A Good Review II

Here’s another nice review, written for me by Andrew Moore. To the general public, Andrew may be known for recent features such as Evidently Cochrane’s “Paracetamol: widely used and largely ineffective” and (with Nicholas Moore) the European Journal of Hospital Pharmacy‘s “Paracetamol and pain: the kiloton problem”. But these are just the tip of a research iceberg representing more than 40 years, 500 scientific and clinical publications, 200 systematic reviews, 100 Cochrane reviews, and a number of books on evidence-based medicine and pain. I was introduced to Andrew at the Oxford Pain Research Group in 2008, and have since helped him with many data analyses; and with his colleague Sebastian Straube, wrote about this work for Dr Dobbs in our post “Trials and tribulations: measuring drug efficacy in clinical trials, plotting graphs in Java with gnuplot, and reading Excel with JExcelAPI”. That was in Java, but I’ve done our more recent work in R, because of its conciseness and the huge number of library functions available for reading, reformatting, analysing and writing such data. And I’ve also hosted his Bandolier evidence-based medicine website. The text below is Andrew’s.

The spreadsheet is a terrific boon for science and medicine. It allows huge amounts of information to be processed and analysed.

And that is fine when you are following a well known process, down a road well-travelled.

But the cutting edge of science and medicine is, by definition, off that road. Being at the front involves asking awkward questions — those for which there are no answers or processes.

Now large spreadsheets can be the barrier, because transforming them from something designed for one purpose into something useful for a different purpose is hard and fraught with potential error.

That’s where Jocelyn can help — helping researchers make better use of the tools they have to answer questions they didn’t think they could answer.

The three examples below come from clinical trials in acute and chronic pain, where analysis at the level of the individual patient allowed better insights into trial design and patient benefit.

The following papers used or were inspired by Jocelyn’s data analyses:

  • Validating speed of onset as a key component of good analgesic response in acute pain.
    Moore RA, Derry S, Straube S, Ireson-Paine J, Wiffen PJ.
    Eur J Pain. 2015 Feb;19(2):187-92. doi: 10.1002/ejp.536.

  • Faster, higher, stronger? Evidence for formulation and efficacy for ibuprofen in acute pain.
    Moore RA, Derry S, Straube S, Ireson-Paine J, Wiffen PJ.
    Pain. 2014 Jan;155(1):14-21. doi: 10.1016/j.pain.2013.08.013

  • Interference with work in fibromyalgia: effect of treatment with pregabalin and relation to pain response.
    Straube S, Moore RA, Paine J, Derry S, Phillips CJ, Hallier E, McQuay HJ.
    BMC Musculoskelet Disord. 2011 Jun 3;12:125. doi: 10.1186/1471-2474-12-125.

  • Minimum efficacy criteria for comparisons between treatments using individual patient meta-analysis of acute pain trials: examples of etoricoxib, paracetamol, ibuprofen, and ibuprofen/paracetamol combinations after third molar extraction.
    Moore RA, Straube S, Paine J, Derry S, McQuay HJ.
    Pain. 2011 May;152(5):982-9. doi: 10.1016/j.pain.2010.11.030.

  • Pregabalin in fibromyalgia–responder analysis from individual patient data.
    Straube S, Derry S, Moore RA, Paine J, McQuay HJ.
    BMC Musculoskelet Disord. 2010 Jul 5;11:150. doi: 10.1186/1471-2474-11-150.

  • Fibromyalgia: Moderate and substantial pain intensity reduction predicts improvement in other outcomes and substantial quality of life gain.
    Moore RA, Straube S, Paine J, Phillips CJ, Derry S, McQuay HJ.
    Pain. 2010 May;149(2):360-4. doi: 10.1016/j.pain.2010.02.039.

How to List Blog Posts from outside WordPress

On my website, I’ve got two kinds of page. One kind is like my home page: coded directly as HTML. These pages are static, in that they are files which never change unless I edit them. The other kind of page belongs to this blog. These pages are implemented in WordPress, and are dynamic. When your browser asks for a WordPress page, it sends a web address to my server. The server looks for a PHP script at that address and runs it, and the script decides what HTML to send there and then, based on the contents of WordPress’s database. A good example is the page at which lists my blog posts. But what should I do if I want to list these posts outside WordPress, for example on my home page? There’s an answer at “How to display recent posts outside WordPress” by Paul Green.

It’s the same kind of problem that I solved in “How to Run PHP under WordPress with Justyn’s Magic Includer”. There, I needed to stand outside WordPress and run a script that added information to its database about the names and times and venues of a teacher’s classes, so that they could be displayed by the Promenade theme. Here, I need to stand outside it and run a script that loops through the database returning the text of each and every blog post. In both cases, the scripts need to know where to find the WordPress functions they must call to do the job. In terms of the analogy I used in my Justyn’s Magic Includer post, I need to tell my scripts that to find the WordPress tools, they’ll have to rummage around behind that pile of motorbike spares at the back of my garage.

Here’s a demonstration. The script is below, a shortened version of the one in Paul Green’s post, and also similar to the “Standard Loop” examples in “Class Reference/WP Query” from the authoritative WordPress Codex. You can see what its output looks like by going to


/* posts_demo.php */

A simple script that demonstrates
looping through blog posts and
displaying each one.

require( $_SERVER['DOCUMENT_ROOT'] . '/blog/wp-load.php' ); 

$args = array( 'posts_per_page' => -1 );
$latest_posts = new WP_Query( $args ); 	

while ( $latest_posts->have_posts() ) {
  echo "<BR>\n";
  the_time( 'l jS F, Y' ); 
  echo "<BR>\n";
  echo "<BR><BR>\n";




A Good Review

Here’s a very nice review one of my customers sent. His site is still confidential, so I can’t show it here, but I can say that the WordPress theme he was talking about is a premium theme that works with WP Job Manager. The rest of the text below is his.

Like the majority of business owners, the idea of creating a website can be like stepping off a plane in a busy unfamiliar city where no one speaks your language. You have to trust that someone will understand your basic attempts at communication or you’ll end up spending money on something that you didn’t ask for. Having had a bad experience with a web developer in the past, I decided to purchase a WordPress template so that I could chose the basic functions, look and feel of the site from the outset.

I wrote a brief for Jocelyn in layman’s terms and I was very impressed to see an email in my inbox with a decoded brief from Jocelyn with questions regarding how to tackle some of the shortcoming of the template.

The template needed quite extensive editing in areas and it was clear that Jocelyn had understood exactly what we needed from the site. The users would need to spend time entering information into a database and it was crucial that this process was easy and simple to navigate. Jocelyn’s interpretation of our brief was excellent. Jocelyn integrated a number of plugins, which prevented the need to code custom scripts saving me money. Jocelyn also created a login feature which prevents the website from loading until a registered username and password had been entered.

Jocelyn integrated our logos, colour scheme, graphics and text in a very attractive and well-considered manner linking nicely with the overall look and feel of the site. He also helped us move our domains to our host server, uploaded the site and thoroughly tested the site before he handed it over. I have been very impressed with the time Jocelyn takes to clearly explain what he is doing and when he encounters a problem and needs input from myself. We have asked Jocelyn to manage our site for us which is testament to our trust and confidence in his ability.

Thanks again Jocelyn!