Friday, July 11, 2014

Example 47,385 of poor coding being blamed on R

I am beginning to think of speed tests of programming languages as being just about useless. An NBER working paper, A Comparison of Programming Languages in Economics, (un-gated version here) is only reenforcing the point.

The authors run a common macroeconometric model in a handful of languages and compare speeds. There is a twist, however. 

To make the comparison as unbiased as possible, we coded the same algorithm in each language (which could reflect more about our knowledge of each language than its objective virtues.)

In my mind, this is a poor choice and is itself more biased. Part of choosing a programming language is choosing your programming paradigm. A more fair comparison would be to adopt the dominant paradigm of each language and compare speeds that way.  This choice essentially pits poorly-written R code against well-written C++ code. How is this helpful?

The authors also make the following claims:

Issues such as avoiding loops through vectorization, which could help Matlab or R, are less important in our case. With 17,820 entries in a vector, vectorization rarely helps much in comparison with standard loops.

We did not explore the possibility of mixing language programming such as Rcpp in R. While such alternatives are often useful (although cumbersome to implement), a detailed analysis falls beyond the scope of this paper.

First, I have found that the larger the vector, the more vectorization is an important concept in R.(Edit 7/14/2014 - In fairness, I am not sure that the authors' procedure would vectorize well anyway.) Second, I do not buy the implication that coding in C++ on its own is somehow much less cumbersome that calling a single Rcpp function to import that  C++ code as a function to be called from R.

Their R code, by the way, had an "if" statement in the middle of 3 nested "for" loops which were themselves nested in a "while" loop. Ummmm.... yeah.

Oh, yes. I do believe that R code written that way was "500 to 700 times slower than C++." Imagine that...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.