Readability vs speed in R

I have bad news for those of you trying to produce lucid code!

In his blog Radford M. Neal, Professor at the University of Toronto, published an article with the headline Two Surprising Things about R. He worked out, that parentheses in mathematical expression slow down the run-time dramatically! In contrast it seems to be less time consuming to use curly brackets. I verified these circumstances to be true:

> x=10
> f <- function (n) for (i in 1:n) 1/(1*(1+x))
> g <- function (n) for (i in 1:n) (((1/(((1*(((1+x)))))))))
> system.time(f(10^6))
   user  system elapsed 
  2.231   0.000   2.232 
> system.time(g(10^6))
   user  system elapsed 
  3.896   0.000   3.923 
> # in contrast with curly brackets
> h <- function (n) for (i in 1:n) 1/{1*{1+x}}
> i <- function (n) for (i in 1:n) {{{1/{{{1*{{{1+x}}}}}}}}}
> system.time(h(10^6))
   user  system elapsed 
  1.974   0.000   1.974 
> system.time(i(10^6))
   user  system elapsed 
  3.204   0.000   3.228

As you can see adding extra parentheses is not really intelligent concerning run-time, and not in a negligible way. This fact shocked me, because I always tried to group expressions to increase the readability of my code! Using curly brackets speeds up the execution in comparison to parentheses. Both observations are also surprising to me! So the conclusion is: Try to avoid redundant parentheses and/or brackets!

To learn more about the why you are referred to his article. He also found a interesting observation about squares. In a further article he presents some patches to speed up R.

Damn scoping in R

Ok, R is very well-considered in certain respects, but there are also some things annoying me… This time it’s scoping…

Let’s have a look to the following code:

	if (runif(1) > .5)
		x = 1

First it looks damn unspectacular. But wait, whats that:

> x=0
> fun()
[1] 1
> fun()
[1] 0

Taking a closer look to the function shows that the returned value is randomly chosen from local ( runif(1) > .5 ) or global scope ( runif(1) <= .5 ). So you can’t expect a result from this function. Nasty, especially while debugging external code, isn’t it? :-)

> sum(sapply(1:10^6, function (null) fun()))/10^6
[1] 0.499681

So again my advise: Think about such specific features! This won’t happen in any sensible language…

Auth issues

Sitting on an almost well configured host, I experienced some authentication issues the last few days…

So for example I’m using xtrlock as default X locking mechanism, but if I try to run it on this machine I got the following error:

/tmp % xtrlock
password entry has no pwd
1 /tmp %

Mmh, that is crap. My workaround to temporarily avoid this problem: Connecting to another host via SSH, running xtrlock within a GNU screen session ;-) But that’s no solution for a longer time… So I started debugging. First of all I grabbed the sources from the apt repository and searched for this error message. Turned out to be this piece of code (beginning with line 94 of xtrlock.c ):

errno=0;  pw= getpwuid(getuid());
  if (!pw) { perror("password entry for uid not found"); exit(1); }
  sp = getspnam(pw->pw_name);
  if (sp)
    pw->pw_passwd = sp->sp_pwdp;

  /* logically, if we need to do the following then the same 
     applies to being installed setgid shadow.  
     we do this first, because of a bug in linux. --jdamery */
  /* we can be installed setuid root to support shadow passwords,
     and we don't need root privileges any longer.  --marekm */

  if (strlen(pw->pw_passwd) < 13) {
    fputs("password entry has no pwd\\n",stderr); exit(1);

Ok, seems that the provided password(-hash) is shorter than 13 characters… Going on debugging, the content of pw comes from getpwuid(getuid()) and seems to be ok (matches my users profile like it can be found in /etc/passwd ). At this time (line 1) pw->pw_passwd contains only an single x , more information can’t be retrieved from the passwd -file.. Next the code checks whether SHADOW_PWD is defined, means whether we use an additional shadow -file. Since thats the case this code is executed and the variable sp gets the broken-out fields of the record in the shadow password database that matches the username pw->pw_name (validated, my user). Checking this sp variable I recognized that it is null ! So pw->pw_passwd won’t be updated and still contains the single x from the passwd entry… First I thought about a bug in the getspnam () function, such things might happen due to the Debian unstable release I’m using, but after some further thoughts I checked the shadow file itself:

/tmp % l /etc/shadow
-rw-r----- 1 root root 2673 Feb 16 15:49 /etc/shadow

In comparison with other systems with working xtrlock instances I figured out, that this file shouldn’t only be owned by root. Instead the group has to be shadow! So here is the solution to this issue:

/tmp % chgrp shadow /etc/shadow

And everything is working fine again. Have no idea what or who changed the permissions for the shadow-file…

Update: By the way, afterwards I tried to use Xscreensaver instead of xtrlock, but I wasn’t able to unlock the screen when the shadow rights are wrong. The /var/log/auth.log held messages like that:

Feb 17 10:14:32 HOST xscreensaver: pam_unix(xscreensaver:auth): conversation failed
Feb 17 10:14:32 HOST xscreensaver: pam_unix(xscreensaver:auth): auth could not identify password for [USER]

But this is just for google-searchers ;-)

Open Source DNA

Yesterday I was a bit confused when I read this tweet. Manu Sporny, founder and CEO of Digital Bazaar, announced in his blog that he has published his genome..

He send some saliva to 23andme, they analyzed his DNA and provided his genetic code to him (let’s neglect the discussion whether data from 23andme-chips represent a fully sequenced genome..). This process is very smart and not expensive, so this part of his announcement is not spectacular. Lot’s of people are doing so.

The interesting part of this article: He published the results (roughly 1 million SNP markers) from 23andme as open source project to github, licensed under CC0! So he has released all his rights on this data.

In general a very impressing step, he might be the first person who published its DNA under such a license. His intentions are more than exemplary, providing access to genetic data to everyone that wants to work with it, i.e. researchers.

So far, so good, but there are some disadvantages, he still dealt with some of it. For example, what if anybody uses this information against him? I.e. healthcare provider, they might deny him to avoid high costs because they detected some pre-existing conditions in his DNA. It may also affect employment and can lead to discrimination. His reaction:

I’ve thought long and hard about each of those questions and the many more that you ask yourself before publishing this sort of personal data. There are large privacy implications in doing this. However, speaking solely for myself, I think the benefits outweigh the drawbacks.

Very nice, but there are also some ugly implications he apparently didn’t thought about! All these disadvantages don’t only affect himself, they may also affect relatives (children, parents, siblings..). Did they all agree with this publication?

I can’t see the advantages to an anonymously publication. Attach some demographic information like age, gender, educational background and everyone is satisfied. Then you don’t have to bear any consequences with bugs in your DNA.

With all due respect for his engagement, I think this step is not really sophisticated.

Valentine's Day

Yes, it’s that time again, Feb 14th.. It’s Valentine’s Day.

Don’t know who has told my wife, but now I have to do some love, uuurgh..

How ever, this one is for my little valentine:

'            01110000  01101111
                     10                                   '

Love you soo much, of course! ;-)

PS. If you are able to catch one of these flower or praline seller: beat the living daylights out of them!!

Martin Scharm

stuff. just for the records.

Do you like this page?
You can actively support me!