ShortCut[RegEx]: x-modifier

Independent of your programming experiences, you should have learned that regular expressions are more or less write-only.

Write-only? What is he talking about!? Actually I revisited some Perl code with a relatively short reg-ex. Do you think I was able to understand what I’ve thought when I created that piece of code? Not in the slightest!

But there is a smart modifier, that enables you to comment your regular expressions: x. With /x all white-spaces are ignored and with an unescaped # the rest of the line is treated as a comment. I found a nice example, what do you think is this expression for:

/^1?$|^(11+?)\\1+$/

No idea? Don’t even bother, I’m also stumped… Here is the solution: It’s used to check for prime numbers ;-) Using the x-mod the explanation looks much more readable (via Neil Kandalgaonkar):

/
  ^1?$   # matches beginning, optional 1, ending.
         # thus matches the empty string and "1".
         # this matches the cases where N was 0 and 1
         # and since it matches, will not flag those as prime.
|   # or...
  ^                # match beginning of string
    (              # begin first stored group
     1             # match a one
      1+?          # then match one or more ones, minimally.
    )              # end storing first group
    \\1+            # match the first group, repeated one or more times.
  $                # match end of string.
/x

So you see, it’s really helpful to use the x-modifier. At least for your own understanding :-P

A bit more explanation can be found on Perl.com.

Talking R through Java

Today I played a bit with JRI as part of rJava, a Java-R-interface. Here you can learn how to setup for Debian/Ubuntu/akins.

Installation

Assuming you have a running version of Java and GNU’s R, you have to install r-cran-rjava :

aptitude install r-cran-rjava

Shell environment

To talk to R through Java you have to specify three more environmental variables. First of all you need to publish you R installation path, my R is found in /usr/lib64/R :

export R_HOME=/usr/lib64/R

If you didn’t or the path is wrong you’ll fall into trouble:

R_HOME is not set. Please set all required environment variables before running this program.

Second the $CLASSPATH needs to get an update. Precisely you have to add the archives JRIEngine.jar , JRI.jar and REngine.jar . In my case all of them can be found in /usr/lib/R/site-library/rJava/jri/ , so the $CLASSPATH should be set like that:

export CLASSPATH=.:/usr/lib/R/site-library/rJava/jri/

If the $CLASSPATH isn’t defined correctly you won’t be able to compile your Java code.

Last but not least you have to add the native JRI-library to your $LD_LIBRARY_PATH , by default this lib is located in the same directory like the jar’s:

export LD_LIBRARY_PATH=/usr/lib/R/site-library/rJava/jri/

If the $LD_LIBRARY_PATH isn’t proper you’ll experience errors like this:

Cannot find JRI native library!
Please make sure that the JRI native library is in a directory listed in java.library.path.

java.lang.UnsatisfiedLinkError: no jri in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
        at java.lang.Runtime.loadLibrary0(Runtime.java:823)
        at java.lang.System.loadLibrary(System.java:1028)
        at org.rosuda.JRI.Rengine.<clinit>(Rengine.java:19)

To not always do the same you might write these export stuff to your .bashrc or .zshrc respectively.

Eclipse setup

Of course in Eclipse you’ll also have to define these three things. Where are the jar’s located? Add them to your libraries in Project > Properties > Java Build Path > Libraries. Instead of the $LD_LIBRARY_PATH you can set the java.library.path in Run > Run Configurations > Arguments. Add -Djava.library.path=.:/usr/lib/R/site-library/rJava/jri/ to the VM arguments (modify the path to match your criteria). The R_HOME can be published in Run > Run Configurations > Environment. Create a new variable with the name R_HOME and the value /usr/lib64/R (or an equivalent path). That’s it, see the section above to identify what went wrong if something fails.

Netbeans setup

Two of these three parts are also straight forward in Netbeans. First publish the location of the jar’s. Right-click on your project and choose Properties > Libraries. In the Compile-tab click Add JAR/Folder and search for the jar files. Next task is to adjust the library-path. Right-click on your project and choose Properties > Run. Add -Djava.library.path=.:/usr/lib/R/site-library/rJava/jri/ to the VM Options (modify the path to match your criteria). The third step is a little tricky. As far as I know there is no way to change the environment from within Netbeans, so you can’t create the variable R_HOME after Netbeans is started. In my opinion you have two options:

  1. Export the variable before starting Netbeans:
   usr@srv $ export R_HOME=/usr/lib64/R
   usr@srv $ netbeans
   

you might want to write a wrapper script that does this step for you, or include the export in any of the resource files that are called before Netbeans starts (e.g. your .bashrc ).

  1. Change the environment from within your project. At stackoverflow you can find a workaround, but I think this is a very lousy solution..

If you have further suggestions please let me know! Meanwhile George Bull published a setup guide for Netbeans on Windows hosts. Seems to be worthy to take a look at it ;-)

Testcase

If you defined your environment properly, you should be able to utilize the REngine. I have a small script for you to test whether all things are fine:

package de.binfalse.martin;

import org.rosuda.JRI.Rengine;

public class JRItest
{
  public static void main (String[] args)
  {
    // new R-engine
    Rengine re=new Rengine (new String [] {"--vanilla"}, false, null);
    if (!re.waitForR())
    {
      System.out.println ("Cannot load R");
      return;
    }

    // print a random number from uniform distribution
    System.out.println (re.eval ("runif(1)").asDouble ());

    // done...
    re.end();
  }

}

You should be able to compile and run it, afterwards you’ll see a random number from an uniform distribution. Congratulations, well done :-P

For more information see the JRI and rJava sites at RForge.net.

Download: Java: JRItest.java (Please take a look at the man-page. Browse bugs and feature requests.)

Readability vs speed in R

I have bad news for those of you trying to produce lucid code!

In his blog Radford M. Neal, Professor at the University of Toronto, published an article with the headline Two Surprising Things about R. He worked out, that parentheses in mathematical expression slow down the run-time dramatically! In contrast it seems to be less time consuming to use curly brackets. I verified these circumstances to be true:

> x=10
> f <- function (n) for (i in 1:n) 1/(1*(1+x))
> g <- function (n) for (i in 1:n) (((1/(((1*(((1+x)))))))))
> system.time(f(10^6))
   user  system elapsed 
  2.231   0.000   2.232 
> system.time(g(10^6))
   user  system elapsed 
  3.896   0.000   3.923 
> 
> # in contrast with curly brackets
> h <- function (n) for (i in 1:n) 1/{1*{1+x}}
> i <- function (n) for (i in 1:n) {{{1/{{{1*{{{1+x}}}}}}}}}
> system.time(h(10^6))
   user  system elapsed 
  1.974   0.000   1.974 
> system.time(i(10^6))
   user  system elapsed 
  3.204   0.000   3.228

As you can see adding extra parentheses is not really intelligent concerning run-time, and not in a negligible way. This fact shocked me, because I always tried to group expressions to increase the readability of my code! Using curly brackets speeds up the execution in comparison to parentheses. Both observations are also surprising to me! So the conclusion is: Try to avoid redundant parentheses and/or brackets!

To learn more about the why you are referred to his article. He also found a interesting observation about squares. In a further article he presents some patches to speed up R.

Damn scoping in R

Ok, R is very well-considered in certain respects, but there are also some things annoying me… This time it’s scoping…

Let’s have a look to the following code:

fun=function()
{
	if (runif(1) > .5)
		x = 1
	x
}

First it looks damn unspectacular. But wait, whats that:

> x=0
> fun()
[1] 1
> fun()
[1] 0

Taking a closer look to the function shows that the returned value is randomly chosen from local ( runif(1) > .5 ) or global scope ( runif(1) <= .5 ). So you can’t expect a result from this function. Nasty, especially while debugging external code, isn’t it? :-)

> sum(sapply(1:10^6, function (null) fun()))/10^6
[1] 0.499681

So again my advise: Think about such specific features! This won’t happen in any sensible language…

Auth issues

Sitting on an almost well configured host, I experienced some authentication issues the last few days…

So for example I’m using xtrlock as default X locking mechanism, but if I try to run it on this machine I got the following error:

/tmp % xtrlock
password entry has no pwd
1 /tmp %

Mmh, that is crap. My workaround to temporarily avoid this problem: Connecting to another host via SSH, running xtrlock within a GNU screen session ;-) But that’s no solution for a longer time… So I started debugging. First of all I grabbed the sources from the apt repository and searched for this error message. Turned out to be this piece of code (beginning with line 94 of xtrlock.c ):

errno=0;  pw= getpwuid(getuid());
  if (!pw) { perror("password entry for uid not found"); exit(1); }
#ifdef SHADOW_PWD
  sp = getspnam(pw->pw_name);
  if (sp)
    pw->pw_passwd = sp->sp_pwdp;
  endspent();
#endif

  /* logically, if we need to do the following then the same 
     applies to being installed setgid shadow.  
     we do this first, because of a bug in linux. --jdamery */
  setgid(getgid());
  /* we can be installed setuid root to support shadow passwords,
     and we don't need root privileges any longer.  --marekm */
  setuid(getuid());

  if (strlen(pw->pw_passwd) < 13) {
    fputs("password entry has no pwd\\n",stderr); exit(1);
  }

Ok, seems that the provided password(-hash) is shorter than 13 characters… Going on debugging, the content of pw comes from getpwuid(getuid()) and seems to be ok (matches my users profile like it can be found in /etc/passwd ). At this time (line 1) pw->pw_passwd contains only an single x , more information can’t be retrieved from the passwd -file.. Next the code checks whether SHADOW_PWD is defined, means whether we use an additional shadow -file. Since thats the case this code is executed and the variable sp gets the broken-out fields of the record in the shadow password database that matches the username pw->pw_name (validated, my user). Checking this sp variable I recognized that it is null ! So pw->pw_passwd won’t be updated and still contains the single x from the passwd entry… First I thought about a bug in the getspnam () function, such things might happen due to the Debian unstable release I’m using, but after some further thoughts I checked the shadow file itself:

/tmp % l /etc/shadow
-rw-r----- 1 root root 2673 Feb 16 15:49 /etc/shadow

In comparison with other systems with working xtrlock instances I figured out, that this file shouldn’t only be owned by root. Instead the group has to be shadow! So here is the solution to this issue:

/tmp % chgrp shadow /etc/shadow

And everything is working fine again. Have no idea what or who changed the permissions for the shadow-file…


Update: By the way, afterwards I tried to use Xscreensaver instead of xtrlock, but I wasn’t able to unlock the screen when the shadow rights are wrong. The /var/log/auth.log held messages like that:

Feb 17 10:14:32 HOST xscreensaver: pam_unix(xscreensaver:auth): conversation failed
Feb 17 10:14:32 HOST xscreensaver: pam_unix(xscreensaver:auth): auth could not identify password for [USER]

But this is just for google-searchers ;-)

Open Source DNA

Yesterday I was a bit confused when I read this tweet. Manu Sporny, founder and CEO of Digital Bazaar, announced in his blog that he has published his genome..

He send some saliva to 23andme, they analyzed his DNA and provided his genetic code to him (let’s neglect the discussion whether data from 23andme-chips represent a fully sequenced genome..). This process is very smart and not expensive, so this part of his announcement is not spectacular. Lot’s of people are doing so.

The interesting part of this article: He published the results (roughly 1 million SNP markers) from 23andme as open source project to github, licensed under CC0! So he has released all his rights on this data.

In general a very impressing step, he might be the first person who published its DNA under such a license. His intentions are more than exemplary, providing access to genetic data to everyone that wants to work with it, i.e. researchers.

So far, so good, but there are some disadvantages, he still dealt with some of it. For example, what if anybody uses this information against him? I.e. healthcare provider, they might deny him to avoid high costs because they detected some pre-existing conditions in his DNA. It may also affect employment and can lead to discrimination. His reaction:

I’ve thought long and hard about each of those questions and the many more that you ask yourself before publishing this sort of personal data. There are large privacy implications in doing this. However, speaking solely for myself, I think the benefits outweigh the drawbacks.

Very nice, but there are also some ugly implications he apparently didn’t thought about! All these disadvantages don’t only affect himself, they may also affect relatives (children, parents, siblings..). Did they all agree with this publication?

I can’t see the advantages to an anonymously publication. Attach some demographic information like age, gender, educational background and everyone is satisfied. Then you don’t have to bear any consequences with bugs in your DNA.

With all due respect for his engagement, I think this step is not really sophisticated.

Valentine's Day

Yes, it’s that time again, Feb 14th.. It’s Valentine’s Day.

Don’t know who has told my wife, but now I have to do some love, uuurgh..

How ever, this one is for my little valentine:

'            01110000  01101111
            01101100011011100110
           1001011100110110001101
            10100001100101001000
              0001101110011101
                010111010001
                 1101000110
                   010101
                    1011
                     10                                   '

Love you soo much, of course! ;-)

PS. If you are able to catch one of these flower or praline seller: beat the living daylights out of them!!

java.lang.OutOfMemoryError: Java heap space

I was just contacted concerning this Java memory problem, here is how you can get rid of it.

The amount of Ram for an Java application is limited by the JVM. To provide more memory to a single application you can start your Java process with two more parameters, like:

java -Xms1024m -Xmx1024m YOUR_JAVA_CALL

This allows Java to use up to 1024 MB. Here -Xms specifies the initial heap size, while -Xmx determines the maximum size. For machines with much more mem you might use g instead of m to set the size in gig’s. So -Xmx10g limits the amount of RAM to 10 GB.

Of course it’s annoying to apply these parameters to all your Java runs, so to change this behavior user-wide, you may create an alias like:

alias java='java -Xms1024m -Xmx1024m'

or better: Tell it to the Java Plugin Control Panel! Using XFCE you can find this tool in your panel’s menu in the Settings section. Gnome users may look in System > Preferences. If you don’t want to move your mouse you can also run ControlPanel from your terminal. This opens a window, default parameters can be applied in the tab Java, click View… and add your parameters to the Runtime Parameters column. This tool afterwards writes something like the following line to $HOME/.java/deployment/deployment.properties :

deployment.javaws.jre.0.args=-Xmx9234m -Xms9234m

So advanced users craving for trouble may edit this file on it’s own :-P

MySQL upgrade failed

Still upgrading some of our servers from lenny to squeeze, actually I run into MySQL trouble…

While upgrading from the package mysql-server 5.0.51a-24+lenny5 -> 5.1.49-3 aptitude told me the following:

Setting up mysql-server-5.1 (5.1.49-3) ...
Stopping MySQL database server: mysqld.
Starting MySQL database server: mysqld . . . . . . . . . . . . . . failed!
invoke-rc.d: initscript mysql, action "start" failed.
dpkg: error processing mysql-server-5.1 (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of mysql-server:
 mysql-server depends on mysql-server-5.1; however:
  Package mysql-server-5.1 is not configured yet.
dpkg: error processing mysql-server (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 mysql-server-5.1
 mysql-server

Mmh, a look into the /var/log/syslog pointed to the following errors:

Feb 11 20:50:11 HOST /etc/init.d/mysql[13219]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Feb 11 20:50:11 HOST /etc/init.d/mysql[13219]: ^G/usr/bin/mysqladmin: connect to server at 'localhost' failed
Feb 11 20:50:11 HOST /etc/init.d/mysql[13219]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Feb 11 20:50:11 HOST /etc/init.d/mysql[13219]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Feb 11 20:50:11 HOST /etc/init.d/mysql[13219]:
[...]
Feb 11 20:50:59 HOST mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59 [Note] Plugin 'FEDERATED' is disabled.
Feb 11 20:50:59 HOST mysqld: /usr/sbin/mysqld: Table 'mysql.plugin' doesn't exist
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59  InnoDB: Started; log sequence number 0 657837804
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59 [ERROR] /usr/sbin/mysqld: unknown option '--skip-bdb'
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59 [ERROR] Aborting
Feb 11 20:50:59 HOST mysqld:
Feb 11 20:50:59 HOST mysqld: 110211 20:50:59  InnoDB: Starting shutdown...
[...]
Feb 11 20:51:05 HOST mysqld: 110211 20:51:05  InnoDB: Shutdown completed; log sequence number 0 657837804
Feb 11 20:51:05 HOST mysqld: 110211 20:51:05 [Note] /usr/sbin/mysqld: Shutdown complete
Feb 11 20:51:05 HOST mysqld:
[...]
Feb 11 20:51:05 HOST mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
Feb 11 20:51:14 HOST /etc/init.d/mysql[13584]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Feb 11 20:51:14 HOST /etc/init.d/mysql[13584]: ^G/usr/bin/mysqladmin: connect to server at 'localhost' failed
Feb 11 20:51:14 HOST /etc/init.d/mysql[13584]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Feb 11 20:51:14 HOST /etc/init.d/mysql[13584]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Feb 11 20:51:14 HOST /etc/init.d/mysql[13584]:

Many messages at once.. To make a long story short the main problem is this line:

Feb 11 20:50:59 vs-inf-www mysqld: 110211 20:50:59 [ERROR] /usr/sbin/mysqld: unknown option '--skip-bdb'

So edit your /etc/mysql/my.cnf and comment the following line (in my configuration it’s line 94):

skip-bdb

That’s it, retry to configure the new version and everything will turn out all right.

Apache not parsing PHP files

I just had a confusing problem: instead of interpreting PHP-scripts in our webserver’s userdir apache serves them for download!

It’s caused by an upgrade from lenny to squeeze and I spend a lot of ours with fixing.

This is really a serious problem, these sites aren’t able to read for those people/search engines etc. that are browsing and, more fatal, if clients are able to access the PHP code of our students/staff they might explore security issues or passwords stored in these PHP files, so first of all I disabled the public access to the webserver.

So what was the problem? When I recognized that phpMyAdmin and other not userdir related stuff still works fine, I searched for issues that differ for userdirs. At long last I took a look into the libapache2-mod-php5 config file located in /etc/apache2/mods-available/php5.conf :

<IfModule mod_php5.c>
    <FilesMatch "\\.ph(p3?|tml)$">
        SetHandler application/x-httpd-php
    </FilesMatch>
    <FilesMatch "\\.phps$">
        SetHandler application/x-httpd-php-source
    </FilesMatch>
    # To re-enable php in user directories comment the following lines
    # (from <IfModule ...> to </IfModule>.) Do NOT set it to On as it
    # prevents .htaccess files from disabling it.
    <IfModule mod_userdir.c>
        <Directory /home/*/public_html>
            php_admin_value engine Off
        </Directory>
    </IfModule>
</IfModule>

As you can see, PHP is disabled if the userdir module was enabled… Disgusting! Commenting these lines out switched PHP for users on. Very annoying!