## ShortCut[xtrlock]: Avoid Xscreensaver

By default Xfce provides screen-locking via Xscreensaver. Here is how you change it.

Xfce runs a script called xflock4 to lock the screen, to change the default behavior just foist another script on Xfce! The default path settings for searching for this executable shows, that /usr/local/bin has higher priority than /usr/bin (here is the original xflock4 located). The rest should be clear!

E.g. to use xtrlock instead of Xscreensaver you just have to link to the binary:

On a multiuser system you may allow each user to use it’s own locking-solution. So just write a script that checks if \$HOME/.screenlock is executable and runs it or falls back to a default screensaver:

Save it executable as /usr/local/bin/xflock4 - done…

## Homage to floating points

I recently got very close to the floating point trap, again, so here is a little tribute with some small examples!

Because Gnu R is very nice in suppressing these errors, all examples are presented in R.

Those of you that are ignorant like me, might think that 0.1 equals 0.1 and expect 0.1==0.1 to be true, it isn’t! Just see the following:

You might think it comes from the division, so you might expect seq(0, 1, by=0.1) == 0.3 contains exactly one vale that is TRUE !? Harrharr, nothing like that!

Furthermore, what do you think is the size of unique(c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)) !? Is it one? Not even close to it:

Your machine is that stupid, that it isn’t able to save such simple numbers ;) And another example should show you how these errors sum up:

As you can see, R tells you that you summed up to exactly one, suppressing the small numerical error. This error will increase with larger calculations! So be careful with any comparisons. To not fail the next time, for example use the R build-in function all.equal for comparison:

Or, if you’re dealing with integers, you should use round or as.integer to make sure they really are integers.

I hope I could prevent some of you falling into this floating point trap! So stop arguing about numerical errors and start caring for logical fails ;-)

Those of you interested in further wondering are referred to [Mon08].

# References

[Mon08]
David Monniaux. The pitfalls of verifying floating-point computations. ACM Trans. Program. Lang. Syst., 30(3):1–41, 2008. http://hal.archives-ouvertes.fr/hal-00128124/en/

## ShortCut[R]: locator

Welcome to my new category: ShortCut! Here I’ll shortly explain some smart features, unknown extensions or uncommon pathways of going for gold. Today it’s about the Gnu R tool locator.

With locator you are able to detect the mouse position inside you plot. Just run locator() and click some points, when you’re finished click the right button and locator will print the x - and y -values of the clicked positions. With this tool it’s possible to visually validate some numerical calculation.

With a little bit more code, you can output the coordinates right into you plot:

With a click into the plot you’ll be able to create a result like figure 1.

## RNA-Seq - introducing Galaxy

I’m actually attending a lecture with the great name RNA-Seq, dealing with next generation sequencing (NGS). I think the lecture is more or less addressed to biological scientist and people who are working with genome analyzers, but I think there is no harm in visiting this lecture and to get to know the biologists point of view.

These scientists are using different sequencing platforms. Some popular examples are Roche 454, Illumina/Solexa, ABI SOLiD, Pacific Biosciences PacBio RS, Helicos HeliScope™ Single Molecule Sequencer or Polonator, but there are much of more such platforms. If you are interested in these different techniques, you are referred to [Met09]. There is no standard, so all these machines produce output in different formats and quality. In general you’ll get a fastq file as result of sequencing. This file contains roughly more or less small reads of sequences and a quality score of each recognized nucleotide. The quality score is encoded in ASCII characters and contains four line types. Here is an example of such a file:

As you can see, in general the file contains an identifier line, starting with @ , the recognized sequence, a comment, starting with + , followed by the quality score for each base. It’s a big problem that there is no common standard for these quality scores, they differ in domain depending on the sequencing platform. So the original Sanger format uses PHRED scores ([EG98] and [EHWG98]) in an ASCII range 33-126 ( ! - ~ ), Solexa uses Solexa scores encoded in ASCII range 59-126 ( ; - ~ ) and with Illumina 1.3+ they introdused PHRED scores in an ASCII range 64-126 ( @ - ~ ). So you sometimes won’t be able to determine which format your fastq file comes from, the Illumina scores can be observed by all of this three example. If you want to learn more about fastq files and formats you are referred to [CFGHR10]. Interested readers are free to translate the ASCII coded quality scores of my small example to numerical quality scores and post the solution to the comment!

There is a great tool established to work with these resulting fastq files (this is just a small field of application): Galaxy. It is completely open source and written in Python. Those who already worked with it told me that you can easily extend it with plug-ins. You can choose wheter to run your own copy of this tool or to use the web platform of the Penn State. There’s a very huge ensemble of tools, I just worked with a small set of it, but I like it. It seems that you are able to upload unlimited size of data and it will never get deleted!? Not bad guys! You can share your data and working history and you can create workflows to automatize some jobs. Of course I’m excited to write an en- and decoder for other data like videos or music to and from fastq - let’s see if there’s some time ;-)

But this platform also has some inelegance’s. So there is often raw data presented in an raw format. Have a look at figure 1, you can see there is a table, columns are separated by tabs, but if one word in a column is much smaller/shorter as another one in this column this table looses the human readability! Here I’ve colorized the columns, but if the background is completely white, you have no chance to read it.

So instead of getting angry I immediately wrote a user-script. It adds a button on the top of pages with raw data and if it is clicked, it creates an HTML table of this data. You can see a resulting table in figure 2. If you think it is nice, just download it at the end of this article.

All in all I just can estimate what’s coming next!

# References

[CFGHR10]
Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, and Peter M. Rice. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6):1767–1771, April 2010. http://nar.oxfordjournals.org/content/38/6/1767.abstract
[EG98]
Brent Ewing and Phil Green. Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities. Genome Research, 8(3):186–194, March 1998. http://www.ncbi.nlm.nih.gov/pubmed/9521922
[EHWG98]
Brent Ewing, LaDeana Hillier, Michael C. Wendl, and Phil Green. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment. Genome Research, 8(3):175–185, March 1998. http://www.ncbi.nlm.nih.gov/pubmed/9521921
[Met09]
Michael L. Metzker. Sequencing technologies — the next generation. Nature Reviews Genetics, 11(1):31–46, December 2009. http://www.nature.com/nrg/journal/v11/n1/full/nrg2626.html

## Resort search results in SquirrelMail

Apart from an IMAP/POP service we provide a webmail front end to interact with our mail server via SquirrelMail. This tool has a very annoying feature, search results are ordered by date, but in the wrong way: From old to new!

SquirrelMail is a very simple to administrate front end, not very nice, but if my experimental icedove doesn’t work I use it too. Furthermore we have staff members, who only use this tool and aren’t impressed by real user-side clients like icedove or sylpheed.. What ever, I had to resort these search results!

Searching for a solutions doesn’t result in a solution, so I had three options: Modifying the SquirrelMail code itself (very bad idea, I know), providing a plugin for SquirrelMail, or writing a userscript.

Ok, hacking the core of SquirrelMail is deprecated, writing a plugin is to much work for now, so I scripted some JavaScript.

The layout of this website is lousy! I think the never heard of div’s or CSS, everything is managed by tables in tables of tables and inline layout specifications -.- So detecting of the right table wasn’t that easy. I had to find the table that contains a headline with the key From :

If I’ve found such a table, all the rows have to be sorted from last to first. Except the first ones defining the headline of that table. So I modified the DOM:

Ok, that’s it! Using this script the search results are ordered in the correct way. Let’s wait for a response from these nerdy SquirrelMail-user ;-)

## Welcome Rumpel!

Ladies and gentlemen, waiting is finally over. I’m proud to introduce Rumpel!

After more than one year of high frequently power of persuasion he finally set up his own blog named RforRocks. A few minutes after release there are even lots of myths surrounding the origin of that name. May it stand for:

Who knows? Me not! So try to grill Rumpel about this issue. And by the way, he’s in any case worthy to follow ;-) (so Rumpel, time for my payoff)

## Reminded by unforgotten

Who doesn’t know, yesterday was your wife’s birthday and you didn’t bought some flowers yet!? Sounds like trouble, but these days are gone!

Some days ago I read an article at chruetertee.ch about the BSD tool calendar. I immediately realized, that this tool is the answer to one of my biggest questions: How to remember all the dates!?

Today I found some time to implement a small bash script that sends mails based on the output of the calendar tool. So it helps you remembering this periodical stuff.

On the software page for this script you can learn how to use it. It is more than simple ;-)

Now I need to collect all important dates, so if YOU want me to wish you all the best at your birthday you have to leave a comment or write me an email!

## Hacking Shibboleth

Today I attended a workshop on Shibboleth, organized by the AAI team of the DFN. There are several problems I’ll explain in this posting.

## What the hell is Shibboleth!?

### Basics

Shibboleth is a system to provide a single sign-on (SSO) solution for different services. It is split into two modules, the Identity Provider (IdP), that knows the authentication stuff by an Identity Management (IdM) (e.g. a database like LDAP), and the Service Provider (SP), that has (generally) no knowledge about accounts of the users that make use of its services. One example may be a university (school, company etc.) as IdP, that provides accounts of its students and staff members, and a scientific journal (mail provider, library, e-learning platform etc.) as SP, that will offer copies to students. So the journal has to verify that the requesting user is a student or a staff. The actual system is either based on user authentication on each SP or on IP restrictions (e.g. only user from 141.48.x.x are allowed to download), so the users have to manage a lot of different accounts for any service or otherwise the SP’s have to maintain IP black- or whitelists. Of course this is an unsatisfying behavior. Here comes Shibboleth! It provides the communication between the IdP’s and SP’s, so a single user just has to have only one account at its IdP and is able to use all services of the SP’s that have arrangements with the users IdP.

### Working principle

I don’t want to go into detail. Just for notice, it is based on XML messages via web, can be implemented via JAVA/C++/PHP, verification goes by certificates, a lot of restrictions… However, figure 1 illustrates the working principle. First the user requests a service of a SP (1), there are two possibilities:

1. There is no active session on the SP, so the user is linked to a Discovery Service (DS) (2).
• This DS lets the user choose its IdP in a pool of known IdP's (3). The DC may be implemented by this SP or it is provided by someone like the DFN.
• The user chooses one of the IdP's and is linked to the website of this IdP (4) to authenticate itself (5).
• The IdP decides whether it is a valid user or not (authentication by form, session based recognition or something like that), so again two possibilities (6):
1. If it is a valid user, the IdP sends some user related stuff to the SP, so the SP knows it is a valid user.
2. Otherwise the IdP informs the SP that the authentication has failed.
2. If there is an active session (7), the SP already knows whether the user is allowed to request anything...

## Problems

I’m not the person to evaluate that code, didn’t yet saw any, but I see some other problems not concerned to code exploits.

### Centralizing IdM

The first problem isn’t that critical, but the current situation is that each SP (library, mail provider, computer pools etc.) has a single account for every user. Due to historical reason they are all disconnected, so it will be a hard job to combine all of them. But nevertheless it’s possible.

### Privacy policies

Basically (yes, the instructors always said basically) the SP’s shouldn’t know anything about the user, except the validity. But they also mentioned that e.g. an e-learning platform might want to know whether the user has a prediploma or something like this, so they have to receive this information from the IdP. Of course this has to be controlled via contracts, but what if the SP wants also to know some grades or an mail address to communicate with a user!? You may not want to provide that much information to any SP.. That situation isn’t considered anywhere. It’s also a terrible thing, that the user doesn’t know what kind of information is offered by the IdP. In their demonstration one could see, that the SP perfectly receives the information from a IdP, by displaying this information (consisting of role at the IdP, mail address, given name, sure name and so on). So the possibility to send all LDAP attributes to the SP is undeniable there, who can promise that not all information will be transferred!? Remember: The provided data is verified! No chance for trashmail or something like this! I think a much better solution would be, if the IdP tells the user what attributes are requested by which SP before a user authenticates itself and thus commits the access to this data.

### Fishing

Yes, the good old fishing problem. I think it would be a very interesting experiment to build a fake SP, maybe called bamja, and pretend to offer music for free to students. Just authenticate as member of an university.. Yeah, cool thing! Just log in and I get any music I want!? But of course also the DS is faked, and why not, even the university website (maybe found at something like auth.uni-halle.de.whatever.de). We all know, that not every student has that technological knowledge like us, so I think that try will catch a lot of people. And if there is really some music behind the faked authentication page, this user will probably tell its friend about this cool feature and you are able to catch a lot of accounts in a short period of time. Ahh, and, because we have this new feature, with this account you can access their mail accounts, you can request books in a library, you can buy lectures in an e-learning tool and so on! Maybe you can quit their university register!?

### DDoS

Micha immediately recognized the high DDoS potential. Imagine one single IdP (e.g. a university) and hundreds of SP’s (journals, libraries, software provider etc.). Every time you request something from one of these SP’s they send a big XML message to the IdP, containing lots of data (certs, web addresses and so on). So you just have to request some stuff from different clients to any of these SP’s and they will attack the IdP with that much data, that the IdP may fail parsing everything. The SP’s don’t recognize each other, and the IdP just sees different SP’s until it parses the XML, so there is no chance to block a request!? Isn’t that a nice scenario? ;-)

## Conclusion

Of course the idea of SSO is very smart, but I don’t like what they build… And, by the way, I don’t really want that much cookie trash in my browser.

## Reassembling logical operations on boolean vectors in Gnu R

I just had some problems with computing a boolean vector as a result of applying AND to two boolean vectors:

As you can see, it’s a nice result, but not what I want.. My hack was the following:

When Rumpel, my personal R-freak, saw that hack, he just laughed and told me the short version of this hack:

Nice, isn’t it ;)

## Export R data to tex code

We often use Gnu R to work on different things and to solve various exercises. It’s always a disgusting job to export e.g. a matrix with probabilities to a $\LaTeX$ document to send it to our supervisors, but Rumpel just gave me a little hint.

The trick is called xtable and it can be found in the deb repository:

It’s an add on for R and does right that what I need:

It is not only limited to matrices and doesn’t only export to latex, but for further information take a look at ?xtable ;)

Btw. I just noticed that the GeSHi acronym for Gnu R syntax highlighting is rsplus