Too much at once

Just installed a new Grml system, annoyingly from a bit too far outdated image so aptitude fails to handle everything at once…

Here is the error:

Reading package fields... 52%/usr/lib/ruby/1.8/debian/utils.rb:47:in 'pipe': Too many open files (Errno::EMFILE)
        from /usr/lib/ruby/1.8/debian/utils.rb:47:in 'pipeline'
        from /usr/lib/ruby/1.8/debian/utils.rb:86:in 'tar'
        from /usr/lib/ruby/1.8/debian.rb:142:in 'load'
        from /usr/lib/ruby/1.8/debian/utils.rb:75:in 'gunzip'
        from /usr/lib/ruby/1.8/debian/utils.rb:40:in 'pipeline'
        from /usr/lib/ruby/1.8/debian/utils.rb:72:in 'gunzip'
        from /usr/lib/ruby/1.8/debian.rb:141:in 'load'
        from /usr/lib/ruby/1.8/debian/ar.rb:150:in 'open'
        from /usr/lib/ruby/1.8/debian/ar.rb:147:in 'each'
        from /usr/lib/ruby/1.8/debian/ar.rb:147:in 'open'
        from /usr/lib/ruby/1.8/debian.rb:140:in 'load'
        from /usr/lib/ruby/1.8/debian.rb:82:in 'field'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:733:in 'field'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:751:in 'create'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:743:in 'each_index'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:743:in 'create'
        from /usr/sbin/apt-listbugs:323
/usr/lib/ruby/1.8/debian.rb:198:in 'parseFields': E: required field Package not found in  (Debian::FieldError)
        from /usr/lib/ruby/1.8/debian.rb:196:in 'each'
        from /usr/lib/ruby/1.8/debian.rb:196:in 'parseFields'
        from /usr/lib/ruby/1.8/debian.rb:439:in 'initialize'
        from /usr/lib/ruby/1.8/debian.rb:150:in 'new'
        from /usr/lib/ruby/1.8/debian.rb:150:in 'load'
        from /usr/lib/ruby/1.8/debian.rb:82:in 'field'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:733:in 'field'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:751:in 'create'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:743:in 'each_index'
        from /usr/share/apt-listbugs/apt-listbugs/logic.rb:743:in 'create'
        from /usr/sbin/apt-listbugs:323
E: Failed to fetch http://cdn.debian.net/debian/pool/main/k/krb5/libgssrpc4_1.8.3+dfsg-4_i386.deb: 404  Not Found
E: Sub-process /usr/sbin/apt-listbugs apt || exit 10 returned an error code (10)
E: Failure running script /usr/sbin/apt-listbugs apt || exit 10
A package failed to install.  Trying to recover:
Press return to continue.

Aha, too many open files.. So I had to install everything piecewise in a disturbing manner..

Btw. updating iptables 1.4.6-2 -> 1.4.10-1 before xtables-addons-common 1.23-1 -> 1.26-2 is a bad idea and fails for some reasons. So try to do it the other way round.

Crypto off

if you haven’t noticed yet: SSL is turned off…

Of course it isn’t really turned off, all content is still available through encrypted connections (all links are still working), but it’s disabled by default.

But why!? I got a lot of mails during the last weeks, telling me that there is a problem with my SSL cert. Yes, your browser is completely right, my cert isn’t valid because I’ve signed it by myself.. To get a trusted certificate that your browser recognizes to be valid is very expensive. For a cheap one I still have to pay about $100, that’s neither worthy nor affordable for me and my private blog. But I’m always interested in ideally offering secure mechanisms, so I tried to provide SSL. Another reason for SSL was my auth stuff. Wordpress doesn’t provide both SSL and SSL-free access. In an installation you have to decide whether to use https://... or http://... for URL’s. So all links are either to SSL encrypted content or the next click is unencrypted. Don’t ask me why they don’t check whether SSL was turned on/off for the last query and decide afterwards on using SSL for all further links.. However, I didn’t want to authenticate myself unencrypted and so I enabled SSL by default.

To be congenial to my visitors I turned off SSL, until somebody sponsors a valid certificate. There are also many disgusting tools having problems with my website, so it might be the better way to deliver unencrypted contend. The information on my site isn’t that secret ;-)

As a consequences you aren’t able to register/login anymore. I scripted a little bit to find a secure way for authenticating myself, but you aren’t allowed to take this path :-P Nevertheless, comments are still open and doesn’t require any authentication.

If you can find any SSL zombies please inform me!

ShortCut[siblings]: tail and its derivatives

Every text-tool-user should know about tail! You can print the last few lines of a file or watch it growing. But there are three improved derivatives, just get into it.

I think there is no need for further explanation of tail itself, so lets begin with the first derivative.

colortail

colortail is based on tail with support for colors, so it helps to keep track of important content. Common options and parameters are resembled closely to them of tail, so it won’t be a big adjustment to new circumstances for tail fans. The content that it presents is of course the same as if it comes from tail, but colorized ;) With -k you can additional submit a configuration file that defines some regular expressions and its colors. On a Debian some examples can be found in /usr/share/doc/colortail/examples/ . In figure 1 you can see an example output of colortail on the syslog of a virtual machine.

multitail

The second tool in this article is multitail. Like colortail it can colorize the output, but all is presented in a ncurses based user interface so it is able to create multiple windows on your console. If you open a file in multitail it’s automatically in a following mode ( -f in case of tail and colortail). If you are monitoring multiple log files your console is split horizontal or vertical or a mix of both. You can pause the output, search for regular expressions and a lot more. Enter F1 to get a small help window. Figure 2 presents a sample output. Its project page keeps much more information.

logtail

logtail pursues a different goal. It’s not interested in prettifying the output, it remembers the content that was still displayed and just prints the differences to the last run. So it is an ideal tool for log analyzer, log messages doesn’t have to be parsed multiple times. logtail is written in perl, you can also monitor logfiles on different machines.

I hope I could give you some smart inspirations.

OpenNIC DNS network

DNS look-ups are a very sensible topic. Of course you want very fast name-to-IP resolutions, but should you always use Google’s DNS server? After all they can keep track of all your network motion profile unless you are surfing by IP! Today I read about the OpenNIC Project and ran some speed tests. It’s very interesting and worthy to know about!

The project about itself:

OpenNIC (a.k.a. "The OpenNIC Project") is an organization of hobbyists who run an alternative DNS network. [...] Our goal is to provide you with quick and reliable DNS services and access to domains not administered by ICANN.

Ok, I gave it a try and implemented a Perl-script that checks the speed. It throws a dice to call one of my often used domains and digs1 each of my predefined DNS servers to save the query time. I tested the following DNS server:

  • 178.63.26.173 : one server of the OpenNIC project, located in Germany
  • 217.79.186.148 : one server of the OpenNIC project, located in Germany (NRW)
  • 8.8.8.8 : Google’s public DNS server, proven to be fast and reliable
  • 172.16.20.53 : my ISP’s server
  • 141.48.3.3 : name server of our university

Find the Perl code attached.

And here are the results after 10000 qeuries:

IPProvider10000 queries
172.16.20.53my ISP131989 ms
217.79.186.148OpenNIC259382 ms
8.8.8.8google270300 ms
178.63.26.173OpenNIC304094 ms
141.48.3.3NS of uni-halle.de394134 ms

As you can see, my ISP’s DNS server is the fastest, they may have optimized their internal infrastructure to provide very fast look-ups to its customers. But it is also nice to see, that there is one OpenNIC server that is faster than google! And this server comes with another feature: It doesn’t track any logs! Isn’t that great!?

To find some servers near you just check their server list. Some of them don’t record logs or anonymize them, and of course all of them are independent from ICANN administrations.

I can’t recommend to use any special DNS server, but I want to advise to test them and find the best one for your demands! Feel free to post your own test results via comment or trackback.

1 dig is part of the larger ISC BIND distribution

Download: Perl: pipapo/scripts/dns-bench.pl (Please take a look at the man-page. Browse bugs and feature requests.)

Boot messages to service console

You may have heard about management consoles!? If a server is dead you can revive it via service console without driving the long way to the data center (often miles away).

While logged into the service console you of course have the chance to reboot the machine itself. To get to know what it is doing while booting you may want to see all the messages that are usually prompted to the terminal at the attached monitor. Unfortunately you aren’t next to the machine, and so there is no monitor attached to it, but you can force grub to prompt all messages both to terminal and to service console.

First of all you have to setup the serial console:

serial --unit=0 --speed=57600 --word=8 --parity=no --stop=1

The --unit parameter determines the COM port, here it’s COM1, if you need COM2 you should use --unit=1 . --speed defines a baud rate of 57600 bps, see your manual. To learn more about the other parameter you are referred to the Grub manual for serial. Next you have to tell Grub where to write the output:

terminal --timeout=5 console serial

This line tells grub that there are two devices, the typical console on the attached screen and our previous defined serial console. With this directive Grub waits 5 seconds for any input from serial console or the attached keyboard and will print its menu to that device where the input was generated. That means if you’re at home and press any key, Grub will show you all outputs to your serial connection, but your student assistant (who had to go to the server, by bike while raining!!) isn’t able to see whats happening. But if your assistance is faster than you and hits a key on the physically attached keyboard, he’ll see anything and you’ll look through a black window… If nobody produces any input the output is written to that device that is listed first.

Last but not least you have to modify the kernel sections of the boot menu and append something like that at the end of every kernel line:

console=tty0 console=ttyS0

That tells grub that all kernel messages should be printed to both the real console of the attached screen and the serial console. Keep in mind to modify ttyS0 to match your serial port (here it is COM1). Grub decides for the device that is listed last to also send all stdin/stdout/stderr of the init process, that means only the last device will act as interactive terminal. E.g. checks of fsck are only printed to the last device, so stay calm if nothing happen for a long time on the other one ;-)

Here is a valid example for copy and paste:

# init serial console
serial --unit=0 --speed=57600 --word=8 --parity=no --stop=1
# what device to use for grub menu!?
terminal --timeout=5 console serial
# ....
title           Debian GNU/Linux, LOCAL CONSOLE
root            (hd0,0)
kernel          /vmlinuz-SOMEWHAT-openvz-amd64 root=UUID=AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE ro console=ttyS0 console=tty0
initrd          /initrd.img-SOMEWHAT-openvz-amd64

title           Debian GNU/Linux, LOCAL CONSOLE
root            (hd0,0)
kernel          /vmlinuz-SOMEWHAT-openvz-amd64 root=UUID=AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE ro console=tty0 console=ttyS0
initrd          /initrd.img-SOMEWHAT-openvz-amd64

Here both Grub entries are booting the same kernel, but the first one will use the local console as interactive terminal whether the other entry takes the serial console for interactions.

ShortCut[xtrlock]: Avoid Xscreensaver

By default Xfce provides screen-locking via Xscreensaver. Here is how you change it.

Xfce runs a script called xflock4 to lock the screen, to change the default behavior just foist another script on Xfce! The default path settings for searching for this executable shows, that /usr/local/bin has higher priority than /usr/bin (here is the original xflock4 located). The rest should be clear!

E.g. to use xtrlock instead of Xscreensaver you just have to link to the binary:

% ln /usr/bin/xtrlock /usr/local/bin/xflock4

On a multiuser system you may allow each user to use it’s own locking-solution. So just write a script that checks if $HOME/.screenlock is executable and runs it or falls back to a default screensaver:

#!/bin/bash

# default
DO=/usr/bin/xtrlock

# does user want smth else??
[ -x $HOME/.screenlock ] && DO=$HOME/.screenlock

$DO

Save it executable as /usr/local/bin/xflock4 - done…

Homage to floating points

I recently got very close to the floating point trap, again, so here is a little tribute with some small examples!

Because Gnu R is very nice in suppressing these errors, all examples are presented in R.

Those of you that are ignorant like me, might think that 0.1 equals 0.1 and expect 0.1==0.1 to be true, it isn’t! Just see the following:

> a=0.1
> b=0.3/3
> a
[1] 0.1
> b
[1] 0.1
> a==b
[1] FALSE

You might think it comes from the division, so you might expect seq(0, 1, by=0.1) == 0.3 contains exactly one vale that is TRUE !? Harrharr, nothing like that!

> seq(0, 1, by=0.1) == 0.3
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Furthermore, what do you think is the size of unique(c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)) !? Is it one? Not even close to it:

> unique(c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4))
[1] 0.3 0.3 0.3

Your machine is that stupid, that it isn’t able to save such simple numbers ;) And another example should show you how these errors sum up:

> sum=0
> for (i in 1:100) sum = sum + 0.01
> sum
[1] 1
> print(sum, digits=16)
[1] 1.000000000000001

As you can see, R tells you that you summed up to exactly one, suppressing the small numerical error. This error will increase with larger calculations! So be careful with any comparisons. To not fail the next time, for example use the R build-in function all.equal for comparison:

> unique(c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4))
[1] 0.3 0.3 0.3
> all.equal(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)
[1] TRUE

Or, if you’re dealing with integers, you should use round or as.integer to make sure they really are integers.

I hope I could prevent some of you falling into this floating point trap! So stop arguing about numerical errors and start caring for logical fails ;-)

Those of you interested in further wondering are referred to [Mon08].

References

[Mon08]
David Monniaux. The pitfalls of verifying floating-point computations. ACM Trans. Program. Lang. Syst., 30(3):1–41, 2008. http://hal.archives-ouvertes.fr/hal-00128124/en/

ShortCut[R]: locator

Welcome to my new category: ShortCut! Here I’ll shortly explain some smart features, unknown extensions or uncommon pathways of going for gold. Today it’s about the Gnu R tool locator.

With locator you are able to detect the mouse position inside you plot. Just run locator() and click some points, when you’re finished click the right button and locator will print the x - and y -values of the clicked positions. With this tool it’s possible to visually validate some numerical calculation.

With a little bit more code, you can output the coordinates right into you plot:

> x<-seq (0, 10, .01)
> plot (x, dgamma (x, rnorm (1, 2, 0.5), rnorm (1, 1, 0.5)), t='l', main='any curve', ylab='y')
> text (p<-locator (1), paste (p, collapse="\\n"), adj=0)

With a click into the plot you’ll be able to create a result like figure 1.

RNA-Seq - introducing Galaxy

I’m actually attending a lecture with the great name RNA-Seq, dealing with next generation sequencing (NGS). I think the lecture is more or less addressed to biological scientist and people who are working with genome analyzers, but I think there is no harm in visiting this lecture and to get to know the biologists point of view.

These scientists are using different sequencing platforms. Some popular examples are Roche 454, Illumina/Solexa, ABI SOLiD, Pacific Biosciences PacBio RS, Helicos HeliScope™ Single Molecule Sequencer or Polonator, but there are much of more such platforms. If you are interested in these different techniques, you are referred to [Met09]. There is no standard, so all these machines produce output in different formats and quality. In general you’ll get a fastq file as result of sequencing. This file contains roughly more or less small reads of sequences and a quality score of each recognized nucleotide. The quality score is encoded in ASCII characters and contains four line types. Here is an example of such a file:

@SRR039651.1 HWUSI-EAS291:8:1:1:356
TTTTGGTTTTANTTTTTAATAGGTAAATNNNNNNNT
+
BCCBAABCCC=!/=BCABB%%%%%%%%%!!!!!!!%
@SRR039651.2 HWUSI-EAS291:8:1:1:410
TGGTTTGGTTGNTATTGTGATGTATTTANNNNNNNT
+
BBB?@BBB@A0!0<B?.4B?BA?%%%%%!!!!!!!%
@SRR039651.3 HWUSI-EAS291:8:1:1:1018
TTAGTAGTGTTNGTAGAATTTTATTTGTNNNNNNNT
+
BBBB;AB?B@=!,5@B=@ABBB=B%%%%!!!!!!!%

As you can see, in general the file contains an identifier line, starting with @ , the recognized sequence, a comment, starting with + , followed by the quality score for each base. It’s a big problem that there is no common standard for these quality scores, they differ in domain depending on the sequencing platform. So the original Sanger format uses PHRED scores ([EG98] and [EHWG98]) in an ASCII range 33-126 ( ! - ~ ), Solexa uses Solexa scores encoded in ASCII range 59-126 ( ; - ~ ) and with Illumina 1.3+ they introdused PHRED scores in an ASCII range 64-126 ( @ - ~ ). So you sometimes won’t be able to determine which format your fastq file comes from, the Illumina scores can be observed by all of this three example. If you want to learn more about fastq files and formats you are referred to [CFGHR10]. Interested readers are free to translate the ASCII coded quality scores of my small example to numerical quality scores and post the solution to the comment!

There is a great tool established to work with these resulting fastq files (this is just a small field of application): Galaxy. It is completely open source and written in Python. Those who already worked with it told me that you can easily extend it with plug-ins. You can choose wheter to run your own copy of this tool or to use the web platform of the Penn State. There’s a very huge ensemble of tools, I just worked with a small set of it, but I like it. It seems that you are able to upload unlimited size of data and it will never get deleted!? Not bad guys! You can share your data and working history and you can create workflows to automatize some jobs. Of course I’m excited to write an en- and decoder for other data like videos or music to and from fastq - let’s see if there’s some time ;-)

But this platform also has some inelegance’s. So there is often raw data presented in an raw format. Have a look at figure 1, you can see there is a table, columns are separated by tabs, but if one word in a column is much smaller/shorter as another one in this column this table looses the human readability! Here I’ve colorized the columns, but if the background is completely white, you have no chance to read it.

So instead of getting angry I immediately wrote a user-script. It adds a button on the top of pages with raw data and if it is clicked, it creates an HTML table of this data. You can see a resulting table in figure 2. If you think it is nice, just download it at the end of this article.

All in all I just can estimate what’s coming next!

References

[CFGHR10]
Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, and Peter M. Rice. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6):1767–1771, April 2010. http://nar.oxfordjournals.org/content/38/6/1767.abstract
[EG98]
Brent Ewing and Phil Green. Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities. Genome Research, 8(3):186–194, March 1998. http://www.ncbi.nlm.nih.gov/pubmed/9521922
[EHWG98]
Brent Ewing, LaDeana Hillier, Michael C. Wendl, and Phil Green. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment. Genome Research, 8(3):175–185, March 1998. http://www.ncbi.nlm.nih.gov/pubmed/9521921
[Met09]
Michael L. Metzker. Sequencing technologies — the next generation. Nature Reviews Genetics, 11(1):31–46, December 2009. http://www.nature.com/nrg/journal/v11/n1/full/nrg2626.html
Download: JavaScript: galaxydatasetimprover.user.js (Please take a look at the man-page. Browse bugs and feature requests.)

Resort search results in SquirrelMail

Apart from an IMAP/POP service we provide a webmail front end to interact with our mail server via SquirrelMail. This tool has a very annoying feature, search results are ordered by date, but in the wrong way: From old to new!

SquirrelMail is a very simple to administrate front end, not very nice, but if my experimental icedove doesn’t work I use it too. Furthermore we have staff members, who only use this tool and aren’t impressed by real user-side clients like icedove or sylpheed.. What ever, I had to resort these search results!

Searching for a solutions doesn’t result in a solution, so I had three options: Modifying the SquirrelMail code itself (very bad idea, I know), providing a plugin for SquirrelMail, or writing a userscript.

Ok, hacking the core of SquirrelMail is deprecated, writing a plugin is to much work for now, so I scripted some JavaScript.

The layout of this website is lousy! I think the never heard of div’s or CSS, everything is managed by tables in tables of tables and inline layout specifications -.- So detecting of the right table wasn’t that easy. I had to find the table that contains a headline with the key From :

var tds = document.getElementsByTagName ('td');
var table = 0;
for (var i = 0; i < tds.length; i++)
{
	if(tds[i].innerHTML.match(/^\\s*<b>From<\\/b>\\s*$/))
	{
		table = tds[i].parentNode.parentNode;
		break;
	}
}

If I’ve found such a table, all the rows have to be sorted from last to first. Except the first ones defining the headline of that table. So I modified the DOM:

if (table)
{
	var old = table.cloneNode (true);
	var tru = false;
	var oldi = old.childNodes.length - 1;
	var tablelen = table.childNodes.length;
	for (var i = 0; i < tablelen; i++)
	{
		// don't sort the head to the end...
		if (!tru)
		{
			if (table.childNodes[i].innerHTML && table.childNodes[i].innerHTML.replace(/\\n/g,'').match (/<b>From<\\/b>.*<b>Date<\\/b>.*<b>Subject<\\/b>/))
				tru = true;
			continue;
		}
		table.replaceChild (old.childNodes[oldi--], table.childNodes[i]);
	}
}

Ok, that’s it! Using this script the search results are ordered in the correct way. Let’s wait for a response from these nerdy SquirrelMail-user ;-)

Download: JavaScript: squirrelmail_search_reorder (Please take a look at the man-page. Browse bugs and feature requests.)