Reminded by unforgotten

Who doesn’t know, yesterday was your wife’s birthday and you didn’t bought some flowers yet!? Sounds like trouble, but these days are gone!

Some days ago I read an article at about the BSD tool calendar. I immediately realized, that this tool is the answer to one of my biggest questions: How to remember all the dates!?

Today I found some time to implement a small bash script that sends mails based on the output of the calendar tool. So it helps you remembering this periodical stuff.

On the software page for this script you can learn how to use it. It is more than simple ;-)

Now I need to collect all important dates, so if YOU want me to wish you all the best at your birthday you have to leave a comment or write me an email!

Download: Bash: unforgotten (Please take a look at the man-page. Browse bugs and feature requests.)

Hacking Shibboleth

Today I attended a workshop on Shibboleth, organized by the AAI team of the DFN. There are several problems I’ll explain in this posting.

What the hell is Shibboleth!?


Shibboleth is a system to provide a single sign-on (SSO) solution for different services. It is split into two modules, the Identity Provider (IdP), that knows the authentication stuff by an Identity Management (IdM) (e.g. a database like LDAP), and the Service Provider (SP), that has (generally) no knowledge about accounts of the users that make use of its services. One example may be a university (school, company etc.) as IdP, that provides accounts of its students and staff members, and a scientific journal (mail provider, library, e-learning platform etc.) as SP, that will offer copies to students. So the journal has to verify that the requesting user is a student or a staff. The actual system is either based on user authentication on each SP or on IP restrictions (e.g. only user from 141.48.x.x are allowed to download), so the users have to manage a lot of different accounts for any service or otherwise the SP’s have to maintain IP black- or whitelists. Of course this is an unsatisfying behavior. Here comes Shibboleth! It provides the communication between the IdP’s and SP’s, so a single user just has to have only one account at its IdP and is able to use all services of the SP’s that have arrangements with the users IdP.

Working principle

I don’t want to go into detail. Just for notice, it is based on XML messages via web, can be implemented via JAVA/C++/PHP, verification goes by certificates, a lot of restrictions… However, figure 1 illustrates the working principle. First the user requests a service of a SP (1), there are two possibilities:

  1. There is no active session on the SP, so the user is linked to a Discovery Service (DS) (2).
    • This DS lets the user choose its IdP in a pool of known IdP's (3). The DC may be implemented by this SP or it is provided by someone like the DFN.
    • The user chooses one of the IdP's and is linked to the website of this IdP (4) to authenticate itself (5).
    • The IdP decides whether it is a valid user or not (authentication by form, session based recognition or something like that), so again two possibilities (6):
      1. If it is a valid user, the IdP sends some user related stuff to the SP, so the SP knows it is a valid user.
      2. Otherwise the IdP informs the SP that the authentication has failed.
  2. If there is an active session (7), the SP already knows whether the user is allowed to request anything...


I’m not the person to evaluate that code, didn’t yet saw any, but I see some other problems not concerned to code exploits.

Centralizing IdM

The first problem isn’t that critical, but the current situation is that each SP (library, mail provider, computer pools etc.) has a single account for every user. Due to historical reason they are all disconnected, so it will be a hard job to combine all of them. But nevertheless it’s possible.

Privacy policies

Basically (yes, the instructors always said basically) the SP’s shouldn’t know anything about the user, except the validity. But they also mentioned that e.g. an e-learning platform might want to know whether the user has a prediploma or something like this, so they have to receive this information from the IdP. Of course this has to be controlled via contracts, but what if the SP wants also to know some grades or an mail address to communicate with a user!? You may not want to provide that much information to any SP.. That situation isn’t considered anywhere. It’s also a terrible thing, that the user doesn’t know what kind of information is offered by the IdP. In their demonstration one could see, that the SP perfectly receives the information from a IdP, by displaying this information (consisting of role at the IdP, mail address, given name, sure name and so on). So the possibility to send all LDAP attributes to the SP is undeniable there, who can promise that not all information will be transferred!? Remember: The provided data is verified! No chance for trashmail or something like this! I think a much better solution would be, if the IdP tells the user what attributes are requested by which SP before a user authenticates itself and thus commits the access to this data.


Yes, the good old fishing problem. I think it would be a very interesting experiment to build a fake SP, maybe called bamja, and pretend to offer music for free to students. Just authenticate as member of an university.. Yeah, cool thing! Just log in and I get any music I want!? But of course also the DS is faked, and why not, even the university website (maybe found at something like We all know, that not every student has that technological knowledge like us, so I think that try will catch a lot of people. And if there is really some music behind the faked authentication page, this user will probably tell its friend about this cool feature and you are able to catch a lot of accounts in a short period of time. Ahh, and, because we have this new feature, with this account you can access their mail accounts, you can request books in a library, you can buy lectures in an e-learning tool and so on! Maybe you can quit their university register!?


Micha immediately recognized the high DDoS potential. Imagine one single IdP (e.g. a university) and hundreds of SP’s (journals, libraries, software provider etc.). Every time you request something from one of these SP’s they send a big XML message to the IdP, containing lots of data (certs, web addresses and so on). So you just have to request some stuff from different clients to any of these SP’s and they will attack the IdP with that much data, that the IdP may fail parsing everything. The SP’s don’t recognize each other, and the IdP just sees different SP’s until it parses the XML, so there is no chance to block a request!? Isn’t that a nice scenario? ;-)


Of course the idea of SSO is very smart, but I don’t like what they build… And, by the way, I don’t really want that much cookie trash in my browser.

Reassembling logical operations on boolean vectors in Gnu R

What a headline.. It’s about combining boolean vectors in R.

I just had some problems with computing a boolean vector as a result of applying AND to two boolean vectors:

> x <- c(FALSE, TRUE, FALSE)
> y <- c(TRUE, TRUE, FALSE)
> x&&y

As you can see, it’s a nice result, but not what I want.. My hack was the following:

> # logical AND
> as.logical(x*y)
> # logical OR
> as.logical(x+y)

When Rumpel, my personal R-freak, saw that hack, he just laughed and told me the short version of this hack:

> # logical AND
> x&y
> # logical OR
> x|y

Nice, isn’t it ;)

Export R data to tex code

We often use Gnu R to work on different things and to solve various exercises. It’s always a disgusting job to export e.g. a matrix with probabilities to a document to send it to our supervisors, but Rumpel just gave me a little hint.

The trick is called xtable and it can be found in the deb repository:

aptitude install r-cran-xtable

It’s an add on for R and does right that what I need:

> library('xtable')
> m=matrix(rnorm(25,5,1),5,5)
> m
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 5.223797 4.921448 4.775009 5.253216 5.002215
[2,] 5.111304 6.761457 5.561525 5.693226 3.857417
[3,] 3.868195 3.759403 5.971332 4.240052 4.328775
[4,] 5.009473 4.624340 7.367284 3.844524 4.888032
[5,] 4.923996 5.239990 5.336282 5.264121 3.130824
> xtable(m)
% latex table generated in R 2.11.1 by xtable 1.5-6 package
% Tue Oct 12 11:35:50 2010
 & 1 & 2 & 3 & 4 & 5 \\
1 & 5.22 & 4.92 & 4.78 & 5.25 & 5.00 \\
  2 & 5.11 & 6.76 & 5.56 & 5.69 & 3.86 \\
  3 & 3.87 & 3.76 & 5.97 & 4.24 & 4.33 \\
  4 & 5.01 & 4.62 & 7.37 & 3.84 & 4.89 \\
  5 & 4.92 & 5.24 & 5.34 & 5.26 & 3.13 \\

It is not only limited to matrices and doesn’t only export to latex, but for further information take a look at ?xtable ;)

Btw. I just noticed that the GeSHi acronym for Gnu R syntax highlighting is rsplus

Lost my gitosis' post-update

Don’t ask me why neither how, but I’ve lost the gitosis’ post-update on my server. So, among others, the .gitosis.conf wasn’t updated anymore…

Ok, let’s start at the beginning. I’m hosting some git repositories on my server using gitosis. Today I tried to create a new repository by editing the gitosis.conf in the gitosis-admin repo, but I couldn’t push the new created repo to the server:

~ % git push origin master:refs/heads/master
ERROR:gitosis.serve.main:Repository read access denied
fatal: The remote end hung up unexpectedly

Damn, looking at the server all rights through the gitosis home seem to be ok. But I was wondering why the $HOME/.gitosis.conf of the gitosis user wasn’t updated!? How could this happen?

After some thoughts I found out, that the link $HOME/repositories/gitosis-admin.git/hooks/post-update was pointing to nirvana:

lrwxrwxrwx 1 gitosis gitosis 97 Aug 23  2009 /home/git/repositories/gitosis-admin.git/hooks/post-update -> /usr/share/python-support/gitosis/gitosis-0.2-py2.5.egg/gitosis/templates/admin/hooks/post-update

Seems that the file /usr/share/python-support/gitosis/gitosis-0.2-py2.5.egg/gitosis/templates/admin/hooks/post-update was deleted through an update (one week ago I updated gitosis 0.2+20090917-9 -> 0.2+20090917-10 ).. Didn’t find an announce of it or any workaround, but I identified a template in /usr/share/pyshared/gitosis/templates/admin/hooks/post-update . So the solution (hack) is to link to that file:

ln -s /usr/share/pyshared/gitosis/templates/admin/hooks/post-update $GITOSISHOME/repositories/gitosis-admin.git/hooks/post-update

Just replace $GITOSISHOME with the home of your gitosis user, mine lives for example in /home/git .

Now every think works fine. If anybody has a better solution please tell me!

Suffix trees

Suffix trees are an application to particularly fast implement many important string operations like searching for a pattern or finding the longest common substring.

Introduction and definitions

I already introduced the Z-Algorithm which optimizes the searching for a pattern by preprocessing the pattern. It is very useful if you have to search for one single pattern in a large number of words. But often you’ll try to find many patterns in a single text. So the preprocessing of each pattern is ineffective. Suffix trees come up with a preprocessing of the text, to speed up the search for any pattern.

As expected a suffix tree of the word (length ) is represented in a data structure of a rooted tree. Every path from root to a leave represents a suffix of $ with . The union . Every inner node (except root) has at least two children. Every edge is labeled with a string of , labels of leaving edges at a single node start with different symbols and each leaf is indexed with . The concatenation of all edge labels on a path from root to a leaf with index represents the suffix .

Definition 1: For each node :

The path from root to is called

  • The union of the labels at all edges on is
  • is label of path ()
  • is path label to ()
  • instead of we can call this node

Definition 2: A pattern exists in suffix tree of (further called ) if and only if there is a , so that contains a node with ().

Definition 3: A substring ends in node or in an edge to with .

Definition 4: A edge to a leaf is called leaf-edge.

The tree contains all suffixes of the word extended with $. This tree is visualized in figure 1.

Building suffix trees: Write only top down

The write-only, top-down (WOTD) algorithm constructs the suffix tree in a top-down fashion.


Let be a node in , then denotes the concatenation of all edge labels on the path to (). Each node in the suffix tree represents the set of all suffixes that have the prefix . So the set of pathlabels to leafs below can be written as (all suffixes of the set of suffixes that start with ).

This set is splitted in equivalence classes for each symbol with is the -group of .

Case 1: For groups that contain only one suffix we create a leaf with the index and connect it to with an edge containing label . Case 2: In groups with a size of at least two we compute their longest common prefix that starts with and create a node . The connecting edge between and gets the label and we continue recursively with this algorithm in node with


Exact pattern matching

All paths from the root of the suffix tree are labeled with the prefixes of path labels. That is, they’re labeled with prefixes of suffixes of the string . Or, in other words, they’re labeled with substrings of . To search for a pattern in , just go through , following paths labeled by the characters of . At any node with is prefix of find the edge with label that starts with symbol . If such an edge doesn’t exists, isn’t a substring of . Otherwise try to match the pattern with to node . If is not a prefix of you’ll either get a mismatch denoting that isn’t a substring of , or you ran out of caracters of and found it in the tree. If is a prefix of continue searching at node . If you were able to find in , contains at any position denoted by the indexes of leafs below your point of discovery.

Minimal unique substrings

is a minimal unique substring if and only if contains exactly once and any prefix of can be found at least two times in .

To find such a minimal unique substring walk through the tree to nodes with a leaf-edge . A minimal unique substring is with is the first char of , because its prefix isn’t unique ( has at least two leaving edges) and every extended version has a prefix that is also unique.

Maximal repeats

A maximal pair is a tuple , so that , but and . A maximal repeat is the string represented by such tuple. If is a maximal repeat there is a node in . To find the maximal repeats do a DFS on the tree. Label each leaf with the left character of the suffix that it represents. For each internal node:

  • If at least one child is labeled with c, then label it with c
  • Else if its children’s labels are diverse, label with c.
  • Else then all children have same label, copy it to current node.

Path labels to left-diverse nodes are maximal repeats.

Generalized suffix trees

An extension of suffix trees are generalized suffix trees. With it you can represent multiple words in one single tree. Of course you have to modify the tree, so that you know which leaf index corresponds to which word. Just a little bit more to store in the leafs ;) A generalized suffix tree is printed in figure 2 of page one.

Further applications

There are a lot of other applications for a suffix tree structure. For example finding palindromes, search for regular expressions, faster computing of the Levenshtein distance, data compression and so on…


I’ve implemented a suffix tree in Java. The tree is constructed via WOTD and finds maximal repeats and minimal unique substrings. I also wanted pictures for this post, thus, I added a functionality that prints GraphViz code that represents the tree.

Download: Java: Code from GitHub (Please take a look at the man-page. Browse bugs and feature requests.)

Cracked the centenary!

Wow, the new kernel is the 100th.

~ % grep '^kernel' /boot/grub/menu.lst | wc -l

(That was the reason why the dist-upgrade took more than an hour!^^)

Bye sidux - welcome aptosid!

I just dist-upgraded to aptosid, sidux is gone.

Yesterday the sidux team announced that sidux is dead, but the good news just followed:

As I am sure you are all aware, there have been interesting times for sidux recently. The bad news is that the sidux project is dead. The good news is that aptosid has been aptly born like a phoenix from the ashes and will provide a smooth upgrade for sidux systems. In many ways nothing has changed but our name.

After a long period of silence I already change the OS on my notebook from sidux to grml. It’s a very good alternative, but I decided to wait for an official announcement before deleting my main OS.

Now, the official announce is released, I upgraded my system to aptosid. First of all I created a aptosid.list in my /etc/apt/sources.list.d/ containing:

deb sid main fix.main contrib fix.contrib non-free fix.non-free vdr
deb-src sid main fix.main contrib fix.contrib non-free fix.non-free vdr

deb sid main fix.main contrib fix.contrib non-free fix.non-free vdr
deb-src sid main fix.main contrib fix.contrib non-free fix.non-free vdr

After wards calling aptitude update to reread the package listings. It notifies me that the new servers couldn’t be verified, I had to install the new keyring:

aptitude install aptosid-archive-keyring

It’s time to upgrade:

aptitude update
aptitude dist-upgrade

That fails at the first time:

Running update-initramfs.
update-initramfs: Generating /boot/initrd.img-2.6.35-4.slh.9-aptosid-amd64
) points to /boot/initrd.img-2.6.35-4.slh.9-aptosid-amd64
 (/boot/initrd.img-2.6.35-4.slh.9-aptosid-amd64) -- doing nothing at /var/lib/dpkg/info/linux-image-2.6.35-4.slh.9-aptosid-amd64.postinst line 348.
) points to /boot/vmlinuz-2.6.35-4.slh.9-aptosid-amd64
 (/boot/vmlinuz-2.6.35-4.slh.9-aptosid-amd64) -- doing nothing at /var/lib/dpkg/info/linux-image-2.6.35-4.slh.9-aptosid-amd64.postinst line 348.
Running /usr/sbin/update-grub.
Can't exec "/usr/sbin/update-grub": No such file or directory at /var/lib/dpkg/info/linux-image-2.6.35-4.slh.9-aptosid-amd64.postinst line 770.
User postinst hook script [/usr/sbin/update-grub] failed to execute: No such file or directory
dpkg: error processing linux-image-2.6.35-4.slh.9-aptosid-amd64 (--configure):
 subprocess installed post-installation script returned error exit status 255

As you see, /usr/sbin/update-grub wasn’t found. It is in grub-pc , but I have no idea why there isn’t a dependency so that grub-pc is installed by default!? Not fine but doesn’t matter, just install it:

aptitude install grub-pc
aptitude dist-upgrade

If this is done, just reboot and join the new kernel version 2.6.35 (slh is the greatest!!).

~ % uname -r

Jabber meets Twitter

This evening I implemented a XMPP bridge to twitter. So I’ll get all news via IM and can update my status by sending an IM to a bot.

Nothing new, I don’t like the twitter web interface. Neither to read, nor to write messages. So I developed some scripts to tweet from command line. These tools are still working, but not that comfortable as preferred.

Today I had a great thought. At Gajim I’m online at least 24/7, talking with people, getting news etc. So the comparison with twitter is obvious.

After some research how to connect to twitter and jabber I decided to implement the bot in Perl. I still worked a little bit with Net::Twitter, so one side of the connection is almost done. For the other side I used the module Net::Jabber::Bot to implement a bot listening for messages or commands and sending twitter news via IM to my jabber account. The call for the jabber bot looks like:

my $bot = Net::Jabber::Bot->new ({
	server => $j_serv
	, port => $j_port
	, username => $j_user
	, password => $j_pass
	, alias => $j_user
	, message_function => \\&messageCheck
	, background_function => \\&updateCheck
	, loop_sleep_time => 40
	, process_timeout => 5
	, forums_and_responses => {}
	, ignore_server_messages => 1
	, ignore_self_messages => 1
	, out_messages_per_second => 20
	, max_message_size => 1000
	, max_messages_per_hour => 1000});

$bot->SendPersonalMessage($j_auth_user, "hey i'm back again!");

Most of it should be clear, the function messageCheck is called when a new message arrives the bot’s jabber account. There I parse the text whether it starts with ! (then it’s a command) otherwise the bot schould take the message to update the twitter status. updateCheck is the background function, it’s called when the bot idles. Here is time to check for news at twitter. It is called loop_sleep_time secs.

The rest is merely a matter of form. News from twitter are jabber’ed, IM’s from the authorized user are twitter’ed. Cool, isn’t it!?

Just download the tool, create a new jabber account for the bot (you’ll get one for example from and update the jmt.conf file with your credentials. Of course you need the additional Perl modules, if you also experience various problems with Net::Jabber::Bot try to use the latest code from git://

The bot could simply be launched by running the Perl script. Send !help to the bot to get some information about known commands. Just start it at any server/PC that has a network connection.

What comes next? If anyone would provide a server I would like to implement a multiuser tool, maybe with database connectivity!?

Download: Perl: jmt.tgz please see GitHub for the latest version (Please take a look at the man-page. Browse bugs and feature requests.)

Advanced searching via Z-Algorithm

I’m actually learning some stuff related to algorithms on sequences. The naive search for a pattern in a long string is of course very slow and comes with a lot of unintelligent compares. The Z-Algorithm improves the searching by preprocessing the pattern.

Naive searching

A simple search algorithm written in java may look like

public void search (String needle, String haystack)
	for (int off = 0; off < haystack.length () - needle.length () + 1; off++)
		boolean found = true;
		for (int p = 0; p < needle.length (); p++)
			if (needle.charAt (p) != haystack.charAt (off + p))
				found = false;
		if (found) System.out.println ("Fount pattern at position " + off);

This code reliably finds any existence of needle in haystack in , with length of needle and length of haystack. That screams for improvements ;)


The first algorithm that I want to present in this series is called Z-Algorithm. First of all we need some definitions.

Definition 1: In the following we denote as the substring of beginning at position and ending at position . We can also leave one of the limits clear, so that is the substring and means .

Definition 2: So is the length of the longest prefix of the suffix that is also prefix of itself. To abbreviate is further on mentioned as .

Definition 3: The set for a is called Z-Box at position .

Definition 4: is the set of limits of all Z-Box’es that start at the left-handed side of . Consider .

Definition 5: If and , defines the rightest Z-Box that starts before respectively at position . Consider .


In the following will denote the actual position we are looking for, and describe the current respectively last found of a Z-Box. First of all we set the values and to zero because we haven’t found any Z-Box yet. of our text is according to Definition 2 the length of the longest prefix of that is also prefix of itself. If we found a first Z-Box and update the limits to and .

Now we have to run through the word , so with defines the length of .

Case 1: Let’s assume position is outside of the last found Z-Box or we didn’t find any Z-Box yet (). We find by comparing the prefixes of and . If we’ve found a new Z-Box and need to update the limits to and .

Case 2: If the current position is inside of a current Z-Box () we try to find the equivalent position at the beginning of . The position we are searching for is steps off the beginning of (we are steps behind and has the same prefix as ). Case 2a: If we don’t break out of the current Z-Box by creating another Z-Box with the length of the box at position (, so position is not behind position ), we can simply apply this Z-Box to the current position and . Case 2b: Otherwise, if we would leave the actual Z-Box () we have to recheck the prefix conditions of and . We know that equals , so we only have to find the length of the longest prefix of that equals the prefix of . Now we can apply the new Z-Box such that and of course we update the Z-Box limits to and .

If we reached the end of all Z-Boxes are found in .

Pseudo code

l = r = 0
Z[2] = prefix (S, S[2 ..]).length
if Z[2] > 0 then
	l = 2
	r = 2 + Z[2] - 1

for i = 3..|S| do
	if i > r then 										'(case 1)'
		Z[i] = prefix (S, S[i ..]).length
		if Z[i] > 0 then
			l = i
			r = i + Z[i] - 1

	else 												'(case 2)'
		k = i - l + 1
		if Z[k] < r - i + 1 then 						'(case 2a)'
			Z[i] = Z[k]

		else											'(case 2b)'
			p = prefix (S[r - i + 2 ..], S[r + 1 ..]).length
			Z[i] = r - i + 1 + p
			l = i
			r = i + Z[i] - 1


Let me demonstrate the algorithm with a small example. Let’s take the word . First we start with and at position 2. is the length of the shared prefix of () and (). Easy to see the prefix is with a length of 1. So , and . At the beginning of our for-loop the program’s status is:


At the first round in the loop , so because . So we meet case 1 and have to find the length of the prefix of () and (). Of course it’s zero, nothing to do.


Next round, we’re at position 4 and again (case 1). So we have to compare and . The longest prefix of both words is with a length of 2. So we start a new Z-Box at 4 with a size of 2, so and .


With and we reach case 2 for the first time. so our similar position at the beginning of is position 2. and so we are in case 2b and have to find the shared prefix of () and (). It’s , so and . and .


Next round brings us , therefor we’re in case 2. Equivalent position is again , but now and we’re in case 2a and can just set .


The last round we have to process is , case 2. Equivalent position is and , so case 2a and .


That’s it. The Z-Box’es we’ve found are visualized in the image.


To search for a pattern in a text just calculate the Z-Boxes of with . These calculations are done in . For any : If means is prefix of , so is found at position in .


Of course I’m providing an implementation, see attachment.

Download: Java: (Please take a look at the man-page. Browse bugs and feature requests.)