## Cracked the centenary!

Wow, the new kernel is the 100th.

(That was the reason why the dist-upgrade took more than an hour!^^)

## Bye sidux - welcome aptosid!

I just dist-upgraded to aptosid, sidux is gone.

Yesterday the sidux team announced that sidux is dead, but the good news just followed:

As I am sure you are all aware, there have been interesting times for sidux recently. The bad news is that the sidux project is dead. The good news is that aptosid has been aptly born like a phoenix from the ashes and will provide a smooth upgrade for sidux systems. In many ways nothing has changed but our name.

After a long period of silence I already change the OS on my notebook from sidux to grml. It’s a very good alternative, but I decided to wait for an official announcement before deleting my main OS.

Now, the official announce is released, I upgraded my system to aptosid. First of all I created a aptosid.list in my /etc/apt/sources.list.d/ containing:

After wards calling aptitude update to reread the package listings. It notifies me that the new servers couldn’t be verified, I had to install the new keyring:

That fails at the first time:

As you see, /usr/sbin/update-grub wasn’t found. It is in grub-pc , but I have no idea why there isn’t a dependency so that grub-pc is installed by default!? Not fine but doesn’t matter, just install it:

If this is done, just reboot and join the new kernel version 2.6.35 (slh is the greatest!!).

This evening I implemented a XMPP bridge to twitter. So I’ll get all news via IM and can update my status by sending an IM to a bot.

Nothing new, I don’t like the twitter web interface. Neither to read, nor to write messages. So I developed some scripts to tweet from command line. These tools are still working, but not that comfortable as preferred.

Today I had a great thought. At Gajim I’m online at least 24/7, talking with people, getting news etc. So the comparison with twitter is obvious.

After some research how to connect to twitter and jabber I decided to implement the bot in Perl. I still worked a little bit with Net::Twitter, so one side of the connection is almost done. For the other side I used the module Net::Jabber::Bot to implement a bot listening for messages or commands and sending twitter news via IM to my jabber account. The call for the jabber bot looks like:

Most of it should be clear, the function messageCheck is called when a new message arrives the bot’s jabber account. There I parse the text whether it starts with ! (then it’s a command) otherwise the bot schould take the message to update the twitter status. updateCheck is the background function, it’s called when the bot idles. Here is time to check for news at twitter. It is called loop_sleep_time secs.

The rest is merely a matter of form. News from twitter are jabber’ed, IM’s from the authorized user are twitter’ed. Cool, isn’t it!?

Just download the tool, create a new jabber account for the bot (you’ll get one for example from jabber.ccc.de) and update the jmt.conf file with your credentials. Of course you need the additional Perl modules, if you also experience various problems with Net::Jabber::Bot try to use the latest code from git://github.com/toddr/perl-net-jabber-bot.git.

The bot could simply be launched by running the Perl script. Send !help to the bot to get some information about known commands. Just start it at any server/PC that has a network connection.

What comes next? If anyone would provide a server I would like to implement a multiuser tool, maybe with database connectivity!?

I’m actually learning some stuff related to algorithms on sequences. The naive search for a pattern in a long string is of course very slow and comes with a lot of unintelligent compares. The Z-Algorithm improves the searching by preprocessing the pattern.

## Naive searching

A simple search algorithm written in java may look like

This code reliably finds any existence of needle in haystack in $$O(m \cdot n)$$, with $$m=$$ length of needle and $$n=$$ length of haystack. That screams for improvements ;)

## Definitions

The first algorithm that I want to present in this series is called Z-Algorithm. First of all we need some definitions.

Definition 1: In the following we denote $$S[i\dots j]$$ as the substring of $$S$$ beginning at position $$i$$ and ending at position $$j$$. We can also leave one of the limits clear, so that $$S[i\dots]$$ is the substring $$S[i\dots |S|]$$ and $$S[\dots j]$$ means $$S[1\dots j]$$.

Definition 2: $$Z_i(S) := \max \{p | S[i \dots i+p-1] = S[1 \dots p]\}$$ So $$Z_i(S)$$ is the length of the longest prefix of the suffix $$S[i\dots]$$ that is also prefix of $$S$$ itself. To abbreviate $$Z_i(S)$$ is further on mentioned as $$Z_i$$.

Definition 3: The set $$[i,i+Z_i-1]$$ for a $$Z_i > 0$$ is called Z-Box at position $$i$$.

Definition 4: $$V_i := \{[a_j, b_j] | [a_j, b_j] \text{ is Z-Box at } a_j \wedge a_j < i\}$$ $$V_i$$ is the set of limits of all Z-Box’es that start at the left-handed side of $$i$$. Consider $$i<j \Rightarrow V_i \subseteq V_j$$.

Definition 5: $$[l_i,r_i] := \begin{cases} \underset{b_j}{\arg\max} \ [a_j,b_j] \in V_i, & \text{if } V_i \ne \varnothing\\ [0,0] & \text{else}\end{cases}$$ If $$l_i>0$$ and $$r_i>0$$, $$[l_i,r_i]$$ defines the rightest Z-Box that starts before respectively at position $$i$$. Consider $$i<j \Rightarrow r_i\le r_j$$.

## Algorithm

In the following $$i$$ will denote the actual position we are looking for, $$l$$ and $$r$$ describe the current respectively last found of a Z-Box. First of all we set the values $$l$$ and $$r$$ to zero because we haven’t found any Z-Box yet. $$Z_2$$ of our text $$S$$ is according to Definition 2 the length of the longest prefix of $$S[2\dots]$$ that is also prefix of $$S$$ itself. If $$Z_2>0$$ we found a first Z-Box and update the limits to $$l=2$$ and $$r=2+Z_2-1$$.

Now we have to run through the word $$S$$, so $$i=3\dots \|S\|$$ with $$\|S\|$$ defines the length of $$S$$.

Case 1: Let’s assume position $$i$$ is outside of the last found Z-Box or we didn’t find any Z-Box yet ($$i>r$$). We find $$Z_i$$ by comparing the prefixes of $$S$$ and $$S[i\dots]$$. If $$Z_i>0$$ we’ve found a new Z-Box and need to update the limits to $$l=i$$ and $$r=i+Z_i-1$$.

Case 2: If the current position $$i$$ is inside of a current Z-Box ($$i\le r$$) we try to find the equivalent position at the beginning of $$S$$. The position we are searching for is $$k=i-l+1$$ steps off the beginning of $$S$$ (we are $$i-l+1$$ steps behind $$l$$ and $$S[l\dots]$$ has the same prefix as $$S$$). Case 2a: If we don’t break out of the current Z-Box by creating another Z-Box with the length of the box at position $$k$$ ($$Z_k<r-i+1$$, so position $$i+Z_k$$ is not behind position $$r$$), we can simply apply this Z-Box to the current position and $$Z_i=Z_k$$. Case 2b: Otherwise, if we would leave the actual Z-Box ($$i + Z_k>r$$) we have to recheck the prefix conditions of $$S[i\dots]$$ and $$S$$. We know that $$S[i\dots r]$$ equals $$S[1\dots r-i+1]$$, so we only have to find the length of the longest prefix $$p$$ of $$S[r-i+2\dots]$$ that equals the prefix of $$S[r+1\dots]$$. Now we can apply the new Z-Box such that $$Z_i=r-i+1+p$$ and of course we update the Z-Box limits to $$l=i$$ and $$r=i+Z_i-1$$.

If we reached the end of $$S$$ all Z-Boxes are found in $$\Theta(\|S\|)$$.

## Example

Let me demonstrate the algorithm with a small example. Let’s take the word $$S=aabaaab$$. First we start with $$l=0$$ and $$r=0$$ at position 2. $$Z_2$$ is the length of the shared prefix of $$S$$ ($$aabaaab$$) and $$S[2\dots]$$ ($$abaaab$$). Easy to see the prefix is $$a$$ with a length of 1. So $$Z_2=1$$, $$l=2$$ and $$r=2$$. At the beginning of our for-loop the program’s status is:

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a b a a a b 1 2 1 2 2

At the first round in the loop $$i=3$$, so $$i>r$$ because $$r=2$$. So we meet case 1 and have to find the length of the prefix of $$S$$ ($$aabaaab$$) and $$S[3\dots]$$ ($$baaab$$). Of course it’s zero, nothing to do.

$$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a a a a b 1 2 3 1 0 2 2 2 2

Next round, we’re at position 4 and again $$i>r$$ (case 1). So we have to compare $$aabaaab$$ and $$aaab$$. The longest prefix of both words is $$aa$$ with a length of 2. So we start a new Z-Box at 4 with a size of 2, so $$l=4$$ and $$r=5$$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 1 0 2 2 2 4 2 2 5

With $$i=5$$ and $$r=5$$ we reach case 2 for the first time. $$k=i-l+1=2$$ so our similar position at the beginning of $$S$$ is position 2. $$Z_2=1$$ and $$r-i+1=1$$ so we are in case 2b and have to find the shared prefix of $$S[2 ..]$$ ($$abaaab$$) and $$S[6 ..]$$ ($$ab$$). It’s $$ab$$, so $$p=2$$ and $$Z_5=r-i+1+p=3$$. $$l=5$$ and $$r=7$$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 1 0 2 3 2 2 4 5 2 2 5 7

Next round brings us $$i=6<r$$, therefor we’re in case 2. Equivalent position is again $$k=i-l+1=2$$, but now $$Z_2=1<r-i+1=2$$ and we’re in case 2a and can just set $$Z_6=1$$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 6 1 0 2 3 1 2 2 4 5 5 2 2 5 7 7

The last round we have to process is $$i=7<r$$, case 2. Equivalent position is $$k=i-l+1=3$$ and $$Z_3=0<r-i+1=1$$, so case 2a and $$Z_7 = 0$$.

 $$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a a 1 2 3 4 5 6 7 1 0 2 3 1 0 2 2 4 5 5 5 2 2 5 7 7 7

That’s it. The Z-Box’es we’ve found are visualized in the image.

## Searching

To search for a pattern $$P \in A^*$$ in a text $$T \in A^*$$ just calculate the Z-Boxes of $$P\T$$ with $$\\notin A$$. These calculations are done in $$\Theta(|T|)$$. For any $$i>|P|$$: If $$Z_i=|P|$$ means $$P\T[i\dots i+|P|-1]$$ is prefix of $$P\T$$, so $$P$$ is found at position $$i-(|P|+1)$$ in $$T$$.

## Code

Of course I’m providing an implementation, see attachment.

If you for example want to kill a hanging SSH session just type ~. . With ~^Z you can suspend a running session and get back to your local machine. To reactivate it just type fg (yes, the SSH session is also just a job). All supported escape sequences will be listed with ~? :