## Lost my gitosis' post-update

Don’t ask me why neither how, but I’ve lost the gitosis’ post-update on my server. So, among others, the .gitosis.conf wasn’t updated anymore…

Ok, let’s start at the beginning. I’m hosting some git repositories on my server using gitosis. Today I tried to create a new repository by editing the gitosis.conf in the gitosis-admin repo, but I couldn’t push the new created repo to the server:

Damn, looking at the server all rights through the gitosis home seem to be ok. But I was wondering why the $HOME/.gitosis.conf of the gitosis user wasn’t updated!? How could this happen? After some thoughts I found out, that the link $HOME/repositories/gitosis-admin.git/hooks/post-update was pointing to nirvana:

Seems that the file /usr/share/python-support/gitosis/gitosis-0.2-py2.5.egg/gitosis/templates/admin/hooks/post-update was deleted through an update (one week ago I updated gitosis 0.2+20090917-9 -> 0.2+20090917-10 ).. Didn’t find an announce of it or any workaround, but I identified a template in /usr/share/pyshared/gitosis/templates/admin/hooks/post-update . So the solution (hack) is to link to that file:

Just replace $GITOSISHOME with the home of your gitosis user, mine lives for example in /home/git . Now every think works fine. If anybody has a better solution please tell me! ## Suffix trees Suffix trees are an application to particularly fast implement many important string operations like searching for a pattern or finding the longest common substring. ## Introduction and definitions I already introduced the Z-Algorithm which optimizes the searching for a pattern by preprocessing the pattern. It is very useful if you have to search for one single pattern in a large number of words. But often you’ll try to find many patterns in a single text. So the preprocessing of each pattern is ineffective. Suffix trees come up with a preprocessing of the text, to speed up the search for any pattern. As expected a suffix tree of the word $S\in\Sigma^+$ (length $|S|=m$) is represented in a data structure of a rooted tree. Every path from root to a leave represents a suffix of $S\$$ with $\\notin\Sigma$. The union $\Sigma\cup\=\tilde{\Sigma}$. Every inner node (except root) has at least two children. Every edge is labeled with a string of $\tilde{\Sigma}^+$, labels of leaving edges at a single node start with different symbols and each leaf is indexed with $1\dots m$. The concatenation of all edge labels on a path from root to a leaf with index $i$ represents the suffix $S[i\dots m]$.

Definition 1: For each node $n$:

The path from root to $n$ is called $p$

• The union of the labels at all edges on $p$ is $\alpha\in\tilde{\Sigma}^*$
• $\alpha$ is label of path $p$ ($\alpha=\text{label}(p)$)
• $\alpha$ is path label to $n$ ($\alpha=\text{pathlabel}(n)$)
• instead of $n$ we can call this node $\overline \alpha$

Definition 2: A pattern $\alpha\in\Sigma^*$ exists in suffix tree of $S\in\Sigma^+$ (further called $ST(S)$) if and only if there is a $\beta\in\tilde{\Sigma}^*$, so that $ST(S)$ contains a node $n$ with $\alpha\beta=\text{pathlabel}(n)$ ($n=\overline{\alpha\beta}$).

Definition 3: A substring $\alpha$ ends in node $\overline \alpha$ or in an edge to $\overline{\alpha\beta}$ with $\beta\in\tilde{\Sigma}^*$.

Definition 4: A edge to a leaf is called leaf-edge.

The tree $ST(ababc)$ contains all suffixes of the word $ababc$ extended with $\$$. This tree is visualized in figure 1. ## Building suffix trees: Write only top down The write-only, top-down (WOTD) algorithm constructs the suffix tree in a top-down fashion. ### Algorithm Let $\overline \alpha$ be a node in $ST(S)$, then $\alpha$ denotes the concatenation of all edge labels on the path to $\overline \alpha$ ($\alpha=\text{pathlabel}(\overline \alpha)$). Each node $\overline \alpha$ in the suffix tree represents the set of all suffixes that have the prefix $\alpha$. So the set of pathlabels to leafs below $\overline \alpha$ can be written as $R(\overline \alpha)=\{\beta|\beta \in \tilde{\Sigma}^* \wedge \alpha\beta \text{ is suffix of }S\}$ (all suffixes of the set of suffixes that start with $\alpha$). This set is splitted in equivalence classes for each symbol $c$ with $G(\overline \alpha, c)=\{c\beta\|c\in \tilde{\Sigma}\wedge c\beta \in R(\overline \alpha)\}$ is the $c$-group of $R(\overline \alpha)$. Case 1: For groups $G(\overline \alpha, c)$ that contain only one suffix $\beta$ we create a leaf $\overline{\alpha\beta}$ with the index $|S| - |\alpha\beta|$ and connect it to $\overline \alpha$ with an edge containing label $\beta$. Case 2: In groups $G(\overline \alpha, c)$ with a size of at least two we compute their longest common prefix $\gamma$ that starts with $c$ and create a node $\overline{\alpha\gamma}$. The connecting edge between $\overline \alpha$ and $\overline{\alpha\gamma}$ gets the label $\gamma$ and we continue recursively with this algorithm in node $\overline{\alpha\gamma}$ with $R(\overline{\alpha\gamma})=\{\beta|\beta \in \tilde{\Sigma}^* \wedge \alpha\gamma\beta \text{ is suffix of }S\}=\{\beta|\gamma\beta\in R(\overline \alpha)\}$ ## Applications ### Exact pattern matching All paths from the root of the suffix tree are labeled with the prefixes of path labels. That is, they’re labeled with prefixes of suffixes of the string $S$. Or, in other words, they’re labeled with substrings of $S$. To search for a pattern $\delta$ in $S$, just go through $ST(S)$, following paths labeled by the characters of $\delta$. At any node $\overline\alpha$ with $\alpha$ is prefix of $\delta$ find the edge with label $\beta$ that starts with symbol $\delta[|\alpha|+1]$. If such an edge doesn’t exists, $\delta$ isn’t a substring of $S$. Otherwise try to match the pattern with $\beta$ to node $\overline{\alpha\beta}$. If $\alpha\beta$ is not a prefix of $\delta$ you’ll either get a mismatch denoting that $\delta$ isn’t a substring of $S$, or you ran out of caracters of $\delta$ and found it in the tree. If $\alpha\beta$ is a prefix of $\delta$ continue searching at node $\overline{\alpha\beta}$. If you were able to find $\delta$ in $ST(S)$, $S$ contains $\delta$ at any position denoted by the indexes of leafs below your point of discovery. ### Minimal unique substrings $\alpha$ is a minimal unique substring if and only if $S$ contains $\alpha$ exactly once and any prefix $\beta$ of $\alpha$ can be found at least two times in $S$. To find such a minimal unique substring walk through the tree to nodes $\overline\alpha$ with a leaf-edge $f$. A minimal unique substring is $\alpha c$ with $c$ is the first char of $label(f)$, because its prefix $\alpha$ isn’t unique ($\overline\alpha$ has at least two leaving edges) and every extended version $\alpha c \beta$ has a prefix that is also unique. ### Maximal repeats A maximal pair is a tuple $(p_1,p_2,l)$, so that $S[p_1\dots p_1 + l - 1] = S[p_2\dots p_2 + l - 1]$, but $S[p_1-1] \neq S[p_2-1]$ and $S[p_1+l] \neq S[p_2+l]$. A maximal repeat is the string represented by such tuple. If $\alpha$ is a maximal repeat there is a node $\overline\alpha$ in $ST(S)$. To find the maximal repeats do a DFS on the tree. Label each leaf with the left character of the suffix that it represents. For each internal node: • If at least one child is labeled with c, then label it with c • Else if its children’s labels are diverse, label with c. • Else then all children have same label, copy it to current node. Path labels to left-diverse nodes are maximal repeats. ### Generalized suffix trees An extension of suffix trees are generalized suffix trees. With it you can represent multiple words in one single tree. Of course you have to modify the tree, so that you know which leaf index corresponds to which word. Just a little bit more to store in the leafs ;) A generalized suffix tree is printed in figure 2 of page one. ### Further applications There are a lot of other applications for a suffix tree structure. For example finding palindromes, search for regular expressions, faster computing of the Levenshtein distance, data compression and so on… ## Implementation I’ve implemented a suffix tree in Java. The tree is constructed via WOTD and finds maximal repeats and minimal unique substrings. I also wanted pictures for this post, thus, I added a functionality that prints GraphViz code that represents the tree. Download: Java: Code from GitHub (Please take a look at the man-page. Browse bugs and feature requests.) ## Cracked the centenary! Wow, the new kernel is the 100th. (That was the reason why the dist-upgrade took more than an hour!^^) ## Bye sidux - welcome aptosid! I just dist-upgraded to aptosid, sidux is gone. Yesterday the sidux team announced that sidux is dead, but the good news just followed: As I am sure you are all aware, there have been interesting times for sidux recently. The bad news is that the sidux project is dead. The good news is that aptosid has been aptly born like a phoenix from the ashes and will provide a smooth upgrade for sidux systems. In many ways nothing has changed but our name. After a long period of silence I already change the OS on my notebook from sidux to grml. It’s a very good alternative, but I decided to wait for an official announcement before deleting my main OS. Now, the official announce is released, I upgraded my system to aptosid. First of all I created a aptosid.list in my /etc/apt/sources.list.d/ containing: After wards calling aptitude update to reread the package listings. It notifies me that the new servers couldn’t be verified, I had to install the new keyring: It’s time to upgrade: That fails at the first time: As you see, /usr/sbin/update-grub wasn’t found. It is in grub-pc , but I have no idea why there isn’t a dependency so that grub-pc is installed by default!? Not fine but doesn’t matter, just install it: If this is done, just reboot and join the new kernel version 2.6.35 (slh is the greatest!!). ## Jabber meets Twitter This evening I implemented a XMPP bridge to twitter. So I’ll get all news via IM and can update my status by sending an IM to a bot. Nothing new, I don’t like the twitter web interface. Neither to read, nor to write messages. So I developed some scripts to tweet from command line. These tools are still working, but not that comfortable as preferred. Today I had a great thought. At Gajim I’m online at least 24/7, talking with people, getting news etc. So the comparison with twitter is obvious. After some research how to connect to twitter and jabber I decided to implement the bot in Perl. I still worked a little bit with Net::Twitter, so one side of the connection is almost done. For the other side I used the module Net::Jabber::Bot to implement a bot listening for messages or commands and sending twitter news via IM to my jabber account. The call for the jabber bot looks like: Most of it should be clear, the function messageCheck is called when a new message arrives the bot’s jabber account. There I parse the text whether it starts with ! (then it’s a command) otherwise the bot schould take the message to update the twitter status. updateCheck is the background function, it’s called when the bot idles. Here is time to check for news at twitter. It is called loop_sleep_time secs. The rest is merely a matter of form. News from twitter are jabber’ed, IM’s from the authorized user are twitter’ed. Cool, isn’t it!? Just download the tool, create a new jabber account for the bot (you’ll get one for example from jabber.ccc.de) and update the jmt.conf file with your credentials. Of course you need the additional Perl modules, if you also experience various problems with Net::Jabber::Bot try to use the latest code from git://github.com/toddr/perl-net-jabber-bot.git. The bot could simply be launched by running the Perl script. Send !help to the bot to get some information about known commands. Just start it at any server/PC that has a network connection. What comes next? If anyone would provide a server I would like to implement a multiuser tool, maybe with database connectivity!? Download: Perl: jmt.tgz please see GitHub for the latest version (Please take a look at the man-page. Browse bugs and feature requests.) ## Advanced searching via Z-Algorithm I’m actually learning some stuff related to algorithms on sequences. The naive search for a pattern in a long string is of course very slow and comes with a lot of unintelligent compares. The Z-Algorithm improves the searching by preprocessing the pattern. ## Naive searching A simple search algorithm written in java may look like This code reliably finds any existence of needle in haystack in $O(m \cdot n)$, with $m=$ length of needle and $n=$ length of haystack. That screams for improvements ;) ## Definitions The first algorithm that I want to present in this series is called Z-Algorithm. First of all we need some definitions. Definition 1: In the following we denote $S[i\dots j]$ as the substring of $S$ beginning at position $i$ and ending at position $j$. We can also leave one of the limits clear, so that $S[i\dots]$ is the substring $S[i\dots |S|]$ and $S[\dots j]$ means $S[1\dots j]$. Definition 2: $Z_i(S) := \max \{p | S[i \dots i+p-1] = S[1 \dots p]\}$ So $Z_i(S)$ is the length of the longest prefix of the suffix $S[i\dots]$ that is also prefix of $S$ itself. To abbreviate $Z_i(S)$ is further on mentioned as $Z_i$. Definition 3: The set $[i,i+Z_i-1]$ for a $Z_i > 0$ is called Z-Box at position $i$. Definition 4: $% $ $V_i$ is the set of limits of all Z-Box’es that start at the left-handed side of $i$. Consider $% $. Definition 5: $% $ If $l_i>0$ and $r_i>0$, $[l_i,r_i]$ defines the rightest Z-Box that starts before respectively at position $i$. Consider $% $. ## Algorithm In the following $i$ will denote the actual position we are looking for, $l$ and $r$ describe the current respectively last found of a Z-Box. First of all we set the values $l$ and $r$ to zero because we haven’t found any Z-Box yet. $Z_2$ of our text $S$ is according to Definition 2 the length of the longest prefix of $S[2\dots]$ that is also prefix of $S$ itself. If $Z_2>0$ we found a first Z-Box and update the limits to $l=2$ and $r=2+Z_2-1$. Now we have to run through the word $S$, so $i=3\dots \|S\|$ with $\|S\|$ defines the length of $S$. Case 1: Let’s assume position $i$ is outside of the last found Z-Box or we didn’t find any Z-Box yet ($i>r$). We find $Z_i$ by comparing the prefixes of $S$ and $S[i\dots]$. If $Z_i>0$ we’ve found a new Z-Box and need to update the limits to $l=i$ and $r=i+Z_i-1$. Case 2: If the current position $i$ is inside of a current Z-Box ($i\le r$) we try to find the equivalent position at the beginning of $S$. The position we are searching for is $k=i-l+1$ steps off the beginning of $S$ (we are $i-l+1$ steps behind $l$ and $S[l\dots]$ has the same prefix as $S$). Case 2a: If we don’t break out of the current Z-Box by creating another Z-Box with the length of the box at position $k$ ($% $, so position $i+Z_k$ is not behind position $r$), we can simply apply this Z-Box to the current position and $Z_i=Z_k$. Case 2b: Otherwise, if we would leave the actual Z-Box ($i + Z_k>r$) we have to recheck the prefix conditions of $S[i\dots]$ and $S$. We know that $S[i\dots r]$ equals $S[1\dots r-i+1]$, so we only have to find the length of the longest prefix $p$ of $S[r-i+2\dots]$ that equals the prefix of $S[r+1\dots]$. Now we can apply the new Z-Box such that $Z_i=r-i+1+p$ and of course we update the Z-Box limits to $l=i$ and $r=i+Z_i-1$. If we reached the end of $S$ all Z-Boxes are found in $\Theta(\|S\|)$. ## Pseudo code ## Example Let me demonstrate the algorithm with a small example. Let’s take the word $S=aabaaab$. First we start with $l=0$ and $r=0$ at position 2. $Z_2$ is the length of the shared prefix of $S$ ($aabaaab$) and $S[2\dots]$ ($abaaab$). Easy to see the prefix is $a$ with a length of 1. So $Z_2=1$, $l=2$ and $r=2$. At the beginning of our for-loop the program’s status is: $$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a b a a a b 1 2 1 2 2 At the first round in the loop $i=3$, so $i>r$ because $r=2$. So we meet case 1 and have to find the length of the prefix of $S$ ($aabaaab$) and $S[3\dots]$ ($baaab$). Of course it’s zero, nothing to do. $$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a a a a b 1 2 3 1 0 2 2 2 2 Next round, we’re at position 4 and again $i>r$ (case 1). So we have to compare $aabaaab$ and $aaab$. The longest prefix of both words is $aa$ with a length of 2. So we start a new Z-Box at 4 with a size of 2, so $l=4$ and $r=5$. $$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 1 0 2 2 2 4 2 2 5 With $i=5$ and $r=5$ we reach case 2 for the first time. $k=i-l+1=2$ so our similar position at the beginning of $S$ is position 2. $Z_2=1$ and $r-i+1=1$ so we are in case 2b and have to find the shared prefix of $S[2 ..]$ ($abaaab$) and $S[6 ..]$ ($ab$). It’s $ab$, so $p=2$ and $Z_5=r-i+1+p=3$. $l=5$ and $r=7$. $$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 1 0 2 3 2 2 4 5 2 2 5 7 Next round brings us $% $, therefor we’re in case 2. Equivalent position is again $k=i-l+1=2$, but now $% $ and we’re in case 2a and can just set $Z_6=1$. $$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 6 1 0 2 3 1 2 2 4 5 5 2 2 5 7 7 The last round we have to process is $% $, case 2. Equivalent position is $k=i-l+1=3$ and $% $, so case 2a and $Z_7 = 0$.  $$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a a 1 2 3 4 5 6 7 1 0 2 3 1 0 2 2 4 5 5 5 2 2 5 7 7 7 That’s it. The Z-Box’es we’ve found are visualized in the image. ## Searching To search for a pattern $P \in A^*$ in a text $T \in A^*$ just calculate the Z-Boxes of $P\T$ with $\\notin A$. These calculations are done in $\Theta(|T|)$. For any $i>|P|$: If $Z_i=|P|$ means $P\T[i\dots i+|P|-1]$ is prefix of $P\T$, so $P$ is found at position $i-(|P|+1)$ in $T$. ## Code Of course I’m providing an implementation, see attachment. Download: Java: Zbox.java (Please take a look at the man-page. Browse bugs and feature requests.) ## SSH escape sequences Such as telnet the SSH protocol also has a control character, it’s the tilde (~). If you for example want to kill a hanging SSH session just type ~. . With ~^Z you can suspend a running session and get back to your local machine. To reactivate it just type fg (yes, the SSH session is also just a job). All supported escape sequences will be listed with ~? : All sequences are of course only understood after a newline ;) ## First HTML5 experiences Although I have too much to do it’s in the nick of time to try some stuff with HTML5. You should all have heard about HTML5, next generation of web ;) I still saw a lot of new features, some are still not supported in many browsers but all in all I’m looking forward. Here I played a little bit with the canvas stuff and created a binary clock: Wasn’t that difficult, just created an HTML element of type canvas with enough space in it to draw the clock: and via JavaScript I draw the clock in it: After wards just called init (); , that calls clock(); once a second to draw the clock. Please tell me whether it works in your browser ;) If anybody is interested, here is the code: html5_clock. If you also want to deal with it, Mozilla has a good tutorial. I hope this new age of web will delete all the flash trash out there! Download: Javascript: html5_clock.js (Please take a look at the man-page. Browse bugs and feature requests.) ## Umlauts on English keyboards Micha is just sitting next to me, writing a new blog post. He’s writing in German with an English keyboard, so he has to encode umlauts like ä with an &auml; . I can not watch any longer, here is the trick. Still blogged about it, you can create such additional keys with Xmodmap. So choose a key, get its key code for example with xbindkeys -k and create a file $HOME/.Xmodmap with the following syntax:

XXX ist the code of your key and YYY is that what should happen. For example:

That gives you an ä/Ä on the key with code 137 and so on. To let the file take effect just run xmodmap \$HOME/.Xmodmap . Btw xmodmap -pke will give you the actual running keymap. So Micha, no need to type to much ;)

Some of you may have recognized that twitter has disabled the so called Basic Authentication. So my previous twitter-tools don’t work anymore. But don’t bury your head in the sand, here are the newer versions.

But the new methods of API calls are more complicated (called “OAuthcalypse”) and I really don’t like them. But whoever listens to me?

If you now want to interact with the twitter API, you have to register your tool as new twitter tool. Don’t ask me why, but you have to choose an unique name (all over the twitter world) for your application and get some random strings. For example for a Perl script you need the ones called Consumer key and Consumer secret.

If you want to interact with twitter, you have to do the following:

<li>send the combination of <em>Consumer key</em> and <em>Consumer secret</em> to the server
<li>receive an URL from the server where the user itself can find a pin code (when (s)he is logged into twitter)
<li>send this code to the server again and the user is verified
<li>receive some more authentication information from the server, store it for the next time, so the user don't have to authenticate again


Very annoying method, but there is no alternative method and at least your account is more save against hijacker.

By the way I found a Perl module called Net::Twitter that helps a lot.

Here is my snippet to solve this authentication stuff:

Ok, you see it’s not impossible to solve this problem. And there is another advantage, with these two scripts I don’t have to provide my username/passwort any more.

Here is the script to tweet from command line and this script dumps the actual news to the console.

To use my tools just download them to your machine, rename them as you want and then just run it:

• To tweet something call tweet-v2.pl with your status message as argument.
• To get latest informations from the people you are following just call twitstat-v2.pl with an optional argument defining the maximal number of messages you want to see.

For the first time you’ll see a link where you’ll get your pin (open the link with your browser), after wards the tools will store your credentials in [toolname].credentials . Just try it, won’t (hopefully) break anything :P

Download: Perl: tweet-v2.pl (tweet from command line) Perl: twitstat-v2.pl (get latest news) (Please take a look at the man-page. Browse bugs and feature requests.)