I’m actually learning some stuff related to algorithms on sequences. The naive search for a pattern in a long string is of course very slow and comes with a lot of unintelligent compares. The Z-Algorithm improves the searching by preprocessing the pattern.

## Naive searching

A simple search algorithm written in java may look like

This code reliably finds any existence of needle in haystack in $O(m \cdot n)$, with $m=$ length of needle and $n=$ length of haystack. That screams for improvements ;)

## Definitions

The first algorithm that I want to present in this series is called Z-Algorithm. First of all we need some definitions.

Definition 1: In the following we denote $S[i\dots j]$ as the substring of $S$ beginning at position $i$ and ending at position $j$. We can also leave one of the limits clear, so that $S[i\dots]$ is the substring $S[i\dots |S|]$ and $S[\dots j]$ means $S[1\dots j]$.

Definition 2: $Z_i(S) := \max \{p | S[i \dots i+p-1] = S[1 \dots p]\}$ So $Z_i(S)$ is the length of the longest prefix of the suffix $S[i\dots]$ that is also prefix of $S$ itself. To abbreviate $Z_i(S)$ is further on mentioned as $Z_i$.

Definition 3: The set $[i,i+Z_i-1]$ for a $Z_i > 0$ is called Z-Box at position $i$.

Definition 4: $% $ $V_i$ is the set of limits of all Z-Box’es that start at the left-handed side of $i$. Consider $% $.

Definition 5: $% $ If $l_i>0$ and $r_i>0$, $[l_i,r_i]$ defines the rightest Z-Box that starts before respectively at position $i$. Consider $% $.

## Algorithm

In the following $i$ will denote the actual position we are looking for, $l$ and $r$ describe the current respectively last found of a Z-Box. First of all we set the values $l$ and $r$ to zero because we haven’t found any Z-Box yet. $Z_2$ of our text $S$ is according to Definition 2 the length of the longest prefix of $S[2\dots]$ that is also prefix of $S$ itself. If $Z_2>0$ we found a first Z-Box and update the limits to $l=2$ and $r=2+Z_2-1$.

Now we have to run through the word $S$, so $i=3\dots \|S\|$ with $\|S\|$ defines the length of $S$.

Case 1: Let’s assume position $i$ is outside of the last found Z-Box or we didn’t find any Z-Box yet ($i>r$). We find $Z_i$ by comparing the prefixes of $S$ and $S[i\dots]$. If $Z_i>0$ we’ve found a new Z-Box and need to update the limits to $l=i$ and $r=i+Z_i-1$.

Case 2: If the current position $i$ is inside of a current Z-Box ($i\le r$) we try to find the equivalent position at the beginning of $S$. The position we are searching for is $k=i-l+1$ steps off the beginning of $S$ (we are $i-l+1$ steps behind $l$ and $S[l\dots]$ has the same prefix as $S$). Case 2a: If we don’t break out of the current Z-Box by creating another Z-Box with the length of the box at position $k$ ($% $, so position $i+Z_k$ is not behind position $r$), we can simply apply this Z-Box to the current position and $Z_i=Z_k$. Case 2b: Otherwise, if we would leave the actual Z-Box ($i + Z_k>r$) we have to recheck the prefix conditions of $S[i\dots]$ and $S$. We know that $S[i\dots r]$ equals $S[1\dots r-i+1]$, so we only have to find the length of the longest prefix $p$ of $S[r-i+2\dots]$ that equals the prefix of $S[r+1\dots]$. Now we can apply the new Z-Box such that $Z_i=r-i+1+p$ and of course we update the Z-Box limits to $l=i$ and $r=i+Z_i-1$.

If we reached the end of $S$ all Z-Boxes are found in $\Theta(\|S\|)$.

## Example

Let me demonstrate the algorithm with a small example. Let’s take the word $S=aabaaab$. First we start with $l=0$ and $r=0$ at position 2. $Z_2$ is the length of the shared prefix of $S$ ($aabaaab$) and $S[2\dots]$ ($abaaab$). Easy to see the prefix is $a$ with a length of 1. So $Z_2=1$, $l=2$ and $r=2$. At the beginning of our for-loop the program’s status is:

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a b a a a b 1 2 1 2 2

At the first round in the loop $i=3$, so $i>r$ because $r=2$. So we meet case 1 and have to find the length of the prefix of $S$ ($aabaaab$) and $S[3\dots]$ ($baaab$). Of course it’s zero, nothing to do.

$$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a a a a b 1 2 3 1 0 2 2 2 2

Next round, we’re at position 4 and again $i>r$ (case 1). So we have to compare $aabaaab$ and $aaab$. The longest prefix of both words is $aa$ with a length of 2. So we start a new Z-Box at 4 with a size of 2, so $l=4$ and $r=5$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 1 0 2 2 2 4 2 2 5

With $i=5$ and $r=5$ we reach case 2 for the first time. $k=i-l+1=2$ so our similar position at the beginning of $S$ is position 2. $Z_2=1$ and $r-i+1=1$ so we are in case 2b and have to find the shared prefix of $S[2 ..]$ ($abaaab$) and $S[6 ..]$ ($ab$). It’s $ab$, so $p=2$ and $Z_5=r-i+1+p=3$. $l=5$ and $r=7$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 1 0 2 3 2 2 4 5 2 2 5 7

Next round brings us $% $, therefor we’re in case 2. Equivalent position is again $k=i-l+1=2$, but now $% $ and we’re in case 2a and can just set $Z_6=1$.

$$T$$ a $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a b 1 2 3 4 5 6 1 0 2 3 1 2 2 4 5 5 2 2 5 7 7

The last round we have to process is $% $, case 2. Equivalent position is $k=i-l+1=3$ and $% $, so case 2a and $Z_7 = 0$.

 $$T$$ b $$i$$ $$Z_i$$ $$l$$ $$r$$ a a b a a a 1 2 3 4 5 6 7 1 0 2 3 1 0 2 2 4 5 5 5 2 2 5 7 7 7

That’s it. The Z-Box’es we’ve found are visualized in the image.

## Searching

To search for a pattern $P \in A^*$ in a text $T \in A^*$ just calculate the Z-Boxes of $P\T$ with $\\notin A$. These calculations are done in $\Theta(|T|)$. For any $i>|P|$: If $Z_i=|P|$ means $P\T[i\dots i+|P|-1]$ is prefix of $P\T$, so $P$ is found at position $i-(|P|+1)$ in $T$.

## Code

Of course I’m providing an implementation, see attachment.

## SSH escape sequences

Such as telnet the SSH protocol also has a control character, it’s the tilde (~).

If you for example want to kill a hanging SSH session just type ~. . With ~^Z you can suspend a running session and get back to your local machine. To reactivate it just type fg (yes, the SSH session is also just a job). All supported escape sequences will be listed with ~? :

All sequences are of course only understood after a newline ;)

## First HTML5 experiences

Although I have too much to do it’s in the nick of time to try some stuff with HTML5.

You should all have heard about HTML5, next generation of web ;) I still saw a lot of new features, some are still not supported in many browsers but all in all I’m looking forward.

Here I played a little bit with the canvas stuff and created a binary clock:

Wasn’t that difficult, just created an HTML element of type canvas with enough space in it to draw the clock:

and via JavaScript I draw the clock in it:

After wards just called init (); , that calls clock(); once a second to draw the clock. Please tell me whether it works in your browser ;)

If anybody is interested, here is the code: html5_clock. If you also want to deal with it, Mozilla has a good tutorial.

I hope this new age of web will delete all the flash trash out there!

## Umlauts on English keyboards

Micha is just sitting next to me, writing a new blog post. He’s writing in German with an English keyboard, so he has to encode umlauts like ä with an &auml; . I can not watch any longer, here is the trick.

Still blogged about it, you can create such additional keys with Xmodmap. So choose a key, get its key code for example with xbindkeys -k and create a file $HOME/.Xmodmap with the following syntax: XXX ist the code of your key and YYY is that what should happen. For example: That gives you an ä/Ä on the key with code 137 and so on. To let the file take effect just run xmodmap$HOME/.Xmodmap . Btw xmodmap -pke will give you the actual running keymap. So Micha, no need to type to much ;)

Some of you may have recognized that twitter has disabled the so called Basic Authentication. So my previous twitter-tools don’t work anymore. But don’t bury your head in the sand, here are the newer versions.

But the new methods of API calls are more complicated (called “OAuthcalypse”) and I really don’t like them. But whoever listens to me?

If you now want to interact with the twitter API, you have to register your tool as new twitter tool. Don’t ask me why, but you have to choose an unique name (all over the twitter world) for your application and get some random strings. For example for a Perl script you need the ones called Consumer key and Consumer secret.

If you want to interact with twitter, you have to do the following:

<li>send the combination of <em>Consumer key</em> and <em>Consumer secret</em> to the server
<li>receive an URL from the server where the user itself can find a pin code (when (s)he is logged into twitter)
<li>send this code to the server again and the user is verified
<li>receive some more authentication information from the server, store it for the next time, so the user don't have to authenticate again


Very annoying method, but there is no alternative method and at least your account is more save against hijacker.

By the way I found a Perl module called Net::Twitter that helps a lot.

Here is my snippet to solve this authentication stuff:

Ok, you see it’s not impossible to solve this problem. And there is another advantage, with these two scripts I don’t have to provide my username/passwort any more.

Here is the script to tweet from command line and this script dumps the actual news to the console.

To use my tools just download them to your machine, rename them as you want and then just run it:

• To tweet something call tweet-v2.pl with your status message as argument.
• To get latest informations from the people you are following just call twitstat-v2.pl with an optional argument defining the maximal number of messages you want to see.

For the first time you’ll see a link where you’ll get your pin (open the link with your browser), after wards the tools will store your credentials in [toolname].credentials . Just try it, won’t (hopefully) break anything :P

Download: Perl: tweet-v2.pl (tweet from command line) Perl: twitstat-v2.pl (get latest news) (Please take a look at the man-page. Browse bugs and feature requests.)

## Userinteraction with Perl

Til today I scripted the user interactions in Perl by my own, but now I’ve learned there is an easier way to interact with the user.

The old way was something like this:

That does what I want it to do, but if you want more complex operations it’s somewhat difficult to hack it. If you want the user to choose something from a menu or to give you an integer, you have to write lots of code and you have to verify the input by your own. There is a Perl module called IO::Prompt to simplify this ( aptitude install libio-prompt-perl ). For example to get an integer from the user you can use this part of code:

The function prompt will print the string and waits for an input. When the user gives an input it will chomp it and verifies the input by your condition (here it tests whether the input is an integer). If the test fails it prints an error and gives the user a new chance to type a correct value until the conditions are complied. So you can be sure that the returned value is definitely an integer! Of course you can tell prompt to check for more difficult conditions, something like a regular expression. For example to get a hexadecimal value you can use this:

With -req this function expects a hash, it’s entries must match to the input or it will print the corresponding key as error message. As values you can pass functions that should return true if the input is correct, or a regular expression that must pattern match or something like this (see IO::Prompt). Here I’m using a regular expression that matches to hexadecimal input and if the user enters a correct input it’s converted to base 10. An example run might look like this:

Even menus are simple to realize. For example:

The freaks among you will try more complex menus. You are allowed to use hashes in hashes in arrays for your menu and prompt will lead the user through your options. You should know where to find further information about this :P

## Show all tags in WP when creating new post

I was annoyed that WordPress by default just shows 45 most used tags on the Add New Post page and found a solution to display all Tags.

After I create a new post in this blog I usually tag it. WordPress provides a very helpful widget that displays the most used tags, but I want to see all tags that I’ve created in the past. Some research through the net doesn’t bring solutions, so I had to walk through the code on my own. Wasn’t very difficult, it was clear that the tags come with Ajax to the site, and I found the code in wordpress/wp-admin/admin-ajax.php on line 616 (WordPress 3.0.1) or wordpress/wp-admin/includes/ajax-actions.php on line 666 (WordPress 3.6, see comments):

That is what you’ll carry by JavaScript. To get more tags just change this line to something like this:

You can also edit wordpress/wp-admin/includes/meta-boxes.php , original is:

If you change it to:

the link to get the tags will be called All Tags, not Choose from the most used tags.

I hope this could help some of you. With the next WordPress update these changes will be lost, but you should be able to do it again and maybe I’ll blog about it ;)

### Update for WordPress 3.6

You need to edit:

• wordpress/wp-admin/includes/ajax-actions.php line 666
• wordpress/wp-admin/includes/meta-boxes.php

(thanks to Gustavo Barreto)

### Update for WordPress 3.8.1

You need to edit:

• wordpress/wp-admin/includes/ajax-actions.php line 691
• wordpress/wp-admin/includes/meta-boxes.php line 381

(thanks to August for reminder)

### Update for WordPress 3.9.1

You need to edit:

• wordpress/wp-admin/includes/ajax-actions.php line 702
• wordpress/wp-admin/includes/meta-boxes.php line 410

### Update for WordPress 4.1

You need to edit:

• wordpress/wp-admin/includes/ajax-actions.php line 836
• wordpress/wp-admin/includes/meta-boxes.php line 431

## Increasing anonymity with Tor

Terrified I had to notice, that some of you don’t know Tor!? Here is a little intro, so you don’t have to die stupid.

When you for example request a website, the server that provides this site knows your IP address, with this address it’s able to detect your real location. It also get to know your UserAgent and a lot of other things like that. So the other site of your connection knows quite a lot of you, which system you’re working on, which browser you use, where (which website) do you come from and so on.. But is it essential to let the world know so much about you!? Of course not! By the way, think about the security issue ;)

So what to do!? One option is not to use the internet, only connect to servers you trust. But the better solution is to use Tor! Tor is a software to get anonymous network connection. It works like a big proxy. All around the world are Tor-server. When you try to connect to a webserver you won’t do it directly, but you will connect to a Tor access-node, this node is connecting further nodes, until an exit-node is reached. This exit-node will now send your initial request to the webserver, wait for a response and send this response on a way through the Tor-network back to your machine. The connections between the Tor nodes are encrypted and randomly chosen, so nobody is able to find the way your requests took through the Tor nodes. This process is called onion routing and is much more complicated than I described here, but it’s to much to talk about in detail.

## Setting up Tor

The setup is very easy. Just add the Tor repositories to your sources.list:

I for example added the following to my /etc/apt/sources.list.d/3rdparty.list :

After that add the GPG-Key of this repository:

And install the software:

If you now start Tor with /etc/init.d/tor start it is listening on 127.0.0.1:9050 . You also need a small proxy like privoxy:

It’s configuration is very easy, just tell privoxy to send the packages to Tor with the following in /etc/privoxy/config :

The rest of this file should be configured correctly.

That’s it! Everything that now reaches your proxy is finding its anonymous way through the Tor-network.

## Configuring client software

Now you have to force your software to use the proxy. The most important client software is probably your browser. For example in firefox (or iceweasel) you find the settings in Edit->Preferences->Advanced->Network->Settings and check Manual proxy configuration. Your proxy is 127.0.0.1 (or rather localhost) on port 8118 . Now your more anonymous, just ask a website where you come from. (at the moment I’m using an exit node from Russian Federation and the webserver recognizes me as Windows 7 user with Firefox 3.6 while using a sidux and iceweasel 3.5.11). Here you can verify that you Tor configuration is working. There are also some AddOns for firefox, that makes live easier. For example Torbutton or FoxyProxy. With it you can enable or disable the usage of Tor with a single mouse click.

But Tor is not only designed for browsers. You can configure a lot of software to go through Tor, for example gajim in Edit->Accounts->Your Account->Connection, or in opera with Settings->Preferences->Advanced->Network->Proxy Servers…. Nearly every thing that is able to connect the internet may be able to use your proxy. You can also activate the usage of your proxy by default by including the following line in your .bashrc or .zshrc or what ever:

## Conclusion

Tor is a very nice project, for further reading you may take a look on the projects website. If you hold a server that is contactable for the public you should think about providing an onion node on it! It’s very easy, but you should know about legal stuff.

Ok, when Micha saw my tiny hack he changed his implementation (as promised) and told me I’m not able to hack it again… Micha, your captcha failed again :P

Lets have another look to his code:

First of all he renamed the fields, so of course my last attack will fail :P The next problem is, that the hash for a 7 isn’t the hash for another 7 of a different calculation. Maybe he’s using the arithmetic problem or the time or other things for his hash calculation. So if we don’t know how his calculation for the hash is, the last attack is senseless… Btw he told me that he’s using encryption. If you’re bored, try to break it, it’s to much for me. But Micha bets three beer that I’m not able. So no chance to quit!

In my last post I had another idea to crack the captcha: Parse the formula. Ok, I wrote about parsing the URI to the external server that produces the picture, of course it’s much easier to parse the title - or alt -tag of the image! These fields are human readable to get the site handicapped accessible. Of course worthy that he provides this fields! So, after some reloads I had a small idea with what kind of problems I have to deal with:

Simple calculations
something like: $\sqrt{49} + 24 - 4^2$, just calculate the solution
Convertings
like $x + 82 = 192$, first convert the formula before calculating the solution
Sums
for example $\displaystyle\sum^{4}_{n=1} (2 \cdot n + 1)$, first rewrite the sum-symbol, than calculate the solution

That’s the theory, the code is this time a little bit longer:

As you can see, it’s a little bit tricky and just works for some mathematical formulas that are of interest. If he combines the converting problem with brackets or something like that, this code fails.. But the algorithm is easy to modify for such changes ;)

But respect, to crack my captcha you don’t need that intelligence, it’s feasible in much less code. I hope he doesn’t rewrite his plugin again, don’t want to calculate that stuff by brain…

Micha just implemented an own Captcha-Plugin for wordpress, I just cracked it some minutes later ;)

This version is deprecated, see Cracked next Captcha…

Micha was annoyed of his previous Captcha-Plugin, neither valid nor beautiful, so he decided to write his own tool for killing bots.

When I saw his new captchas I was wondering wheter he will get further comments. His captchas ask for solution of mathematical problems like $\sqrt{121} + 95$ or $228 \div 19$ or $\frac{136 - 61}{\sin(\pi \div 2)}$.. Who the hell wants to calculate that stuff!? Me not! ;)

So I developed a little userscript that solves this problem. When you take a look to the source code of his website you’ll find something like this:

So you see, there is an image created by an external server, an input field where you can put the solution and an input field of the type hidden with a crypt value (seems like a hash^^). The most of you will see several ways to hack this:

1. Parse the string of the image like the external server does to create the $\LaTeX$-image. So you’ll get an arithmetic problem, easy to solve.
2. Find out what kind of hash is in the value of the secret hidden input-field and try to find a number that matches that hash, maybe via brute force.
3. Solve one captcha and fake the rest ;)

Of course the last solution is the easiest one. So I solved on captcha, solution was 7 and the secret key was 9ee4251f80923e6239ae66ab50a357daa6039f04 , hack done!

The development of the userscript was more than simple:

I think that this script won’t work for a long time, so there is no download available ;) If you want to use it, copy&paste, you know.

Ähm, before anybody starts to blame me, a similar workaround kills also my captcha-solution… :P