I’m consuming quite some input from the internet everyday. A substantial amount of information arrives through podcasts, but much more essential are the 300+ RSS feeds that I’m subscribed to. I love RSS, it’s one of the best inventions in the world wide web!
However, there are alarming rumors and activities trying to get rid of RSS… We probably should all get our news filtered by Facebook or something..!? The importance of RSS, which allows users to keep track of updates on many different websites, seems to get continuously ignored.. And so does the new website of our University, where official RSS feeds aren’t provided anymore :(
Apparently, many people were already asking for RSS feeds of the University’s webpage. At least that’s what they told me, when I asked… But the company who built the pages won’t integrate RSS anymore - probably wasn’t listed in the requirements.. And the University wouldn’t touch the expensive website.
“Fortunatelly,” they stayed with Typo3 as the CMS, which we’ve been using as well - before we decided to switch. And this Typo3 platform can output the page’s content as RSS feed out of the box, you just need to know how! ;-)
And… I’ll tell you: Just append
?type=9818 to the URL.
That’s it! Really. It’s so easy.
Here are a few examples:
- Press releases as RSS feed: https://www.uni-rostock.de/universitaet/aktuelles/pressemeldungen/?type=9818
- Events as RSS feed: https://www.uni-rostock.de/universitaet/aktuelles/veranstaltungen/?type=9818
- Open positions as RSS feed: https://www.uni-rostock.de/stellen/wissenschaftliches-und-nichtwissenschaftliches-personal/?type=9818
- Open professorships as RSS feed: https://www.uni-rostock.de/stellen/professuren/?type=9818
- Events of the institute of computer science as RSS feed: https://www.informatik.uni-rostock.de/veranstaltungen/alle-veranstaltungen/?type=9818
Sure, it doesn’t work everywhere. If the editors maintain news as static HTML pages, Typo3 fails to export a proper RSS feed. It’s still better than nothing. And maybe it helps a few people…
The RSS icon was adapted from commons:Generic Feed-icon.svg.
Static websites are great and popular, see for example Brunch, Hexo, Hugo, Jekyll, Octopress, Pelican, and …. They are easy to maintain and their performance is invincible. But… As they are static, they cannot dynamically handle user input, which is an obvious requirement for every search engine.
Outsource the task
Lucky us, there are already other guys doing the search stuff pretty convincingly. So it’s just plausible to not reinvent the wheel, but instead make use of their services. There are a number of search engines, e.g. Baidu, Bing, Dogpile, Ecosia, Google, StartPage, Yahoo, Yippy, and more (list sorted alphabetically, see also Wikipedia::List of search engines). They all have pros and cons, but typically it boils down to a trade between coverage, up-to-dateness, monopoly, and privacy. You probably also have your favourite. However, it doesn’t really matter. While this guide focusses on DuckDuckGo, the proposed solution is basically applicable to all search engines.
The idea is, that you add a search form to your website, but do not handle the request yourself and instead redirect to an endpoint of a public search engine.
All the search engines have some way to provide the search phrase encoded in the URL.
Typically, the search phrase is stored in the GET varialble
q, for example
example.org/?q=something would search for
Thus, your form would redirect to
However, that would of course start a search for the given phrase on the whole internet!
Instead, you probably want to restrict the search results to pages from your domain.
Fortunatelly, the search engines typically also provide means to limit search results to a domain, or similar.
In case of DuckDuckGo it is for example the
site: operator, see also DuckDuckGo’s syntax.
That is, for my blog I’d prefix the search phrase with
Implementing the workaround is no magic, even though you need to touch your webserver’s configuration.
First thing you need to do is adding a search form to your website. That form may look like this:
As you see, the form just consists of a text field and a submit-button.
The data will be submitted to
/search on your website.
/search doesn’t exist on your website (if it exists you need to use a different endpoint), but we’ll configure your web server to do the remaining work.
The web server needs to do two things: (1) it needs to prefix the phrase with
site:your.domain and (2) it needs to redirect the user to the search engine of your choice.
Depending on the web server you’re using the configuration of course differs.
My Nginx configuration, for example, looks like this:
So it sends the user to
duckduckgo.com, with the query string
site:binfalse.de concatenated to the submitted search phrase (
$arg_q = the
q variable of the original GET request).
If you’re running an Apache web server, you probably know how to achieve the same over there.
Otherwise it’s a good opportunity to look again into the manual ;-)
Furthermore, the results pages of DuckDuckGo can be customised to look more closely like your site.
You just need to send a few more URL parameters with the query, such as
kj for the header color or
k7 for the background color.
The full list of available configuration options are available from DuckDuckGo settings via URL parameters.
In conclusion, if you use my search form to search for
docker, you’ll be guided to
The Nginx delivering my website will then redirect you to
https://duckduckgo.com/?q=site%3Abinfalse.de+docker, try it yourself:
search for docker!
I’ve been using it for my calendars and adressbooks already for more than 4 years now. However, I initially installed it as plain PHP application with a MySQL database. The developers also announced quite early, that they are working on a Docker image, but there is nothing useful as of mid 2018. So far they just provide a quite inconvenient how-to and a list of issues that apparently prevent them from providing a proper Docker image. Thus, I just dockerised the application myself :)
The Docker image
Actually, creating a Docker image for Baïkal was super easy. In the end, it is “only” a PHP application ;-) The corresponding Dockerfile can be found in the root directory of Baïkal’s git repository (at least in my fork). The latest version at the time of writing is:
So, it basically
- installs some dependencies through
- installs the PDO-MySQL extension,
- installs composer,
- adds the Baikal sources into the image,
- and finally installs remaining Baikal dependencies through composer.
I distribute the image as binfalse/baikal.
Using the Docker image
Using the image is fairly simple.
Basically, you only need to mount some persistent space to
docker run -it --rm -p 80:80 -v /path/to/persistent:/var/www/Specific binfalse/baikal
Please make sure that the directory
/path/to/persistent has proper permissions.
In the container an Apache2 is serving the contents, so make sure the user
33) is allowed to
rwx that directory.
To start with, you can use the original Specific directory from the Baïkal repository.
Then head to your Baikal instance (which will probably redirect to
BASEURL/admin/install), and setup your server.
Every configuration will be stored in the mounted volume at
To support encrypted connections you would need to mount the certificates as well as a modified Apache configuration into the container. However, I recommend to run it behind a reverse proxy, such as binfalse/nginx-proxy, and let the proxy handle all SSL connections (as for all other containers). This way, you just need one proper SSL configuration.
The default SQLite database is perfect for a first test, but is slow and just allows for a limited amount of SQL variables. If you for example have more than 999 contacts, the first sync of a clean WebDAV device will result in an exception such as:
PDOException: SQLSTATE[HY000]: General error: 1 too many SQL variables
Thus, for production you may want to switch to a proper database, such as MariaDB. Lucky you, the Docker image supports MySQL! ;-)
To reproducibly assemble both containers, I recommend Docker-Compose.
Here is a sample config with two containers
This assumes, that your Baikal configuration can be found in
The database will be stored in
Also note the database credentials for configuring Baikal.
If you’re not running a reverse proxy in front of the application, you also need to add some port forwarding for the
PLEASE NOTE: sSMTP is not maintained anymore! Please switch to
msmtp, for example, as I explained in Migrating from sSMTP to msmtp.
In a typical Docker environment you’ll have plenty of containers (probably in multiple networks?) on the same machine. Let’s assume, you need to debug some problems of a container, eg. because it doesn’t send mails anymore.. What would you do? Correct, you’d go and check the logs.
By default, Docker logs the messages of every container into a json file.
On a Debian-based system you’ll probably find the file at
However, to properly look into the logs you would use Docker’s logs tool.
This will print the logs, just as you would expect
cat to dump the logs in
docker-logs can also filter for time spans using
--until, and it is able to emulate a
tail -f with
However, the logs are only available for exsiting containers.
That means, if you recreate the application (i.e. you recreate the container), you’ll typically loose the log history…
If your workflow includes the
--rm, you will immediately trash the log of a container when it’s stopped.
Fortunatelly, Docker provides other logging drivers, to e.g. log to AWS, fluentd, GPC, and to good old syslog! :)
Here I’ll show how to use the host’s syslog to manage the logs of your containers.
Log to Syslog
Telling Docker to log to the host’s syslog is really easy.
You just need to use the built-in
Voilà, the container will log to the syslog and you’ll probably find the messages in
Here is an example of an Nginx, that I just started to serve my blog on my laptop:
By default, the syslog driver uses the container’s ID as the syslog tag (here it is
but you can further configure the logging driver and, for example, set a proper syslog tag:
This way, it is easier to distinguish between messages from different containers and to track the logs of an application even if the container gets recreated:
Here, I configured an nxinx that just serves the contents from
The interesting part is, however, that the container uses the
syslog driver and the syslog tag
I always prefix the tag with
docker/, to distinguish between log entries of the host machine and entries from Docker containers..
Store Docker logs seperately
The workaround so far will probably substantially spam your
/var/log/syslog, which may become very annoying… ;-)
Therefore, I recommend to write Docker’s logs to a seperate file. If you’re for example using Rsyslog, you may want to add the following configuration:
Just dump the snippet to a new file
/etc/rsyslog.d/docker.conf and restart Rsyslog.
This rule tells Rsyslog to write messages that are tagged with
/var/log/docker, and not to the default syslog file anymore.
/var/log/syslog stays clean and it’s easier do monitor the Docker containers.
Disentangle the Container logs
Since version 8.25, Rsyslog can also be used to split the docker logs into individual files based on the tag.
So you can create separate log files, one per container, which is even cleaner!
The idea is to use the tag name of containers to implement the desired directory structure.
That means, I would tag the webserver of a website with
docker/website/webserver and the database with
We can then tell Rsyslog to allow slashes in program names (see the programname section at www.rsyslog.com/doc/master/configuration/properties.html) and create a template target path for Docker log messages, which is based on the programname:
Using that configuration, our website will log to
Neat, isn’t it? :)
Even though all the individual logfiles will be smaller than a combined one, they will still grow in size. So we should tell logrotate of their existence!
Fortunatelly, this is easy as well.
Just create a new file
/etc/logrotate.d/docker containing something like the following:
This will rotate the files ending in
/var/log/docker/ and its subdirectories everyday and keep compressed logs for 7 days. Here I’m using a maximum depth of 3 subdirectories – if you need to create a deeper hierarchy of directories just add another
/var/log/docker/*/*/*/*.log etc to the beginning of the file.
This article is based on Contao 3. There is a new version, see Dockerising Contao 4
A central idea of Docker is to install the application in an image and mount persistent files into a running container. Thus, you can just throw away an instance of the app and start a new one very quickly (e.g. with an updated version of the app). Unfortunately, using Contao it’s not that straight-forward – at least when using the image decribed earlier.
Here I’m describing how I fought the issues:
Issues with Cron
The first issue was Contao’s Poor-Man-Cron. This cron works as follows:
- The browser requests a file
cron.txt, which is supposed to contain the timestamp of the last cron run.
- If the timestamp is “too” old, the browser will also request a
cron.php, which then runs overdue jobs.
- If a job was run, the timestamp in
cron.txtwill be updated, so
cron.phpwon’t be run every time.
Good, but that means the
cron.txt will only be written, if a cron job gets executed.
But let’s assume the next job will only be run next week end!?
The last cron-run-time is stored in the database, but the
cron.txt won’t exist by default.
That means, even if the
cron.php is run, it will know that there is no cron job to execute and, therefore, exit without creating/updating the
Especially when using Docker you will hit such a scenario every time when starting a new container..
Thus, every user creates a 404 error (as there is no
cron.txt), which is of course ugly and spams the logs..
I fixed the issue by extending the Contao source code.
The patch is already merged into the official release of Contao 3.5.33.
In addition, I’m initialising the
cron.txt in my Docker image with a time stamp of
0, see the Dockerfile.
Issues with Proxies
A typical Docker infrastructure (at least for me) consists of bunch containers orchestrated in various networks etc.. Usually, you’ll have at least one (reverse) proxy, which distributes HTTP request to the container in charge. However, I experienced a few issues with my proxy setup:
HTTPS vs HTTP
While the connection between client (user, web browser) and reverse proxy is SSL-encrypted, the proxy and the webserver talk plain HTTP.
As it’s the same machine, there is no big need to waste time on encryption.
But Contao has a problem with that setup.
Even though, the reverse proxy properly sends the
HTTP_X_FORWARDED_PROTO, Contao only sees incomming HTTP traffic and uses
http://-URLs in all documents…
Even if you ignore the mixed-content issue and/or implement a rewrite of HTTP to HTTPS at the web-server-layer, this will produce twice as much connections as necessary!
The solution is however not that difficult.
Contao does not understand
HTTP_X_FORWARDED_PROTO, but it recognises the
Thus, to fix that issue you just need to add the following to your
system/config/initconfig.php (see also Issue 7542):
In addition, this will generate URLs including the port number (e.g.
https://example.com:443/etc), but they are perfectly valid. (Not like
https://example.com:80/etc or something that I saw during my tests… ;-)
This workaround doesn’t work for Contao 4 anymore! To fix it see Dockerising Contao 4
URL encodings in the Sitemap
The previous fix brought up just another issue: The URL encoding in the sitemap breaks when using the port component (
rawurlencode to encode all URLs before writing them to the sitemap.
rawurlencode encodes quite a lot!
Among others, it converts
Thus, all URLs in my sitemap looked like this:
https://example.com%3A443/etc - which is obviously invalid.
Issues with Cache and Assets etc
A more delicate issue are cache and assets and sitemaps etc. Contao’s backend comes with convenient buttons to clear/regenerate these files and to create the search index. Yet, you don’t always want to login to the backend when recreating the Docker container.. Sometime you simply can’t - for example, if the container needs to be recreated over night.
Basically, that is not a big issue. Assets and cache will be regenerate once they are needed. But the sitemaps, for instance, will only be generated when interacting with the backend.
Thus, we need a solution to create these files as soon as possible, preferably in the background after a container is created.
Most of the stuff can be done using the
Automator tool, but I also have some personal scripts developed by a company, that require other mechanisms and are unfortunately not properly integrated into Contao’s hooks landscape.
And if we need to touch code anyways, we can also generate all assets and rebuild the search index manually (precreating necessary assets will later on speed up things for users…).
To generate all assets (images and scripts etc), we just need to access every single page at the frontend.
This will then trigger Contao to create the assets and cache, and subsequent requests from real-life users will be much faster!
The best hack that I came up with so far looks like the following script, that I uploaded to
/files/initialiser.php to Contao instance:
The first 3 lines initialise the Contao environment.
Here I assume that
../system/initialize.php exists (i.e. the script is saved in the
The next few lines purge existing cache using the Automator tool and subsequently regenerate the cache – just to be clean ;-)
Finally, the script
(i) collects all “searchable pages” using the
(ii) enriches this set of pages with additional pages that may be hooked-in by plugins etc through
and then (iii) uses cURL to iteratively request each page.
The first part should be reasonably fast, so clients may be willing to wait until the cache stuff is recreated. Accessing every frontend page, however, may require a significant amount of time! Especially for larger web pages.. Thus, I embedded everything in the following skeleton, which advises the browser to close the connection before we start the time-consuming tasks:
Here, the browser is told to close the connection after a certain content size arrived.
I buffer the content that I want to transfer using
ob_end_flush, so I know how big it is (using
ob_get_length can safely be ignored by the client, and the connection can be closed.
(You cannot be sure that the browser really closes the connection. I saw
curl doing it, but also some versions of Firefox still waiting for the script to finish… Nevertheless, the important content will be transferred quick enough).
In addition, I created some
mod_rewrite to automatically regenerate missing files.
For example, for the sitemaps I added the following to the vhost config (or
That means, if for example
/share/sitemap.xml not yet exists, the user gets automagically redirected to our
In addition, I added some request parameters (
?target=sitemap&sitemap=$1), so that the
initialiser.php knows which file was requested.
It can then regenerate everything and immediately output the new content! :)
For example, my snippet to regenerate and serve the sitemap looks similar to this:
Thus, the request to
/share/somesitemap.xml will never fail.
If the file does not exist, the client will be redirected to
/files/initialiser.php?target=sitemap&sitemap=somesitemap, the file
/share/somesitemap.xml will be regenerated, and the new contents will immediately be served.
So the client will eventually get the desired content :)
Please be aware, that this script is easily DOS-able! Attackers may produce a lot of load by accessing the file. Thus, I added some simple DOS protection to the beginning of the script, which makes sure the whole script is not run more than once per hour (3600 seconds):
true, it won’t regenerate cache etc, but still serve the sitemap and other files if requested..
However, if there is also no
$_GET['target'] defined, we don’t know what to serve anyway and can
You could include the script at the footer of your webpage, e.g. using
/*...*/ or something…)
This way you would make sure, that every request produces a fully initialised system. However, this will probably also create unnecessary load every hour… You could increase the time span in the DOS-protection-hack, but I guess it should be sufficient to run the script only if a missing file is requested. Earlier requests then need to wait for pending assets etc, but to be honest, that should not be too long (or you have a different problem anyway…).
And if your website provides an RSS feed, you could subscribe to it using your default reader, which will regularly make sure that the RSS feed is generated if missing.. (and thus trigger all the other stuff in our
– A feed reader as the poorest-man-cron ;-)
As I said earlier, my version of the script contains plenty of personalised stuff. That’s why I cannot easily share it with you.. :(
However, if you have trouble implementing it yourself just let me know :)