WordPress WordCloud with R
These days one can frequently read about wordclouds created with R, initiated by the release of the wordcloud package by Ian Fellows on July 23rd. So here I am to put in my two cents.
I thought about creating a wordcloud of a complete blog history, so I build a script that connects to a MySQL database and grabs all published posts and pages. All articles are combined in an huge text, that, when purged from tags and special chars, is visualized as a wordcloud:
[cc lang=”rsplus” lines=”-1” file=”pipapo/R/wordpress-wordcloud.R”][/cc]
Enough code, here is the result for my slight blog:
Smart image, isn’t it? Unfortunately it takes about 30 secs to generate it, otherwise it would be cool to create such a cloud live, for example using rApache.
Leave a comment
There are multiple options to leave a comment:
- send me an email
- submit a comment through the feedback page (anonymously via TOR)
- Fork this repo at GitHub, add your comment to the _data/comments directory and send me a pull request
- Fill the following form and Staticman will automagically create a pull request for you:
4 comments
VERY cool - thank you for this post!
BTW, it might be worth removing some of the “that” “this” etc words, using the tm package…
Cheers, Tal
[…] passada me deparei com um post interessante do Martin Scharm sobre como fazer wordcloud no R em um dos blogs que leio com frequencia, o R […]
Very cool. 2 followups: How did you shape the cloud to be round? When I generate my cloud, the words are very spread out, any tips?
Hi Ray, thanks for your interest :) Unfortunately, I have no idea what might be wrong with your code. Especially if you do not publish it.. Moreover, it’s been some time that I developed this small piece of code. Thus, the R packages probably have changed their behavior? I hope someone else is able to help you.