2009-06-15 10:52 Logster vs. Wikipedia, Round 1
We wanted to try out how much data our Logster code can conveniently visualize. So we downloaded the whole English Wikipedia's revision history, available at http://download.wikimedia.org/ in easily parseable XML. Processing and sorting it by time was a breeze. Thanks, Wikipedia people.
The idea was to show the edits on a world map. The history dump contains IP addresses for each anonymous edit, and usernames for non-anonymous ones. There was no way for us, as outsiders, to get our hands to the locations of non-anonymous Wikipedia editors. Which is a good thing, really. But we could use the IP addresses for geolocating the anonymous edits. All in all, there were about 50 000 000 entries we could use.
To spice things up a bit we added German and Spanish Wikipedias to the mix, as they seem to be quite active. The resulting video shows how the English, German and Spanish edits are distributed geographically over time, indicated by color. If the area is colored red, it's dominated by English edits; blue, German; green, Spanish. The coloring is also proportional, so if some area is e.g. purple, it's roughly divided between English and German edits.
2 seconds in the video represent roughly one month of Wikipedia history. There you can pretty accurately pinpoint stuff, for example the moments German and Spanish Wikipedias are born. Check it out at http://www.youtube.com/watch?v=ZT54tICMpqw or here:
- red = English
- blue = German
- green = Spanish
- purple = red + blue = English + German
(Or download the higher quality version.)
The nice thing is that we now have the pruned and sorted data and the code for handling it. Should the need arise we can try all kinds of other visualizations based on this. Any good ideas? Contact us at email@example.com for great justice!
-- ?jvi 2009-06-15 09:17:10