Saturday, 16 April 2011

MongoDB - usable for O2?

Most programmers are familiar with relational database management systems such as mysql, mssql, postgresql or oracle. In these systems the tables we create expect its data to be in a very specific format and an object is normally stored not in just one table; rather, its data are usually distributed across several database tables. For example, in O2, a Member object uses four tables for its data: O2_OBJ_OBJECT, O2_OBJ_PERSON, O2_OBJ_MEMBER and O2_OBJ_OBJECT_VARCHAR.

MongoDB is a so-called object oriented database management system, and it is supposed to be faster than relational database management systems and scale better. But it doesn't have all the functionality of the relational database management systems; most importantly it doesn't support joins.

MongoDB doesn't have tables or rows like "normal" databases. Instead it has something called collections and documents, but you can think of a collection as sort of a table, and of a document as a row.

When using Perl, as we do, a document is a hash ref. It has lots of keys and corresponding values. A value doesn't have to be a scalar, it can also be a reference to another hash, or a reference to an array. And MongoDB doesn't care about how that structure looks. You are free to insert data structures with totally different keys into the same collection. Whether this is a smart thing to do is another matter... We would probably store similar data structures in each separate collection ("table"). If we should start using MongoDB in O2, one object would likely correspond to one document in the collection which represents that object's class.

Objects in O2 can be related to each other. The way to do this is to set the type of a field to a class name - the class name of objects that can be stored in that field. In the database it is only the ID of those objects that is stored in the "main" object. This means that when we search for objects through related objects using relational databases, we have to join some tables in the search query, which can have a severe impact on performance, especially when we have to join more than a couple of tables, which sometimes happens in O2. MongoDB has something called DBRef, which is a reference to another object. I first thought it would be possible to search for fields in the related object through the main object and the DBRef, but it turns out that's not possible after all. Anyway, it might not have been that efficient either, if it were possible.

I believe the main reason why O2 is sometimes slow is that searches in related objects can take a long time, due to several joins, and because mysql sometimes doesn't execute the joins in the most efficient order. One way to mitigate this using MongoDB could be to store whole serialized objects instead of just object IDs in the database. This, however, has two challenges: 1) Duplication of data, and 2) How to update all the serialized objects when the actual object is updated.

We have to choose between fast response times and no duplication - we can't have both. At least not without (other forms of) caching. But caching makes things harder to debug, so the less caching we need, I think, the better.

When it comes to updating all serialized objects, I think this is possible if we store in the actual object the IDs of the objects that contain serialized copies of itself. And when the object is saved, we go through all of these other objects and update them, as well. This will make saving objects slower than today, but the question is by how much. If it is very slow, it might be possible to do it asynchronously, so that the user doesn't have to wait for it.

If we were going to represent Member and Person objects in MongoDB, we would probably create a Person collection and a Member collection. A Member inherits from Person, so the question is whether we should insert data into both the Person collection and the Member collection, or whether a Member object should be inserted only into Member. And if we decide to insert into Person as well, do we insert the entire Member object or just the part that's relevant for Persons?

My intuition tells me it would be best to insert Member objects only into Member, not Person. Which means that searching for Persons must search through both Person and Member.

In conclusion, MongoDB is a more natural data storage for O2 than relational database management systems, and, I think, better. And it might be possible to make searching really fast in MongoDB if we can tolerate duplication in the database and longer saving times.

Thursday, 18 March 2010

Exceptionally cool things you can do with LinkPulse: The Red Box

We already knew that our customers are cool innovators with great ideas. That's why they use LinkPulse. But we were nonetheless struck by awe when we found out one of them had created this red box click counter with data from LinkPulse.

Per Åstrøm is Technical Manager New Platforms at Tv4.se and together with David Hall he made this during a "hack day" at Bonnier (owner company of Tv4).

Basically, it shows a number known in LP as "State Now", based on traffic the past 15 minutes, in addition to an up or down pointing arrow indicating whether traffic is going up or down.

Data is fed by an XML feed from the LinkPulse application.

We love it!!

Tuesday, 26 January 2010

Nettavisen increases traffic after changing the frontpage

The Norwegian news outlet and LinkPulse customer, Nettavisen, yesterday announced a 14% increase in pageviews and 40% increase in clickthrough after changing their frontpage layout.

Clickthrough is basically the conversion of page views to clicks; i.e. if you go to the front page, do you click on an article?

We believe that such a dramatic change could only come as a result of a dedicated effort to see what users do and what users don't do on their frontpage.

We are happily convinced that LinkPulse was an important part of the project and send our warmest congratulations to the staff over at Nettavisen.

Cheers!

Thursday, 21 January 2010

klikk.no receives a Golden Tag for 2009

Our congratulations go to our customer klikk.no (Hjemmet Mortensen) who received a diploma for best website in the 2009 Golden Tag Awards.

Klikk.no is one of the fastest growing internet magazines in Norway, and we are pleased to know that LinkPulse is used every day to make that happen.

For instance, we are grateful to klikk.no for beta-testing our new browser Toolbar, which will surely change everyone's lives when we release it this spring!

Keep up the good work!

Tuesday, 19 January 2010

Remoteless: Remote control for Spotify and Iphone - I tested it.

Disclaimer: My friend is on the development team of Remoteless.

Last night I participated in a very exciting informal user test of an upcoming Iphone app called Remoteless, scheduled to arrive in App Store around March 1, 2010. It's a remote control for Spotify (on Windows).

First of all, I must say I'm impressed by the job done, since Spotify offers no open API to control it they had to use image processing and interaction simulation to communicate with the Spotify client. It requires you to install a little program on your Windows computer, and for now, that's the only platform they support.

Since I am a Mac user I don't reckon I will buy the app, but having tried I would certainly recommend it to Windows users out there who need an easy way to switch music playing on your computer without getting out of the couch.

As opposed to other apps that let you control the computer remotely, this app here actually communicates with the public metadata API of Spotify to search for artists, albums and songs, and when you are ready to play a song, its URI is sent to Spotify along with some double-click events and some such.

It even sports the ability to save tracks, albums and even artists as favorites, something I've been missing from Spotify!

Scheduled for release around the same time as Spotify's arrival in the US, I suppose this app will do well, and I think it deserves it!

Wednesday, 18 November 2009

In beta: Web analytics tools market share in Nordic news outlets

I was inspired by KAIZEN Analytics recent post on web analytics tools market shares in the automotive industry, to do a similar study on online news outlets in the Nordic countries, specifically Norway, Sweden, Denmark and Finland.

Just as Kaizen, I will use WASP to inquire each web site what tools they use. Moreover, I only care about core web analytics, not ad trackers etc. I also took the liberty to add the LinkPulse numbers (as WASP does not recognize it, but I know who they are).

I would like to emphasize at this point that this is a research project in the making, and I'm publishing preliminary results to see if there's interest in these numbers out there. For now, I have far more data for Norway than for the other countries because it's much easier for me to decide which ones to count and which ones not to count. Basically I've tried to include web sites that correspond to daily news papers, as well as online only sites which center around dissemination of news, may be portals or niche sites such as sports or economics. Another criteria I've been considering is amount of traffic according to official metrics, but haven't followed this strictly so far.

Further work on this research will include establishing more rigid selection criteria and gathering more data from the other Nordic countries. Other considerations to make are groups of sites that buy tools collectively because of common ownership, as well as using the fact that most sites have more than one tool.

Another interesting possibility is to factor in each sites' official traffic numbers. Thereby we could see something about what tools account for the most traffic, or some such.

As of now, I have counted the tools on 66 sites, of which a little under half are Norwegian.

One difficulty is to account for at least two "disturbing" factors in the data. One being that one of the tools, Google Analytics, is free and therefore has a far lower threshold for use than the other tools. The other disturbance is that all the Nordic countries have one tool that is used across the board due to common agreement to provide official metrics.

I haven't decided how to account for some of these issues and therefore present the data with a big juicy footnote to consider it largely incomplete. I think, nonetheless they give an interesting view of what is being used generally in news sites.


Nordic countries
The first pie shows the usage of web analytics tools in Norway, Sweden, Denmark and Finland, weighted so that it totals to 100% even though some sites use more than one tool. These numbers are schewed by the fact that almost half the data are from Norwegian sites. Another difficulty is that TNS metrix and Gemius are "forced" tools in some countries, but also used voluntarily in other countries. This is an issue that I will work to solve.

Norway only
The second pie focuses on the Norwegian data only. Also I have removed both Google Analytics and TNS Metrix. Google Analytics is free and therefore used by virtually everyone and TNS Metrix is the official tool in Norway, therefore used by literally everyone (at least all the sites that I gathered data about). The usage of these may be interesting in and of itself, but may overshadow interesting facts about usage of the other tools.

 No conclusions

Since this endeavor has just started, and since there are yet so many issues to be resolve, I refrain from making any sort of conclusions about the data so far. I just let the pies stand as they are.

What I would like to get comments on is if these sorts of numbers are interesting for anyone out there to follow, as well as suggestions on how to resolve some of the issues I have raised that may schew the data.

Also, if you know of anyone else doing similar research, I'd love to know about it, especially if it's related to online news media.

Of course, if there's anything else you have on your mind about any of this, feel free to leave a note.

Saturday, 7 November 2009

Agile journalism and Web analytics are friends

I had been thinking about different ways to approach web analytics for news publishers, and specifically journalists, for a while when I came over Eric T. Petersons white paper "The Coming Revolution in Web Analytics" where he discusses what the future holds for web analytics, and what third-generation tools need to do to make it happen.

Peterson's emphasis on making decisions in real time is one that resonates highly with the agile software development methodology. Iterative processes, high information saturation and high degree of freedom have proven successful catalysts of creativity in this domain.

I wanted to check if anyone had written about the parallels of agile methods in software development with the new working environment of journalists. Google is my friend, so I entered 'agile journalism' and got about a thousand hits. So, it's not an entirely new idea.

Actually, the first use of the term 'agile journalism' that I can find (on Google) is in Arthur Symons "Studies on modern painters" where he describes the style of Helleu as "superficial draughtmanship", but at least "far more alive", "chic in all its hasty expressiveness a wholly Parisian art, hardly more serious than agile journalism, but how clever of its kind!" Symons recognizes, perhaps somewhat prematurely, the value of agility in reporting from a scene.

Several bloggers have also commented on the similarities of the shift towards agile methods in software development with media's "shift" from print to online publishing. Newspapers have "a gigantic machine with many small cogs, devoted to producing something that is frozen in time"  but enter the web and journalists must "change their way of thinking" .

Florin Duroiu recently even pointed out his epiphany that "process journalism really is agile journalism".

Others feel, more bluntly, that "the future of journalism is agile", but is anyone linking 'agile journalism' to web analytics? Googling 'agile journalism + 'web analytics' gave me no hits, and a quick scan of the initial search result didn't seem to include any discussions about how web analytics can help journalists be agile.

I have come to appreciate that journalists and front page editors really can benefit from having access to certain metrics if it enables them to react fast and see the response. The focus is not on the tool, but on the questions that arise instantly from looking at data that, for instance, don't make sense compared to the past. Can I change the picture, or the headline to drive more traffic? Are there any related articles I can link to in my text to drive down the bounce rate?

Of course, web analytics can't tell you about people's emotional response. But it can tell you that you're doing something wrong, and enable you to make up for it fast. I think the revolution in web analytics will come, at least for journalists and editors, when they get easy access to reports that are tailored for them and an assurance that anyone can be an analytics expert.

Agility comes with the willingness to experiment, and web analytics may just be the safeguard to allow that to happen.


Image: Wikimedia Commons - Paul Helleu "Madame Helleu Sur Son Yacht Letoile" 1898-1900