Wednesday, April 22, 2015

Multilayer networks

Monday last week, on the 13th of April 2015, the 1999 Nobel laureate Günter Grass passed away. He grew up in what was then the free city of Danzig. Although Königsberg was farther east in Eastern Prussia, many of the memories of his youth he describes in the Tin Drum are familiar to me from my relatives' reminiscences. In fact, my grandfather emigrated from the Bernese Oberland to Königsberg to work as an agrarian engineer, ending up marrying a local girl.

A couple of centuries earlier another Swiss—Leonhard Euler from Basel—had also emigrated to Königsberg. Downtown, the river Pregel forms two large islands and seven bridges cross the river. In 1735 Euler posed himself the problem of whether it is possible to follow a path that crosses each bridge exactly once and returns to the starting point. In the process of tackling this problem, Euler invented graph theory.

Back in February 1996, I was given a week to write a blurb on what a company could do in terms of technology to monetize the freshly commercialized Internet. You can make money by making people more efficient, which can be achieved by allowing them to do their own work, or by allowing them to do new more valuable work. My employer lost interest and published the blurb, another company picked up the idea and ran with it.

The idea was that the World Wide Web is based on hypertext and therefore is a graph. You add value by adding weights to the edges of the digraph (Fig. 4 on p. 16). Once you have added the weights, you can build the minimum spanning tree (Fig. 5 on the following page). While it is very easy to get lost in a graph that is not minimally connected, resulting in tedium and frustration, in a tree there is a root (you can always go back to the beginning) and between any two nodes there is just one path. As opposed to a list, you can categorize the nodes by putting them in the same subtree, thus reducing the cognitive load.

At the time, I thought that you would have many instances of the graph, each one with different weights representing different criteria, like reading difficulty, subject matter, detail level, etc. Each one would yield a different minimum spanning tree and depending on what you would want to accomplish, you use one or the other.

In 1998 Brin and Page published their paper on "The Anatomy of a Large-Scale Hypertextual Web Search Engine." Pregel is Google's system for large-scale graph processing. There is an open-source counterpart to Pregel in Apache Giraph, although you would not expect to encounter a giraffe on the Pregel in Königsberg, not even in the zoo opened in 1896.

The following decade, as an associate editor for Optical Engineering, I often read papers where data from various frequency domains was fused to obtain much more useful images from remote sensing. Also in my activities at the Interactive Multimodal Information Management (IM2) Competence Center I learned that when dealing with hidden Markov models, by fusing various modes you could build much more powerful models.

Therefore, forget simple weighted digraphs. What you want is a multiplex network and you want to look at the trajectories of ergodic random walkers on multiplex networks. If you thing this is difficult, go back to the Baltic Sea where at Umeå University on the Gulf of Bothnia (more romantically, Bottnischer Meerbusen in German) they have developed Infomap, which makes it a child's play to reveal community structure by taking random walks.

detail of a visualization created with Infomap

Compared with conventional network analysis, multiplex Infomap applied to multilayer networks uncovers interplay between network layers and reveals smaller modules with more overlap that better capture the actual organization. Shoehorning multiplex networks into conventional community-detection algorithms can obscure important structural information, and earlier attempts at generalizing conventional community detection methods to identify communities across layers have proven problematic. In contrast, thanks to the map equation’s intrinsic ability to capture that sets of nodes across layers represent the very same physical objects in multiplex networks, the framework generalizes straightforwardly.