0 comment

Graph libraries – D3.js

One of the best known visualization frameworks is D3.js. Written by Mike Bostock, the data visualization superstar about whom Edward Tufte said “that he will become one of the most important people for the future of data visualisation” (according to Wikipedia). And who am I to object to Edward Tufte?

So D3 is in its core an SVG handling library around which several so called layouts for different (actually a lot of) types of diagrams gather. One of them is the so called force-directed graph, which is exactly what we will use here.

How does it look?

This is a real physical simulation (although featuring some shortcuts for reduced calculation time) which means that the arrangement of the nodes when reloading the page is a new one every time and dragging around one node lets the others wobble around. One of the shortcuts is that the repulsive force only is active between nodes connected. So dangling ends can cross edges with the rest of the graph. Nevertheless playing around with the mesh can be fun for some time. The complete source code can be found in the GitHub-Mark-32px GitHub repository

How does it work?

Once again it all starts with including the libraries:

The we add some style for nodes and text:

Next we define some variables:

The first line defines width and height of the SVG element, the second line defines n as the number of iterations to be done (I’ll explain later) and the global variables force, svg and drag.

Next we need to define the nodes and edges or links. For the sake of simplicity I’ll do that once again inline. In a real application you would load the dataset in the background:

Then I wrapped up the whole generation and execution of the graph into a function called performGraph():

Lines 2 to 7 set up the force layout engine with some physical parameters, the width and height and an event handler for the ticks. Ticks are D3’s simulation steps. For every tick, a function is called and the positions of the elements are recalculated. We’ll discuss the tick function further down.

Then we define a drag event listener in line 8 to 10 binding its events to two functions called dragstart and dragend. More on those later. Lines 11 to 14 show how to manipulatie the DOM using D3 by adding an SVG element with some attributes to the body.

Lines 15 and 16 handle one of the (for me) most counterintuitive features of D3: we select DOM elements by class that aren’t yet there. Line 17 attaches the nodes and links datasets to the force layout. Line 18 kicks off the physical simulation, line 22 stops it. The for-loop in between counting down from 10000 (the n variable, you remember?) executes 10000 times the force.tick() function. We also could have started the simulation and let it run infinitely but this start-stop handling provides for a steady graph instead of one always wiggling around a bit.

Lines 23 to 44 define, what to do with the links by appending a g (for group) SVG element to the SVG canvas and adding line to the group. Lines 45 to 53 do something similar with the nodes adding a group and a transformation function. In lines 54 to 62 we add a AVG circle element using the nodecolor attribute from the nodes dataset. Lines 63 to 71 finally add the text label to the nodes. Line 72 eventually binds the nodes to the drag event object to allow for the dragging of nodes with a mouse.

Now we need to define all the helper functions bound to event listeners in code above. First the tick() function to do the simulation steps:

And the dragging functions:

The last thing to do: kick it to action by calling performGraph():

Why exactly did I put all the code into a function to call it in the end? In a real application you would have some additional GUI elements for example to switch on and of different types of nodes or labels etc. Every time this is done we need to recalculate the graph. Now this can be done with one function call.

0 comment

Innovation means breaking the rules

6264574707_47f374a312_oToday I paged through a book by Martin Gaedt, called “Rock your idea”. Despite the  english title the book is written in German. One of the first and central ideas presented there was:

Innovation means to break the rules.

I’ve been scientific and innovation enabler for several projects and companies now and this is nearly all you need to know about injecting new ideas or technology into companies.

If you want to work out a totally different and optimal solution for a given  problem or even want to create a new area of business, you have to break the rules.

  • The rules defining what is possible. Maybe someone just invented the technology needed to build your idea. You might at the moment not know that.
  • The rules what we think customers need or want. Ever heard that argument “None of our customers asked for that! Why should we build that?”?.
  • The rules of the existing business. Ever heard that your idea doesn’t fit into the portfolio of the company? WHAT THE HECK? The portfolio is where the money is, for God’s sake!
  • And sometimes even the rules of hierarchy. Having an innovator working ‘at a short leash’ often means preventing innovation. Sometimes managerial control is the worst you can do to harm your company.

So to reformulate the negative arguments to positive actionable items:

  • When having a new idea, don’t instantly think about existing technology to build it. When you need it, there will be a solution. There always was one when I needed it. Finding technical solutions is the job of people like me.
  • When you don’t know if a customers will buy into your idea, TALK TO THEM. Sometimes you will get a piloting customer aboard for an idea that is nothing more than that: an idea. Sometimes you will have to invest a small amount of money and time into a prototype or mockup to win a customers interest.
  • If you only look for ideas fitting into an existing portfolio, I’m sorry to tell you, that your company will not have any chances to survive a future disruptive change in its very own field of business. There will be someone else having a similar idea and with no reluctance to realize it, portfolio or not.
  • If you have people working for you on innovative ideas, trust them. At least a bit further than you feel comfortable with. Micromanaging people is a great way to let them look for another company that offers a broader field of work. If you expect your researchers to follow working instructions to the point you’re exactly missing the point.
0 comment

Graph libraries – Cytoscape.js with Cola.js

Now it’s time for my favourite graph visualization framework. As an introduction I cite from the Cytoscape website:

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

Cytoscape.js as a side project is a JavaScript library designed to build graph theory related data evaluation and visualization applications on the web. It comes with some layout renderers but is open to other visualization libraries such as Cola.js. Cola is a constraint based layout library with some very interesting features. In this context, we will use it only as a layout engine for the graph.

How does it look?

When playing around with this graph you will probably notice two things:

  1. When dragging around a single node all the other nodes of the graph remain calm and don’t wiggle around like in most physical based layouts.
  2. When reloading this page you will always see the same arrangement of nodes. The graph is stable to reloads.

These two feature result from the way Cola.js renders the graph and are very useful for real world applications. While it’s big fun wobbling and dragging around nodes in a real application you would always like to see the same graph when reloading the same data. And manipulating the arrangement should not affect nodes not touched. The complete source code can be found in the GitHub-Mark-32px GitHub repository.

How does it work?

Once again we start by including the stuff needed. And this are the two libraries and an adaptor to use Cola.js from within Cytoscape.js:

The adaptor can also be found on GitHub.

The HTML body contains a div to render the graph into:

The remainder of the index.html file consists of a script:

Lines 2 to 7 define the nodes with their name/caption and colour. Lines 8 to 54 define the edges in a similar way. Lines 56 to 70 define the Cytoscape container and attach dynamic properties to the nodes and edges. Line 71 finally kicks off the layout rendering. Note: up to that last line we specified exactly no rendering or geometry information! By changing the name of the renderer from cola  to one of the built-ins ( circle  or cose  are fun to play with!) you can get a completely different outout.

As a last remark I strongly recommend to have a look at the graph theoretical and practical routines and algorithms the Cytoscape.js library has to offer.

0 comment

Graph libraries – Arbor.js

Arbor.js is written by Christian Swineheart from Samizdat Drafting Co. The documentation is concise even though the main page navigation as animated tree layout always drives me crazy :) The example I present here only depends on arbor.js and jQuery. Graph definition is for shortness and clarity done by calls to addNode  and addEdge  methods of the ParticleSystem renderer.

How does it look?

As you can see, the physical simulation is much more fluid and undamped than with Alchemy.js. I’m pretty much shure this can be changed to some degree with a different initialization of the particle system as described in the reference documentation. Once again, the complete source code, in this case consisting only of an index.html file, can be found in the GitHub-Mark-32px GitHub repository.

How does it work?

Once again in the header we start by loading the stuff needed:

arbor-tween.js is not really needed for this example as it allows gradual transitions in the data object of nodes and edges. Arbor.js by default renders its graph in a HTML5 canvas, in the body we need to define this:

The remainder of the file consists of a function defining and binding the renderer object to the canvas and constructing the graph:

Lines 2 to 73 define the Renderer, in detail:

  • get the HTML5 canvas (line 3)
  • grab its 2D context (line 4)
  • an init function for the particle system (lines 7 to 12)
  • a redraw function actually drawing the graph (lines 13 to 33) by iterating over all nodes and edges
  • defining a pretty basic mouse handling (lines 34 to 69)

In line 75 we get a particle system, set the rendering to the canvas in line 76 and define nodes (lines 76 to 81) and edges (lines 82 to 87). That’s pretty much all.

0 comment

Graph libraries – Alchemy.js

Alchemy.js is a library developed by graphAlchemist (site defunct). The documentation could be better and more detailed but it works. Like many other libraries it leverages other libraries to do “the basic stuff”. In this case it is it depends on:

Alchemy uses GraphJSON (similar to GeoJSON) as input format, which is quite nice but undocumented since the website doesn’t exist any more. Looks as if support for it ceded with the graphAlchemist web site …

How does it look?

Like all demos I’ll show it’s a fully functional mini demo, so you can drag the nodes around. I’ll present excerpts of the code to explain. The complete code can be found in a GitHub-Mark-32px GitHub repository.

How does it work?

Our example consists of an index.html file, a GraphJSON data file and the library stuff. In the index.html we first load the stuff needed:

The body contains a div to put the graph in and the include for the base library itself:

Finally there’s the code producing the graph:

As you can see, nearly everything is done in a configuration object. Line 2 defines the source of the GraphJSON file, lines 3 & 4 set the dimensions of the SVG element. The setting of nodeCaptionsOnByDefault  in line 6 switches on the display of the node titles. Otherwise they would only show up when hovering with the mouse. Lines 7 to 9 define a callback function defining what to display as a caption / title. Lines 10 to 15 define valid values for grouping edges and nodes. Nodes are grouped by the role attribute, edges by caption. Lines 16 to 37 define how these groups of edges and nodes should be formatted. There are other, more direct ways to style a graph in Alchemy.js but I wanted to show the (for me) most elegant way. Finally line 39 kicks off the rendering of the graph with the given configuration.

0 comment

Graph libraries – Introduction

This post starts a small series of articles, that will present some JavaScript graph plotting libraries and show how to draw more or less the same simple demo graph with these different libs.

What is a graph?

Graph theory is a mathematical discipline that studies networks of nodes and links or edges connecting them. Graphs are useful for many applications: social media networks, streets connecting cities, transfers connecting bank accounts or networks of people, addresses, bank accounts and cars connected via insurance claims and events. We’ll deal with some of those applications later on when discussing the application of network and graph paradigms to visualizing fraudulent actions.

Which libraries have I examined?

The internet is full, I mean literally FULL of libraries to display networked graphs. I had to choose some and discard others. First I decided to neglect commercial libraries that don’t have an open source edition. Then I stopped evaluating libs when I recognized that the documentation is too poor or functionality too limited to further investigate. So here is the list in alphabetical order:

Libraries I didn’t evaluate due to licensing constraints:

Libraries I gave up on:

  • Dracula Graph Library (Graph Dracula … you understand … haha)
    Needs a raphael.js renderer to display custom node shapes. Too much fuzz for a very short test.
  • sigma.js
    Poorly documented. Didn’t succeed to build a test candidate in short (!) time.

How did I test?

networkI always used the same test network. It consists of 6 nodes and 6 edges. Some edges are red and one node is green. The nodes are labelled “Node 1” to “Node 6”. Colouring and labelling is used to test how easy it is to insert some basic customization into the layout.

On the right side you can see a hand drawn (well not really handdrawn) picture of the demo network. The examples I present are fully functional, not screen shots and I will briefly go through the source code.


The first part for Alchemy.js can be found here.

The second post on Arbor.js is here.

The third part with Cytoscape.js and Cola.js is here.

1 comment

Lean systems as reincarnation of large ones


Washington SunglassesSome software systems are designed for massive amounts of data to be processed in a very short time. Banking systems, fraud detection, billing systems. Lets pick one, I worked on for a long time: billing systems (for telecom or internet providers for example).

Most of those systems are very large, mostly complex systems, designed to bill millions of customers per month. Some examples are Kenan Arbor (bought by Lucent) or Amdocs. Since these systems need to process vast amounts of data very fast, they are built using compiled sources / binaries. Binary software is not easily customizable. So most of these systems are widely customizable via configuration files. Taking in account all options possible in dynamic configurations results in even more compiled code. I think you got it.

What actually is a billing system?

To give an impression which steps are required in a typical billing scenario, here is a short non-exhaustive list:

  1. Preprocessing
    Collect billable items from external systems, translate or reformat the data and put it in a database.
    Additional for mobile telecom billing: import GSM TAP3 (Transferred Account Procedure version 3) roaming data.
  2. First step: Rating
    Put a “price tag” on every billable item for the billing period in question
  3. Second step: Billing
    Collect rated items as invoice items per customer.
  4. Third step: Invoicing
    Create invoices, on paper or digitally.
  5. Fourth step: Payment
    Withdraw money via saved payment option per customer.
    Alternatively: substract invoice total from prepaid deposit.

The problem

All of these steps could be “special” for any customers. Think of a subscribed service. Every customer pays $5 per month. But once the company had an introductory offer of 20% off for the first year. So not only are some customers paying only $4 but they are paying $5 after 12 months. Now take into account, that a typical mobile telecom company has something like 20-30 different contract types or rates. Wow, lots of options. Not a problem for a multi-million dollar company but for smaller companies with, let’s say, 100 to 100.000 customers.

A solution?

Now what if a billing system would be implemented in a scripting language? Admittedly it would be a bit slower (would it? I don’t really know) that a solution in C or C++. But it would be very fast and flexible customizable, if well documented (we developers love to write documentation, don’t we?). Also management summary dashboards would be much more flexible as prebuilt solutions like QlikView (which also would cost additional license fees).

I could visualize for example a solution in Python. This way it would be fast compared to some other scripting languages and could leverage the massive amount of financial and mathematical software components. Build an administration and dashboard component with Flask or Django and run Python scripts on a PostgreSQL database. If more speed is needed you could switch to an Apache Spark architecture, which would also be scriptable in Python via PySpark.

Start on a small budget but don’t let decisions limit your options!

0 comment

Assholes and code of conduct manifests

Recently someone pointed out, that the perceived number of assholes is rising. This means, that anti-social behavior can be encountered more often. I responded that there seems to be some sort of sociological mechanism leading to anti-social behavior being more accepted than before.
Someone else asked what that mechanism might be, so here is my amateurish point of view.
Social interaction often is accomplished by communication. So examining forms of anti-social or disruptive communicational behavior might help to clarify some points. One form of disruptive communication is the so called “interactive vandalism” (Anthony Giddens, Sociology). Giddens points out, that effective communication or interaction is based on a cooperative behavior of the participants. If one party of an interaction deliberately behaves in a non-cooperative way, this often is encountered as a aggressive attitude by the other participants. But this is a stylistic device, not an explanation.
Another perspective is that of Erving Goffman‘s “The Presentation of Self in Everyday Life“: people behave as if they were acting. And like in a theater there is a stage and a backstage area. In the front region (stage) they act mostly according to common sense rules. In the back region (backstage) they can “give vent to feelings and styles of behavior they keep in check when on stage”. So acting more anti-social might mean transferring behavioral patterns from back to front.
In traditional social settings this would in general have been a completely unacceptable behavior, but why is it not judged that way now? There is a general process at work transferring the private into the public. Reality shows on TV, social media, liberalization of professional situations. Don’t get me wrong: I don’t judge those processes, but if not backed by a so called “good education” things can go wrong unnoticed. If this happens and some sort of “invisible control” doesn’t come into effect, openly visible regulation is a way to prevent unwanted situations.
An area where this currently happens are public conferences. More and more conference hosts issue code of conduct manifests. I got into some serious discussions because I objected that these rules are pretty obvious and stating them so explicitly might be sort of embarrassing for “well-behaved visitors”. Conference organizers assured me that these rules are not so obvious anymore. Maybe in the future we’ll see explicitly stated rules for human interaction more often. While being liberated from unnecessarily rigid forms of social behavior is a good thing, this feels like a cultural loss to me. But than again I just might be getting old.

1 comment

Science and technology links [22-05-2016]


Here are some interesting links connecting technology and science.

Open Scholar

Open Scholar is a open source content management system based on Drupal. This means it’s written in PHP. It is developed at Harvard University at the Institute of Quantitative Social Science.

Open Science Framework

OSF is a management system for scientific projects. It can be used as a hosted service right at the site or self-hosted. OSF is witten in Python and developed by the Center for Open Science.

The Dataverse Project

The Dataverse Project is a research data repository software. There are at the time of this writing 17 interconnected installations around the world. The software is written in Java and is developed by the aforementioned Institute of Quantitative Social Science at Harvard.


Figshare is a repository for research output (mainly publications or papers). It’s a closed-source hosted service developed by Digital Science which also developed Overleaf. More on scientific writing solutions in a later post.

0 comment

How to read in a statistic whatever you want

Professor at chalkboardIt’s a common place that you can read nearly any result you want from a statistic. You just have to optimize your mathematical model or cut short the reasoning about the data. There is currently a JAMA publication from the American Medical Society which is cited in many magazines and newspapers (even in German Spiegel) as “22% less risk of colorectal cancer for vegetarians”. No ordinary reader of these reviews will have a look at the original numbers in the publication since it is not freely available (yet another reason for open Publication …). But here they are:

  • Vegetarian participants: 40367
    Cancer cases: 252
  • Nonvegetarian participants: 37292
    Cancer cases: 238

This makes for the following relative case numbers:

  • Vegetarian: 0,624 / 100 participants
  • Nonvegetarian: 0,638 / 100 participants

Or a difference of 0,014 cases per 100 people. This means, if you eat meat your risk to come down with a form of colorectal cancer increases by 0,014 percent. This reads quite different, doesn’t it?