Separating structure and semantics

There is a great and simple rule, known as “Micha’s Golden Rule”, which goes back to Micha Gorelick (@mynameisfiber). It states:

Do not store data in the keys of a JSON blob.

This means, that instead of writing a JSON dataset holding people and their gaming scores like this:

{
  "Volker": 100,
  "John": 300
}

you should use something like:

[
  {
    "name": "Volker",
    "score": 100
  },
  {
    "name": "John",
    "score": 300
  }
]

First of all, it is good practice, to separate data and its meaning. Second it simplifies software development. And here is why:

One reason is, that in the first form you have no idea, what exactly the number associated with the name means. And when accessing the data you need to know the keys. But the keys are part of the data. So you first have to parse the whole file, separate the keys and iterate over them. In the second case you can iterate over a set of completely identical structured data sets and fetch names and scores.

This rule not only holds true for JSON but for any structured data like XML or yaml. Consider the following modified XML example from the SimpleXML section of the PHP manual:



 
  PHP: Behind the Parser
  Rasmus Lerdorf
  1

In PHP you would access the director in this way:

movie[0]->director;
?>

Now if you would like to use the directors name as a key to get the number of oscars he won, it would look like:



 
  PHP: Behind the Parser
  1

This is perfectly valid but stupid XML. And to access the data you need to know the name of the director:

movie[0]->RasmusLerdorf;
?>

Doesn’t make too much sense, hm? One additional drawback I didn’t mention but that you nevertheless saw: the keys of a data structure language often are subject to several limitaions. In XML element names e.g. there can’t be spaces. So you have to work around that e.g. by camelcasing the name. To get the name back in readable form, you would have to parse it and insert spaces at the correct positions. Which can be impossible with human names, since there are camelcased names like “DeLorean”.

Considering this rule is not always obvious but can save you a lot of nerves. Take care!

Technology scout

Finding a way through

Separating structure and semantics

Leave a Reply Cancel reply