Book Review: ‘Fundamentals of Data Visualization‘ by Claus O. Wilke
About the Author:
Claus O. Wilke is a computational and evolutionary biologist and chair of the Department of Integrative Biology at University of Texas at Austin, where Wilke studies the evolution of molecules and viruses using theoretical and computational methods. He is also the author of the Cowplot and ggridges plotting packages.
A whole book about graphs? But why?!
To begin with, let’s just forget for a moment that the book is full of beautiful charts that are a pleasure to look at. I have been working in Data Science for years now, and it is common knowledge that it lies somewhere between 4 disciplines – statistics, computer science, machine learning and visualization (if you are unfamiliar with this concept, I recommend reading some of our earlier blog posts).
However, it is the latest discipline that is often forgotten. Trust me; I know what I’m talking about – I have graduated in statistics, and currently work in an IT company that vehemently supports education in IT above all. Obviously, machine learning is a big passion of mine. But gosh, what about the visualization?!
When this book was published two months ago, there was a lot of praise going around. That’s why I recommended it to Profinit’s internal library and got my hands on it as soon as it arrived. It was definitely worth it!
Part I: The ZOO
In the first part, the book is more or less a theoretical introduction to how data are mapped on different axes, etc. (x, y, colour, size, transparency, etc.). If you have ever read ‘Grammar of Graphics’ (by Leland Wilkinson) or if you are using the Ggplot2 package for your visualizations (in which Hadley Wickham adapted the same concept in R) to visualize, you will feel very much at home.
Most of Part I, however, could be described as a ‘Zoo’, containing perhaps all conceivable (certainly all reasonably applicable) charts. Moreover, in my opinion, the metaphor of a zoo is also apt for describing its varied readership.
The first group will be newcomers to the industry, who keep running from cage to cage, from paddock to paddock, and are super enthusiastic about the beauty and strangeness of the creatures presented to them.
The second group might consist of curious schoolchildren, who start to realize that the theory from biology lessons is actually grounded in reality. Those are already excited about using the newly acquired knowledge in their next school project.
The third group is true to nature lovers, who spend long hours carefully studying information boards and observing the species on display. They aim to refine their knowledge about nature so that they can accurately reproduce this information to anyone who might ask.
Last but not least are the artists, who return again and again to the same bench in front of their favourite cage in order to perfectly capture the right shade and light of those colourful wings in their sketches.
Not to speak of all the other zoo visitors. A locksmith, florist, or a chef will all look at the caged birds form a different perspective. In short – there is something for everyone.
Part II and III: When two things aren’t equal
Originally, I wanted to start this section by citing another piece of work:
All that is gold does not glitter,Not all those who wander are lost.– J. R. R. Tolkien
But this Czech proverb will do just as well: “When two people do the same thing, it’s not the same.” And this is also true of the charts – how many times have you rolled your eyes after opening the newspaper *), because the shown infographics were over the top, simply ugly, or worse; misleading, and made no sense at all. This happens to me all the time.
Part II and III of the book contain tips on how to present your data so that the reader will grasp the message (and ideally not suffer from aesthetic shock). How should one resist the temptation of using overly complicated charts and rather choose to tell the story with simpler yet fitting illustrations? Moreover, what do you need to do to respect the needs of readers with colour vision disorders (especially when according to estimates, this applies to 8% of men and 0.5% of women, sic!). What colour range is suitable for which occasions and how do you make sure that the key information does not get lost in all that rainbow madness. And much more…
*) By the way, at this point, other groups of people, who should read this book, come to mind, e.g., (data) journalists, BI specialists, etc.
For the Data gourmets – GitHub version with source code
If you are still hesitant about giving the book a chance, its electronic version is freely available here.
A concluding remark: it was written using the Bookdown package for R (by Yihui Xie), and its source code is available on GitHub. This means that within the limits of licensing, you can adjust it to your liking, but also have a closer look at how each chart was created. The book itself does not contain the source code for each visualization. The author says that such technical details would only distract readers from the bigger picture.