swatinem.dedie Welt ist nicht gerecht– The deranged thoughts of a tech junkie

29.05.2015 about virtual dom

Well, the concept of having a virtual dom, a DOM that is represented as a tree of plain (arbitrarily complex) JS objects instead of real HTML elements has had a lot traction recently. React has becoming more and more popular, with React Native bringing the same ideas to native iOS app development. And of course there are alternatives left and right whereever you want to look.

I have actually written a virtualdom library some time ago. And more recently I have used mercury for a very interactive app, but I abandoned it in favor of traditional DOM mainly for performance reasons that I will rant about here.

But nevertheless I really like the idea of having a virtual dom, and I might come back to use one of the many libraries out there sooner or later. Or decide to write my own once again.

This post will be a summary of my experiences and my ideas I have with libraries of this sort, and what use cases I want to have covered, and how I think that the giant goal o performance can be achieved.

Performance

Speaking of performance, the virtual dom approach is always advertised as being extremely fast. But actually, this is simply not true. Or well, the right answer is: it depends.

It sure is faster than concatenating strings and using innerHTML to replace your previous element. But it is a lot slower than a bunch of .className = and .data = (for TextNodes).

If you want raw speed, there is nothing faster than those atomic operations like setting a class or changing the text content of a certain element. With HTML5, you will even want to use the template tag and do those operations on its content before you importNode it into your actual document.

But you honestly don’t want to do this. Why? Because it couples your update logic extremely tightly to your template. Wanna move your element around? Wanna add another level of nesting? Well you just broke your page. Until you remember you also have to update your data-updating function.

At work, we currently use a system similar to what I described. But instead of hardcoding a path on which to set the textContent, we rather use querySelector with some hardcoded class names. Which kind of exposes our class names as public API, and also has the same problem of what happens if that element is not even included in the template. Also, we have no good solution to the problem of structural changes.

Structural changes vs content changes

There is actually two kinds of templating that you might want to take care of. I call them structural and content changes.

Structural changes might mutate the complete DOM tree of an element, depending on some kind of state you want te represent. You want to add or remove some child element, or even display a completely different subtree. For that use case, a virtual dom based templating system is actually the most convenient thing you might use. But for content changes, where the only thing that actually changes is a className or a textContent, virtual dom is actually overkill. Performance vise.

Performance, again

So lets dig deeper in what actually happens with all those libraries. Essentially, the naïve approach would be to recreate your whole dom tree as a virtual dom tree consisting of nested js objects. Then the library will diff that to your previous tree and return a set of patches which are then applied. Essentially, those patches are commands like move node A, change textContent of node B or things like that.

When I said before that the virtual dom approach is inherently slow, I was referring to these steps. Recreating the whole tree, even if its just in JS, if you change a single value of your state is incredibly slow. It creates a ton of new objects that put pressure on the GC. And depending on the library you are using, the functions to create those virtual dom objects might actually be incredibly slow. I told you I tried using mercury for a our project. And actually mercurys h function (or more precisely a library used inside of that function) was happily eating 30% or more of the whole apps performance.

As a side note: The virtualdom library I wrote a year ago worked with plain flat js objects, without any overhead of a constructor function. React will take the same approach in the future by having jsx inline those thing. If I decide to write a new virtualdom library myself, I will sure chose that same design principle as well. But probably have a very very light wrapper constructor just for jsx transpilers sake.

Coming back to the issue of slowness: What can you do about this? Of course you split up the big problem into a set of smaller ones. Or more specifically in the case of virtual dom, you split the giant tree up into subtrees. I will call this approach componentization, as component is the most widely used term for this. Mercury calls them thunks.

So the idea is that you start re-creating your dom tree from the root up, but when you get to a component boundary you stop if the actual data (state) of the component did not change. Or you do not start at the root at all, but at individual components, but more on that later.

Ok, so you are at a component boundary. And instead of recreating and diffing and patching its complete subtree every single time, you only diff the its state. Its a tradeoff, since diffing the state is faster than creating and diffing a complete virtual dom tree.

But you can do better than that even. Dive into:

Immutable Datastructures

One of the things I really hate about JS is that it has no structural equality. There are multiple proposals to add typed structs or immutable records and tuples to js. Mozilla has experimented some time ago with adding typed objects into SpiderMonkey, but overall, the proposals are far from getting into the language. But there are libraries that do something similar right now.

Anyway, what that means is that {a: 'a'} == {a: 'a'} will never ever be true. Even though the objects are equal, they are not one and the same. Comparing them, you would have to manually recursively walk the whole object. That is hella slow. It would be a lot faster if you could use the referential equality, as in, the objects are one and the same, and the comparison is extremely cheap.

Those immutable datastructure libraries do exactly that for you. They make a shallow copy of your state tree every time one nested property changes.

But that is a tradeoff as well. So with componentization, you traded away creating and comparing a virtual dom subtree to comparing only the data behind it. And with immutable data structures, you trade away comparing the data to creating a shallow copy of the data when you actually change it.

One component of mercury is called observ, which implements such an immutable data structure. And I was also following mercuries best practices, as in using one giant state atom as an observ object.

But guess what: It is hella slow. It was shallowly copying my state object over and over again. And with following mercury best practices, I had a fairly deep state object, so there was a lot of copying. And I had n copies of one object inside the state tree, so it was copying n times. And dealing with transforms, I was changing the translation, rotation and scale of an object, and with no support for transactions in observ, it was happily doing three shallow copies of the entire tree for the three properties I was changing. Not cool! That added another 30% or so of runtime overhead. By now I was spending way more time in mercury than in my own code.

Well thats what you get when you follow best practices. (Just a word of advice: Be extra cautious if something has the best practices label on it. Think for yourself before blindly following that advice) But oh well, mercury didn’t exactly make things easy anyway.

props vs state

So coming back to the component idea, we can actually think one step further and split the thing up into what we want to visualize and how we want to visualize it. So thats the raw data, and our view state. Or we can call it global state and local state. Or props and state, like its called in react and some other libraries. Ranting more about mercury, the answer to this is basically non existant.

So props are propagated from the root to subtrees, with all the optimizations we discussed before. And state is internal to a component, which re-renders itself unknown to the parent.

Summary

So I was taking on about how the various virtual dom libraries work, how they try to improve performance. I was ranting a lot about mercury (sorry) and I was clearing up some concepts in my mind as well, since writing an article about these things actually makes you understand them better.

What is still an open question for me is how to actually make internal state work nicely, to really avoid walking a nested tree of components. And still have them update inside a requestAnimationFrame, more precisely, the same rAF tick where my animation code changed their local transform state.

Since components should be self contained, they need some kind of way to signal a global refresh loop that they need to updated, preferably in the currently running tick. Oh well. I’m still bashing my head against the table on this one.

So I’m looking for, or planning to write a virtual dom library that has the following:

Revisiting content changes

Since the whole virtual dom strategy of re-rendering and diffing is really quite expensive, unlike most people clain, I would really love to have a mode that completely avoids those steps if all I really need is to set a className, or a style property. But then again, most of the content is actually computed and does not directly map to props or state. Oh well. Maybe all that is not possible after all. Or its even not really necessary if you make creating and diffing the virtual dom actually fast enough. And no, for most libraries, it simply isn’t!

15.12.2013 Tax Evasion

So apparently in capitalism, the richer you are, the less taxes you have to pay. Just look at all the big companies that pay less than 1% taxes for their revenue, even though the corporate tax is usually much higher. They use clever tricks with different subsidiaries and shell corporations in some foreign countries that have loopholes or simply no corporate tax at all.

This is absurd in itself, and as a mere mortal who pays a lot of taxes, I feel cheated.

I was thinking: Why is it only the rich that can benefit from such – apparently legal, but certainly amoral – wizardry? How about there were some tax advisor (more like tax evasion advisor), who would take care of all this for you, like creating bogous shell corporations in foreign countries and can make guarantees that everything is still legal. Someone who just said “I can guarantee you that you pay no more than 2% tax and that you can not be touched by your countries income revenue service (Finanzamt)” For that service, he can charge you 1% of all your income and make tons of money himself (for which of course he himself does not have to pay taxes, haha).

How about someone just starts up such a system as a service (SaaS) for every single mid to small size company, freelancer or simple employee? So that every single one could benefit from the same cheats that the super rich are using to rip us off. And if everyone starts doing it, maybe the countries leaders suddenly realize that the system is broken as fuck and start fixing it for good.


So why doesn’t anyone do something like this? And if something like this already exists, point me to it.

Older stuff in the archive.