Sunday, January 06, 2013

The Power Of Getters

While examples used in this post are implemented in JavaScript, concepts discussed here about getters are, in my experience, universally valid.
No matter if we are programming client or server, getters can be full of wins and if you think getters are a bad practice because of performance keep reading and you might realize getters are a good practice for performance too.
As summary, this entry is about getters and few patterns you might not know or ever thought about but yeah, is a long one so ... grab a coffee, open a console if you want to test some snippet and enjoy!
Update
JavaScript allows inline runtime overload of inherited getters so that properties can be redefined as such, when and if necessary.
This is not possible, or not so easy, with Java or other classical OOP languages.
This update is for those thinking this topic has been already discussed somewhere else and there's nothing more to learn about ... well, they might be wrong! ;)

What

Generally speaking, a getter is a method invoked behind the scene and transparently. It does not require arguments and it looks exactly as any other property.
Historically represented as __defineGetter__ Object method in older browsers (but IE), ES5 allows us to use a more elegant and powerful Object.defineProperty method while older IE could use, when and if necessary, VBScript madness. However, consider that today all mobile and desktop browsers support getters, as well as server side JS implementations as showed in this table. Here the most basic getter example:
var o = Object.defineProperty(
  {},    // a generic object
  "key", // a generic property name
  // the method invoked every time
  {get: function () {
    console.log('it works!');
    return 123;
  }}
);

// will log 'it works!'
o.key; // 123

// this will throw an error
o.key();
// getters are not methods!
I know, pretty boring so far and nothing new so ... let's move on!

Why

Getters are usually behind special behaviors such read-only non-constant properties, as HTMLElement#firstChild could be, or (reaction|mutation)able properties such Array#length.
// DOM
document
  .body
  // equivalent of
  // function () {
  //   return document
  //     .getElementsByTagName(
  //       "body"
  //      )[0];
  // }
  .firstChild
  // equivalent of
  // function () {
  //   return this
  //     .getElementsByTagName(
  //       "*"
  //      )[0];
  // }
;

// Array#length
var a = [1, 2, 3];
a.length; // getter: 3
a.length = 1; // setter
a; // [1]
a.length; // getter: 1

A First Look At Performance

If we perform every time an expensive operation as the one described to obtain the body, of course performance cannot be that good. Thing is, the engine might perform that every time because it must be reliable when we ask again for the body, being this just a node inside the documentElement that can be replaced as any other node at any time.
However, even the engine could be optimized when it comes to widely used accessors as firstChild could be, and this is what we can do as well with our defined getters ( and if you are wondering how to speed up the document.body access, well ... just use var body = document.body; on top of your closure if you are sure no other script will ever replace that node which is 99% of use cases, I guess ... drop that script otherwise :D )

A DOM Element Example

Use cases for getters are many but for the sake of explaining this topic, I have chosen a classic DOM simulation example. Here the very basic constructor:
// A basic DOM Element
function Element() {
  this.children = [];
}
That is quite common constructor for many other use cases too, right? What if I tell you that there is already something inefficient in that simple constructor?

Already Better With Getters

So here the thing, when we create an object, this might have many properties that could be objects or arrays or any sort of instance, isn't it? Now question yourself: am I going to use all those objects by default or instantly?
I believe the answer will be most of the time: NO!
function Element() {}
// lazy boosted getter
Object.defineProperty(
  // per each instance
  Element.prototype,
  // when children property is accessed
  "children", {
  get: function () {
    // redefine it with the array
    // dropping the inherited getter
    return Object.defineProperty(
      // and setting it as own property
      this, "children", {value: []}
    ).children;
  }
});

// example
var el = new Element;
// later on, when/if necessary
el.children.push(otherElement);
We can see above benchmark results here. In real world the boost we have per each instance creation, and the lazy initialization of many object properties, will make the benchmark even more meaningful.
Moreover, what jsperf never shows is the lower amount of used RAM adopting this pattern based on getters. It is true that we have a bigger overhead in the code itself, but unless every instance will use those properties, the number of objects to handle will be reduced N times per instance creation and this is a win for Garbage Collector operations too.

Recycling The Pattern

OK, that looks a lot of overhead for such common pattern, when it comes to properties as objects, so how could we reuse that pattern? Here an example:
function defineLazyAccessor(
  proto,        // the generic prototype
  name,         // the generic property name
  getNewValue,  // the callback that returns the value
  notEnumerable // optional non-enumerability
) {
  var descriptor = Object.create(null);
  descriptor.enumerable = !notEnumerable;
  Object.defineProperty(Element.prototype, name, {
    enumerable: !notEnumerable,
    get: function () {
      descriptor.value = getNewValue();
      return Object.defineProperty(
        this, name, descriptor
      )[name];
    }
  });
}

// so that we can obtain the same via
defineLazyAccessor(
  Element.prototype,
  "children",
  // the new value per each instance
  function () {
    return [];
  }
);
The callable value is a compromise for raw performance but worth it. An extra call per each property and once should never be a problem while RAM, GC operations, and initialization per each instance, specially when many instances are created, coul dbe quite a bottleneck.
Now, back to the main constructor :)

The Element Behavior

For this post sake we would like to simulate appendChild(childNode) and firstChild as well as lastChild. Theoretically the method itself could be the best place to obtain this behavior, something like this:
Element.prototype.appendChild = function (el) {
  this.children.push(el);
  this.firstChild = this.children[0];
  // to make the code meaningful with the logic
  // implemented later on ... this is:
  this.lastChild = this.children[
    this.children.length - 1
  ];
  // instead of this.lastChild = el;
  return el;
};
Above snippet is compared with another one we'll see later on in this benchmark.

Faster But Unreliable

Yes, it is faster, but what happens if someone will use another method such replaceChild() passing, as example, a document fragment so that the number of children could change? And what if the other method changes the firstChild or the lastChild?
In few words, inline properties assignment are not an option in this case so, let's try to understand what should we do in order to obtain those properties and clean them up easily with other methods.

An Improved defineLazyAccessor()

If we want to be able to reconfigure a property or reuse the inherited getter, the function we have seen before needs some change:
var defineLazyAccessor = function() {
  var
    O = Object,
    defineProperty = O.defineProperty,
    // be sure no properties can be inherited
    // reused descriptor for prototypes
    dProto = O.create(null),
    // reused descriptor for properties
    dThis = O.create(null)
  ;
  // must be able to be removed
  dThis.configurable = true;
  return function defineLazyAccessor(
    proto, name, getNewValue, notEnumerable
  ) {
    dProto.enumerable = !notEnumerable;
    dProto.get = function () {
      dThis.enumerable = !notEnumerable;
      dThis.value = getNewValue.call(this);
      return defineProperty(this, name, dThis)[name];
    };
    defineProperty(proto, name, dProto);
  };
}();
At this point we are able to define firstChild or lastChild and remove them any time we appendChild()
// firstChild
defineLazyAccessor(
  Element.prototype,
  "firstChild",
  function () {
    return this.children[0];
  }
);

// lastChild
defineLazyAccessor(
  Element.prototype,
  "lastChild",
  function () {
    return this.children[
      this.children.length - 1
    ];
  }
);

// the method to appendChild
Element.prototype.appendChild = function(el) {
  // these properties might be different
  // if these were not defined or no children
  // were present
  delete this.firstChild;
  // and surely the last one is different
  // after we push the element
  delete this.lastChild;

  // current logic for this method
  this.children.push(el);
  return el;
};

Optimize ... But What?

It is really important to understand what we are trying to optimize here which is not the appendChild(el) method but firstChild and lastChild repeated access, assuming every single method will use somehow these properties as well as the rest of the surrounding code.
Accordingly, we want to be sure that these are dynamic but also assigned once and never again until some change is performed. This benchmark shows performance gap between always getter and current, suggested, optimization. It must be said that V8 does an excellent work optimizing repeated getters, but also we need to consider that daily code is, I believe, much more complex than what I am showing/doing here.

Avoid Boring Patterns

The repeated delete thingy is already annoying and we have only two properties. An easy utility could be this one:
function cleanUp(self) {
  for(var
    // could be created somewhere else once
    name = [
      "lastChild",
      "firstChild" // and so on
    ],
    i = name.length; i--;
    delete self[name[i]]
  );
  return self;
}
We could use above function in this way:
Element.prototype.appendChild = function(el) {
  cleanUp(this).children.push(el);
  return el;
};

Still Boring ...

We could also automate the creation of the cleanUp() function, making simpler also the definition of all these lazy accessors. So, how about this?
function defineLazyAccessors(proto, descriptors) {
  for (var
    key, curr, length,
    keys = Object.keys(descriptors),
    i = 0; i < keys.length;
  ) {
    curr = descriptors[
      key = keys[i++]
    ];
    defineLazyAccessor(
      proto,
      key,
      curr.get,
      !curr.enumerable
    );
    if (curr.preserve) keys.splice(--i, 1);
  }
  length = keys.length;
  return function cleanUp(self) {
    self || (self = this);
    for(i = 0; i < length; delete self[keys[i++]]);
    return self;
  }
}

var cleanUp = defineLazyAccessors(
  Element.prototype, {
  children: {
    preserve: true,
    enumerable: true,
    get: function () {
      return [];
    }
  },
  firstChild: {
    get: function () {
      return this.children[0];
    }
  },
  lastChild: {
    get: function() {
      return this.children[
        this.children.length - 1
      ];
    }
  }
});

Benchmark All Together

OK, it's time to test what we have optimized until now. The test would like to simulate an environment where most common operations are Element instances creation and firstChild and lastChild access:
function benchIt(Element) {
  // 200 instances
  for (var i = 0; i < 200; i++) {
    var el = new Element;
    // 5 appendChild of new Element per instance
    for (var j = 0; j < 5; j++) {
      el.appendChild(new Element);
    }
    // 100 firstChild and lastChild access
    for (j = 0; j < 100; j++) {
      result = el.firstChild && el.lastChild;
    }
  }
}
For some reason, and I believe it's a sort of false positive due benchmark nature, Chrome is able to optimize those repeated getters more than expected. It is, however, the only one faster in this bench but this is kinda irrelevant, if you understood the point of this post ... let's summarize it!

Getters Are Powerful Because

  • can be inherited, and manipulated to improve performance when and if necessary
  • can help complex objects to initialize one or more heavy property later on, and only if necessary
  • could be tracked in an easier way, simply adding a notification mechanism per each time the getter is accessed
  • APIs look cleaner and transparent for their users
Enough said? Your comments are welcome :)

5 comments:

Bryan said...

Getters can be useful in Node.js as well. I sometimes use them to lazy-load required modules. Since Node caches all requires, any call to the getter function after the first is extremely fast.

Andrea Giammarchi said...

Bryan, in that case you might consider Proxy too to lazy load every module when/if necessary ;-)

Adrien Risser said...

Andrea, every post of yours is a brainfuck!
Understanding barely most of what you describe, I can't say I see how I would use all of or just a part of it in any project of mine.
Nonetheless, keep the good posts coming, appreciating the good examples you provide. Cheers

jonz said...

Right now this syntax seems like obfuscation but the patterns it supports are what I've always wanted, I wonder if it will ever become familiar.

Andrea Giammarchi said...

jonz, here a tiny utility which aim is to simply all of this madness :-)