[Code Index] by Mike Marynowski

programming for fun and work

Value Type Box Cache

We've recently been building a .NET object database that necessarily stores a lot of data in object[] arrays. Most of the values in these arrays are default values (i.e. 0 for numbers), small integers, boolean values or enums. On a loaded instance there can be many millions of these boxes values sitting in memory and they are constantly being created and collected, which got me thinking...we could probably cache and reuse the boxes for a lot of these values, and thus the BoxCache was born!

Here is a quick rundown of the features:

  • Supports any value type
  • Supports nullable types and shares the same boxes as their non-nullable equivalents
  • 100% thread-safe and very fast during box requests due to lock-free design
  • Automatically adds the default value for every type
  • Automatically adds all defined enum values

There are three main methods you will want to use:

  • object BoxCache<T>.GetBox(T value) - Gets a box for the value if one is cached otherwise it boxes it for you.
  • object BoxCache<T>.GetOrAddBox(T value) - Gets a box for the value if one is cached otherwise it adds one into the cache and returns it.
  • void AddValues<T>(IEnumerable<T> values) - Adds boxes for the specified values into the cache.

GetOrAddBox() has the exact same performance as GetBox() if the cached value exists so we use it whenever we are working with flag enums that are likely to hold values other than the named values (but are still limited in range of values), or if we know the value we are working with will be a good candidate for reuse (i.e. if we're trying to get a box for a default field value). It would be a rather bad idea to use this method for unconstrained int or float values for example, because it would be caching a box for every single possible value it encounters, increasing memory usage each time. Furthermore, anytime a new box is cached a COPY of the cache is modified for thread safety, and you can imagine what that means if you are adding values one-by-one when the cache is large. Adding items to the cache is slow and should generally be done only during application startup. This was a conscious design decision to make fetching boxes as fast as possible by eliminating locks.

To give you an example of how we use it, here is how we set values and default values on our objects:

object[] _storage = new object[PropertyCount];

protected static Property<T> RegisterProperty<T>(T defaultValue)
{
    return new Property<T>(typeof(T).IsValueType ? BoxCache.GetOrAddBox(defaultValue) : defaultValue);
}

protected void Set<T>(Property<T> property, T value)
{
    T oldValue = (T)_storage[property.Ordinal];

    if (!EqualityComparer<T>.Default.Equals(value, oldValue)) {
        _storage[property.Ordinal] = property.IsValueType ? BoxCache.GetBox(value) : value;
        OnPropertyChanged(property);
    }
}

As you can see, the Set<T>() method lets you pass in a strongly typed value, and if it matches one of the common values that have cached boxes, it returns the shared box instead of allocating a new boxed value in memory. Because properties are all registered on application startup and the default value will be used over and over any time we create a new object, we use BoxCache.GetOrAddBox<T>() to ensure the value gets cached and the overhead of caching the box the first time is well worth the memory usage reduction for the rest of the application lifetime.

This is what the initialization method looks like , which loads up the cache with the defaults we've settled on that seems to work well for us:

internal static void IntializeDefaults()
{
    if (Interlocked.CompareExchange(ref _intialized, 1, 0) == 0)
    {
        // If you are only caching a default value then don't bother doing it here,
        // it gets automatically added in the generic BoxCache<T> initializer.

        AddValues(false, true);

        AddValues(Enumerable.Range(0, 255).Select(v => (byte)v));
        AddValues(Enumerable.Range(-128, 127).Select(v => (sbyte)v));
        AddValues(Enumerable.Range(-10, 256).Select(v => (short)v));
        AddValues(Enumerable.Range(0, 256).Select(v => (ushort)v));
        AddValues(Enumerable.Range(-10, 256).Select(v => (int)v));
        AddValues(Enumerable.Range(0, 256).Select(v => (uint)v));
        AddValues(Enumerable.Range(-10, 256).Select(v => (long)v));
        AddValues(Enumerable.Range(0, 256).Select(v => (ulong)v));

        AddValues<float>(-1, 0, 1);
        AddValues<double>(-1, 0, 1);
        AddValues<decimal>(-1, 0, 1);
    }
}

You can verify the cache is working via tests like so:

Assert.AreNotSame(BoxCache.GetBox<int>(10000), BoxCache.GetBox<int>(10000));

Assert.AreSame(BoxCache.GetBox(50), BoxCache.GetBox(50));
Assert.AreEqual(50, BoxCache.GetBox(50));

Assert.AreSame(BoxCache.GetBox<int?>(50), BoxCache.GetBox(50));
Assert.AreEqual(50, BoxCache.GetBox<int?>(50));

You should profile your application to see if it would benefit from cached boxes but if you store a lot of value types in object arrays this technique can definitely be a huge win, especially in terms of memory usage. Ours dropped by almost 40% for certain data sets, which meant we could cache more data in memory to reduce our disk I/O, our CPU cache can work much more effectively and we spend a LOT less time in the garbage collector to boot! You should adjust the range of default values to suit your particular application of course.

Here is the link to the full source code: Value Type Box Cache

Happy boxing :)

 

 Hi, I am Mike, overlord of Singulink :)

This is where I post mostly software development related stuff I think might be useful and interesting.

Posts by Date