[Pharo-dev] [ANN] Neo-Caching

Sven Van Caekenberghe sven at stfx.eu
Tue Dec 10 10:54:23 EST 2013


Hi,

Caching is an important technique to trade time (speed of execution) for space (memory consumption) in computation. Used correctly, caching can significantly improve an application’s performance.

Given the key/value nature of a cache, a Dictionary is often the first data structure used for implementing it. This is however not a good idea since a simple Dictionary is unbounded by definition and can/will lead to explosive cache growth.

Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) and/or time-to-live (TTL) based limited cache are often incorrect and inefficient.

The LRUCache implementation currently in Pharo is too simple and not efficient, being O(n) on its main operation.

Pharo deserves and needs a good LRU/TTL cache implementation that can/should be used uniformly in various places of the system and in code built on top of it. That is why I implemented a proposal.

Neo-Caching is an implementation of both a good LRU cache as well as a TTL extension of it. The caches are easy to use, yet have a number of interesting features. The implementation is properly O(1) on its main operations and is 2 to 3 times faster than LRUCache. The package also contains a double linked list implementation. The code comes with a full set of unit tests. There are class and method comments and the code is easy to read.

The code (which has no dependencies) can be found here

  http://mc.stfx.eu/Neo
  http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main

There is only one package to load. The code has been written in Pharo 3.0 but it should run on older versions as well.


Code


Let’s create a 16K ordered collection of keys that have some repetitions in them with a reasonable distribution:

data := Array streamContents: [ :out |
  1 to: 4096 do: [ :each |
    each primeFactorsOn: out.
    out nextPut: each ] ].
data := data collect: [ :each | each asWords ].

By using #asWords the keys have better hash properties. Now, we put all this in a cache:

cache := NeoLRUCache new.
cache maximumWeight: 512.
cache factory: [ :key | key ].
data do: [ :each | cache at: each ].
cache.

==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%)

We had a 72% hit ratio, which is good. The cache is of course full, 512/512 with 512 entries (not necessarily the same thing, see further).

We can now benchmark this:

[ data do: [ :each | cache at: each ] ] bench.

==> '35.8 per second.’

And compare it to LRUCache:

cache := LRUCache size: 512 factory: [ :key | key ].

[ data do: [ :each | cache at: each ] ] bench.

==> '12.6 per second.’


Features


Instead of just counting the number of entries, which is the default behaviour, the concept of weight is used to determine when a cache is full. For each cached value, a block or selector is used to compute its weight. When adding a new entry causes the weight to exceed a maximum, eviction of the least recently used item(s) takes place, until the weight is again below the maximum.

cache 
  computeWeight: #sizeInMemory;
  maximumWeight: 16*1024.

Will keep the cache below 16Kb in real memory usage. This can be very useful when caching various sized images or MC packages, for example. 

By default, no concurrent access protection takes place, but optionally a semaphore for mutual exclusion can be used. This slows down access.

cache useSemaphore.

NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached value. Upon cache hit, there is a check to see if this timestamp is not older than the allowed time to live. If so, the value is stale and will be recomputed.

(cache := NeoTTLCache new)
  timeToLive: 10 minutes;
  factory: [ :key | ZnEasy get: key ];
  maximumWeight: 32*1024;
  computeWeight: #contentLength.

cache at: ‘http://zn.stfx.eu/zn/numbers.txt'.

==> a ZnResponse(200 OK text/plain;charset=utf-8 71B)

cache

==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% 0:00:10:00)

Would be a simple HTTP cache, keeping resolved URLs in memory for 10 minutes maximum before refreshing them. It uses #contentLength on ZnResponse to compute the weight. You can see from the print string that there is now 1 entry, while the weight is 17 bytes out of 32768.


These are the main points, please have a look at the code. Feedback is welcome.

Regards,

Sven




More information about the Pharo-dev mailing list