Week 07/02/2017 - 07 /09/2017 Rewrite Memcached Cluster Accessor

Spent the week re-writing memcached accessor & configuration for our java server to support clustering.

Initiative of this project:
Multiple memcached clusters are used in my project and some of them need to be accessed by multiple services written in different languages, one of which being PHP.
The old accessor does not support clustering at all, not even to mention ketama consistent hashing (on the roadmap). While other services do utilize clustering for memcached, the java server accessor create one memcached client per host in the cluster and run a multi-write &/ multi-read. On top of that manual hashing was done to decide which cluster a particular key goes to (based on an additional parameter, for instance, user id)

Old format:
memcached.name=<cluster_name>
memcached.hosts.0=<mc00, mc01>
memcached.hosts.1=<mc02, mc03>

To fully utilize clustering support from memcahed library, a new format was proposed:

New format:
accessor.<accessor_name>.main=<cluster0_name>
accessor.<accessor_name>.drain=<cluster1_name>
accessor.<accessor_name>.multi=<cluster2_name>

clusters.0.name=<cluster0_name>
clusters.0.hosts=<mc00, mc01>
clusters.0.hash=<standard / ketama>

With the new format, clusters definition is isolated from accessor, and draining & multi-writing is done one a different level (if needed). More importantly the implementation how memcached clients are created has been modified to support clustering, instead of creating one instance per host. 
In addition I also fixed clean-up / reinitialize logic for configuration reader, so that when changes are made to the configuration file, we don't hold connections to old hosts any more.

Difficulties:
I'm using spymemcached library: https://github.com/dustin/java-memcached-client and I ran into an issue when I started unit-testing: writing multiple key-value paoirs to a cluster of 2 instances and verify the value in a different language (PHP used for convenience reason)
Test failed and I spent a day looking into the source code of memcached library to debug the inconsistent behavior. Turns out both libraries are doing the same thing for the same hash algorithm and it's me who made a mistake. When extending DefaultConnectionFactory class, the hash algorithm used inside createNodeLocator() needs to match the one returned in getHashAlg(). With that fixed everything works just as expected.

For people who's curious how mapping a key to a host in cluster is done, this is what you need to look at:
https://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/DefaultHashAlgorithm.java

By default the connection factory uses NATIVE which translates to:
hash = <your_key>.hashCode() 
And assume you use an ArrayModNodeLocator, the hash value is the value used for mod operation. 
By default PHP supports only 2 hash algorithm: CRC32 and FNV-1A. I've verified from memcached libraries the implementation is the same as the ones in spymemcached library. In case you need to share data between java and php via memcached cluster, configure both to use the same hash algorithm

Comments

Popular posts from this blog

Week 10/22/2017 - 10/29/2017 Repository migration & Deployment process improvement

Week 07/09/2017 - 07/16/2017 Profiling & Performance Improvement