Monday, July 13, 2015

A light and fast type for serializing to a byte slice in Go

Sometimes we need to serialize some data structure and form a byte slice ([]byte). The builtin library provides a type in bytes package named Buffer. While in GolangPlus, a type named ByteSlice can be an better alternative in github.com/golangplus/bytes package.

Actually, ByteSlice is nothing but a renamed []byte, i.e. simply

  type ByteSlice []byte

while a Buffer contains much more fields:

  type Buffer struct {
    buf []byte
    off int
    runeBytes [utf8.UTFMax]byte
    bootstrap [64]byte
    lastRead  readOp
  }

which means much more memory usage. In some situation, this could be a problem.

Preparing a ByteSlice is much lighter than preparing a Buffer. Here is the benchmark for serializing a 10-byte data:

  BenchmarkByteSliceWrite10       20000000 101 ns/op
  BenchmarkBytesBufferWrite10_New  3000000 460 ns/op
  BenchmarkBytesBufferWrite10_Def  3000000 474 ns/op

BenchmarkBytesBufferWrite10_New initializes the buffer with bytes.NewBuffer and a 10-byte-long byte slice and BenchmarkBytesBufferWrite10_Def just defines a zero value bytes.Buffer variable. The more than 4 times advantage of ByteSlice over Buffer is caused by the difference of intializing the object.

Writing to a *ByteSlice is appending to the slice. For example, (*ByteSlice).WriteByte is implemented as follow:

  func (s *ByteSlice) WriteByte(c byte) error {
    *s = append(*s, c)
    return nil
  }

Comparing it with the implementation of Buffer.WriteByte:

  func (b *Buffer) WriteByte(c byte) error {
    b.lastRead = opInvalid
    m := b.grow(1)
    b.buf[m] = c
    return nil
  }

Which is much more complicated. Here is the benchmark showing the efficiency difference:

  BenchmarkByteSliceWrite1k   200000  9971 ns/op
  BenchmarkBytesBufferWrite1k 100000 11933 ns/op

At the same time, Buffer doesn't create the overhead for nothing. It supports UnreadByte and UnreadRune which are not supported by ByteSlice (they do need extra memory to support). But if one doesn't need them, which is most of the case for me, ByteSlice is obvious a better choise.

Wednesday, June 17, 2015

An alternative design for "container/heap" in Go.

Here is an alternative design of the heap package in Go:


The main difference is that elements need not be converted to interface{}'s for pushing and popping. This reduces the overhead of convertion from value to interface and vice verser. The trick is to use the last element as the in/out place so the interface doesn't need to touch the element. PushPop and Remove are replaced with PushLastPopToLast and RemoveToLast, respectively.

An example of a heap with integers is like this:

type IntHeap []int
func (h *IntHeap) Pop() int {
  heap.PopToLast((*sort.IntSlice)(h))
  res := (*h)[len(*h) - 1]
  *h = (*h)[:len(*h) - 1]

  return res
}

func (h *IntHeap) Push(x int) {
  *h = append(*h, x)
  heap.PushLast((*sort.IntSlice)(h))
}

Pushing and poping the elements are done by calling the corresponding methods in the type IntHeap other than calling heap.Push(h, el). The code looks much clearer for me.

In the case where the element is a struct, the benchmark shows about 10~15% of performance increase by removing the convertion:


BenchmarkDataHeap_Plus    100    10541648 ns/op
BenchmarkDataHeap_Pkg     100    11921974 ns/op

where each element is of the type:

type Data struct {
Value    string
Priority int
}

The closure (func) version of functions are also defined as PushLastFPopToLast and RemoveToLast, respectively. Using them can make the calling faster than using the interface.

The benchmark shows a 20~30% peformance improvement by using closure version vs the builtin heap:

BenchmarkDataHeap_F       200    8875933 ns/op

Heaps of int/string/float64 are predefined with support of customized less function. Here is the benchmark results (similar performance increase):

BenchmarkBuiltinIntHeap   300    5680077 ns/op
BenchmarkPlusIntHeap      300    4049263 ns/op
BenchmarkPlusIntHeap_Less 300    4933297 ns/op

Tuesday, February 24, 2015

A lock-free Java object pool

It's a little bit surprising for me that I could not find an article or library of a totally lock-free Java object pool by googling. The closet one is Lock Less Java Object Pool. Here comes my version of a totally lock-free Java object pool.

The code is shared as  a public project at Github. Fell free to share or fork it.

The idea is simple. All returned objects (which need to be cached) are stored in a stack, which is held in an AtomicReferenceArray instance and an AtomicInteger-typed variable stores the top position of the stack. allocation or freeing of an object contains two steps:
  1. Firstly, a change of the top variable needs to be reserved,
  2. Then we try get/put the element in the reserved position,
  3. Since some other thread may also reserve the same place because the stack could expand and shrink, we may have to start over if this happens, i.e. Step 2 failed.
Step 1 utilizes the method of compareAndSet of AtomicInteger in a loop and step 2 uses getAndSet/compareAndSet of AtomicReferenceArray. Both are in a lock-free style.

This framework is also known as an optimisitic locking. It is useful for cases when contentions are relatively rare events.

Here are some code fragments:

    public E alloc() {
        while (true) {
            // Try reserve a cached object in objects
            int n;
            do {
                n = top.get();
                if (n == 0) {
                    // No cached oobjects, allocate a new one
                    return factory.alloc();
                }
            } while (!top.compareAndSet(n, n - 1));
            // Try fetch the cached object
            E e = objects.getAndSet(n, null);
            if (e != null) {
                return e;
            }
            // It is possible that the reserved object was extracted before
            // the current thread tried to get it. Let's start over again.
        }
    }
    
    public void free(E e) {
        while (true) {
            // Try reserve a place in this.objects for e.
            int n;
            do {
                n = top.get();
                if (n == objects.length()) {
                    // the pool is full, e is not cached.
                    factory.free(e);
                }
            } while (!top.compareAndSet(n, n + 1));
            // Try put e at the reserved place.
            if (objects.compareAndSet(n + 1, null, e)) {
                return;
            }
            // It is possible that the reserved place was occupied before
            // the current thread tried to put e in it. Let's start over again.
        }
    }

Any comments are welcome!