Most programming languages provide sorting APIs in their built-in libraries. Java and Go use different sorting algorithms, I implemented Java’s algorithms in Go with different approaches as an exercise. It’s a fun way to understand some design choices in both languages.
TLDR
- Go uses a mixed algorithm with quicksort, heap sort, and shell sort for unstable sorts.
- Go uses insertion sort and stable minimum storage merging for stable sorts.
- Java uses dual-pivot sort for primitives, which is an unstable sort.
- Java uses timsort, a stable sorting algorithm for objects.
- I created a Go package
jsort
by porting the algorithms from Java. It can be imported from github.com/openaphid/jsort- Public APIs for sorting slices are compatible with Go’s built-in
sort
package, such asjsort.Sort
,jsort.Stable
,jsort.Slice
,jsort.SliceStable
, etc. They are all stable sorts using timsort, which could be 2-5x times faster thansort.Stable
in the cost of extra space. - Specialized functions for primitive slices are provided using dual-pivot sort, such as
jsort.Ints
,jsort.Int64s
, etc. They could be up to 3x times faster thansort.Sort
and don’t use extra space as well.
- Public APIs for sorting slices are compatible with Go’s built-in
Go’s Sorting Algorithms
Sorting APIs need to support arbitrary slice types whose elements can be compared. It’s usually straightforward to achieve it if a language supports generics. Go doesn’t have generics for now and it’s expected to be included in v1.18 betas by the end of 2021. There are alternatives ways for generics in Go: code generation, type assertions, reflection, and interface abstraction.
Go’s built-in sort
package mainly uses the interface abstraction approach combined with other techniques. The sorting algorithms depend on an interface sort.Interface
for three basic operations: Len, Less, and Swap.
1
2
3
4
5
6
7
package sort
type Interface interface {
Len() int
Less(i, j int) bool
Swap(i, j int)
}
sort.Slice
function is present if we don’t want to implement the interface sometimes. It leverages reflection to derive Len and Swap operations.
1
2
3
4
5
6
7
8
9
10
11
func Slice(slice interface{}, less func(i, j int) bool) {
rv := reflectValueOf(slice)
swap := reflectSwapper(slice)
length := rv.Len()
quickSort_func(lessSwap{less, swap}, 0, length, maxDepth(length))
}
type lessSwap struct {
Less func(i, j int) bool
Swap func(i, j int)
}
quickSort_func
function can be found in zfuncversion.go that is generated from the Interface
version of quicksort
function by genzfunc.go.
1
2
3
4
5
6
7
// sort.go
//go:generate go run genzfunc.go
func quickSort(data Interface, a, b, maxDepth int) {}
// zfuncversion.go, generated by genzfunc.go from sort.go
func quickSort_func(data lessSwap, a, b, maxDepth int) {}
//
“sort.go” implements two hybrid algorithms for unstable and stable sorts. The unstable algorithm in sort.Sort
has the following characteristics:
- It’s a recursive in-place algorithm with logarithmic additional stack space, no explicit
make
calls for temporary space. - The implementation loosely follows
qsort
in glibc. - The recursion depth is bound by
2*ceil(lg(n+1))
. It switches to heap sort once reaching the limit. - If a subproblem has fewer than 12 elements, it runs one shell sort pass with gap 6, then applies insertion sort.
The stable algorithm in sort.Stable
uses another approach:
- It’s also a recursive in-place algorithm with logarithmic additional stack space.
- It divides the input data into blocks with size 20, then runs insertion sort on all blocks.
- Sorted blocks are merged by
symMerge
function, which implements the stable minimum storage merging algorithm.
JDK’s Sorting Algorithms
In old JDK versions, Java uses a tuned quicksort for primitive arrays. Sorting Object[]
uses a modified merge sort. Starting from JDK 1.7, the quicksort was upgraded to dual-pivot sort and the merge sort was replaced by timsort.
Dual-pivot sort was first introduced to be faster in practice than the single-pivot variant by Yaroslavskiy-Bloch-Bentley in 2009. It doesn’t reduce the number of compare or swap operations. It’s faster because of fewer CPU cache misses.
Timsort is an adaptive stable merge sort that takes advantage of existing orders in its input. It was invented by Time Peters in 2002 for Python. Besides Java, it’s also adopted by V8, Swift, and Rust. It’s proved to be faster in practice with a requirement of O(n/2) extra space. It’s also a relatively complex sorting algorithm, several implementations were found to be broken.
jsort
is built upon the code of DualPivotQuickSort.java and TimSort.java in OpenJDK 11.
Porting from Java to Go
Besides porting the algorithms in Go’s idiomatic ways, I also played with some experimental solutions for evaluation only, such as using type assertions and Go2 generics.
Primitive Types
DualPivotQuicksort.java replicates the algorithm code for each primitive type as Java doesn’t support specializations. It’s easier to accomplish it by using go generate
in Go. A template file sort_primitive.go implements the algorithm by utilizing some handy Go features:
1
2
3
4
5
6
7
8
// +build ignore
//go:generate go run sort_primitive_gen.go
type primitive = int64 // Type placeholder to be replaced
func Sort(a []primitive) {} // dual-pivot sort
func IsSorted(a []primitive) bool {}
The // +build ignore
flag excludes the template from the normal go build
, and the alias type primitive
creates a placeholder that’s easy to be replaced by sort_primitive_gen.go. The generator creates an internal package with specialized code and tests per primitive type, such as internal/sort_int, internal/sort_int64, etc. Then sort_export.go exports the internal symbols, such as jsort.Ints
and jsort.Int64s
functions. One beautiful feature of go generate
is that generated code can be formatted before writing to files.
Generic Slice Types
#1 Attempt: Type Assertions
To support sorting generic slice types, I first experimented with type assertions in packages sort_slice_dps_ts and sort_slice_tim_ts. The sort functions expect the data parameter to be a []interface{}
and a compare function. It looks like using Object[]
and Comparator<Object>
in Java.
1
2
3
type CompareFunc = func(i, j interface{}) int
func SortInterfaces(a []interface{}, compare CompareFunc) {}
In Java, Object[]
and SomeClass[]
can be cast directly to each other without overhead costs. It’s not the case for Go as []interface{}
and []SomeStruct
are two different types. We need a temporary space to help with the conversion by reflection:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func Sort(a interface{}, compare CompareFunc) {
rt := reflect.TypeOf(a)
rv := reflect.ValueOf(a)
elm := rt.Elem()
n := rv.Len()
interfaces := make([]interface{}, n)
for i := 0; i < n; i++ {
interfaces[i] = rv.Index(i).Interface()
}
SortInterfaces(interfaces, compare)
for i := 0; i < n; i++ {
rv.Index(i).Set(reflect.ValueOf(interfaces[i]).Convert(elm))
}
}
Besides the additional space of []interface{}
, it requires extra cast operations from interface{}
to the concrete type in the compare function, which is not cost-free. The two packages are not exported as using type assertions is not ideal both in speed and memory.
#2 Approach: Interface Abstraction
My first thought was that timsort can’t be built upon the built-in sort.Interface
as the algorithm depends on more operations, such as indexed read and write, typed array creation, etc.
After reading the source code of Go’s sort
package more thoroughly, I realized that the key to interface abstraction is to not leak data type into sort algorithms. I came up with a better idea in the package sort_slice_tim_interface for timsort to use sort.Interface
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// sort_slice_tim_interface.go
func Sort(data sort.Interface) {
// 1. Create a temporary `[]int` to hold the indices.
n := data.Len()
indices := make([]index, n)
for i, _ := range indices {
indices[i] = i
}
// 2. Run timsort on indices with data.Less function.
sortInternal(indices, 0, n, data, nil, 0, 0)
// 3. Rearrange data with data.Swap according to the sorted indices.
for i, j := range indices {
if i == j {
continue
}
for indices[j] < 0 {
j = -indices[j]
}
if i != j {
data.Swap(i, j)
indices[i] = -j
}
}
}
Now the function signature of jsort.Sort
is the same as Go’s sort.Sort
. We can extend it to support sort.Slice
with a helper struct lessSwap
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
func Slice(data interface{}, less func(i, j int) bool) {
rv := reflect.ValueOf(data)
swap := reflect.Swapper(data)
Sort(lessSwap{
length: rv.Len(),
less: less,
swap: swap,
})
}
type lessSwap struct {
length int
less func(i, j int) bool
swap func(i, j int)
}
func (l lessSwap) Len() int {
return l.length
}
func (l lessSwap) Less(i, j int) bool {
return l.less(i, j)
}
func (l lessSwap) Swap(i, j int) {
l.swap(i, j)
}
var _ sort.Interface = (*lessSwap)(nil)
Interface abstraction is usually the idiomatic techniques to solve tasks that need generics in Go. I’m happy that jsort
can export timsort using the same APIs as Go’s sort
package. One thing to note is that jsort.Sort
and jsort.Stable
are the same functions of timsort while sort.Sort
and sort.Stable
are different, the rule applies to the functions of Slice
and SliceStable
too.
#3 Experiment: Go2 Generics
Go team released an experimentation tool go2go in June 2020. I built the tool locally by checking out a dev branch of Go, and wrote the two algorithms with generics syntax. It’s impressive that I can build a full binary of a language in less than 5 minutes.
The go2go tool translates a “.go2” code file to a type specialized version in “.go” files. It is a heterogeneous translation, where different specializations have no typing relationship with each other. The translated code can be built and ran by Go 1.x.
For example, we can declare the sort functions for dual-pivot sort and timsort with generics:
1
2
3
4
5
6
7
8
9
10
11
12
// sort_slice_dps_go2.go2
type Primitive interface {
type int, int8, int16, int32, int64,
uint, uint8, uint16, uint32, uint64, uintptr,
float32, float64,
string
}
func DualPivotSort[T Primitive](a []T) {}
// sort_slice_tim_go2.go2
func Timsort[T any](a []T, compare func(i, j T) int) {}
As I wrote tests by calling the functions with []int
or []Person
, go2go generates specialized functions:
1
2
3
4
5
6
7
// sort_slice_go2_test.go
// specialized dual-pivot sort for int
func instantiate୦୦DualPivotSort୦int(a []int,)
// timsort for Person
func instantiate୦୦Timsort୦sort_slice_go2୮aPerson(a []Person, compare func(i, j Person,) int){}
Oriya digit zero ୦ (U+0B66) and Oriya digit eight ୮ (U+0B6E) are used as separators by go2go. The two characters are valid to use in Go’s identifiers, go2go assumes no one uses these in their own code.
You can find the “.go2” code and the translated “.go” code in the package sort_slice_go2. For this specific task, writing in Go’s new generics syntax feels the same way as using Java’s generics.
As go2go turns off GO111MODULE
, it’s not easy to make the go2 files share code with other packages. I can only play with the code inside the package.
Mistakes
Most parts of porting were straightforward, I did make some mistakes that are worth noting:
++
and--
operators. The Java code usesa++
and++a
in many places. As increment and decrement operators are not expressions in Go, it needs to be represented by two statements in Go. For example, the following Java code embeds an increment operator in an if statement:
1
2
if (c.compare(a[runHi++], a[lo]) < 0) {
}
As the semantic is to increase runHi
and use the old value in the if condition, the equivalent Go code is:
1
2
3
runHi++
if c(a[runHi-1], a[lo]) < 0 {
}
- loops. Go has only one keyword for loops while Java has several alternatives. It gets trickier when it comes with increment and decrement operators. The following code is the insertion sort path in Java’s dual-pivot sort:
1
2
3
4
5
6
7
8
9
10
for (int i = left, j = i; i < right; j = ++i) {
double ai = a[i + 1];
while (ai < a[j]) {
a[j + 1] = a[j];
if (j-- == left) {
break;
}
}
a[j + 1] = ai;
}
The afterthought part has an increment operation and an assignment on two variables, it’s translated to Go as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for i, j := left, left; i < right; {
var ai = a[i+1]
for ai < a[j] {
a[j+1] = a[j]
j--
if j+1 == left { // the condition needs the value before decrement
break
}
}
a[j+1] = ai
// afterthought part
i++
j = i
}
- switch. Switch statements have implicit fall through behavior in Java, Go requires it to be explicitly declared. The following piece of timsort is an optimization for array copy when n is small.
1
2
3
4
5
6
switch (n) {
case 2: a[left + 2] = a[left + 1];
case 1: a[left + 1] = a[left];
break;
default: System.arraycopy(a, left, a, left + 1, n);
}
I was fooled by it and wrote the wrong code in Go. The correct version is:
1
2
3
4
5
6
7
8
9
10
11
switch {
case n <= 2:
if n == 2 {
a[left+2] = a[left+1]
}
if n != 0 {
a[left+1] = a[left]
}
default:
copy(a[left+1:], a[left:left+n])
}
>>>
operator. There is no equivalent in Go for the unsigned right shift operation. There are two scenarios in Java code with it. One is to get the middle point without integer overflow, which is a famous bug existing for years in history:
1
int mid = (left + right) >>> 1
In Go, it should be converted to unsigned int first and then converted back:
1
int(uint(left+right) >> 1)
Another place in Java is to compute the next smallest power of 2:
1
2
// Compute smallest power of 2 > minCapacity
int newSize = -1 >>> Integer.numberOfLeadingZeros(minCapacity);
I don’t know how to do it in one line Go, and use the idea from bit twiddling hacks:
1
2
3
4
5
6
7
newSize := minCapacity
newSize |= newSize >> 1
newSize |= newSize >> 2
newSize |= newSize >> 4
newSize |= newSize >> 8
newSize |= newSize >> 16
newSize++
Benchmark
I must say that writing benchmarks in Go is more natural as it’s built into Go’s toolchain, while Java needs to setup jmh manually.
sort_benchmark_test.go setups benchmarks on sorting int slice and struct slice. Two types of datasets are using in each benchmark scenario. One is random data, which means the elements are mostly unsorted with few duplicated data. The other uses XOR operation to compute data from index values, which makes the data partially sorted. The size of a dataset ranges from 256 to 65536.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// random data
func prepareRandomInts(src []int) {
rand.Seed(time.Now().Unix())
for i := range src {
src[i] = rand.Int()
}
}
// xor data, eg. [716 717 718 719 712 713 714 715 708 709 710 711...]
func prepareXorInts(src []int) {
for i := range src {
src[i] = i ^ 0x2cc
}
}
#1 Benchmark: Number of Compares
Let’s first check out the number of compares for each sorting algorithm by sort_op_test.go. You can run make operation_stats
to see the numbers by yourself. We can find out that:
- Timsort always uses fewer compares than others. Its advantage is more significant in xor dataset benchmarks.
- Dual-Pivot sort needs more compares than others.
sort.Sort
is not adaptive to data patterns. It uses almost the same amount of comparisons on random and xor datasets.sort.Stable
uses fewer compares thansort.Sort
on xor dataset and a bit more on random dataset.
Theoretically speaking, timsort seems faster than others. But performance in the real world is affected by many other metrics. Let’s run performance benchmarks to look into it.
#2 Benchmark: Sort Ints
The first set of benchmarks are for int slices. I tried to test all applicable APIs:
- Dps-SpecializedInts uses
jsort.Ints
which is the specialized dual-pivot sort for int. - BuiltinSort-SpecializedInts uses a specialized version of Go’s built-in unstable sorting algorithm, which I created manually in sort_builtin_specialized_test.go.
- BuiltinSort-Slice uses the built-in
sort.Slice
function. - BuiltinSort-Sort uses
sort.Sort
function. - TimSort-Interface calls
jsort.Sort
which is timsort. - TimSort-Slice calls
jsort.Slice
function. - BuiltinSort-SliceStable uses
sort.SliceStable
function. It’s present for evaluation only as it doesn’t make sense to use stable sort for primitives. - BuiltinSort-Stable uses
sort.Stable
function. - Dps-TypeAssertion and TimSort-TypeAssertion use the type assertions variants of dual-pivot sort and timsort. They are expected to be less efficient.
“Dps-SpecializedInts” and “BuiltinSort-Sort” are the recommended APIs to sort int slice in practice. I highlighted them in the chart.
The numbers on sorting random ints could give us some findings:
jsort.Ints
can be 3x times faster than Go’ssort.Ints
function. Both use zero extra space as they sort in-place. It explains well why Java adopted dual-pivot sort.- The performance hit of interface abstraction can not be ignored. “BuiltinSort-SpecializedInts” is much faster by shedding the overhead, although it’s 10%~20% slower than
jsort.Ints
. - Timsort is not faster on random data when the comparison cost is low.
- Using type assertions is the slowest as expected, and it also consumes the most extra space due to the type conversions.
sort.Slice
is faster thansort.Sort
on int slices while the result is reversed onjsort.Slice
andjsort.Sort
. I don’t know the reason yet.sort.Stable
is the slowest.
Tests on xor data tell us more stories:
jsort.Ints
can be 5x times faster thansort.Ints
.- Both dual-pivot sort and timsort perform much better. Timsort can be faster than the built-in sorting algorithms.
#3 Benchmark: Sort Structs
Another set of benchmarks are sorting slice of a struct that has one int field and one string field:
1
2
3
4
type Person struct {
Age int
Name string
}
The benchmarks test all APIs that can sort []Person
by the Name
field, its value is a string from random int or XOR computed int.
- Unstable-BuiltinSort-Sort uses Go’s
sort.Sort
, which is an unstable sort. - Stable-TimSort-Interface uses
jsort.Sort
, which is a stable sort. - Unstable-BuiltinSort-Slice uses
sort.Slice
, which is also an unstable sort. - Stable-TimSort-Slice calls
jsort.Slice
. - Stable-BuiltinSort-Stable calls
sort.Stable
. - Unstable-Dps-TypeAssertion and Stable-TimSort-TypeAssertion test the type assertions version of the two algorithms.
- Stable-BuiltinSort-SliceStable calls
sort.SliceStable
.
Stable sorts are preferred for structs in practice. They are highlighted in the chart.
From the random names dataset benchmarks, we can collect some insights:
jsort.Sort
is the fastest among all stable sort APIs. It’s 15% slower thansort.Sort
which is an unstable sort, and it’s 50%~100% faster thansort.Stable
.- Timsort needs extra space.
sort.SliceStable
can be 100% slower thansort.Stable
, while the gap is 20% forjsort.Slice
andjsort.Sort
.
On the xor names dataset, jsort.Sort
is the clearing winner on speed. It’s even faster than sort.Sort
.
#4 Benchmark: Go2 Generics
How does the experimental Go2 generics perform on the same tasks? As GO111MODULE is turned off by go2go, I can’t import jsort
package in a go2 file. The benchmark only compares the go2 version of dual-pivot sort and timsort to the built-in sort functions.
- The translated code of “Dps-Go2” is basically the same as “Dps-SpecializedInts”, it’s no surprise that it’s the fastest for int slice.
- The speed of “Timsort-Go2” is on par with “Stable-TimSort-Interface”, but it uses more extra space as it internally allocates
[]Person
instead of[]int
. If the size of the structPerson
increases, more space would be needed.
Conclusion
Go’s sort functions can be improved for better performance by using other algorithms. If sorting speed is critical to your application, please give jsort package a try.