Bucket Sort
BasisPrerequisites
Counting sort achieves linear time by mapping each integer key directly to an index. Bucket sort generalises this idea: instead of one slot per key value, you allocate a small number of buckets, each covering a range of values, then sort the small buckets independently. When values are uniformly distributed across the range, each bucket gets elements on average, and sorting all buckets together costs .
The algorithm
Given floating-point values uniformly distributed in :
- Create empty buckets, where bucket covers the interval .
- Distribute each element into bucket .
- Sort each bucket with insertion sort (buckets are small on average).
- Concatenate the buckets in order.
const std = @import("std");
fn insertionSortF64(arr: []f64) void {
var i: usize = 1;
while (i < arr.len) : (i += 1) {
const key = arr[i];
var j = i;
while (j > 0 and arr[j - 1] > key) : (j -= 1) {
arr[j] = arr[j - 1];
}
arr[j] = key;
}
}
/// Sorts arr in-place. All values must be in [0, 1).
fn bucketSort(arr: []f64, allocator: std.mem.Allocator) !void {
const k = arr.len;
if (k == 0) return;
// Allocate k buckets, each an expandable list.
const buckets = try allocator.alloc(std.ArrayList(f64), k);
defer {
for (buckets) |*b| b.deinit();
allocator.free(buckets);
}
for (buckets) |*b| b.* = std.ArrayList(f64).init(allocator);
// Distribute elements into buckets.
for (arr) |x| {
const idx: usize = @intFromFloat(x * @as(f64, @floatFromInt(k)));
try buckets[@min(idx, k - 1)].append(x);
}
// Sort each bucket and write back.
var pos: usize = 0;
for (buckets) |*b| {
insertionSortF64(b.items);
for (b.items) |x| {
arr[pos] = x;
pos += 1;
}
}
}
The @min(idx, k - 1) guard handles the edge case where x is exactly 1.0 due to floating-point rounding; it clamps the index to the last valid bucket.
Why buckets?
Setting gives each bucket an expected load of 1 element under a uniform distribution. The analysis below uses this choice, but any fixed works — smaller means fewer (larger) buckets, which is preferable when memory is tight.
Average-case analysis
Assume the input values are drawn independently and uniformly from . Let be the number of elements in bucket . The expected sorting cost is:
Each element falls in bucket with probability , so follows a distribution. For this distribution:
Summing over all buckets: total expected time is .
Worst-case behaviour
The uniform distribution assumption is crucial. If all elements fall into the same bucket, that bucket takes to sort with insertion sort, making the total . Bucket sort is only efficient when values are spread across the buckets.
Extending to other key types
The range is conventional but not mandatory. Any range works by normalising: . For integer keys, bucket sort with one element per key value degenerates to counting sort — counting sort is the special case of bucket sort where every bucket holds at most one distinct value.
Complexity summary
| Metric | Complexity |
|---|---|
| Expected time | (uniform input, ) |
| Worst-case time | |
| Extra space | |
| Stable | Yes (insertion sort within buckets preserves order) |
The extra space covers the bucket structures and the bucket contents. With , this is .
When to use bucket sort
- Input is approximately uniformly distributed over a known range.
- You can afford extra memory.
- The key type supports a fast mapping to a bucket index (floating-point multiplication, integer division).
When the distribution is skewed, bucket sort degrades. Use counting sort for integer keys with a small range, or quick sort / merge sort for general comparison-based needs.
Summary
- Bucket sort distributes elements into buckets by key range, sorts each bucket, then concatenates.
- With buckets and uniformly distributed input, expected time is .
- Worst-case time is when all elements cluster in one bucket.
- Space is ; stability depends on the per-bucket sort algorithm (insertion sort is stable).
- Counting sort is the special case of bucket sort with one slot per distinct key value.