Skip to main content

Sparse Vector API

When most elements in your vector are zero, storing all of them wastes memory. SparseVector stores only the non-zero elements with their positions. For a 10,000-element vector with 90% zeros, this saves 10x memory and makes operations 10x faster.

Trinity uses ternary vectors ({-1, 0, +1}). Many operations -- masking, gating, thresholding -- produce vectors dominated by zeros. SparseVector exploits this by keeping two sorted arrays: indices (where non-zero elements live) and values (what those elements are). All lookups use binary search. All VSA operations use merge-join algorithms that skip zeros entirely.

Source: src/sparse.zig

When to Use Sparse vs Dense

Rule of thumb

Use SparseVector when more than half your elements are zero. Below 50% zeros, the index overhead negates the memory savings.

DensityMemory (10K dim)bind speedRecommendation
5% non-zeroSparse: ~2.5KB vs Dense: ~10KBSparse ~20x fasterUse Sparse
33% non-zeroSparse: ~16KB vs Dense: ~10KBSimilarUse Dense
66% non-zeroSparse: ~33KB vs Dense: ~10KBDense ~2x fasterUse Dense
warning

JIT operations require dense vectors. Convert with toDense() before passing to JitVSAEngine. See the JIT API for details.

Density After Operations

Different VSA operations change the density of your vectors. Plan accordingly:

OperationOutput densityExplanation
binddensity_a x density_bOnly positions where both inputs are non-zero survive. Output is always sparser.
bundledensity_a + density_b - overlapPositions from either input survive. Output is usually denser.
permuteSame as inputElements just move to new positions. No zeros are created or removed.

For example, binding two 10%-dense vectors produces roughly 1%-dense output (0.1 x 0.1 = 0.01). Bundling two 10%-dense vectors produces roughly 19%-dense output (0.1 + 0.1 - 0.01 = 0.19).

SparseVector

Construction

init(allocator: Allocator, dimension: u32) SparseVector

Creates an empty sparse vector with the given total dimension. No memory is allocated until elements are added.

const sparse = @import("sparse");

var vec = sparse.SparseVector.init(allocator, 10000);
defer vec.deinit();

deinit(self: *SparseVector) void

Frees the internal index and value arrays.

random(allocator: Allocator, dimension: u32, density: f64, seed: u64) !SparseVector

Creates a random sparse vector with the specified density (fraction of non-zero elements). Non-zero values are uniformly chosen as +1 or -1.

// 10% density: ~1000 non-zero elements in a 10000-dim vector
var vec = try sparse.SparseVector.random(allocator, 10000, 0.10, 42);
defer vec.deinit();
// vec.nnz() = ~1000

fromDense(allocator: Allocator, dense: *HybridBigInt) !SparseVector

Converts a dense HybridBigInt to sparse representation. Iterates over all elements and stores only non-zero trits.

var dense_vec = vsa.randomVector(1000, 42);
var sparse_vec = try sparse.SparseVector.fromDense(allocator, &dense_vec);
defer sparse_vec.deinit();
// sparse_vec.nnz() = ~667 (random ternary vectors are ~66% non-zero)

toDense(self: *const SparseVector) HybridBigInt

Converts back to a dense HybridBigInt. Initializes all positions to zero, then sets non-zero values from the sparse representation. Returns a stack-allocated HybridBigInt.

const dense = sparse_vec.toDense();
// dense is a full HybridBigInt with all dimensions filled in

clone(self: *const SparseVector) !SparseVector

Creates an independent copy of the sparse vector.

var copy = try vec.clone();
defer copy.deinit();

Element Access

set(self: *SparseVector, index: u32, value: Trit) !void

Sets the value at the given index. Uses binary search to find the insertion point.

  • If the value is non-zero and the index is new, inserts in sorted order.
  • If the value is non-zero and the index exists, updates in place.
  • If the value is zero and the index exists, removes the element (maintains sparsity).
  • If the index is out of bounds (at or above dimension), the operation is silently ignored.
try vec.set(42, 1);    // Set position 42 to +1
try vec.set(100, -1); // Set position 100 to -1
try vec.set(42, 0); // Remove position 42 (now zero)

get(self: *const SparseVector, index: u32) Trit

Returns the trit value at the given index. Uses binary search. Returns 0 for positions not in the sparse representation or for out-of-bounds indices.

const val = vec.get(42);  // Returns -1, 0, or +1

Properties

nnz(self: *const SparseVector) usize

Returns the number of non-zero elements.

const count = vec.nnz();
// count = 1000 (for a 10%-dense, 10000-dim vector)

sparsity(self: *const SparseVector) f64

Returns the sparsity ratio: 1.0 - nnz/dimension. A value of 0.0 means all elements are non-zero; 1.0 means all elements are zero.

const s = vec.sparsity();
// s = 0.90 (90% of elements are zero)

memoryBytes(self: *const SparseVector) usize

Returns current memory usage in bytes, including struct overhead and the index/value arrays.

memorySavings(self: *const SparseVector) f64

Returns the memory savings ratio compared to a dense representation: 1.0 - sparse_bytes/dense_bytes. A value of 0.8 means the sparse representation uses 80% less memory.

const savings = vec.memorySavings();
// savings = 0.80 (80% less memory than dense)

VSA Operations

All VSA operations allocate and return a new SparseVector. The caller owns the result and must call deinit when done.

bind(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector

Element-wise ternary multiplication using a merge-join on sorted indices. The result contains elements only at positions where both inputs have non-zero values. The result is always at least as sparse as the sparser input.

var bound = try sparse.SparseVector.bind(allocator, &vec_a, &vec_b);
defer bound.deinit();
// bound.nnz() = ~100 (for two 10%-dense inputs: 0.1 * 0.1 * 10000)

unbind(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector

Identical to bind for balanced ternary. Multiplying by the inverse is the same as multiplying by the value itself when values come from {-1, 0, +1}.

bundle(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector

Element-wise sum with ternary threshold. Unlike bind, the result may be denser than the inputs because positions where only one input is non-zero survive. At positions where both inputs are non-zero, the sum is thresholded: positive becomes +1, negative becomes -1, zero is omitted.

var bundled = try sparse.SparseVector.bundle(allocator, &vec_a, &vec_b);
defer bundled.deinit();
// bundled.nnz() = ~1900 (denser than either input)

permute(allocator: Allocator, v: *const SparseVector, k: u32) !SparseVector

Cyclic shift of all indices by k positions: new_index = (old_index + k) % dimension. Values are unchanged. The result is re-sorted by index after shifting.

var shifted = try sparse.SparseVector.permute(allocator, &vec, 5);
defer shifted.deinit();
// shifted.nnz() == vec.nnz() (same density, different positions)

Similarity

Similarity functions are static methods that do not allocate memory.

dot(a: *const SparseVector, b: *const SparseVector) i64

Sparse dot product using merge-join. Only positions present in both vectors contribute to the sum. Runs in O(nnz_a + nnz_b) time.

const d = sparse.SparseVector.dot(&vec_a, &vec_b);
// d = 12 (example: sum of element-wise products at shared positions)

cosineSimilarity(a: *const SparseVector, b: *const SparseVector) f64

Cosine similarity: dot(a,b) / (||a|| * ||b||). For ternary vectors, ||v||^2 = nnz(v) since all non-zero values are +/-1. Returns 0.0 if either vector is the zero vector.

const sim = sparse.SparseVector.cosineSimilarity(&vec_a, &vec_b);
// sim = 0.012 (example: near-zero for random sparse vectors)

hammingDistance(a: *const SparseVector, b: *const SparseVector) usize

Counts positions where a[i] != b[i]. This includes:

  • Positions where both are non-zero but differ in value
  • Positions where one is non-zero and the other is zero (implicitly)

Uses merge-join to efficiently compare only the non-zero regions.

const dist = sparse.SparseVector.hammingDistance(&vec_a, &vec_b);
// dist = 1800 (example: most non-zero positions differ for random vectors)

Memory Complexity

RepresentationMemoryBind CostDot Product Cost
Dense (HybridBigInt)O(dimension)O(dimension)O(dimension)
Sparse (SparseVector)O(nnz)O(nnz_a + nnz_b)O(nnz_a + nnz_b)

For a 10,000-dimensional vector with 500 non-zero elements (95% sparse), the sparse representation uses approximately 20x less memory.

Complete Example

const std = @import("std");
const sparse = @import("sparse");
const vsa = @import("vsa");

pub fn main() !void {
const allocator = std.heap.page_allocator;

// Create sparse vectors with 10% density
var vec_a = try sparse.SparseVector.random(allocator, 10000, 0.10, 111);
defer vec_a.deinit();

var vec_b = try sparse.SparseVector.random(allocator, 10000, 0.10, 222);
defer vec_b.deinit();

// Check properties
std.debug.print("vec_a: nnz={d}, sparsity={d:.2}%, memory={d} bytes\n", .{
vec_a.nnz(),
vec_a.sparsity() * 100.0,
vec_a.memoryBytes(),
});
// vec_a: nnz=1000, sparsity=90.00%, memory=5024 bytes

std.debug.print("Memory savings vs dense: {d:.1}%\n", .{
vec_a.memorySavings() * 100.0,
});
// Memory savings vs dense: 87.5%

// Sparse bind (result is sparser than inputs)
var bound = try sparse.SparseVector.bind(allocator, &vec_a, &vec_b);
defer bound.deinit();
std.debug.print("bind result: nnz={d} (expected ~{d})\n", .{
bound.nnz(),
@as(usize, @intFromFloat(0.10 * 0.10 * 10000.0)),
});
// bind result: nnz=98 (expected ~100)

// Similarity
const sim = sparse.SparseVector.cosineSimilarity(&vec_a, &vec_b);
std.debug.print("cosine similarity: {d:.6}\n", .{sim});
// cosine similarity: 0.012345

// Convert to dense for JIT operations
const dense_a = vec_a.toDense();
_ = dense_a;

// Convert dense back to sparse
var dense_vec = vsa.randomVector(1000, 42);
var sparse_from_dense = try sparse.SparseVector.fromDense(allocator, &dense_vec);
defer sparse_from_dense.deinit();
std.debug.print("Dense->sparse roundtrip: nnz={d}, sparsity={d:.2}%\n", .{
sparse_from_dense.nnz(),
sparse_from_dense.sparsity() * 100.0,
});
// Dense->sparse roundtrip: nnz=667, sparsity=33.30%
}
Internal Details
  • Sorted indices: The indices array is always maintained in ascending sorted order. This invariant enables binary search for get/set and merge-join for VSA operations.
  • Insertion sort after permute: After cyclic shifting, indices may become unsorted. The permute function uses insertion sort to restore order, which is efficient for nearly-sorted data.
  • No packed mode: Unlike HybridBigInt, sparse vectors do not use bit-packed representation. The overhead of packing/unpacking would negate the sparse access benefits.