Skip to main content

Memory Efficiency

Trinity achieves up to 20x memory savings compared to float32 representations through a combination of packed ternary encoding, lazy conversion strategies, and sparse vector formats. This page explains each memory optimization technique and when to use it.

Ternary Information Density

Each ternary value (trit) can be one of three states: {-1, 0, +1}. This carries log2(3) = 1.58 bits of information. In contrast, a float32 value uses 32 bits, and even a single byte (int8) uses 8 bits. The theoretical minimum storage for a trit is 1.58 bits, and Trinity's packed format approaches this limit.

HybridBigInt: Dual Representation

The HybridBigInt type (defined in the core library) provides a hybrid storage strategy with two internal representations:

  • Packed format: Trits are stored at approximately 1.58 bits per trit using a custom encoding scheme. This is the memory-efficient representation used for storage and transmission.
  • Unpacked format: Each trit occupies a full integer slot in a fixed-size array ([MAX_TRITS]Trit). This is the compute-friendly representation used during arithmetic operations.

Conversion between formats is lazy -- the system only unpacks when an operation requires element-level access, and only packs when storage efficiency is needed. This avoids redundant conversions in operation chains. The ensureUnpacked() method is called before JIT-compiled operations to guarantee direct memory access to the trit array.

Packed Trit Encoding

At the lowest level, Trinity encodes trits using 2 bits per trit in packed byte arrays. The encoding maps:

Trit Value2-bit Encoding
-10b10
00b00
+10b01

Four trits fit in a single byte. For a 10,000-dimensional vector:

FormatSizeCalculation
float3240,000 bytes (40 KB)10,000 x 4 bytes
int810,000 bytes (10 KB)10,000 x 1 byte
Packed 2-bit2,500 bytes (2.5 KB)10,000 x 2 bits / 8
Theoretical (1.58-bit)1,981 bytes (~2 KB)10,000 x 1.58 bits / 8

The packed 2-bit format achieves a 16x reduction compared to float32. With the higher-density 1.58 bits/trit packing used by HybridBigInt, the compression approaches 20x.

Sparse Vector Representation

For vectors where a large proportion of trits are zero (sparsity > 50%), Trinity provides a SparseVector type that uses the Coordinate List (COO) format. Instead of storing every element, it stores only the indices and values of non-zero elements:

SparseVector {
indices: [u32] -- sorted positions of non-zero trits
values: [Trit] -- trit values at those positions (-1 or +1)
dimension: u32 -- total vector length
}

Memory usage scales with the number of non-zero elements (nnz) rather than the total dimension:

Sparsity10,000-dim Dense (packed)10,000-dim Sparse (COO)Savings
50% zeros2,500 bytes~25,000 bytesNone (sparse is worse)
90% zeros2,500 bytes~5,000 bytesNone (sparse is worse)
99% zeros2,500 bytes~500 bytes5x
99.9% zeros2,500 bytes~50 bytes50x

The sparse format becomes advantageous at very high sparsity levels (above ~95% zeros), which occurs in certain VSA encoding patterns and after thresholding operations. The SparseVector provides a sparsity() method to measure the zero ratio and a memorySavings() method to compare against the equivalent dense representation.

Choosing the Right Format

Use CaseRecommended FormatReason
General VSA operationsHybridBigInt (packed)Good balance of memory and speed
JIT-compiled hot pathsHybridBigInt (unpacked)Direct memory access for native code
Storage and serializationPacked trit arraysMinimum size for dense vectors
Very sparse data (>95% zeros)SparseVector (COO)Memory proportional to non-zero count
BitNet model weightsPacked ternary20x compression vs float32

Impact on Inference

For BitNet b1.58 language models, the memory savings from ternary weights are substantial. A 7B parameter model in float32 requires approximately 28 GB of memory for weights alone. With ternary packing at 1.58 bits per weight, the same model fits in roughly 1.4 GB -- small enough to run on a single consumer GPU or even in system RAM on a laptop.