Hypher Binary Tries
CrossPoint embeds the exact binary automata produced by Typst’s hypher.
File layout
Each .bin blob is a single self-contained automaton:
uint32_t root_addr_be; // big-endian offset of the root node
uint8_t levels[]; // shared "levels" tape (dist/score pairs)
uint8_t nodes[]; // node records packed back-to-back
The size of the levels tape is implicit. Individual nodes reference slices inside that tape via 12-bit offsets, so no additional pointers are required.
Node encoding
Every node starts with a single control byte:
- Bit 7 – set when the node stores scores (
levels). - Bits 5-6 – stride of the target deltas (1, 2, or 3 bytes, big-endian).
- Bits 0-4 – transition count (values ≥ 31 spill into an extra byte).
If the levels flag is set, two more bytes follow. Together they encode a 12-bit offset into the global levels tape and a 4-bit length. Each byte in the levels tape packs a distance/score pair as dist * 10 + score, where dist counts how many UTF-8 bytes we advanced since the previous digit.
After the optional levels header come the transition labels (one byte per edge) followed by the signed target deltas. Targets are stored as relative offsets from the current node address. Deltas up to ±128 fit in a single byte, larger distances grow to 2 or 3 bytes. The runtime walks the transitions with a simple linear scan and materializes the absolute address by adding the decoded delta to the current node’s base.
Embedding blobs into the firmware
The helper script scripts/generate_hyphenation_trie.py acts as a thin wrapper: it reads the hypher-generated .bin files, formats them as constexpr byte arrays, and emits headers under lib/Epub/Epub/hyphenation/generated/. Each header defines the raw data plus a SerializedHyphenationPatterns descriptor so the reader can keep the automaton in flash.
A convenient script update_hyphenation.sh is used to update all languages. To use it, run:
./scripts/update_hypenation.sh