Skip to content

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570

Open
peder1981 wants to merge 1 commit into
microsoft:mainfrom
peder1981:pr/1-kernels
Open

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570
peder1981 wants to merge 1 commit into
microsoft:mainfrom
peder1981:pr/1-kernels

Conversation

@peder1981

Copy link
Copy Markdown

Add L2–L5 algebraic kernels for CPU-only 1.58-bit inference

This PR adds four new algebraic kernels for the CPU-only inference path:

Level Algebra Kernel Saves
L2 Walsh–Hadamard (no multiplications) ggml-bitnet-wht Replaces 256 maddubs with adds/subs in vec_dot
L3 ACDC (FWHT + diagonal) ggml-bitnet-fwht O(n log n) GEMV; needs ACDC-diagonalizable W
L4 Tropical (max, +) ggml-bitnet-tropical O(n·d + K·d) attention via top-K softmax over keys
L5 Holographic Reduced Repr. (FFT) ggml-bitnet-hrr d-dim vector stores N ≪ d "memories"

Files

  • src/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.cpp — kernel implementations
  • include/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.h — headers
  • CMakeLists.txt, src/CMakeLists.txt, src/ggml-bitnet-mad.cpp — build integration
  • patches/llama.cpp/ (5 patches) + scripts/apply-dispatch-patches.sh — Llama dispatch
  • .gitmodulesignore = dirty for local patch workflow

Design

  • All kernels are opt-in via env vars (default = untouched I2_S GEMV)
  • No GPU, no telemetry, no cloud calls
  • Submodule pinned to 1f86f05 (same as upstream)

Part of a split from original PR #567. This is the core code portion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant