microsoft · peder1981 · Jun 21, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,154 @@
+# ─── BitNet CPU kernel CI ──────────────────────────────────────────────────────
+#
+# Builds the bitnet.cpp project with all L2-L5 math kernels enabled and runs
+# the kernel unit test suite. No model download (full smoke/perplexity happens
+# locally or in a separate nightly workflow).
+#
+# Why this exists:
+#   - Clang ≥ 18 is required for SIMD kernels (per CLAUDE.md).
+#   - 3rdparty/llama.cpp is a fork (branch `merge-dev`); submodule init is
+#     critical for the build.
+#   - GCC 14 may not be installed in the runner image; we explicitly install
+#     libstdc++-14-dev so Clang 18 can find its system C++ headers.
+#
+# Trigger: every push to main, every PR.
+
+name: kernel-ci
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+
+jobs:
+  build-and-test:
+    name: build + test (Ubuntu, clang-18)
+    runs-on: ubuntu-24.04
+    timeout-minutes: 30
+
+    steps:
+      - name: Checkout (with submodules)
+        uses: actions/checkout@v4
+        with:
+          submodules: recursive
+          fetch-depth: 1
+
+      - name: Apply dispatch patch (combined 05)
+        run: |
+          echo "Applying combined patch 05 (L3 ACDC + L5 HRR + L4 K_i8 cache + FaseIII rect + LLaMA gate)..."
+          chmod +x ./scripts/apply-dispatch-patches.sh
+          ./scripts/apply-dispatch-patches.sh
+          echo "Verifying idempotence..."
+          ./scripts/apply-dispatch-patches.sh --check
+        shell: bash
+
+      - name: Install build dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y \
+            clang-18 \
+            cmake \
+            ninja-build \
+            libstdc++-14-dev \
+            python3 \
+            python3-pip \
+            python3-venv
+
+      - name: Create Python venv and install test dependencies
+        # Use an isolated venv to avoid PEP-668 conflicts between apt numpy/scipy
+        # and PyPI packages (safetensors has no numpy dep; still isolate for safety).
+        run: |
+          python3 -m venv .venv
+          .venv/bin/pip install --no-cache-dir numpy scipy safetensors
+
+      - name: Configure (Release, all kernels + ACDC_RECT)
+        # BITNET_ENABLE_ACDC_RECT defaults ON → 16 tests in CI.
+        # Python3_EXECUTABLE points to the venv so test_extract_acdc_diagonal
+        # finds the installed numpy/safetensors.
+        run: |
+          cmake -B build -G Ninja \
+            -DCMAKE_C_COMPILER=clang-18 \
+            -DCMAKE_CXX_COMPILER=clang++-18 \
+            -DCMAKE_BUILD_TYPE=Release \
+            -DBITNET_L2_WHT=ON \
+            -DBITNET_L3_ACDC=ON \
+            -DBITNET_L4_TROPICAL=ON \
+            -DBITNET_L5_HRR=ON \
+            -DBITNET_L6_RAG=ON \
+            -DBITNET_BUILD_TESTS=ON \
+            -DPython3_EXECUTABLE=$(pwd)/.venv/bin/python3
+
+      - name: Build (compiles L1 + L2-L6 + all test targets)
+        # Single build step — cmake discovers all targets from CMakeLists.txt.
+        # No hardcoded --target list: avoids breakage when targets are added/renamed.
+        run: cmake --build build --config Release -j$(nproc)
+
+      - name: ctest — 16/16 kernel unit tests
+        # BITNET_ENABLE_ACDC_RECT=ON (default) adds test_acdc_rect → 16 tests.
+        # -j$(nproc): parallel execution; --output-on-failure: full log on fail.
+        # PYTHON3_EXECUTABLE env var ensures the venv Python is used for
+        # test_extract_acdc_diagonal (the add_test() COMMAND is cmake-resolved).
+        run: |
+          ctest --test-dir build \
+            --output-on-failure \
+            -j$(nproc) \
+            --timeout 120
+
+      - name: NO-06 — telemetry audit (zero hits required)
+        # Persona D4: binário nunca envia dados a endpoints externos.
+        # Any match = CI failure.
+        run: |
+          HITS=$(grep -rn \
+            "telemetry\|upload_data\|send_metrics\|POST.*http" \
+            src/ utils/ run_inference*.py setup_env.py 2>/dev/null | \
+            grep -v "^Binary\|\.pyc" || true)
+          if [ -n "$HITS" ]; then
+            echo "::error::NO-06 FAIL — telemetry code found:"
+            echo "$HITS"
+            exit 1
+          fi
+          echo "NO-06 PASS — 0 telemetry hits"
+
+      - name: NO-07 — cloud URL audit (zero hits in production code)
+        # Ensures no hard-coded HTTP endpoints in C/C++ production sources.
+        # URLs in comments (// http) and docs are excluded.
+        run: |
+          HITS=$(grep -rn "http://\|https://" \
+            src/ include/ \
+            --include="*.cpp" --include="*.h" | \
+            grep -v "//.*http\|/\*.*http\| \* http" || true)
+          if [ -n "$HITS" ]; then
+            echo "::error::NO-07 FAIL — cloud URLs in production code:"
+            echo "$HITS"
+            exit 1
+          fi
+          echo "NO-07 PASS — 0 cloud URL hits"
+
+      - name: Cross-validation C ↔ Python (L3/L4/L5)
+        # Verifies that the Python reference implementations match the C kernels
+        # to rtol=1e-5, atol=1e-7. No model required.
+        # --build-dir points to the cmake output dir (build/tests/), not the
+        # local development build (build_tests/).
+        run: |
+          .venv/bin/python3 tests/cross_validation.py \
+            --all \
+            --build-dir build/tests
+          echo "Cross-validation: PASS"
+
+      - name: Air-gapped boot test (AC-11)
+        # Verifies that the built llama-cli binary runs without making any
+        # network syscalls.  This enforces persona D4 (no telemetry, no cloud)
+        # at the CI level.  The script is in tests/test_air_gapped_boot.sh;
+        # it auto-skips if no model file is provided (which is the case in CI).
+        # Result: SKIPPED is acceptable in CI; PASS requires a real model.
+        run: |
+          chmod +x tests/test_air_gapped_boot.sh
+          bash tests/test_air_gapped_boot.sh 2>&1 | tee /tmp/air_gapped.log
+          rc=${PIPESTATUS[0]}
+          if [ $rc -ne 0 ]; then
+            echo "::error::AC-11 air-gapped boot FAILED (rc=$rc)"
+            cat /tmp/air_gapped.log
+            exit $rc
+          fi
diff --git a/.gitignore b/.gitignore
@@ -44,4 +44,7 @@ __pycache__/
 poetry.toml
 
 build/
-logs/
+build_tests/
+logs/Testing/
+Testing/
+models/Llama-2-7B-GGUF/
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,21 @@
+# Regra Mandatória — Protocolo protheus-rag mem0 (cross-agent)
+
+> **NÃO-NEGOCIÁVEL — aplicar em TODO acesso de qualquer agente (Claude, OpenCode, Windsurf, Devin)**
+
+**ANTES de qualquer tool externa** (`webfetch`, `websearch`, `gh api`, `git`, doc online):
+1. Execute `mem0_search(query, user_id='default')` via MCP `protheus-rag`
+2. Se resultado útil → **USE-O. Não repita a busca externa.**
+3. Se vazio/baixa confiança → execute a tool externa, depois `mem0_add(content, user_id='default')`
+
+**RAG local primeiro:** use `consultar_base_direta` / `buscar_reversa_direto` antes de qualquer busca na internet.
+`user_id` é **sempre** `'default'` — namespace compartilhado OpenCode ↔ Claude ↔ Windsurf ↔ Devin.
+**Protocolo completo:** `~/.claude/mem0-protocol.md`
+
+---
+
+# Projeto: BitNet
+
+> **Guias completos de desenvolvimento**: `~/.claude/CLAUDE.md` contém:
+> - ADVPL/TLPP Development Guidelines
+> - SonarQube Compliance, API Symbol Validation, Completeness Verification
+> - MCP para consultar documentação (tabela completa de MCPs por necessidade)
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,165 @@
+# Regra Mandatória — Protocolo protheus-rag mem0 (cross-agent)
+
+> **NÃO-NEGOCIÁVEL — aplicar em TODO acesso de qualquer agente (Claude, OpenCode, Windsurf, Devin)**
+
+**ANTES de qualquer tool externa** (`webfetch`, `websearch`, `gh api`, `git`, doc online):
+1. Execute `mem0_search(query, user_id='default')` via MCP `protheus-rag`
+2. Se resultado útil → **USE-O. Não repita a busca externa.**
+3. Se vazio/baixa confiança → execute a tool externa, depois `mem0_add(content, user_id='default')`
+
+**RAG local primeiro:** use `consultar_base_direta` / `buscar_reversa_direto` antes de qualquer busca na internet.  
+`user_id` é **sempre** `'default'` — namespace compartilhado OpenCode ↔ Claude ↔ Windsurf ↔ Devin.  
+**Protocolo completo:** `~/.claude/mem0-protocol.md`
+
+---
+
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+---
+
+## Project Purpose
+
+This is a fork of Microsoft's **bitnet.cpp** — a CPU-only inference framework for 1-bit LLMs (ternary weights {-1, 0, +1}, 1.58 bits/param). The GPU pipeline has been removed. The fork extends the project with a mathematical research roadmap aimed at universalizing LLMs on CPU through forgotten algebraic structures.
+
+**Primary constraint**: CPU only. Never GPU. All new kernels must remain CPU-bound.
+
+---
+
+## Build and Setup
+
+**Full setup** (download model + convert + codegen + compile):
+```bash
+conda activate bitnet-cpp
+python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
+# ARM64: use -q tl1 instead; x86_64: use -q tl2 for LUT kernels
+```
+
+**Manual cmake build** (after kernel headers are generated):
+```bash
+# Standard build (requires libstdc++-14-dev; or use the flags below)
+cmake -B build -DCMAKE_BUILD_TYPE=Release
+cmake --build build --config Release -j$(nproc)
+```
+
+**Compiler requirement**: Clang ≥ 18 is required for SIMD kernels. GCC is tolerated but requires `-fpermissive`. Never use MSVC.
+
+**Ubuntu 24.04 workaround** — Clang 18 defaults to GCC 14 headers; if only `libstdc++-13-dev` is installed (no `libstdc++-14-dev`), add these flags:
+```bash
+cmake -B build \
+  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
+  -DCMAKE_CXX_FLAGS="-I/usr/include/c++/13 -I/usr/include/x86_64-linux-gnu/c++/13" \
+  -DCMAKE_EXE_LINKER_FLAGS="-L/usr/lib/gcc/x86_64-linux-gnu/13" \
+  -DCMAKE_SHARED_LINKER_FLAGS="-L/usr/lib/gcc/x86_64-linux-gnu/13" \
+  -DCMAKE_BUILD_TYPE=Release
+```
+
+**Submodule**: `3rdparty/llama.cpp` (fork, branch `merge-dev`) is the inference backend. Initialize with `git submodule update --init --recursive`.
+
+---
+
+## Running Inference and Benchmarks
+
+```bash
+# CPU inference (hardcoded -ngl 0, -b 1)
+python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
+  -p "Your prompt" -n 200 -t 4
+
+# Conversational mode
+python run_inference.py -m models/.../ggml-model-i2_s.gguf -p "System prompt" -cnv
+
+# End-to-end throughput benchmark
+python utils/e2e_benchmark.py -m models/.../ggml-model-i2_s.gguf -n 128 -p 512 -t 4
+
+# Perplexity evaluation
+python utils/test_perplexity.py -m models/.../ggml-model-i2_s.gguf
+```
+
+**Math kernel benchmarks** (Level 2/3/4 research, no model required):
+```bash
+python utils/wht_benchmark.py                    # Level 2: WHT zero-multiplication
+python utils/acdc_benchmark.py --n 512           # Level 3: FWHT+ACDC O(n log n)
+python utils/acdc_benchmark.py --n 512 --scaling # show operation count scaling table
+python utils/tropical_benchmark.py --n 256 --d 64 --k 16  # Level 4: tropical attention
+python utils/tropical_benchmark.py --scaling     # show speedup vs seq_len table
+```
+
+---
+
+## Kernel Architecture
+
+There are three CPU kernel families, selected at build time:
+
+| Format | Platform | Build flag | Generator |
+|--------|----------|-----------|-----------|
+| **I2_S** | x86_64 + ARM | default (no flag) | `src/ggml-bitnet-mad.cpp` |
+| **TL1** | ARM64 only | `-DBITNET_ARM_TL1=ON` | `utils/codegen_tl1.py` |
+| **TL2** | x86_64 only | `-DBITNET_X86_TL2=ON` | `utils/codegen_tl2.py` |
+
+**I2_S encoding**: weights {-1→0, 0→1, +1→2}, packed 4 per byte. QK block size = 128 (x86) / 64 (ARM). Main SIMD path uses `_mm256_maddubs_epi16` (AVX2).
+
+**TL1/TL2** are lookup-table kernels. The `.h` files in `preset_kernels/<model>/` are pre-generated for known models. For new models, run `utils/codegen_tl1.py` or `codegen_tl2.py` to regenerate, then recompile.
+
+**Kernel performance tuning**: Edit `include/gemm-config.h` before building. Controls `ROW_BLOCK_SIZE`, `COL_BLOCK_SIZE`, `PARALLEL_SIZE`, and the `ACT_PARALLEL` mode (activation-parallel vs weight-parallel). Activation parallel (`ACT_PARALLEL` defined) is recommended for I2_S. Run `python utils/tune_gemm_config.py` to auto-tune for your hardware.
+
+---
+
+## Mathematical Research Extensions (this fork)
+
+The fork adds experimental kernels under a 5-level algebraic roadmap:
+
+| Level | Math | Files | Status |
+|-------|------|-------|--------|
+| 2 | WHT decomposition — zero multiplications | `src/ggml-bitnet-wht.cpp`, `include/ggml-bitnet-wht.h` | Done |
+| 3 | FWHT + ACDC layer — O(n log n) GEMV | `src/ggml-bitnet-fwht.cpp`, `include/ggml-bitnet-fwht.h` | Done |
+| 4 | Tropical attention — (max,+) semiring | `src/ggml-bitnet-tropical.cpp`, `include/ggml-bitnet-tropical.h` | Done |
+| 5 | Holographic Reduced Representations (HRR) | `src/ggml-bitnet-hrr.cpp`, `include/ggml-bitnet-hrr.h` | Done |
+
+Full mathematical theory: `docs/mathematical-foundations.md`.
+
+**Critical ACDC invariant**: ACDC is not a post-hoc compression method. For random ternary W, ACDC projection captures only ~1/n energy. ACDC only achieves exact recovery when the model is *trained* with the ACDC architecture (d is the learned diagonal, optimized during training, not fitted afterward).
+
+**Level 3 kernel**: `acdc_forward(x, d)` = H·(d⊙(H·x)), unnormalized — no 1/n² factors. The projection formula `acdc_project`: d* = diag(H·W·H) / n².
+
+**Level 4 kernel**: `tropical_attention()` scans all keys with ternary dot products (zero multiplications), selects top-K, applies softmax only over K tokens. Complexity O(n·d + K·d) vs O(n²·d) standard attention.
+
+These Level 2–5 kernels are **wired into CMakeLists.txt** as a `bitnet_math` OBJECT library (linked into the `ggml` target) via `-DBITNET_L2_WHT=ON -DBITNET_L3_ACDC=ON -DBITNET_L4_TROPICAL=ON -DBITNET_L5_HRR=ON`. The build is verified (all four `.cpp` files compile with AVX2 flags on x86_64). They are not yet connected to the **llama.cpp tensor dispatch path** (that integration is the next step).
+
+**HRR operating regime** (critical): retrieval quality requires d ≥ 10·N (d = head_dim, N = context tokens). At d=64, N=32 → capacity limit, noisy retrieval (mathematically expected — see `docs/theory/05-holographic-memory.md`). For practical attention replacement: d ≥ 640 for N=64, or use phasor keys (exact inverse) instead of Gaussian random keys.
+
+---
+
+## Model Conversion
+
+```bash
+# From HuggingFace GGUF (pre-quantized)
+huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir models/BitNet-b1.58-2B-4T
+python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
+
+# From safetensors (bf16 checkpoint)
+huggingface-cli download microsoft/bitnet-b1.58-2B-4T-bf16 --local-dir ./models/bitnet-b1.58-2B-4T-bf16
+python utils/convert-helper-bitnet.py ./models/bitnet-b1.58-2B-4T-bf16
+
+# With embedding quantization (Q6_K format, recommended for speed+quality tradeoff)
+python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s --quant-embd
+```
+
+Conversion pipeline: safetensors → `convert-helper-bitnet.py` → `ggml-model-f32.gguf` → `llama-quantize` → `ggml-model-i2_s.gguf`.
+
+---
+
+## Repository Conventions
+
+- `_reversa_sdd/` — Reversa framework analysis artifacts. **Never modify these files.**
+- `.reversa/` — Reversa working directory. **Never modify these files.**
+- `preset_kernels/` — Pre-tuned kernel configs for known models. Only regenerate via codegen scripts.
+- The `3rdparty/llama.cpp` submodule is a fork (not upstream). Treat it as read-only unless deliberately patching the backend.
+- `run_inference.py` hardcodes `-ngl 0` (no GPU offload) and `-b 1` (decode batch size 1). This is intentional — CPU-only decode mode.
+
+---
+
+## Remotes
+
+- `origin` → `https://github.com/peder1981/BitNet.git` (this fork)
+- `upstream` → `https://github.com/microsoft/BitNet.git`
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md