Higgins: a neurosymbolic menubar buddy
- 8 minutes read - 1569 wordsHiggins is a local assistant that lives in the menubar. A Qwen 2.5 7B model runs on-device via MLX, a SQLite triple store remembers things across sessions, and a small collection of tools — AppleScript runners, EventKit bridges, gh CLI wrappers — lets him actually do work on the Mac. No cloud, no subscription, no data leaving the machine.
The name is a nod to Magnum P.I.’s reserved English major-domo. The vibe was aspirational; most of the engineering effort went into making sure he doesn’t cheerfully hallucinate your dentist appointment into 2023.
Why not just a chat wrapper
The starting pitch was neuro-symbolic: “LLMs are great at vibes and terrible at strict logic; symbolic systems are the opposite; put a logic engine inside a neural one.” Nice in a blog post, vague as a project.
What it actually means in code: the neural side (Qwen) handles language, intent, paraphrasing. The symbolic side (SQLite facts table, typed tool schemas, EventKit queries) handles everything where being wrong is worse than being silent — dates, names, calendar events, PR numbers. The LLM never gets to compute a date from priors; it forwards your words to a parser. The LLM never claims to remember your dog’s name; the fact comes from a SQL row, or it doesn’t answer.
That split is the whole project. Everything else is plumbing.
The Qwen brain
Qwen 2.5 7B Instruct, 4-bit quantized, ~4.5 GB on disk. Loaded via mlx-swift-lm with progress reported through a NSKeyValueObservation on the download’s Progress object — since swift-huggingface fires the handler once then mutates in place, not on every byte.
Picking Qwen over Gemma was almost incidental. Gemma 3 4B worked but wouldn’t accept a system role, forcing a hacky primer-pair trick. Gemma 4 26B-a4b is MoE, which mlx-swift-lm’s Gemma4Model doesn’t implement yet. Qwen accepts system prompts cleanly and, crucially, was trained to emit native tool calls — which matters a lot more than benchmark scores.
The old path was prompt-engineered: TOOL_CALL toolname {...} in the output, parsed with regex. Fragile. Qwen wants to emit something like:
<tool_call>
{"name": "reminder_add", "arguments": {"text": "…", "datetime": "…"}}
</tool_call>mlx-swift-lm surfaces this as .toolCall(ToolCall) events in the generation stream, if you pass structured tool specs into UserInput(chat:, tools:). Once wired up, the model stops inventing its own format and starts using the one it was RL’d on.
for await gen in try MLXLMCommon.generate(input: input, parameters: params, context: context) {
switch gen {
case .chunk(let s): output += s
case .toolCall(let c): call = c
case .info: break
}
}The text parser stuck around as a fallback for edge cases.
The symbolic side
A SQLite file at ~/Library/Application Support/Symbolic/memory.db with two tables:
CREATE TABLE episodes(
id INTEGER PRIMARY KEY,
user_msg TEXT, assistant_reply TEXT,
signal INTEGER, ts REAL
);
CREATE TABLE facts(
id INTEGER PRIMARY KEY,
subject TEXT, predicate TEXT, object TEXT,
confidence REAL, last_seen REAL,
UNIQUE(subject, predicate, object)
);Episodes are the raw log: every chat turn lands here. Facts are the distilled, queryable form — subject/predicate/object triples with a confidence score. The interesting work happens in the pipe between the two.
Sleep
The consolidation metaphor is cribbed directly from human memory: episodic hippocampal traces don’t become semantic knowledge until you sleep. Higgins does the same. After 15 minutes of chat silence, or when you tap the 🌙 button, he runs a pass:
- NREM: the model reads the day’s recent episodes and emits one JSON object per line — extracted stable facts about you, your world, your preferences. Ephemeral conversational filler (greetings, acknowledgments, weather chat) is skipped.
- Synaptic downscaling: every fact’s confidence is multiplied by 0.95. Unused memories decay. Recalled ones get re-bumped above 1.0 and capped.
- Purge: facts below 0.05 confidence are dropped.
- Dream log: a human-readable markdown summary lands in
~/Library/Application Support/Symbolic/dreams/YYYY-MM-DD.md.
func nightCycle(episodeLimit: Int = 100) async throws -> Report {
let episodes = try await store.recentEpisodes(limit: episodeLimit)
let facts = await extractFacts(from: episodes)
for f in facts {
try await store.addFact(subject: f.subject, predicate: f.predicate, object: f.object)
}
try await store.decayFacts(multiplier: 0.95)
let purged = try await store.purgeWeakFacts(threshold: 0.05)
// ...
}There’s no remember command to learn. Facts accumulate while you use the thing.
Per-turn recall: closing the loop
Sleep would be useless if the facts just piled up. Before each generation, the user’s query is keyword-matched against the facts table (top 5 hits, LIKE on subject/predicate/object). Matching facts are injected into that turn’s system context as a bracketed prefix:
[Known facts from memory:
- dog / name / Rex (confidence 0.95)
- manz / employer / Woosmap (confidence 0.87)
]
what's my dog's name?History stays clean — the decoration only exists for the single inference call. No tool call needed for the model to answer “Rex”. The symbolic substrate grounds the neural output on the way past.
At low scale (hundreds of facts) LIKE matching is good enough. Embedding-backed similarity search is the upgrade when it isn’t.
Tools
The tool layer is where the project actually earns its keep. Three surfaces:
AppleScript library. Curated scripts live in ~/Library/Application Support/Symbolic/scripts/, each with frontmatter metadata (-- name:, -- description:, -- args:). Seed scripts ship with the app; the user can edit or add more. The model calls them by name — calendar_today, note_quick, etc. — as if they were first-class tools.
EventKit bridge. AppleScript against Calendar.app is infamously slow — iterating every event on every calendar through the bridge takes 30+ seconds for a busy calendar. EventKit queries the same data in milliseconds. The catch: it needs NSCalendarsFullAccessUsageDescription in a real Info.plist. The SPM executable embeds one via a linker trick:
linkerSettings: [
.unsafeFlags([
"-Xlinker", "-sectcreate",
"-Xlinker", "__TEXT",
"-Xlinker", "__info_plist",
"-Xlinker", "Resources/Info.plist",
]),
]-sectcreate __TEXT __info_plist injects the plist into a section TCC reads when deciding whether to prompt for permission. Works without wrapping the binary in a .app bundle.
gh CLI wrappers. gh_my_prs, gh_review_queue, gh_notifications, gh_recent_merged. Subprocess calls to the user’s existing gh session — no OAuth dance, no token storage. A ProcessRunner actor handles timeouts and respects Task.isCancelled so a stuck subprocess doesn’t freeze the UI when you hit the red stop button.
Tool results render as folded cards with a wrench icon, tool name, and a one-line summary. Click to expand. Errors auto-expand. #123 · PR title · owner/repo links are real anchors thanks to MarkdownUI on the assistant-side rendering.
Dates, or: why the model doesn’t do arithmetic
7B models are bad at date math. “Remind me tomorrow at 8:15” reliably produced ISO strings from 2023, 2024, anywhere but actually tomorrow.
The fix is to not let the model do arithmetic at all. Any tool argument named datetime, when, due_date, etc. gets intercepted in Swift before dispatch. A tiny table handles English and French relative words (tomorrow, today, demain, hier) — NSDataDetector isn’t locale-parameterizable and picks up the system locale, which fails on English keywords under a French macOS. Anything the table doesn’t catch falls through to NSDataDetector, which anchors relative phrasing against Date().
Critically, the user’s raw message is the source of truth, not the model’s output. If you typed “tomorrow”, “tomorrow” is what gets parsed — even if the model hallucinates an ISO. The model’s job is intent extraction, not date reasoning.
The AppleScript Run button
When the model writes code in an assistant reply, the code block renders with a language tag and a copy button. When the language is applescript, a Run button appears:
MarkdownUI.Markdown(turn.text)
.markdownBlockStyle(\.codeBlock) { config in
CodeBlock(
code: config.content,
language: config.language ?? "",
onRun: runAction(for: config.language ?? "", code: config.content)
)
}The action shells out to osascript -e <script>, captures stdout, appends the result as a tool turn. The user stays in charge — nothing runs without the click. Safer than giving the model unrestricted automation tools, more flexible than the curated script library.
What grew out of it
The original plan was to fine-tune a custom voice — a “genius caveman” Grug, all short fragments. Hours of LoRA experiments on Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B later, the honest conclusion: voice tuning on small models produces caricature or collapse, and nobody cares about voice when the tools don’t work. The pivot was embracing stock Qwen and pouring the effort into the symbolic substrate instead.
What’s there now:
- On-device Qwen 2.5 7B with native tool calling, loading with proper progress feedback
- Conversation persistence as JSONL (machine) + Markdown sidecar (human), auto-saved per turn, restored on launch if recent
- Tool calls through Qwen’s native format, fallback text parser, per-tool timeouts, cancel button, pending spinner
- Sleep/nap consolidation turning episodes into facts, confidence decay, dream logs
- Per-turn recall grounding generation on stored facts without touching history
- EventKit for calendar/reminders in milliseconds,
ghCLI for GitHub, NSDataDetector for natural-language dates - Right-click menu: new conversation, sleep now, open dreams/conversations/memory in Finder
- Markdown-rendered assistant replies via MarkdownUI, plain monospaced tool output
- AppleScript Run button on any
applescriptcode block in a reply
The fine-tune loop (REM-derived training candidates → periodic LoRA refresh of stable voice and high-confidence facts) is designed but deferred. Retrieval has a faster iteration loop; weights-baking earns its keep when retrieval stops being enough. Given Higgins has had exactly zero users for three days, that threshold is a while off.
Meanwhile he tells me what’s due tomorrow, doesn’t make up dates, remembers the cat’s name is Charlie, and opens PR review queues when I ask. Good enough for now.