Hackathon Build

VoiceAgents

Voice-native computing. Local. Private. Instant.
OpenClaw-grade ambition, built for accessibility: the browser and the DAW, commanded by voice, on your machine.

See the suite View architecture

Chromium: \"open reddit and scroll\" LMMS: \"play from bar 8\" System: Local. Private. Fast.

Scroll

The suite

Two agents. One thesis.

Chromium

Cursor for the web. Tabs, search, scroll, media, real sites - spoken into motion. The browser stops being an obstacle course.

LMMS

Cursor for the session. Transport, tracks, plugins - voice and text through a real AgentControl boundary. The DAW stops fighting you.

Line for the room: We took “AI is the new cursor” and put it where creators actually live - the open web and the open session. Same pattern. Two industries. One voice.

Product / Web

Chromium

Cursor for the web.

Intent in, action out - navigation, controls, the long tail of real pages.
Regex + optional local LLM - fast demos; semantic rescue when phrasing gets human.
Surface truth - you always know when the mic is in the loop.
Serious sites - YouTube, Reddit, Instagram - extensible to anything worth commanding.

Own the room: Speech becomes a first-class input to the browser - not dictation dumped into a search box.

Product / Audio

LMMS Agent

Cursor for music production.

AgentControl plugin - automation through a deliberate, inspectable seam.
Voice + text - same vocabulary; studio when it is quiet, text when it is not.
Local stack - takes stay in the session, not on a stranger’s GPU.

Own the room: Not “chat for musicians” - the moment the DAW takes direction like an instrument, not a fight.

Why it matters

More than a demo.

Accessibility is the architecture - voice as a peer to pointer and keyboard; recoverable flows.
Two domains, one pattern - consumption (web) plus creation (LMMS). Platform-shaped, not a Chrome trick.
Local-by-default - schools, studios, clinics - a trust story.

Closer: Agentic, but accountable.

Future

When everything is voice-native.

One mental model across apps - say intent, ground, confirm risk, execute.
Smarter grounding - accessibility tree plus gaze hints so “that control” is a ranked choice.
Workflows, not lone commands - chains with checkpoints and undo.
Voice as infrastructure - RSI, motor and vision load, hands-busy jobs.

Final beat: A world where your computer understands your job - without pretending it owns your life.

Spec

Architecture north star.

Full technical plan: local-only inference, gaze as a ranking signal, policy engine for risk tiers, accessibility-first UX - documented for implementers.

voice_agent_full_plan_v4.md

Command schemas, world model, fusion, safety, metrics - the blueprint behind the story.