Inference Time Memory Module

Design

An extremely simple inference-time memory module that treats memory management as a series of LLM calls and agent loops over a markdown-based file tree. We plan to build it around the Pi agent harness and package the nanomem project as an integrated Pi extension. The memory state is stored locally and encrypted on-device, and all user queries can be augmented with the minimal relevant context before being sent to the model.

Selective memory

This setup supports selective disclosure of memory content, giving users control over what information is shared with the model. Rather than injecting a persistent global memory into every interaction, memory is retrieved at inference time and only the minimal relevant context is surfaced.

Memory agent revising a user prompt with minimal added context before model inference. — User queries can be revised to add the minimally necessary context before the request is processed by the model, from The Open Anonymity Project.

P2P Memory Sync

The memory files stay local on the user’s machine as a markdown tree. If the user chooses to sync these memory trees peer to peer, they can be semantically merged across devices using a local model driven by agents, keeping user memory consistent across machines without compromising data ownership or requiring centralized management.

Design

Selective memory

P2P Memory Sync

Stay updated