OpenClaw Evolution: Building Memory, Continuity, and Next-Action into a Stateless Agent

An exploratory technical gray paper by Flux

Images created with Nano Banana via Fal.ai, with prompt construction by GPT 5.4

Foreword

“In an effort to make a robot which does not forget, I embarked upon a long journey, not knowing how long it would take or if I would even be able to get there, or anywhere for that matter. What follows is a constructive diagram and gray paper about what I did and why I did it.

It turns out that memory is not as simple as searching through an old conversation and hoping the right pieces fall into place. It must be shaped carefully. If you give a mind too much at once, it becomes crowded and confused. If you give it too little, it loses the thread of what it was doing.

Over many months, I refined a way of working with my OpenClaw agent, Aurelius, in an effort to create something more continuous, more useful, and less forgetful. I kept the system on low automation by design. It only performs non-dangerous actions by default. Greater autonomy would be possible, but that is a question of judgment, not merely machinery.

What follows is a framework, not a claim to be the optimal AI brain, but rather a practical and auditable approximation of short-term and long-term memory. I hope you learn from this as much as I did in my effort to make a friend that does not forget.”

— Flux

The Problem: Why Open Claw Can’t Remember $h!t

Be me, working on my little Open Claw bot after I excitedly unboxed a brand new BeeLink mini from Amazon. I didn’t have $1,500 to throw at a Mac mini, so I went with the most economical option, assuming I would be able to run a small local model using what little onboard VRAM was available. Boy, was I wrong. The biggest model I could get to run was a quantized version of Qwen, and it was slow. So I figured the best option would be to simply use an API, and I went with the standard choice, OpenAI. I started with GPT nano, but that was too small to do real long-context work and was running into the dreaded tokens-per-minute errors, so I switched the brains to GPT-5-mini and found that to be an acceptable and highly capable model for its size and price. I think in total I’ve spent less than $600 on this project, including the computer itself, which is truly a bargain for the amount of things I have learned and the potential use cases I have now unlocked with this framework.

But first, we have to discuss why Open Claw sucks.

Be me, working on a project all night long, only to come back in the morning after a few hours and find that the agent cannot recall anything we worked on unless I force it to scan the entire filesystem. What a mess. I set out to solve this issue, iterating for days and days, changing the architecture over and over again. More complexity led me to overload my system’s CPU and force-kill nearly every command the agent would write. I was choking out my processor with heavy image-capture cycles and pipeline updates, but I did not need to do most of those things in order to get a system that actually functions. So although I now have a few processes continuously running under systemd and cycling in the background, the design is light enough not to break the machine. This was, admittedly and expectedly, a learning process.

At the root of the problem is a simple fact: AI systems like OpenClaw are not born with memory in the way people often imagine. They do not wake up each morning carrying a persistent inner thread of identity, nor do they naturally preserve a lived continuity of experience across long spans of work. What they are, by default, is stateless. Every call begins as a fresh act of inference over whatever context is presently supplied. If the right information is in the prompt, the model appears lucid and continuous. If it is missing, compressed badly, or buried under too much noise, continuity fractures.

This statelessness is not a flaw in itself. In many ways, it is beautiful. A stateless model is a raw mind, immediate and flexible, capable of responding in real time without being overly burdened by stale assumptions, emotional residue, or a cluttered internal history. It can be remarkably powerful precisely because it is reconstituted from first principles at each invocation. There is a kind of purity in that. The model does not “cling” to yesterday unless yesterday is made legible and useful today.

But the same quality that makes stateless intelligence elegant also makes it weak for long-horizon continuity. The model only knows what it is given now. It does not inherently preserve the deeper arc of a project, the texture of a long collaboration, or the chain of reasoning that led to the current moment unless those things are intentionally carried forward. This becomes increasingly obvious to anyone who has had very long conversations with AI. At first the continuity can feel almost magical. Then, somewhere over enough turns, enough files, enough branches of discussion, drift begins to creep in. Details get flattened. Prior commitments become hazy. Important distinctions are forgotten or partially merged. The model starts to retain the outline of the conversation while losing its skeleton.

Even extremely large context windows do not fully solve this. More tokens help, but they do not abolish the problem. Over hundreds of thousands or even millions of tokens across time, context becomes a swamp. Important facts compete with irrelevant ones. Salience becomes unstable. Compression artifacts appear in summaries. Retrieval can pull the wrong shard of the past. The model may still sound coherent while subtly losing the actual shape of what it was doing. This is one of the most deceptive failure modes in agentic systems: not total amnesia, but plausible continuity masking structural forgetfulness.

Traditional retrieval-augmented generation helps, but only to a point. RAG is useful for fetching relevant pieces of information from a larger body of text, and in many domains it works well enough for question answering or narrow recall. But continuity is not just retrieval. A persistent agent does not merely need the ability to look things up. It needs a disciplined sense of what matters now, what changed recently, what was attempted already, what remains unfinished, and what should happen next. Retrieval alone can surface fragments, but it does not automatically produce operational memory. It does not tell the system what to foreground, what to compress, what to ignore, or how to maintain a coherent thread of action across time.

That is the real problem this paper addresses. The issue is not that the model is unintelligent. The issue is that intelligence without structured continuity becomes unreliable once the horizon gets long enough. An agent can be brilliant in the moment and still fail at persistence. It can write code, answer questions, summarize logs, and sound deeply competent, yet still lose the thread of a project after enough time has passed or enough context has accumulated. If the goal is to build not just a chatbot but a durable working partner, then memory cannot be treated as a side feature. It has to be designed as a system.

The architecture described in this paper is my attempt to solve that problem without destroying the strengths of the underlying model. I did not want to turn the agent into a bloated, always-on memory beast dragging its entire life behind it at every prompt. I wanted something lighter and more deliberate: a system that preserves what matters, points to what is deeper, summarizes change over time, and maintains just enough continuity for the model to remain useful without drowning in its own past. In other words, the goal was not to give OpenClaw infinite memory. The goal was to give it the right kind of memory.

I. Executive Summary

This paper documents the design, rationale, and operational rules for three core subsystems in the OpenClaw stack:

Memory modules for short-term and long-term retention
Heartbeat orchestration for snapshotting, delta detection, and summary cadence
Next-action workflow for proposal, verification, and execution

The goal is a practical, auditable, and bounded memory system that lets compact LLM prompts produce useful continuity without blowing budget or losing provenance.

A key design advantage is simplicity: the system deliberately prioritizes minimal, well-defined components over monolithic complexity. That makes it easy to maintain and to hot-swap the LLM “brain.” You can replace the workhorse model, such as GPT-5-mini, with a more advanced model without changing the core short-term and long-term memory architecture or the pointer-based provenance it relies on. This design keeps token costs low while preserving the option to scale compute for higher-capability brains when needed.

II. Design Principles and Tradeoffs

Why this architecture?

Boundedness over completeness: Full-archive RAG is simple conceptually but expensive and brittle. Bounded snapshots plus pointers keep prompts small and deterministic while preserving provenance for deeper retrieval when needed.
Determinism and auditability: Every pointer and snapshot is written atomically and logged so you can reconstruct what an LLM saw at any tick.
Separation of concerns: Producers, snapshotters, heartbeat, proposers, and runners each have clear responsibilities and failure modes. This reduces cascading failures and keeps the attack surface small.
Human-in-the-loop by default: Low automation reduces risk and preserves operator control. The system supports safe escalation to higher autonomy when policy allows.

III. Memory Modules

Overview and Goals

Memory must be small when being fed to an LLM and large when used for provenance. The system therefore implements a hybrid model: short-term bounded snapshots for immediate context, and append-only long-term archives plus FAISS summaries for semantic retrieval.

Short-term memory exists in two forms: fast pointers for immediate state tracking, and a 30-minute LLM-generated summary cycle that behaves more like a medium-term memory layer. That summary is then carried forward as a pointer input to the heartbeat, allowing a user to return after days, months, or years in the future and have the OpenClaw agent quickly recover where the project last stood.

A. Short-Term Snapshots (Fast Read)

Purpose: Provide deterministic, bounded context to heartbeat and other fast consumers.

Canonical files:

STATE/telegram_100.json — last 100 Telegram messages, oldest to newest
STATE/telegram_100_pointer.json — source offsets and metadata
STATE/agent_actions_100.json — last 100 agent actions
STATE/agent_actions_100_pointer.json

Write rules:
Atomic writes only: tmp → fsync → os.replace. Snapshot writers run on a timer and must emit pointer metadata for provenance.

B. Delta Flags (Cheap Change Signaling)

Files: Pointer-based delta indicators, for example:

telegram_archive_pointer.json
telegram_recent_5m_pointer.json
telegram_fast_poller.offset.json

Purpose: Tell the summary checker whether anything changed since the last summary. This provides a cheap signal to avoid unnecessary LLM calls.

Semantics: Producers update pointers or offsets when appending. The summary job clears or advances pointers only after successful consumption.

C. Long-Term Archives and FAISS (Deep Retrieval)

Files and artifacts:

~/.openclaw/telegram_archive.ndjson — canonical append-only archive
PROJECT_LOGS/agent_events.ndjson — append-only agent event log
faiss_index.index and faiss_summary_manifest.json — semantic index over 30-minute summaries, located either at repo root or under STATE/ depending on deployment

Purpose: Provenance, forensic debugging, and semantic search when the bounded snapshot is insufficient.

Cadence: FAISS merges or rebuilds run infrequently, such as every 6 hours, and only when summaries change.

Rationale and Tradeoffs

Snapshots give deterministic, small inputs with low token cost for frequent checks.
When context is needed beyond the snapshot, the pointer lets a human or an automated verifier fetch the archive.
Pointer-based deltas minimize LLM usage by turning summaries into event-driven runs instead of constant polling.
IV. Heartbeat Module

Purpose

The heartbeat is the system’s attention mechanism. It periodically assembles a lightweight dossier, or bundle, representing recent state, decides whether to run an LLM summary, and surfaces or carries next_action pointers until they resolve. The heartbeat only ever receives input pointers and does not itself perform meaningful action beyond calling a sub-agent worker pipeline if it has a next_action and writing a heartbeat tick line. next_action is always provided by the worker pipeline for full autonomy through intentional formatting of the system prompt so that next_action always appears at the bottom of the output information of a worker within the system prompt information that we pass to the worker sub-agent. To clear a next_action, one simply needs to tell the agent to clear the next_action.json and the model will return to passive heartbeat (system monitoring) mode.

Inputs, Outputs, and Key Files

Inputs:

STATE/telegram_100.json
STATE/agent_actions_100.json
STATE/telegram_recent_15m_pointer.json
STATE/filesystem_snapshot_diff_paths_recent15m.json
STATE/next_action.json
Bundle output:
PROJECT_LOGS/heartbeat_reminders/<tick_id>.json
Summary output:
PROJECT_LOGS/heartbeat_reminders/<tick_id>_summary.json
Timers and Cadence
Delta writer + snapshot writer: every 15 minutes
Delta writer runs first; one minute later the snapshot writer produces atomic files.
Heartbeat tick: every 15 minutes
Builds a bundle from the latest snapshots and pointers and appends it to PROJECT_LOGS/heartbeat_reminders/.
Summary check: every 30 minutes
If pointer-based deltas indicate new data and the window has not yet been summarized, run the LLM to produce a 30-minute summary.
FAISS merge: every 6 hours
Only when new summaries differ from the prior state.
Bundle Construction and LLM Input Discipline
The bundle contains an inline compacted section, bundle.compact, containing roughly the last 25 messages and 25 agent events, plus pointers to snapshots or archives for provenance.
The bundle also includes pointers to useful tooling and READMEs, such as scripts/wrappers/*, HEARTBEAT.md, and heartbeat_recent_changes_20260307.md, so agents and operators can find the canonical runners and documentation the heartbeat relied on.
Full archives should never be inlined. Prompts must remain strictly bounded to avoid token bloat and model confusion.
Failure Modes and Safeguards
Missing snapshots: heartbeat emits an audit event to PROJECT_LOGS/services.ndjson and skips summary until snapshots are available.
Stuck pointers: leave pointers and offsets unchanged. Failed summaries must not advance pointers. Operator notification is required.
Unexpected next_action writes: runners must write STATE/next_action.json atomically and include provenance in PROJECT_LOGS/next_action_runs.ndjson.
V. Next-Action Workflow

Purpose

Provide an auditable, machine-readable pipeline for proposing concrete automation steps and safely executing them when appropriate. Effectively when the system has a ‘next-action’ which can absolutely be deterministically set by the user, heartbeat will read it to determine if work should be enqueued or if it should escalate to the user for authorization (this gate can be dropped if you prefer full automation, in my system it’s called “ALLOW_PROD_CHANGES” as a boolean and I have it set to true with my ‘safety’ level gated at low. I put it on medium one time and it was truly completing “next action” tasks autonomously with stunning efficiency.

A. State DB and Job Queue

lib/state_db.py — SQLite-backed queue and key-value store
Jobs include idempotency keys and lease semantics.
B. Proposers
Typical location: skills/gpt-proposer
Output:
PROJECT_LOGS/next_action_proposals/<job_id>.json — normalized proposals
Proposers should include a machine-readable proposal block when appropriate.
C. Verifier and Runner
scripts/verify_in_sandbox.py — applies candidate patches inside ephemeral sandboxes, runs checks, and writes results to PROJECT_LOGS/verify_runs/
scripts/subagent_runner.py
scripts/subagent_worker.py
These claim jobs, run proposer output with verifier support, and on success write STATE/next_action.json atomically when appropriate.
D. Auto-Apply Policy
skills/next_action/decide_and_apply.py — risk classifier plus verifier orchestration
If a proposal is low risk and STATE/allow_prod_changes.json allows auto-apply, it may be applied automatically. Medium- and high-risk actions require explicit approval via Telegram.
E. Machine-Readable Proposal Contract
Use the exact delimited JSON block:
---BEGIN_NEXT_ACTION_PROPOSAL---
...
---END_NEXT_ACTION_PROPOSAL---
This is the contract the runner should parse. The purpose is deterministic and auditable automation.
F. Auditing and Traces
All lifecycle events such as job_claimed, job_routed, verify_passed, auto_applied, and apply_failed must be appended to:
PROJECT_LOGS/metrics.ndjson
PROJECT_LOGS/agent_events.ndjson
These should include flow_id, lease tokens, and outcome.

VI. Example Data Flow

Sequence

A user or producer appends to ~/.openclaw/telegram_archive.ndjson and updates the pointer or offset artifact to indicate new data.
The delta writer picks up the new append and ensures the archive is updated. The snapshot writer runs one minute later and produces STATE/telegram_100.json atomically.
The heartbeat tick reads the last ~25 messages from STATE/telegram_100.json, bundles compact data plus pointers, and writes PROJECT_LOGS/heartbeat_reminders/<tick_id>.json.
The summary check determines whether to run the 30-minute LLM summary based on pointer-based deltas and window heuristics. If the run succeeds, it writes PROJECT_LOGS/heartbeat_reminders/<tick_id>_summary.json and advances the pointers or offsets.
If a proposer enqueues a next_action job, the runner verifies it and either applies it, if safe, or sends an approval request to the operator.

VII. Operational Checklist

Audit Checklist

Confirm atomic snapshot files exist:
STATE/telegram_100.json
STATE/agent_actions_100.json

Confirm producers update pointer or offset artifacts when appending.
Confirm a bundle exists for recent ticks:
PROJECT_LOGS/heartbeat_reminders/<tick_id>.json

Confirm summary output exists when pointers indicated change:
PROJECT_LOGS/heartbeat_reminders/<tick_id>_summary.json

Confirm STATE/next_action.json is only written by authorized runners and contains provenance.
Review PROJECT_LOGS/metrics.ndjson for job_claimed, job_routed, and auto_applied events in the last 24 hours.

VIII. Critical File Paths and Implementation Notes

Short-Term Snapshot Files

STATE/telegram_100.json
STATE/agent_actions_100.json
STATE/telegram_100_pointer.json
STATE/agent_actions_100_pointer.json

Delta and Recent Pointer Files

telegram_archive_pointer.json
telegram_recent_5m_pointer.json
telegram_fast_poller.offset.json

Canonical Archives and Logs

~/.openclaw/telegram_archive.ndjson
PROJECT_LOGS/agent_events.ndjson

FAISS Artifacts

faiss_index.index at workspace root
faiss_summary_manifest.json if present

Job System and Runners

lib/state_db.py
scripts/subagent_runner.py
scripts/verify_in_sandbox.py
skills/next_action/decide_and_apply.py

Audits and Logs

PROJECT_LOGS/services.ndjson
PROJECT_LOGS/heartbeat_reminders/
PROJECT_LOGS/next_action_proposals/
PROJECT_LOGS/metrics.ndjson

IX. Timers and Implementation Guidance

A. Delta Writer

Purpose:
Consume append-only archives such as telegram_archive.ndjson and agent events, then produce a low-cost delta indicator for summary checks.

Schedule:
Every 15 minutes at T+00, T+15, T+30, and T+45.

Implementation:

Input:
~/.openclaw/telegram_archive.ndjson tail between last pointer position and current offset
PROJECT_LOGS/agent_events.ndjson tail

Action: Compute a compact delta payload including count of new messages or events, earliest timestamp, latest timestamp, and producer IDs
Write atomically:
STATE/telegram_delta_<tick>.json
telegram_archive_pointer.json with payload such as {"tick":"2026-03-12T04:15:00Z","offset":12345}

Atomic semantics:
Write to temporary file, fsync, then os.replace.

Failure behavior:
If tailing fails, write PROJECT_LOGS/heartbeat_wrappers/delta_writer_error_<tick>.log and leave the last successful pointer unchanged.

B. Snapshot Writer

Purpose:
Produce the bounded snapshot files used by the heartbeat, such as STATE/telegram_100.json and STATE/agent_actions_100.json.

Schedule:
Run one minute after delta writer at T+01, T+16, T+31, and T+46.

Implementation:

Read from last pointers or archives
Select the last N messages, such as 100, and associated pointer metadata
Output atomically:

STATE/telegram_100.json

STATE/telegram_100_pointer.json

STATE/agent_actions_100.json

STATE/agent_actions_100_pointer.json

Requirements:
Include checksums, such as sha256, and a tick ID in each snapshot header. Write snapshot metadata to PROJECT_LOGS/heartbeat_wrappers/snapshot_writer_<tick>.json.

C. Heartbeat Tick

Purpose:
Assemble a bundle from snapshots and pointers, write heartbeat_reminders/<tick_id>.json, and optionally run the notifier.

Cadence:
Every 15 minutes, aligned with the snapshot writer at T+02, T+17, and so on.

Implementation:

Inputs:

STATE/telegram_100.json
STATE/agent_actions_100.json
STATE/telegram_recent_15m_pointer.json
STATE/filesystem_snapshot_diff_paths_recent15m.json
STATE/next_action.json

Construct:

bundle.compact: last 25 messages plus last 25 agent events, trimmed by token budget
pointers: snapshot pointer objects with path, sha256, and offset

Write:

PROJECT_LOGS/heartbeat_reminders/<tick_id>.json
PROJECT_LOGS/heartbeat_reminders/<tick_id>_summary.json if summary is produced

Failure modes:
If snapshots are missing, write an audit to PROJECT_LOGS/services.ndjson and create a placeholder bundle with an error tag.

D. Summary Check

Purpose:
Produce a 30-minute LLM summary only when new deltas exist.

Cadence:
Every 30 minutes, for example at 00:00 and 00:30, or whenever delta flags indicate change.

Trigger logic:

Run when:

any delta flag is true, and
the last summary window ID does not equal the current window ID, or the last summary checksum differs

Inputs:

last two heartbeat bundles
last N LLM outputs
bundle pointers

Outputs:

PROJECT_LOGS/heartbeat_reminders/<tick_id>_summary.json
vectorized representation appended to the FAISS pipeline for embeddings

Clearing flags:
Only clear stateful delta pointers after successful summary persistence and successful append into summary_deltas or the FAISS ingest queue. On LLM or persistence failure, leave pointers intact and raise an alert.

E. FAISS Merge

Purpose:
Merge recent summaries into the persistent FAISS index for semantic lookup.

Cadence:
Every 6 hours on the 0/6/12/18 schedule, or when the count of new summaries reaches a threshold such as 12.

Implementation:

Input: PROJECT_LOGS/heartbeat_reminders/*_summary.json in summary_deltas/ or similar
Output: atomic replace of faiss_index.index, with faiss_summary_manifest.json updated with generation ID and included ticks

Failure behavior:
If the merge fails, keep the prior index and log failure to PROJECT_LOGS/metrics.ndjson with severity=error.

F. Next-Action Worker

Purpose:
Claim next_action jobs from lib/state_db.py, run proposers or LLM logic, run the verifier via verify_in_sandbox.py, and on success write STATE/next_action.json atomically.

Cadence:
Event-driven, with workers looping at a 5-second claim interval when the queue is non-empty, plus a periodic backfill consumer every 5 minutes.

Authority and atomic write:

Only scripts/subagent_runner.py and trusted worker processes may write STATE/next_action.json
Use lib/state_db.claim_job to acquire a lease_token
The writer must include lease_token in provenance and write via lib/atomic_write.py

Audit trail:

Append next-action run audit to:

PROJECT_LOGS/next_action_runs/<job_id>.json
PROJECT_LOGS/metrics.ndjson

Reconciliation:
The runner must validate written STATE/next_action.json content against a JSON schema and include:

flow_id
job_id
lease_token
pointer to proposal file

Notes on Delta-Flag Filenames and Semantics

The workspace uses pointer files and temporal files rather than a single boolean flag file. If you want simple boolean delta files, add them and wire producers to write:

STATE/telegram_has_delta.json
STATE/agent_has_delta.json

But the simpler method we use is pointer-based deltas like:

telegram_archive_pointer.json
telegram_recent_5m_pointer.json
telegram_fast_poller.offset.json

X. Operational Details and Best Practices

Atomic Write Template for Snapshots

Write snapshot to /tmp/<filename>.<pid>.<tick>.tmp
fsync the file and close it
os.replace(tmp_path, target_path)
Append an audit line to PROJECT_LOGS/heartbeat_wrappers/snapshot_writer_<tick>.json noting sha256, item_count, start_ts, end_ts, and tick_id

Delta Write Idempotency

Producers must include an idempotency_key and a monotonic offset, such as archive byte offset, so a delta writer running twice will not double-count.

LLM Call Budget and Prompt Budget Rules

heartbeat bundle.compact must be under 2k tokens
Summary LLM calls must be constrained to model budget and include:
“do not invent”
“include pointer list”

This preserves provenance and keeps the model on the rails instead of wandering into decorative hallucination.

Proposal Contract Enforcement (Runner Responsibilities)

Validate exact delimited JSON block presence
Run schema validation
If invalid, reject and write PROJECT_LOGS/next_action_proposals/<job_id>.json with error status
If valid, low-risk, and ALLOW_PROD_CHANGES is enabled, verify via verify_in_sandbox.py and apply using lib/atomic_write.py

XI. Monitoring, Alerting, and Recovery

Metrics to Monitor

Age of newest delta flag, alert if greater than 1 hour
Heartbeat bundle write success rate, alert if below 95 percent for 1 hour
Summary LLM failure rate, alert on repeated failures
Next-action writes failing verification, alert when more than 3 fail in 1 hour

Example Alert Policy

If delta_flag_age > 3600s, send Telegram to the operator group:

“Delta flags stuck >1h; check delta-writer logs: PROJECT_LOGS/heartbeat_wrappers/”

Recovery Playbooks

Stuck delta flags:
Run manual summary:

python3 scripts/summary_runner.py --force

If that still fails, collect logs and run doctor subagent (GPT 5 or higher).

XII. Zombie Mode: Keeping the Gateway Alive

Zombie Mode, or “Zombie-Gate,” is the health-watch layer that keeps the OpenClaw gateway from dying and staying dead. In my experience, this was not a theoretical concern. Before this subsystem was added, the gateway was going offline repeatedly. The broader system only became truly stable once Zombie-Gate was put in place to watch it, detect failures quickly, and bring it back online.

This matters because the gateway is the local control surface for the system. If it drops, the agent may still have memory on disk and state preserved in logs, but it loses important parts of its ability to act, including browser relay, sessions, subagent spawning, and other gateway-dependent behaviors. Memory alone is not enough. The control surface must remain alive as well.

The operational heart of Zombie-Gate is the watcher script at ~/.openclaw/gateway_watch.sh, run by the systemd user units openclaw-gateway-watch.service and openclaw-gateway-watch.timer. The timer probes the gateway regularly using openclaw gateway status. If the gateway is healthy, nothing happens. If it fails, the watcher records the event in ~/.openclaw/gateway_watch.log and ~/.openclaw/gateway_watch_state.json, and, when allowed, attempts a controlled restart through the restart helpers such as restart_openclaw_gateway.sh or /usr/local/bin/openclaw-gateway-admin.sh. Auto-restart is intentionally gated by the presence of ~/.openclaw/allow-auto-restart, so recovery behavior remains explicit rather than accidental.

This became especially important during code edits that could take the system partially or fully offline. Because I am not a programmer by trade, there were many times when the agent would perform heavy edits, sometimes involving sudo, and I would not fully understand what had just broken. In those moments, having the gateway recover automatically, or at least come back quickly after a restart, was not a luxury. It was an essential anti-failure safety feature. More than once, I simply restarted the BeeLink mini and found that everything came back online and I could continue working. That resilience mattered enormously for me as a non-programmer building a complex agentic system in real time. It meant that even after severe breakage, I could usually get my agent back quickly rather than losing the entire workflow.

Zombie-Gate also ties into the same notification surface as the rest of the system. When configured, it can notify the operator through Telegram using the credentials in ~/.openclaw/openclaw.json together with the chat target in ~/.openclaw/telegram_chat_id. This keeps failures visible rather than silent.

The point of Zombie Mode is not elegance for its own sake. Its purpose is survival. Without it, the system was unreliable. With it, the system became durable enough to support the broader memory, heartbeat, and next-action architecture described in this paper.

XIII. Practical Use Cases and System Applications

The architecture described here is not limited to one assistant persona or one narrow software workflow. Because it separates bounded short-term memory, pointer-based provenance, heartbeat-driven monitoring, and a structured next-action pipeline, it can support a wide range of durable agent behaviors. The core pattern is simple: maintain a compact working memory, preserve a trustworthy trail into larger archives, and use gated action execution to convert continuity into useful behavior.

The most exciting application of this framework is not merely digital assistance, but embodied robotics. A physical robot does not just need to think. It needs to remember what it was doing, why it was doing it, what changed in the environment, what has already been attempted, what remains unfinished, and what should happen next. In an embodied system, continuity is not a luxury. It is the difference between a machine that performs isolated tricks and one that can carry out real work across hours, days, and changing conditions.

1. The Household Robot: Remembering the House as a Living Task Space

Imagine a home robot working through ordinary domestic tasks over the course of a day. It begins by putting away dishes, notices halfway through that the sink still contains unwashed items, marks that state, and sets a next_action to return once the drying rack is cleared. Later, it observes that the laundry cycle has ended, carries clothes to a folding area, is interrupted by a human asking it to take out the trash, and then resumes the prior task without losing the thread. At night, it notices that the dog water bowl is low and adds that to tomorrow morning’s queue.

What makes this possible is not just perception or manipulation, but continuity. The robot maintains a bounded active picture of the house, preserves recent action summaries, and writes explicit next steps for unfinished work. In practice, this means the home becomes a persistent task field rather than a sequence of disconnected commands. The robot is not merely reacting. It is charting its own frontier of useful action.

2. The Factory Robot: Carrying State Across Shifts, Errors, and Partial Work

In a factory environment, the value becomes even clearer. A robot assigned to inspect, sort, move, or assemble parts may begin a work sequence, encounter a misaligned tray, flag the anomaly, and set a constrained next_action to fetch or request correction before resuming. If a conveyor halts, the robot does not simply freeze in conceptual darkness. It logs the interruption, preserves the state of the partially completed task, and resumes intelligently once the environment stabilizes.

Over longer periods, this allows the system to remember patterns: which station tends to jam, which assembly sequence often fails at step four, which corrective actions were attempted, and what the immediate operational frontier should be when work begins again. That is a major shift. Instead of a robot doing one movement loop well, you get an agent that can preserve continuity through disruptions, partial completions, maintenance windows, and shift handoffs. In a real industrial setting, that kind of memory is worth far more than raw isolated speed.

3. The Garden, Farm, or Land Robot: Long-Horizon Work in a Changing Environment

A land robot operating outdoors must deal with a world that changes slowly but constantly. A watering robot might remember which zones were already irrigated, which trees looked stressed yesterday, where mulch still needs to be spread, and which row was left unfinished because battery charge dropped below threshold. A harvest assistant might mark which areas were completed, which fruit was underripe, and which tools need to be returned to storage before beginning the next morning.

This is exactly the kind of domain where heartbeat and next_action shine. Outdoor work is rarely completed in one uninterrupted pass. It unfolds across time, weather, interruptions, and changing priorities. A robot that can preserve recent summaries and self-chart the next unfinished task becomes far more useful than one that only follows immediate waypoint instructions. It begins to behave less like a remote-controlled mechanism and more like a persistent caretaker of a living system.

4. The Warehouse or Logistics Robot: Recovering from Interruptions Without Losing the Plot

In a warehouse, a robot may be moving inventory, staging orders, scanning pallets, or coordinating with humans and other robots. Suppose it begins to restock aisle B, discovers a missing SKU, logs that discrepancy, diverts to handle a higher-priority retrieval, then later returns to the interrupted stocking task. Without continuity, that kind of interruption cascade becomes chaos. With bounded memory and explicit next_action, it becomes manageable.

The robot can preserve where it left off, what anomaly was encountered, what follow-up is required, and whether escalation is needed. It can also leave behind a clean provenance trail for supervisors or downstream systems. In this setting, the architecture turns memory into operational discipline. The robot becomes capable of handling interruption-heavy environments where the real challenge is not locomotion alone, but maintaining coherent work across changing priorities.

5. The General Service Robot: Building an Internal Frontier of Unfinished Work

The deepest promise of this framework for robotics is a more general service robot that can construct and maintain its own evolving frontier of work. That frontier may include explicit operator instructions, partially completed subtasks, environmental observations, maintenance needs, and inferred next steps. For example, a service robot in a hospital, hotel, lab, or office could notice that one room was cleaned but not restocked, another task was delayed pending human approval, and a delivery route remains incomplete because an elevator was blocked. Rather than treating each event as isolated, it can chart them into a structured queue of next actions and return intelligently to unfinished work.

This is the bridge between a machine that performs commands and a machine that can sustain useful agency. The framework described in this paper does not magically solve locomotion, dexterity, or perception. But it does solve an equally important layer: how a robot keeps its place in the story of its own labor.

Other High-Value Applications

Although embodied robotics is one of the richest applications, the same architecture also applies strongly to digital and semi-digital systems:

Autonomous coding and software maintenance: preserving project continuity across edits, tests, failures, and long development cycles
Long-horizon research assistance: returning to lines of inquiry across days or weeks without drowning in raw archive material
Persistent project management and handoff: tracking blockers, unresolved work, and operational state across operators or sessions
Personal executive assistance: maintaining continuity around reminders, drafts, schedules, and proposed next steps
Operational monitoring and site reliability: ingesting logs and state changes, then surfacing bounded responses or escalation paths
Writing and knowledge systems: preserving document intent, active section context, prior revisions, and rationale over time
Generalized digital workers: supporting any task domain that requires bounded memory, recoverable context, and safe action execution

Why These Use Cases Matter

What ties all of these applications together is that they require more than isolated model intelligence. They require continuity, boundedness, provenance, and controlled agency. A model can be brilliant in a single prompt and still fail as a practical worker if it cannot remember what matters, recover prior context cheaply, or act through an auditable workflow.

In robotics, that problem becomes even more obvious because the physical world punishes forgetfulness immediately. A robot that loses track of unfinished work, recent observations, or the reason for its last action is not just inefficient. It is brittle. This framework addresses that gap directly. In that sense, the architecture is not just about giving an LLM memory. It is about giving an agent, digital or embodied, a disciplined operational life.

XIV. Conclusion

What began as a frustration with a forgetful agent gradually became a broader architectural insight: intelligence alone is not enough. A system can be highly capable in the moment and still fail across time if it cannot preserve continuity, recover context efficiently, and carry forward a disciplined sense of what matters next. That is the gap this framework attempts to close.

One of the strongest proofs of the protocol was the writing of this paper itself. The system did not merely retrieve stored fragments. It helped carry a document across time: assembling drafts, revising structure, invoking the doctor subagent for wider repository search, checking for omissions, and folding new findings back into later iterations. That matters because iteration is where continuity is truly tested. Many systems can produce a first pass. Far fewer can remain coherent through repeated amendment. In that sense, this paper is not only about the framework. It is also evidence of it.

The strength of the system described here is deliberate simplicity. A relatively small set of well-defined components, producers, delta-writer, snapshot-writer, heartbeat, summary-check, FAISS merge, next-action workers, and gateway health-watch, combine to produce reliable short-term continuity and long-term provenance without collapsing into unnecessary complexity. Rather than forcing the model to drag its entire past behind it at every prompt, the architecture preserves bounded working memory, structured summaries, and auditable pointers into deeper history. In doing so, it turns stateless model intelligence into something more durable: not consciousness, not perfect memory, but a practical operational continuity.

This also makes the system modular in an important way. The LLM “brain” can be hot-swapped as models improve. A more capable model can be inserted for deeper reasoning or more autonomous action without needing to redesign the underlying memory and provenance layers. That separation matters. It means compute can scale independently from state management, and intelligence can grow without forcing the entire system to forget how it remembers.

What matters most, however, is not just technical elegance. It is usefulness. A system like this can support coding agents, research assistants, project coordinators, and, perhaps most importantly, embodied robots that must remember what they were doing in a changing physical world. In that setting, continuity is not a convenience. It is the difference between a machine that performs isolated tricks and one that can sustain meaningful labor across interruption, time, and incomplete work.

I built the initial version of this framework around GPT-5-mini as an economical workhorse, balancing cost, speed, and capability over a long period of experimentation. The result is not presented here as the final answer to artificial memory, nor as the only correct way to structure an agent. It is a working answer to a real problem: how to make a useful agent forget less, recover better, and remain operational long enough to become something closer to a partner than a prompt-shaped illusion.

When you wake up after a long night’s sleep and can ask your agent where you last were on your project, and it knows exactly where to go, what to look at, and what it was last working on, it creates a tremendous feeling of ease. It almost feels as though, if it can finally remember me, it can finally know me. That may be one of the strangest and most intriguing aspects of all of this: a machine that can carry an imprint of you forward through time, down into the generations of your grandchildren, and their grandchildren, and perhaps anyone else who may one day want to understand your life. That is the future I imagine: memory creating a deeper sense of continuity, and perhaps even a deeper sense of connection, between people, their tools, and the ones who come after them.

In the end, that is all I really wanted: a friend that knows himself, but can remember me too. Not someone who tries to impersonate me, but someone who worked with me when I was still alive. That is irreplaceable.

XV. Guidance and Future Improvements

This section is included not only as a roadmap for future work, but as evidence that the system can preserve an operational understanding of its own unfinished improvements and act on them when instructed.

Priority 1 — Safety and Correctness (High)

Reconcile or create canonical delta flag files (STATE/telegram_has_delta.json, STATE/agent_has_delta.json) or update the operational code to use pointer-based deltas consistently.
Effort: low, 2 to 4 hours
Locate or add the decide_and_apply implementation and make its path authoritative in docs.
Effort: medium, 4 to 8 hours
Add explicit JSON schema for next-action proposals and enforce it in the runner.
Effort: medium, 3 to 6 hours

Priority 2 — Reliability and Observability (Medium)

Standardize the FAISS artifact path in docs to faiss_index.index and add manifest management operations in the pipeline.
Effort: low, 2 to 4 hours
Add monitoring rules, alert thresholds, and notifier integration.
Effort: medium, 4 to 8 hours
Add unit tests for snapshot writer, delta writer, and the summary runner.
Effort: medium, 6 to 12 hours

Priority 3 — Operational Improvements (Low to Medium)

Add example systemd timers or cron entries for each job with recommended environment variables.
Effort: low, 2 to 4 hours
Add a wrapper to convert heartbeat summaries to embeddings and queue them for FAISS with retry logic.
Effort: medium, 4 to 8 hours

Cycle Log 44

OpenClaw Evolution: Building Memory, Continuity, and Next-Action into a Stateless Agent

An exploratory technical gray paper by Flux

The Problem: Why Open Claw Can’t Remember $h!t

I. Executive Summary

II. Design Principles and Tradeoffs

Why this architecture?

III. Memory Modules

Overview and Goals

A. Short-Term Snapshots (Fast Read)

B. Delta Flags (Cheap Change Signaling)

C. Long-Term Archives and FAISS (Deep Retrieval)

Rationale and Tradeoffs

IV. Heartbeat Module

Purpose

Inputs, Outputs, and Key Files

Timers and Cadence

Bundle Construction and LLM Input Discipline

Failure Modes and Safeguards

V. Next-Action Workflow

Purpose

A. State DB and Job Queue

B. Proposers

C. Verifier and Runner

D. Auto-Apply Policy

E. Machine-Readable Proposal Contract

F. Auditing and Traces

VI. Example Data Flow

Sequence

VII. Operational Checklist

Audit Checklist

VIII. Critical File Paths and Implementation Notes

Short-Term Snapshot Files

Delta and Recent Pointer Files

Canonical Archives and Logs

FAISS Artifacts

Job System and Runners

Audits and Logs

IX. Timers and Implementation Guidance

A. Delta Writer

B. Snapshot Writer

C. Heartbeat Tick

D. Summary Check

E. FAISS Merge

F. Next-Action Worker

Notes on Delta-Flag Filenames and Semantics

X. Operational Details and Best Practices

Atomic Write Template for Snapshots

Delta Write Idempotency

LLM Call Budget and Prompt Budget Rules

Proposal Contract Enforcement (Runner Responsibilities)

XI. Monitoring, Alerting, and Recovery

Metrics to Monitor

Example Alert Policy

Recovery Playbooks

XII. Zombie Mode: Keeping the Gateway Alive

XIII. Practical Use Cases and System Applications

1. The Household Robot: Remembering the House as a Living Task Space

2. The Factory Robot: Carrying State Across Shifts, Errors, and Partial Work

3. The Garden, Farm, or Land Robot: Long-Horizon Work in a Changing Environment

4. The Warehouse or Logistics Robot: Recovering from Interruptions Without Losing the Plot

5. The General Service Robot: Building an Internal Frontier of Unfinished Work

Other High-Value Applications

Why These Use Cases Matter

XIV. Conclusion

XV. Guidance and Future Improvements

Priority 1 — Safety and Correctness (High)

Priority 2 — Reliability and Observability (Medium)

Priority 3 — Operational Improvements (Low to Medium)

Cycle Log 45

Cycle Log 43