How to Design an OpenHarness-Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination

AI Agents 📅 2026-06-25 👁 44 views 🏷 OpenHarness, agent runtime, multi-agent coordination, tool use, agent memory, permissions, skills, context compaction, retry logic, cost tracking, agent lifecycle

Introduction

In this tutorial, we build OpenHarness from scratch to demystify how a practical agent harness operates. Rather than treating an agent framework as a black box, we reconstruct its core building blocks to gain full visibility into the control flow. By the end, you'll understand how the harness receives a user task, lets the model decide the next action, validates and executes tool calls, returns observations, and iterates until completion—all without relying on API keys or complex infrastructure.

What We'll Build

We'll recreate the essential components that make an agent system production-ready:

Tool Use & Typed Tool Schemas – Define tools with strict input/output types for safe execution.
Permissions & Lifecycle Hooks – Control tool access and hook into agent lifecycle events (before/after tool calls, task start/end).
Memory & Skills – Persistent memory across turns and reusable skill libraries.
Context Compaction – Keep conversation history within token limits without losing critical information.
Retry Logic & Cost Tracking – Handle failures gracefully and monitor API costs in real-time.
Multi-Agent Coordination – Orchestrate multiple agents that can delegate tasks and share state.

The OpenHarness Architecture

OpenHarness follows a loop-based architecture:

User Input – The harness receives a natural language task.
Model Reasoning – The LLM processes the task and decides on an action (e.g., call a tool, respond, or delegate).
Tool Validation & Execution – The harness validates the tool call against the schema, checks permissions, executes the tool, and returns the result.
Observation Handling – Tool outputs are fed back as observations.
Loop Continuation – Steps 2–4 repeat until the task is completed or a termination condition is met.

This design ensures full transparency—every decision and tool call is logged, auditable, and debuggable.

Step-by-Step Implementation

1. Core Harness Class

We start with a simple harness class that manages the conversation loop, tool registry, and memory.

class OpenHarness:
    def __init__(self, model, tools=None, memory=None, permissions=None):
        self.model = model
        self.tools = tools or {}
        self.memory = memory or []
        self.permissions = permissions or {}
        self.cost_tracker = CostTracker()
        self.context_compactor = ContextCompactor(max_tokens=4096)
        self.lifecycle_hooks = LifecycleHooks()

2. Tool Registry with Typed Schemas

Each tool is defined with a JSON schema for its parameters and return type. The harness validates calls against this schema before execution.

class Tool:
    def __init__(self, name, description, parameters_schema, return_schema, func):
        self.name = name
        self.description = description
        self.parameters_schema = parameters_schema
        self.return_schema = return_schema
        self.func = func

    def validate_call(self, **kwargs):
        # Validate kwargs against parameters_schema
        pass

    def execute(self, **kwargs):
        self.validate_call(**kwargs)
        return self.func(**kwargs)

3. Permissions & Lifecycle Hooks

Permissions control which agents can call which tools. Lifecycle hooks allow custom logic at key points (e.g., logging, alerting, pre/post processing).

class PermissionManager:
    def __init__(self):
        self.rules = {}  # agent_id -> [allowed_tool_names]

    def can_call(self, agent_id, tool_name):
        return tool_name in self.rules.get(agent_id, [])

4. Memory & Skills

Memory stores conversation history and tool outputs. Skills are reusable tool sets or prompt templates that agents can load on demand.

class Memory:
    def __init__(self, capacity=100):
        self.history = []
        self.capacity = capacity

    def add(self, entry):
        self.history.append(entry)
        if len(self.history) > self.capacity:
            self.history.pop(0)

5. Context Compaction & Retry Logic

Context compaction summarizes or prunes old messages to stay within token limits. Retry logic reattempts failed tool calls with exponential backoff.

class ContextCompactor:
    def __init__(self, max_tokens=4096):
        self.max_tokens = max_tokens

    def compact(self, history):
        # Summarize or drop oldest messages until under max_tokens
        pass

6. Multi-Agent Coordination

Agents can delegate subtasks to specialized sub-agents. The harness manages a registry of agents and routes tasks accordingly.

class AgentRegistry:
    def __init__(self):
        self.agents = {}

    def register(self, agent_id, agent):
        self.agents[agent_id] = agent

    def delegate(self, from_agent, task, target_agent_id):
        if target_agent_id in self.agents:
            return self.agents[target_agent_id].run(task)

Running the Harness

To run the harness, you'll need:

Python 3.10+
An LLM provider (e.g., OpenAI, Anthropic, or a local model via Ollama)
Basic dependencies (see the full notebook for requirements)

Clone the repository and open the tutorial notebook:

git clone https://github.com/MARKTECHPOST-AI-MEDIA-INC/AI-Agents-Projects-Tutorials.git
cd AI-Agents-Projects-Tutorials
jupyter notebook openharness_agent_runtime_from_scratch_Marktechpost.ipynb

Conclusion

Building OpenHarness from scratch reveals the inner workings of agent runtimes. By implementing tools, memory, permissions, and coordination ourselves, we gain the ability to customize, debug, and optimize agent systems for real-world applications. This foundational knowledge is essential as multi-agent architectures become central to AI engineering in 2026 and beyond.

For the full code with detailed explanations, refer to the companion notebook linked above.

via MarkTechPost