Introduction
In this tutorial, we build OpenHarness from scratch to demystify how a practical agent harness operates. Rather than treating an agent framework as a black box, we reconstruct its core building blocks to gain full visibility into the control flow. By the end, you'll understand how the harness receives a user task, lets the model decide the next action, validates and executes tool calls, returns observations, and iterates until completion—all without relying on API keys or complex infrastructure.
What We'll Build
We'll recreate the essential components that make an agent system production-ready:
- Tool Use & Typed Tool Schemas – Define tools with strict input/output types for safe execution.
- Permissions & Lifecycle Hooks – Control tool access and hook into agent lifecycle events (before/after tool calls, task start/end).
- Memory & Skills – Persistent memory across turns and reusable skill libraries.
- Context Compaction – Keep conversation history within token limits without losing critical information.
- Retry Logic & Cost Tracking – Handle failures gracefully and monitor API costs in real-time.
- Multi-Agent Coordination – Orchestrate multiple agents that can delegate tasks and share state.
The OpenHarness Architecture
OpenHarness follows a loop-based architecture:
- User Input – The harness receives a natural language task.
- Model Reasoning – The LLM processes the task and decides on an action (e.g., call a tool, respond, or delegate).
- Tool Validation & Execution – The harness validates the tool call against the schema, checks permissions, executes the tool, and returns the result.
- Observation Handling – Tool outputs are fed back as observations.
- Loop Continuation – Steps 2–4 repeat until the task is completed or a termination condition is met.
- Python 3.10+
- An LLM provider (e.g., OpenAI, Anthropic, or a local model via Ollama)
- Basic dependencies (see the full notebook for requirements)
This design ensures full transparency—every decision and tool call is logged, auditable, and debuggable.
Step-by-Step Implementation
1. Core Harness Class
We start with a simple harness class that manages the conversation loop, tool registry, and memory.
class OpenHarness:
def __init__(self, model, tools=None, memory=None, permissions=None):
self.model = model
self.tools = tools or {}
self.memory = memory or []
self.permissions = permissions or {}
self.cost_tracker = CostTracker()
self.context_compactor = ContextCompactor(max_tokens=4096)
self.lifecycle_hooks = LifecycleHooks()
2. Tool Registry with Typed Schemas
Each tool is defined with a JSON schema for its parameters and return type. The harness validates calls against this schema before execution.
class Tool:
def __init__(self, name, description, parameters_schema, return_schema, func):
self.name = name
self.description = description
self.parameters_schema = parameters_schema
self.return_schema = return_schema
self.func = func
def validate_call(self, **kwargs):
# Validate kwargs against parameters_schema
pass
def execute(self, **kwargs):
self.validate_call(**kwargs)
return self.func(**kwargs)
3. Permissions & Lifecycle Hooks
Permissions control which agents can call which tools. Lifecycle hooks allow custom logic at key points (e.g., logging, alerting, pre/post processing).
class PermissionManager:
def __init__(self):
self.rules = {} # agent_id -> [allowed_tool_names]
def can_call(self, agent_id, tool_name):
return tool_name in self.rules.get(agent_id, [])
4. Memory & Skills
Memory stores conversation history and tool outputs. Skills are reusable tool sets or prompt templates that agents can load on demand.
class Memory:
def __init__(self, capacity=100):
self.history = []
self.capacity = capacity
def add(self, entry):
self.history.append(entry)
if len(self.history) > self.capacity:
self.history.pop(0)
5. Context Compaction & Retry Logic
Context compaction summarizes or prunes old messages to stay within token limits. Retry logic reattempts failed tool calls with exponential backoff.
class ContextCompactor:
def __init__(self, max_tokens=4096):
self.max_tokens = max_tokens
def compact(self, history):
# Summarize or drop oldest messages until under max_tokens
pass
6. Multi-Agent Coordination
Agents can delegate subtasks to specialized sub-agents. The harness manages a registry of agents and routes tasks accordingly.
class AgentRegistry:
def __init__(self):
self.agents = {}
def register(self, agent_id, agent):
self.agents[agent_id] = agent
def delegate(self, from_agent, task, target_agent_id):
if target_agent_id in self.agents:
return self.agents[target_agent_id].run(task)
Running the Harness
To run the harness, you'll need:
Clone the repository and open the tutorial notebook:
git clone https://github.com/MARKTECHPOST-AI-MEDIA-INC/AI-Agents-Projects-Tutorials.git
cd AI-Agents-Projects-Tutorials
jupyter notebook openharness_agent_runtime_from_scratch_Marktechpost.ipynb
Conclusion
Building OpenHarness from scratch reveals the inner workings of agent runtimes. By implementing tools, memory, permissions, and coordination ourselves, we gain the ability to customize, debug, and optimize agent systems for real-world applications. This foundational knowledge is essential as multi-agent architectures become central to AI engineering in 2026 and beyond.
For the full code with detailed explanations, refer to the companion notebook linked above.
via MarkTechPost
