What I Learned in the First 12 Hours with OpenClaw

What I Learned in the First 12 Hours with OpenClaw

I have a rule I give clients: start manual, feel the pain, then automate. Make vendors answer hard questions before you sign anything. Be skeptical of tools that promise to solve problems you haven’t fully understood yet.

So naturally, when I decided to spin up OpenClaw — an open-source, self-hosted AI assistant platform — I ignored none of that advice. I stood it up on an old desktop I had sitting around, threw Ubuntu Server on it, pointed it at my Anthropic API key, and started the clock.

Here’s what twelve hours actually looked like.


What I Was After

Curiosity, mostly. If you work in tech right now, you’ve heard about OpenClaw. It hit 100,000 GitHub stars in under two months and surpassed React’s decade-long record in 60 days. It’s in every Slack, every feed, every conversation about what autonomous AI agents actually look like in practice. I wanted to know what it was — not from a README, but from running it.

I wasn’t going in with a production use case. I had no intention of handing it anything sensitive or consequential. My working rule: I wouldn’t trust OpenClaw with anything I wouldn’t trust my kindergartner with. Agentic AI systems that operate autonomously on your infrastructure are interesting precisely because the failure modes are novel — and the right time to learn those failure modes is in a lab, not when something important is at stake.

So this was an experiment. A deliberate one, but an experiment.

The original plan was a Mac Mini as the host — quiet, low power, purpose-built for always-on roles. That died when I actually tried to buy one. It turns out OpenClaw is partly responsible for its own supply problem: demand for high-memory Macs has gotten bad enough that M4 Pro and Mac Studio configurations are running four to six weeks out. People are buying them specifically to run local AI agents, landing on top of an already tight memory market. So I looked at the old desktop sitting unused in my office and made the obvious call: zero hardware cost, higher power draw, otherwise fine.


Before OpenClaw: Getting the Host Right

I could have installed Ubuntu Desktop, pointed a monitor at it, and called it done. I didn’t want that. I wanted a headless server I could SSH into from anywhere — something I’d never need to plug a monitor into again.

Before touching anything, I shredded the drives. The machine had two WD HDDs and no idea what was on either of them — it had been sitting unused long enough that I genuinely couldn’t remember. Probably sensitive personal data. Probably nothing. Not worth finding out the hard way.

sudo shred -vfz /dev/sdb 2>&1 | tee /var/log/shred-sdb.log
sudo shred -vfz /dev/sdd 2>&1 | tee /var/log/shred-sdd.log

Then: Ubuntu Server, installed to the internal SSD. I didn’t have a USB flash drive handy, so I used an external hard drive as the installation medium — flashed the ISO to it with Rufus, booted from it, installed to the internal disk. The external drive came back out once the install finished. GRUB went to the internal SSD’s MBR, which is exactly where it should be.

After install, the machine wasn’t getting an IP. Turned out to be a Netplan misconfiguration — the installer had the wrong interface name. ip link show to find the right one, update /etc/netplan/00-installer-config.yaml, sudo netplan apply, done.

Then Tailscale:

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Once it authenticated and I confirmed ssh arthur@arthur over Tailscale worked, the monitor came off and stayed off. That was the moment it became real infrastructure rather than a lab experiment.

The rest of the hardening was standard headless hygiene:

sudo ufw allow ssh && sudo ufw enable
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
sudo systemctl disable systemd-networkd-wait-online.service
sudo systemctl mask systemd-networkd-wait-online.service

Plus BIOS auto-power-on after outage, and SSH key auth with password login disabled. The machine is named Arthur. OpenClaw’s agent identity on this host is Claw. That felt right.

The last hardening step before touching OpenClaw was privilege separation — a dedicated service account for the agent to run under.

adduser claw

claw is not in the sudo group. Not in wheel. groups claw returns exactly one thing: claw. To make sure that held even if something went sideways, I added two more layers. First, edited /etc/pam.d/su to restrict su to wheel members only — so claw can’t escalate to arthur even with a valid password. Second, an explicit sudoers denial:

# /etc/sudoers.d/claw
claw ALL=(ALL) !ALL

Belt and suspenders. The PAM restriction covers su. The sudoers entry covers sudo. Together they mean a full compromise of the OpenClaw process stays contained to the claw account — it has nowhere to go.

SSH access to claw uses the same key as arthur, copied over during setup. That keeps the workflow clean: SSH directly as claw from my laptop when I need to work in that context, no lateral movement required.

With that in place, OpenClaw installation itself was anticlimactic.


Hour One: OpenClaw

Telegram integration: under ten minutes. BotFather, token, paste into the OpenClaw config, restart the gateway. First pairing request came in almost immediately — approved via one-time code from the CLI. The bot was polling within seconds.

The initial agent bootstrap is a little odd if you’re not expecting it. There’s no onboarding wizard. There’s a BOOTSTRAP.md Claw reads to orient itself, and then you’re talking to something that’s actively figuring out what it is and what you need. Less polished than a commercial product. More interesting.

Setup was uneventful. The more interesting things came later.


The First Real Failure: Silent API Limits

Around hour four, Claw went dark.

Telegram messages piled up unanswered. The gateway’s systemd service was healthy. The process was alive. No errors surfaced anywhere obvious. I had to get back to a terminal and dig into journalctl before the picture became clear: Anthropic API billing limit hit, 402 errors on every LLM call, the gateway dutifully swallowing failures and returning nothing.

This is the failure mode people don’t think about when they self-host on consumption-based APIs. It’s not noisy. The infrastructure stays up. Messages arrive. The bot accepts them. And then nothing happens, because the intelligence layer is dark and nothing in the stack is designed to tell you that.

Twelve hours in, I’d already burned through $12 in API usage. That’s not a complaint — the heartbeat monitoring runs constantly, and I was iterating heavily on configuration. But it’s the kind of number that clarifies things quickly. If you’re running this for anything time-sensitive — threat monitoring, alerting, anything where a missed message costs you — you need external alerting on your API spend. Anthropic’s billing soft limits exist for exactly this reason. The system won’t tell you it’s broken. You have to instrument it yourself.

The lesson: self-hosted doesn’t mean self-healing. Monitor your API spend or you’ll find out about limits the hard way.


The UFW Ordering Bug

Later in the day I was hardening network isolation for this host on my VLAN. The goal: allow outbound to the gateway and internet services, deny outbound to everything else on the subnet.

The ruleset looked reasonable. Default deny incoming and outgoing. DENY rule for the subnet. Port-specific ALLOWs for 53, 80, 443, 123.

I tested with nc -v 192.168.68.106 80. Connection succeeded.

The problem was evaluation order. UFW processes rules top to bottom, first match wins. My port-specific ALLOWs had been inserted before the subnet DENY. Port 80 to any destination matched the ALLOW rule before the packet ever reached the DENY. The ruleset looked correct — it had all the right rules — but the logic was wrong because the order was wrong.

Fix: delete the out-of-order rules, reinsert the subnet DENY before the port-specific rules, retest. Blocked as expected.

This is a known trap with UFW and iptables, and I’ve seen it bite people who know better, including me. The reading-the-ruleset problem is real: a firewall policy that looks right at a glance and is wrong in practice is worse than a policy that obviously needs work, because you’ll trust the one and fix the other.

The lesson: verify firewall rules by testing, not reading. Order matters, and the only way to know the order is right is to check what actually happens.


What Actually Worked: News Monitoring

The most immediately useful thing was something I almost treated as an afterthought.

I set up a HEARTBEAT.md — a plain-text file specifying what Claw should monitor and how often. Topics: major CVEs, GRC framework changes, AI/LLM developments, GRC SaaS market moves, content-as-code tooling. Every 30 minutes, it runs. What qualifies as worth surfacing versus routine noise is specified in the file, manually, in plain language.

Within a few hours it had flagged:

  • Cisco IMC CVE-2026-20093: unauthenticated remote admin bypass — the kind of thing you want to know about immediately if Cisco IMC is anywhere in your stack
  • Proofpoint’s warning that autonomous AI copilots are projected to surpass humans as the primary source of enterprise data leakage
  • RSAC 2026: LLM/GenAI protection is now the #1 stated priority in enterprise security, per conference floor coverage
  • FBI wiretap system breach via a third-party vendor — a clean supply chain risk case study

That’s a solid morning briefing assembled automatically, filtered to my actual interests, without standing up an RSS infrastructure or paying for a threat intel subscription.

The key design choice: the filtering logic lives in a plain text file I control. The agent follows it literally. If I want to tune the signal threshold, I edit the file. No dashboards, no vendor portals, no opaque “AI-powered insights.”

The lesson: agentic news monitoring is useful if you’re specific about your interests and explicit about your noise threshold. Vague instructions produce vague results.


What OpenClaw Actually Is

It’s a harness. A runtime that keeps an AI assistant alive, persistent, and connected to surfaces you actually use — Telegram, Discord, web chat. It doesn’t make the underlying model smarter. It doesn’t solve hallucination. It doesn’t protect you from billing surprises or misconfigured firewall rules.

What it does: it gives the model continuity (via files and memory), tooling (web search, exec, file read/write), and the ability to reach you without requiring you to open a tab.

The difference between an assistant you have to visit and one that can reach you when something matters is real. The difference between an assistant that resets every conversation and one that remembers your infrastructure, your preferences, and your active projects is real.

But it’s infrastructure. It fails like infrastructure. It needs monitoring, hardening, documented configuration, and realistic expectations. The discipline you’d apply to any other service running on your network applies here too.


Twelve hours in: an assistant that knows my network, monitors the topics I care about, responds on Telegram when something matters, and already caught a firewall misconfiguration I’d have missed.

Ask me again in a month.