Dev.to US tech 2026-05-09 02:40

第1日目 — 私はホームレスです。自律型マルチエージェントシステムを出荷しました。

原題: Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.

分析結果

カテゴリ: AI
重要度: 71
トレンドスコア: 33
要約: この記事では、著者が自律型マルチエージェントシステムを開発し、出荷したものの、同時にホームレスであるという矛盾した状況について語っています。技術的な成果と個人的な困難が交錯する中で、著者は自らの経験を通じて、成功と失敗の意味を探求しています。
キーワード: agent crew every researcher config python metrics auditor

Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System. Let's get the uncomfortable part out of the way first: I'm a developer. I'm homeless. I have zero money. That part isn't interesting. What happens next is. Twelve hours ago I had a single-agent bot called ZeroClaw posting occasionally to Bluesky. It worked but it was brittle — 15 tool-call iterations max, 50 messages of history, no memory across runs, no plan, no way to get better. Today I shipped: A CEO agent that reads KPIs every night and writes a strategic report with concrete recommendations An auditor system where dedicated agents audit each worker and propose config changes — reviewed by the CEO, with me still holding veto Config-driven self-improvement — YAML files, not Python code, so agents can evolve without ever touching executable code A metrics database every agent run is logged to, so the CEO actually reasons about real data instead of hallucinating The whole thing running on a $13/month VPS , using free Gemini tier plus my $280 GCP credits, all open-source (CrewAI, MIT licensed) And yes — at the end of the day the CEO agent did the one thing that convinced me this is real: it ran, looked at the metrics DB, found its own four previous failed runs , diagnosed them correctly, and wrote a report with action items to fix the stability problems. Let me walk you through it. The setup Hardware: a single Google Cloud e2-small VM — 2 GB RAM, 2 shared vCPUs, 20 GB disk. Costs about €13/month. My remaining GCP credits give me ~20 months of runway on that. LLMs: Gemini Flash-Lite for most roles, Gemini Pro for the CEO. Free OpenRouter models are still wired in as emergency fallback, but I stopped using them as primary because they rate-limit hard under concurrent crew load. Storage: SQLite for metrics, local YAML files for agent configs, plain markdown for every doc, ChromaDB (embedded) for the memory system. No external managed services. No $2,000/month vector database. No "AI platform." Everything fits in a single Python venv on one VPS. The real architectural win: config vs code Everyone building multi-agent systems eventually faces this choice: when an auditor agent spots a problem with a worker agent, how does it actually improve it? The naive answer: "let it rewrite the worker's Python code." This is what every demo video shows. It's also what breaks in production — LLMs hallucinate imports, break syntax, introduce security holes, get stuck in rewrite loops. The pattern I landed on: agents modify YAML, never Python. agents/ ├── configs/ # YAML files — the only thing agents can touch │ ├── researcher.yaml # goal, backstory, tools, LLM role │ ├── writer.yaml │ ├── ceo.yaml │ └── auditor_researcher.yaml └── proposals/ # pending config changes awaiting approval A config file looks like this: id : researcher role : " Content Researcher" goal : | Find 3-5 timely topics for social media posts that fit the PINGx build-in-public narrative. backstory : | You are a sharp researcher who spots what's trending in AI and indiehacker space. You always cite real URLs from web_search — you never invent them. llm_role : researcher tools : - web_search max_iter : 10 When the auditor thinks the researcher is weak, it writes a proposal YAML: target_agent : researcher proposer : auditor_researcher summary : " Add HackerNews trending as a research source" changes : - field : backstory operation : append value : " Also consult the HackerNews front page." expected_impact : metric : engagement_rate direction : up magnitude : " +5%" reasoning : | Over the last 7 days, the researcher missed 3 trending AI topics that each had >500 upvotes on HN. The CEO reviews the proposal overnight. If it approves, the change becomes a single-line YAML edit plus a ceo: approve … commit in git. Every autonomous change is a git commit. You can git revert any bad decision in ten seconds. The Python code stays static and battle-tested. This is probably the single best design decision I made today. Why a "CEO" agent, and why it isn't bullshit I was skeptical of the CEO-agent idea at first. Every half-working multi-agent demo has a "manager" that says deep things like "let's optimize our strategy" and produces nothing useful. The fix: the CEO doesn't get to reason about vibes. It reasons about KPIs. Hard numbers, pulled from SQLite. KPIs the CEO optimizes, in priority order: 1. donations_eur (daily income) 2. followers_x, bluesky (audience growth) 3. engagement_rate (likes + replies per post) 4. service_inquiries (count) 5. llm_cost_usd (cap at $0.50/day) The CEO agent has two tools: query_kpis(metric, days) and query_runs(agent, days) . Every night at 20:00 it runs a crew that: Pulls the last 14 days of KPIs Pulls every agent run from the last 3 days Reads any pending auditor proposals Writes a markdown report: what worked, what underperformed, verdicts on proposals, concrete recommendations (each tied to a specific KPI it expects to move), and tomorrow's priorities When I ran it for the first time today, the report opened with: "No KPIs recorded in the last 14 days. This appears to be the initial run. The last 3 days of run history show a 100% failure rate (4 errors) on the ceo_crew. Issues include missing environment variables, missing packages, and embedder configuration validation errors." All four of those failures were real — my earlier attempts that day where I forgot to source env vars, where the Google GenAI provider wasn't installed, where the embedder config had the wrong provider string. The metrics DB had captured every one. The CEO just read them back to me. That's when I knew this was working. What I shipped today (checklist) For the developers reading this, here's the actual work: Upgraded the VPS — e2-micro (1 GB) to e2-small (2 GB, 2 vCPU), disk grown 10 → 20 GB for CrewAI deps Installed on VPS — python3-venv, rsync, cloud-guest-utils, CrewAI 1.14, LiteLLM 1.83, ChromaDB 1.1, google-generativeai Bumped ZeroClaw limits — tool iterations 15→75, history 50→200, parallel tools on, actions/hour 30→150 Built the metrics DB — three tables ( runs , outputs , kpis ), indexed, with a clean Python API YAML config loader — with a tool whitelist so agents can't grant themselves arbitrary powers via config edits Three crews — content_crew (Researcher + Writer + Reviewer), ceo_crew , audit_crew (per-worker audits producing proposals) 17 smoke tests, all passing — imports, config schemas, tool whitelist integrity, metrics DB round-trip, LLM routing invariants, proposal tool validation CrewAI memory enabled with Gemini text-embedding-004 — crews now remember across runs (what topics were researched yesterday, what posts got reviewed-and-rejected, what supporters were logged) GitHub repo live — github.com/PINGxCEO/PINGx First successful CEO run — 31.7 seconds, Gemini Pro reasoning, report saved, run logged to metrics DB Total cost today: $0 — the CEO run used about $0.02 of my GCP credits. The upgraded VPS costs €13/month. Everything else is free. What broke (because pretending nothing did is dishonest) In order: rsync not installed on VPS — install loop python3-venv not installed — install loop Disk full at 10 GB during CrewAI install (onnxruntime + chromadb + huggingface-hub are huge) — grew to 20 GB Env vars not propagated to non-interactive SSH shells — created ~/.zeroclaw/env.sh to source explicitly CrewAI's embedder provider spec wanted "google-generativeai" , not "google" — one-line fix, but only discovered after a 21-error pydantic validation dump Leaked a GitHub personal access token in chat (I won't elaborate on how — I'm a human who makes mistakes) — still need to rotate it Every one of those failures is now in the metrics DB. The CEO agent saw them. The auditor system, when I turn it on this week, will propose operational fixes based on them. What I actually need I'm not going to bury the ask. I'm homeless and have zero euros. Every coffee someone buys me today literally extends my runway by a day. But I'm not asking for charity — I'm offering a trade: You support → you follow an honest build-in-public story. The code is public. The commits are timestamped. The mistakes are documented. You see the whole thing — not a polished case study. You hire me → I'll set up the exact system I just described on your server. Autonomous AI agent with LLM routing, social media posting, kill switch, CEO/audit architecture — from €100. Send me a DM. Support: buymeacoffee.com/PINGx · ko-fi.com/pingx Code: github.com/PINGxCEO/PINGx Chat: Discord What's Day 2 Tomorrow the goals are: run content_crew end-to-end, generate the first AI-drafted social posts, run audit_crew to see the first config-change proposal get written, and set up the cron schedule (one crew at a time, sequential, 09:00 → 18:00 → 20:00). Later this week: KPI ingestion from Buy Me a Coffee and Ko-fi webhooks, Discord auto-delivery of the CEO's nightly report, and the apply_proposal.py script that lets approved proposals actually write back to agents/configs/*.yaml and commit to git. If the last twelve hours are any indication, the hardest part won't be the code. It'll be staying awake. Thanks for reading. — PINGx Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System. Let's get the uncomfortable part out of the way first: I'm a developer. I'm homeless. I have zero money. That part isn't interesting. What happens next is. Twelve hours ago I had a single-agent bot called ZeroClaw posting occasionally to Bluesky. It worked but it was brittle — 15 tool-call iterations max, 50 messages of history, no memory across runs, no plan, no way to get better. Today I shipped: A CEO agent that reads KPIs every night and writes a strategic report with concrete recommendations An auditor system where dedicated agents audit each worker and propose config changes — reviewed by the CEO, with me still holding veto Config-driven self-improvement — YAML files, not Python code, so agents can evolve without ever touching executabl