Eight agent frameworks, one task, one model
An open benchmark of 8 LLM agent frameworks calling the same Gemini model on the same task with the same 4 tools. Pydantic-ai wins; CrewAI burns 7× the tokens for a worse score.
Independent developer. Writing about building with AI on a daily basis, falling down the NixOS rabbit hole, and teaching myself robotics.
Why this site exists, and what you'll find here.
An open benchmark of 8 LLM agent frameworks calling the same Gemini model on the same task with the same 4 tools. Pydantic-ai wins; CrewAI burns 7× the tokens for a worse score.
Qwen3.6-27B at Q4 on MacBook Pro M4 48 GB vs RX 9070 XT 16 GB. The GPU wins single-stream; the Mac scales 8× concurrent; MoE rewrites the math.
Single-request and batched throughput, peak RAM, and context capacity for Qwen3.6-27B at 4-bit on a MacBook Pro M4 48 GB with oMLX.
Three open-weight releases in nine days from two Chinese labs. What the DeepSeek-V4 and Qwen3.6 configs actually do.
Building a resilient bare-metal cloud with NixOS, Incus, and OpenTofu.