Electrical-Engineer-in-the-loop

Patrick Paul•Follow on

April 21, 2025

AI Codegen

For all the coders out there, we are all at different points in our journey into the AI singularity (i.e. our own obsolescence as software devs). I've been preoccupied with machine learning and computer vision applications for much of the last couple years and too busy to pick up many of the developments in AI codegen, and though I'd certainly spent a lot of time using the hosted chatbot offerings à la ChatGPT.com, Claude Desktop, etc., it is only in the last few months that I've delved deep into AI pair programming and code generation.

In this article, I present my own trajectory up to now in working with AI assistants, and present my wish for an electrical-engineer-in-the-loop ("EE-in-the-loop") that can prompt us humans for some of the hardware product work it can't do yet.

Codeium

I would be remiss not to mention Codeium which is probably the more truthful "first step" I took into the world of AI codegen. Compared with the intellisense of yester-decade, Codeium added AI-powered code completion and a chatbot right into your VS Code terminal. That said, it was very much a "black box" to me as far as how it worked and so not really germane to this article about moving closer to AI inner workings myself.

Aider

Whereas chatbots ordinarily lived in a web browser and beyond reach of the source code on your filesystem, we now live in a world where LLMs like ChatGPT and Claude interact directly with your machine and filesystem. In late 2024, I tried out a handful of LLM-first IDEs and found them lacking--mostly because I'm a stubborn terminal CLI user. Aider worked beautifully from the terminal, however, and by using first ctags and now repomap, aider is able to map your entire project git repository.

Whereas online chatbots in 2023 were largely limited to whatever snippet or individual file you pasted into the initial chat prompt, aider (and certainly many other tools in 2024) could introduce new features for your app and refactor code across all of the files in your repository. Further, it did this by intelligently referencing symbols across files to maximize understanding and minimize its LLM token usage (optimizing for cost which is a nice-to-have but more importantly fitting more context into the LLM call).

I remain a steadfast fan and user of aider, but it was limited to the knowledge cut-off of whenever whichever LLM was most recently trained and updated (e.g. September 2024). Soon after, I found RA.Aid which added a wrapper around aider -- still using aider for code editing tasks, but introducing an outer ReAct agent loop that could reason about a feature or a bug, and delegate other tool calls like web research or the eventual code editing to other agents. For those interested, under the hood it uses LangChain.

What’s the ReAct agent pattern? It’s a simple but powerful idea: Reason + Act. The AI reasons step-by-step about a problem, acts by calling tools or APIs (like calculators or code runners), then loops back, refining its answer until it’s done. RA-Aid is open source and helps with code refactoring, feature implementation, web research—much like Devin, Cursor, or Aider. (hat/tip LLM summarization tysm)

REPL to Chatbot to ReAct Agent to Agent Orchestration

If we understand the goal to be a polished software project, the march of progress in the developer tools probably flows from the read-eval-print loop (REPL) shell to this zeitgeist's multi-agent orchestration (Crew.ai, LangGraph, Pydantic AI, to name a few).

Level	Altitude	Control	You do...	It does...
REPL	0–100 ft	Manual	Type code	Evaluate instantly
AI Chatbot	5K–10K ft	Interactive	Ask questions	Respond with helpful text/code
ReAct Agent	20K–30K ft	Semi-auto	Give goal	Reason, act, explain each step
Agentic Orchestration	60K+ ft (autopilot)	Delegated	Provide objective	Plan, schedule, call agents, retry

As a necessary disclaimer, I've excluded a handful of powerful engineering practices like test-driven development and continuous integration/continuous deployment (CI/CD) -- but I want to highlight the relationship between the software developer and the code.

REPL. You are in the driver's seat, hopefully driving a hot-hatch like a Volkswagen Golf GTI or better. There is no knowledgeable help except possibly the senior engineer looking over your shoulder mentoring you. To use the metaphor of an airplane, you're still on terran soil except maybe with the REPL you're at least slightly airborne like Wright and Wright at Kittyhawk, as compared with just trying to compile text files and pray. Maybe I ought to have included tooling like gdb debugging here at a somewhat higher altitude.

AI Chatbot. All (okay, most) of the knowledge in the world on the other side of the screen ready to answer all your questions -- at least if you're using one of the predominant languages and frameworks like Python/Django or JavaScript/React.js. Probably not so much if you're using a Rustlang crate with only 500 downloads and scant documentation. That said, these LLMs will give you exactly what you ask for -- so if you have strong foundational experience, these are very powerful and you get a ton of leverage. If you're just starting out, they can betray you. Real dangerous territory for unknown unknowns. Experienced developers have already been burned by bugs and security flaws before and are familiar enough with these challenges to evaluate AI output, whereas beginners may trust AI output without understanding.

ReAct Agent. Here is where I am in my AI codegen journey. You are able to give goals, and the agent(s) will do their darndest to satisfy. The more clear your goals are, the more successful it will be implemented. "I want to save user data for a book sharing web app" is very open-ended whereas "I want a PostgreSQL database to save these data models: users; authors; authors write multiple books; users bookmark their favorite books; users highlight book passages." Moreover, it really shines if you can approach the task of prompts moreso as a product manager writing Jira tickets.

Multi-agent Orchestration. Now you're moving from a single agent to a coordinated team of specialists. Think of it like spinning up a digital agent software team on demand — a frontend dev, a backend dev, a DevOps engineer, a QA tester — each with a clear role, talking to each other, and collaborating toward a shared outcome. In this setup, agents can call tools, delegate subtasks to one another, reason about what’s blocking progress, and even revise each other's work. Your job shifts even more toward product and systems thinking. You’re less the coder and more the product manager. The effectiveness of the system depends not only on how well each agent performs but how clearly they communicate, and most importantly on how well you define the goals and objectives and constraints. I’s fragile at times and requires careful design — but the ceiling is high.

Below I share a few prompts that have been working well for me as I have vibe-coded my own productivity app in the past couple weeks using the ra-aid AI assistant that can autonomously write code in a ReAct loop. The essential steps are to maintain an instructions.md file that you continually improve as you face any new missteps by your AI agent, as well as a quasi-TDD approach where you insist on running tests or at minimum running your build script or similar (to provide some kind of feedback loop to the ReAct agent).

prompts/
├── instructions.md
├── frontend.md
├── backend.md
├── terraform.md
├── 01.md
├── 02.md
├── 03.md
├── 04.md
├── 05.md
├── 06.md
├── 07.md
├── 08.md
├── 09.md
├── 10.md
├── 11.md
├── 12.md
├── 13.md
└── 14.md

"Not Built Here". The AI codegen scene is replete with dev tools and workflows to accomplish much the same thing, but I somewhat like working as close to the wire with LLMs like Claude and Gemini just so I understand how everything works.

I maintain an instructions.md file that outlines the overall project structure. While you could include broader context like the app’s goals, I’ve found that unnecessary in practice. To keep the context window small—and more importantly, to avoid including information that’s extraneous to the task at hand—I break the project down into smaller, focused instruction files like backend.md, frontend.md, and terraform.md.

LLMs have a tendency to over-prioritize or misinterpret any instructions included in the prompt, even if they’re unrelated. So whenever possible, I exclude content that isn’t directly relevant. For example, if I’m working on frontend UX for a progress bar, the model doesn’t need to see backend or Terraform-related files.

## Instructions

PRIMARY TASK: The primary task has already been provided. This file only provides further instructions, guidance, and conventions.

IMPORTANT: Consult with the Expert whenever you think it could be beneficial before implementing changes to source code.

IMPORTANT: Do not make changes, add, delete any files in the `reference` folder. These files are not material to the project and may only be referenced in individual tasks as reference implementations for use elsewhere.

IMPORTANT: Do NOT implement any code changes NOT EXPLICITLY requested in the "Primary Task" section. Do NOT create any data models unless requested explicitly and with details about their data columns. Do not create any view functions unless specifically requested. Don't try to generate your own model defintions. We have an expert on software application logic and database schema design and you need to ask a human for help if you aren't sure.

# Frontend design UI UX

IMPORTANT: Whenever working on UI, UX, frontend code, React, vite, you must read the further instructions in `prompts/frontend.md` before any other research or work.

# Backend development

IMPORTANT: Whenever making changes to the backend API, Django, FastAPI, starlette, nginx, you must read the further instructions in `prompts/backend.md` before any other research or work.

# AWS production stack deployment infrastructure

IMPORTANT: Whenever making changes how this project is deployed on Amazon Web Services AWS or any other changes in the terraform stack or terraform/ folder resources, you must read the further instructions in `prompts/terraform.md` before any other research or work.

# Folder structure

The following folder layout is used:

* backend: python/django source code
* frontend: vite/reactjs/tailwind/shadcn source code
* dockerfiles: Dockerfiles for any containers
* bin: any shell scripts for devops
* terraform: AWS stack definitions

# Package managers

* Use `uv` for Python.
* Use `yarn` for Node.js.

# Library and framework versions

* Node.js 22
* Python 3.12
* Django 5.2
* Next.js 15
* PostgreSQL 16
* Vite 6
* React.js 19
* Tailwind CSS v4
* Shadcn
* react-router-dom 7

## More conventions

* Be very concise and keep comments to a minimum. Explain any changes in chat but do not include these explanations in code comments.

< ... snip ... >

## Templates

### PRIMARY TASK TEMPLATE

When creating a new `prompts/XYZ.md` file, use the following as its contents:

```
FIRST, **always** read the contents of prompts/instructions.md BEFORE READING ANY OTHER FILES -- this file describes the application architecture and provides guidance. These instructions and coding conventions are CRITICAL and together with the following comprise the PRIMARY TASK. If instructions.md implies reading further instructions markdown files like `prompts/frontend.md` or `prompts/terraform.md` ALWAYS READ THEM BEFORE READING FURTHER FILES.

ALWAYS perform any "MANDATORY PRE-TASKS" and "MANDATORY POST-TASKS". These are together mandatory with the "PRIMARY TASK".

## STEP 1 (MANDATORY PRE-TASK): CREATE NEXT PROMPT FILE

Before reading or analyzing the primary task file (e.g., `prompts/N.md`), you MUST first create the *next* incremental prompt file.

*   **Rule:** If the current primary task file is `prompts/N.md`, you MUST create the file `prompts/N+1.md`. (Example: If working on `prompts/142.md`, create `prompts/143.md`).
*   **Content:** Use the "PRIMARY TASK TEMPLATE" (defined later in `prompts/instructions.md`) for the initial content of this new `prompts/N+1.md` file.
*   **Proceed:** Only after successfully creating `prompts/N+1.md` should you proceed to Step 2 (analyzing the primary task in `prompts/N.md`).

## STEP 2 (MANDATORY PRE-TASK): COMMIT AND TAG PROMPT FILE

Commit the primary task file (e.g., `prompts/N.md`) to git and tag it.

*   **Rule:** If the current primary task file is `prompts/N.md`, you MUST `git add` `git commit -m "(adds prompts/N.md) agent instructions: "agent prompt""` and `git tag -m "prompts/N.md" prompts/N`. (Example: If working on `prompts/142.md`, perform `git add prompts/142.md; git commit -m "prompts/142.md: ReAct agent prompt"; git tag -m "prompts/142.md" prompts/142`). Replace \"agent prompt\" with my first prompt to you (before any markdown files are read)
*   **Proceed:** Only after successfully git adding and tagging `prompts/N.md` should you proceed to Step 3 (analyzing the primary task in `prompts/N.md`).

## FINAL STEPS (MANDATORY POST-TASKS): COMMIT COMPLETED TASK

Once all work for the primary task defined in `prompts/N.md` is complete:
1.  Stage all related code changes you made for task N: `git add <path/to/changed/files> ...` (Be specific or use `git add .` if all changes in the working directory are related to task N).
2.  Commit all staged changes with a message of the form "(prompts/N) <changes>" where you summarize the changes in the commit messages. Be concise.

## PRIMARY TASK

< ... This is where the template ends ... >
< ... Insert your desired task here ... >

That's a bit long-winded to read through, but the neat part is the workflow it provides:

Stages and commits the current prompts/N.md file with git tag prompts/N.md
Pre-populates the next prompts/N+1.md file
Completes the work and commits with a descriptive commit message provided by the LLM

In a perfect world, you could have reproducible prompt files that you could iterate through building your entire app a 2nd or 3rd time with different LLMs (Claude 3.7, Gemini 2.5 pro, gpt-o4-mini) -- or even just try out a specifically challenging prompt with alternate LLM models. In practical usage, however, I am often having to un-commit any changes and either polish a few things up myself or just re-write the prompt to add additional clarifying instructions.

These prompts are all run through the RA.Aid web UI with a simple message: implement prompts/65.md

A few sample Primary Tasks:

# prompts/01.md

1. Create a Django ASGI application with Channels
	1. `backend/common/`: Any common objects like User model
	2. `backend/timers/`: Create empty app folder
	3. `backend/mcp/` Create empty app folder
2. Create an empty docker-compose.yml
3. Add a PostgreSQL database to docker-compose.yml with hard-coded SQL user and pass for input into Django settings.py
4. Do not add Django to docker-compose.yml yet. We only want to run from the host shell using manage.py direct invocation.

# prompts/57.md

- Add a HealthCheckMiddleware to backend/common/middleware.py that intercepts any request to /alb-health-check
- Add the middleware to settings.py so that it runs before other middleware in request/response cycle
- Update terraform/common/applications/ to have ALB use this health-check instead of /

# prompts/60.md
My app's content width is way too small compared to the viewport width for larger resolutions like 1440p, 4K, etc.

Please make my app responsive for more viewports up to 4K.

One-shotting vs product managing

There are very cool examples online of "one-shotting" entire web applications from a single prompt, but I'm a dinosaur that prefers to work much more incrementally. That said, it is very interesting to now spend most of time writing prompts in natural language and reviewing staged code versus actually writing the code myself.

And when it comes to user interface design -- which I was never very good at ideating and designing but could otherwise take someone else's designs and implement something pixel-perfect -- the UX that AI generates today using frameworks like Tailwind CSS and Shadcn is surprisingly polished.

I do predict that UX will become one of the most critical human hires for the time being since it is now so easy to riff off a minimum viable product in a weekend. What will really set any app apart isn't basic form fields but clever and engaging user experience which will require some very talented humans.

Not to fret, frontend developers -- everyone just becomes a product manager now for a team of AI agents that may be suitable proxies for junior engineers today and senior engineers or better tomorrow. Anyone working in frontend web development has that much more leverage today, although it also stands to reason that what was once a team of 10 frontend individuals can probably be reduced by a factor of ten. And it also follows that creatives in UI/UX design will be able to join in on this; anyone now versed in Adobe Creative Cloud will be able to lean into no-code or agentic AI workflows to build frontends. So people will re-tool and find new problems to solve. I for one don't bemoan the invention of dynamite and the furloughing of pick-axe quarry miners.

Electrical Engineer-in-the-loop

One agentic workflow I plan to pursue in the coming months is what I am calling EE-in-the-loop.

Despite all the advances in AI agents, humans remain essential, especially in hardware. AI isn't good at making nuanced choices. To use a metaphor, it might choose to install art on your wall with a sledgehammer when a tack hammer is needed. Similar issues arise choosing microcontrollers, RAM, interfaces, and other choices.

Moreover, AI agents obviously can't hold a soldering iron to wire circuits together. Humans will need to be looped in.

All the the necessary technology exists to faciliate AI hardware product development: retrieval-augmented generation for datasheets, pin-outs to code to schematics (Atopile.io, CadQuery, and many others. The work comprising hardware development can be made into code which in turn can be shipped to a large language model (LLM) for reasoning and implementation. The job of the hardware product manager is choices and constraining outputs.

The key to creating an agentic workflow will be to empower the AI agent with tool calling to test its generated code (something I have developed at ChipRelay ) as well as adding a human to the ReAct loop and/or multi-agent orchestration. I imagine that prompt like the following for an agent in a multi-agent framework could produce satisfactory instructions back to a human homelab laborer:

# prompts/senior_electrical_engineer.md

After writing a new schematic, we always build the prototype board and flash with the latest firmware.

You're a senior electrical engineer mentoring a college intern who can only wire pins. When a new design is ready, use the ask_the_intern_to_wire_pins tool to wire together any necessary pins. Be very specific with what chips and what pin numbers need to be tied together.

Whenever this work is complete, invoke the test_hardware, flash_firmware, and test_firmware tools. If either fails, call the debug_firmware tool.

I do expect how we approach developing firmware for embedded will change somewhat as we will need to embrace a test-driven approach that provides meaningful feedback to AI agents (e.g. developing test jig software and target microcontroller firmware in tandem).

My plan is to build as much of my next hardware product through AI and remotely, only looping myself in for human tasks like asking the intern to wire pins together. So we'll see how this space evolves.

Final Thoughts

The AI space is moving so quickly this year that all of the developer tools we use now in April 2025 will be made obsolete by year-end (or at least so greatly enhanced they are not substantially same for same).

For my part, I have joined the model context protocol (MCP) band-wagon. MCP servers dynamically list remote tools, prompts, and other resources that can be consumed by MCP clients. An MCP client could be an AI agent performing business activities like triaging an email inbox, researching sales prospects, generating PDF reports, or it can be developer software like Cursor or ra-aid. A panoply of MCP servers can be found at such websites as Smithery.ai, github.com...awesome-mcp-servers, and of course Google.

I contributed custom tool calling to ra-aid in March which allows the AI assistant to call third-party MCP tools or to write their own custom tools for use in a project. This is really only useful for enhancing the capabilities of the AI assitant and therefore would comprise MCP tools around understanding and writing code.

In the last couple weeks I have also been writing a minimal library for MCP servers called django-mcp that wraps the official Anthropic python-sdk . The basic goal is to add idomatic decorators and functions specific to Django that makes it easier to serve any Django web application's data and services to 3rd party MCP clients. The modelcontextprovider-python-sdk and the MCP specification itself is rapidly changing so I also hope to provide some kind of abstraction layer that can provide a stable and consistent interface for Django developers, insulating them from changes in the SDK and specification while still allowing Django applications to serve MCP tools and prompts and the rest. Contributions to the project are welcome and the immediate roadmap includes adding support for authentication, Streamable HTTP transport, and helper functions to wrap database objects with per-user object permissions (i.e. authorization concerns).