DEV Community: Hoang Nguyen

The real upgrade in my AI Workflow was not better code generation

Hoang Nguyen — Sat, 11 Apr 2026 07:54:21 +0000

A lot of people still talk about AI coding workflows as if the main story is code generation.

That was true for me at first too.

Six months ago, my AI workflow was already useful. I had reusable commands, predefined templates, and a decent setup across tools like Cursor, Codex, and Claude Code. It was enough to help me generate code faster and reduce repeated prompting.

But after using it every day, I realized the real bottleneck was not code generation.

It was workflow orchestration.

I was still the one remembering what to run next. I was still dragging context from one step to another. I was still forcing verification when the AI tried to jump too quickly into implementation.

That is what changed the most while building AI DevKit.

The workflow evolved from reusable commands and templates into:

skills
memory
auto-triggered behavior
automatic verification

A recent feature made that difference very obvious to me.

I was building interactive skill selection for ai-devkit skill add.

Previously, if you wanted to add a skill, you had to know the exact skill name. If you did not know what was available, you had to search first, copy the name, then run the add command.

So I wanted this instead:

If the user runs ai-devkit skill add without a skill name, show an interactive multi-select list and let them choose.

I opened Codex and gave one sentence of instruction.

That was the only prompt I gave.

From there, the dev-lifecycle skill took over and moved through eight phases:

new requirement
review requirement
review design
execute plan
update planning
check implementation
write tests
code review

The actual feature flow took around 30 minutes.
The whole session, including docs, verification, tests, and final cleanup, was under an hour.

That still feels a bit wild to me.

What impressed me was not just that AI wrote code.

It was that one prompt kicked off a structured workflow that carried context forward and left behind something much closer to a complete engineering trail:

requirements
design
planning
implementation
verification
tests
review notes

A few details stood out:

Memory was actually useful: The workflow pulled in an old rule I had stored months earlier: CLI commands should have a non-interactive fallback for CI environments. I had forgotten that I even stored it.
The workflow could challenge itself: The review phases were not just decoration. They could loop backward if something important was missing instead of blindly moving forward.
Verification mattered more than I expected: The check-implementation phase found gaps between the code and the earlier design. Later, I still had one failing test assertion to fix manually. The workflow did not remove responsibility, but it caught structural problems much earlier.
I still drove the product: I kept the first version simple, avoided over-designing the selection UI, and made the tradeoff calls myself. The workflow drove the process. I still drove the decisions.

That is the part I care about now.

My old AI workflow could generate code.

My current workflow can carry context, verify work, and keep me from repeating myself.

That is the real upgrade.

If you want the full breakdown, I wrote it here:

How my AI workflow evolved from prompts to workflow – Codeaholicguy

About six months ago, my AI workflow was already useful. I had reusable commands. I had predefined templates. I had a decent setup in Cursor and later Claude Code and Codex. It was enough to make A…

codeaholicguy.com

If you want to try the setup:

npx ai-devkit@latest init

npx ai-devkit skill add codeaholicguy/ai-devkit dev-lifecycle
npx ai-devkit skill add codeaholicguy/ai-devkit verify
npx ai-devkit skill add codeaholicguy/ai-devkit tdd
npx ai-devkit skill add codeaholicguy/ai-devkit debug

Then open your AI editor and ask it to use the dev-lifecycle skill on a real feature.

If you are still coding the same way you did two years ago, you might be falling behind without realizing it.

Hoang Nguyen — Sat, 28 Feb 2026 08:30:06 +0000

Not because you are not capable.
Not because you are not using AI.
But because you are using AI the same way you did when it first appeared.

A year ago, I wrote about how much faster AI could make engineering teams. At that time, speed was the headline. Write a requirement, generate code, move faster than before. The improvement was obvious.

After a year of working with AI every day, I’ve realized something more important.

Speed is no longer the advantage.
Workflow discipline is.

Today I see two types of engineers.

Both use AI.
Both move faster than before.

One uses AI as an upgraded autocomplete. Prompt, review the diff, merge, move to the next task.

The other treats AI as a system component. Before coding, they let the agent clarify the requirement. They separate planning, implementation, and review into explicit phases. They generate tests and extend them with edge cases. They validate critical constraints, the parts of the system where small mistakes create long-term consequences.

The difference may look small in the first few months.
Over time, it compounds.

One engineer can only focus on one problem at a time because their workflow is still single-threaded. Prompt, wait, review, merge, repeat.

The other can scale the number of things they build by increasing the number of agents they can clearly define, coordinate, and control.

The difference is not just productivity.
It is leverage.

I share more about my experience with Agentic Engineering and how I design workflows to make that leverage compound over time in the full article here.

Link: https://codeaholicguy.com/2026/02/28/my-experience-in-agentic-engineering/

Your AI coding agent isn’t stupid

Hoang Nguyen — Sat, 14 Feb 2026 07:01:07 +0000

After using Cursor and Claude Code daily, I’ve noticed that when an AI coding agent drifts or forgets constraints, we assume it’s a model limitation.

In many cases, it’s context management.

A few observations:

Tokens are not just limits. They’re attention competition.
Even before hitting the hard window limit, attention dilution happens.
Coding tasks degrade faster than chat because of dependency density and multi-representation juggling (diffs, logs, tests).

I started managing context deliberately:

Always write a contract
Chunk sessions by intent
Snapshot state and restart
Prefer on-demand CLI instead of preloading large MCP responses

It dramatically improved the stability of the agent.

Curious how others are handling context optimization.

I also wrote a detailed breakdown of:

How tokens and context windows actually affect stability
Why coding degrades faster
A practical context stack model
Why on-demand CLI retrieval is often more context-efficient

Full post: https://codeaholicguy.com/2026/02/14/tokens-context-windows-and-why-your-ai-agent-feels-stupid-sometimes/

AI coding agents feel powerful at first, why do they get harder to use as projects grow?

Hoang Nguyen — Sat, 31 Jan 2026 07:59:38 +0000

I’ve been spending a lot of time working with AI coding agents lately, and I keep running into the same pattern.

At the beginning, everything feels great.
You write a few prompts, code is generated, and velocity goes up immediately.

Then the project grows.

Prompts get longer.
The agent starts touching files you didn’t intend to change.
Constraints slowly leak into prompts.
The workflow becomes harder to reason about.

Nothing is “broken”, but everything feels fragile.

I don’t think this is mainly a model problem.
I think it’s a mental model problem.

Most of us still treat AI coding agents like conversational tools. In practice, they behave much more like execution systems. They read context, plan, execute, sometimes observe results, and repeat.

Concepts like rules, commands, skills, sub-agents, MCP, and hooks keep showing up in different tools. I don’t see them as features. I see them as attempts to control different parts of that execution loop:

constraints
execution triggers
reusable methodology
scope and responsibility
observation and verification
enforcement

A lot of pain seems to come from mixing these layers together. For example, encoding constraints inside prompts, or using automation to “reason”.

My current workaround looks similar to what many people describe:

humans own the overall design
break work into small, clear tasks
let AI solve those locally
integrate carefully

It’s slower than full vibecoding, but much more predictable.

I’m curious how others here are approaching this:

At what point do prompts stop scaling for you?
Do you structure your agent workflows, or keep things conversational?
Have skills or scoped agents helped, or just added complexity?

I wrote a longer piece exploring this more deeply, but I’m more interested in hearing real experiences from other engineers.

Thinking about memory for AI coding agents

Hoang Nguyen — Sat, 24 Jan 2026 11:44:25 +0000

I’ve been experimenting with AI coding agents in real day-to-day work and ran into a recurring problem, I keep repeating the same engineering principles over and over.

Things like validating input, being careful with new dependencies, or respecting certain product constraints. The usual solutions are prompts or rules.

After using both for a while, neither felt right.

Prompts disappear after each task.
Rules only trigger in narrow contexts, often tied to specific files or patterns.
Some principles are personal preferences, not something I want enforced at the project level.
Others aren’t really “rules” at all, but knowledge about product constraints and past tradeoffs.

That led me to experiment with a separate “memory” layer for AI agents. Not chat history, but small, atomic pieces of knowledge: decisions, constraints, and recurring principles that can be retrieved when relevant.

A few things became obvious once I started using it seriously:

vague memory leads to vague behavior
long memory pollutes context
duplicate entries make retrieval worse
many issues only show up when you actually depend on the agent daily

AI was great at executing once the context was right. But deciding what should be remembered, what should be rejected, and when predictability matters more than cleverness still required human judgment.

Curious how others are handling this. Are you relying mostly on prompts, rules, or some form of persistent knowledge when working with AI coding agents?

Prompts are becoming Code, but we still treat them like Strings

Hoang Nguyen — Sat, 27 Dec 2025 07:42:49 +0000

Prompts used to be just text.

You wrote a few sentences, pasted them into a chat box, tweaked the wording, and moved on. If the output was not good, you tried again. Nothing else depended on it. The cost of getting it wrong was close to zero.

That phase did not last long. In The Turning Point of AI, I shared that:

AI isn’t just a tool for completing your sentences or suggesting the next line of code. We need to see it as a new way of building software.

The moment prompts moved out of chat windows and into those systems, their role changed. They were no longer throwaway text. They were reused across flows. They carried logic. They returned structured data that other parts of the system depended on.

At that point, prompts quietly became part of the system.

But the way we wrote them did not change. They were still plain text strings. Easy to write. Easy to paste. Easy to grow in the wrong direction.

Prompts as plain text, and the pain points

At first, this felt fine. A prompt was just a template string. Some variables. Maybe a bit of interpolation. It looked simple enough, and it worked for small use cases.

Pressure 1: reuse

The first pressure came from reuse.

The same prompt started powering multiple features. Copy-paste became the default strategy. Someone copied it to make a small change. Someone else added another instruction for a different flow. Over time, the prompt logic drifted.

No one could confidently say which parts were shared, which parts were safe to change, and which parts would break something else. Ownership of the structure slowly disappeared.

Nothing broke immediately. That was the dangerous part.

Pressure 2: logic

The second pressure came from logic.

As soon as prompts needed to behave differently based on context, teams usually took one of two approaches.

Approach 1: branching in code and concatenating strings

This pattern is common. The system decides which instructions to include, and the prompt is assembled step by step.

This works. The behavior is explicit. You can see exactly which rules apply in which case.

But the cost shows up as the system grows. Testability is weak because individual rules are hard to exercise in isolation. Maintainability suffers as strings grow and conditions multiply. Reuse often means copying text. A small change in one place can affect multiple flows.

Over time, reading this kind of prompt feels more like debugging than designing.

Example:

function buildPrompt(options: {
  userType: "enterprise" | "standard";
  environment: "prod" | "staging";
}) {
  let prompt = `
Role
You are a senior QA engineer designing API test plans for production services.
`;

  // persona
  prompt += `
Persona
${
  options.userType === "enterprise"
    ? "You are thorough and risk-aware. You prioritize reliability and compliance."
    : "You are pragmatic and efficient. You prioritize high-signal coverage."
}
`;

  // input contract
  prompt += `
Input
- userType: ${options.userType}
- environment: ${options.environment}
`;

  // steps
  prompt += `
Steps
Step 1: Read the provided input and treat it as the contract.
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency.
Step 3: Generate test cases with realistic inputs and expected responses.
Step 4: Validate that every field in output strictly matches output schema.
`;

  // tasks
  prompt += `
Tasks
1. Generate a test plan for the given API.
2. Return only JSON that matches output schema.
`;

  // constraints baseline
  prompt += `
Constraints
- Do not include explanations or markdown.
- Do not output extra keys beyond output schema.
- Keep test cases safe for the given environment.
`;

  // guardrails
  prompt += `
Guardrails
- Do not invent endpoints that are not implied by the API name and context.
- Do not include secrets or real tokens.
- If information is missing, leave a placeholder value and continue.
`;

  // conditional constraints
  if (options.environment === "prod") {
    prompt += `
Constraints
- Avoid destructive test cases. Prefer read-only or safely reversible operations.
`;
  }

  if (options.userType === "enterprise") {
    prompt += `
Constraints
- Include edge cases, rate limiting, and failure scenarios.
`;
  }

  // output
  prompt += `
Output
- Return a single JSON object.
- JSON must match output schema exactly.
- Each test case must follow the schema: { name, description, request, expectedResponse }.
- Output only valid JSON.
`;

  return prompt;
}

Approach 2: delegating logic to the LLM

To avoid string concatenation, some teams push the branching logic into the prompt, describe the rules inside the prompt, and ask the model to apply them.

This looks cleaner at first. There is less code. Everything lives in one place.

But the trade-offs move elsewhere. Logic becomes implicit. Behavior depends on how the model interprets the rules. Testability drops. Debugging becomes guesswork. A small wording change can alter behavior in ways that are hard to predict or reproduce.

When this fails in production, it is often unclear why. Was a rule ignored? Was it interpreted differently? Did a small phrasing change shift the model’s behavior?

Example:

function buildPrompt(options: {
  userType: "enterprise" | "standard";
  environment: "prod" | "staging";
}) {
  return `
Role
You are a senior QA engineer designing API test plans for production services.

Persona
Adjust your behavior based on the following rules:
- If userType is enterprise, be thorough and risk-aware. Prioritize reliability and compliance.
- If userType is standard, be pragmatic and efficient. Prioritize high-signal coverage.

Input
You will receive:
- userType: ${options.userType}
- environment: ${options.environment}

Steps
Step 1: Read the provided input and treat it as the contract.
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency.
Step 3: Generate test cases with realistic inputs and expected responses.
Step 4: Validate that every field in output strictly matches the output schema.

Tasks
1. Generate a test plan for the given API.
2. Return only JSON that matches the output schema.

Constraints
- Do not include explanations or markdown.
- Do not output extra keys beyond the output schema.
- Keep test cases safe for the given environment.

Apply additional rules:
- If environment is prod, avoid destructive test cases. Prefer read-only or safely reversible operations.
- If userType is enterprise, include edge cases, rate limiting, and failure scenarios.

Guardrails
- Do not invent endpoints that are not implied by the API name and context.
- Do not include secrets or real tokens.
- If information is missing, leave a placeholder value and continue.

Output
- Return a single JSON object.
- JSON must match the output schema exactly.
- Each test case must follow the schema: { name, description, request, expectedResponse }.
- Output only valid JSON.
`;
}

Some teams mix both approaches, but the underlying problems remain.

Pressure 3: structured outputs

The third pressure came from structured outputs.

Once prompts were expected to return structured data (ex: JSON) that fed directly into the next step of the system, failures became more visible. Parsing errors. Schema mismatches. Downstream crashes.

OpenAI introduced structured outputs and schema-based responses. Frameworks like the Vercel AI SDK added output validation and explicit error handling, so invalid responses fail fast before reaching the next step.

These tools solved an important part of the problem. They made failures explicit and protected downstream systems.

But they focus on validating the output. They do not address how prompts themselves are structured or evolved over time.

Introducing composable prompts with promptfmt

Composable prompts are my attempt to address this gap.

Instead of treating a prompt as one growing string, you treat it as a composition of parts. Each part has a clear responsibility. Logic is explicit. Reuse is intentional.

This idea is not new.

SQL moved from raw strings to query builders so queries could be composed safely. HTML moved from templates to components so structure and logic could scale.

I believe prompts can follow a similar path.

I built prompfmt for experimenting with this idea, the earlier prompt can be refactored without changing its behavior. What changes is the shape.

import { PromptBuilder, createCondition } from "promptfmt";

function buildPrompt(options: {
  userType: "enterprise" | "standard";
  environment: "prod" | "staging";
}) {
  return new PromptBuilder()
    .role("You are a senior QA engineer designing API test plans for production services")
    .persona((params) => {
      if (params.userType === "enterprise") return "You are thorough and risk-aware. You prioritize reliability and compliance.";
      return "You are pragmatic and efficient. You prioritize high-signal coverage.";
    })
    .steps([
      "Read the provided input and treat it as the contract",
      "Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency",
      "Generate test cases with realistic inputs and expected responses",
      "Validate that every field in output strictly matches output schema",
    ])
    .tasks([
      "Generate a test plan for the given API",
      "Return only JSON that matches output schema",
    ])
    .constraints([
      "Do not include explanations or markdown",
      "Do not output extra keys beyond output schema",
      "Keep test cases safe for the given environment",
    ])
    .guardrails([
      "Do not invent endpoints that are not implied by the API name and context",
      "Do not include secrets or real tokens",
      "If information is missing, leave a placeholder value and continue",
    ])
    .constraints({
      condition: createCondition(
        (params) => params.environment === "prod",
        "Avoid destructive test cases. Prefer read-only or safely reversible operations."
      ),
    })
    .constraints({
      condition: createCondition(
        (params) => params.userType === "enterprise",
        "Include edge cases, rate limiting, and failure scenarios."
      ),
    })
    .output([
      "Return a single JSON object",
      "Each test case must include: name, description, request, expectedResponse",
    ])
    .build(options);
}

And these are the expected outputs.

// console.log(buildPrompt({userType: "enterprise", environment: "prod"));
Role
You are a senior QA engineer designing API test plans for production services

Persona
You are thorough and risk-aware. You prioritize reliability and compliance.

Steps
Step 1: Read the provided input and treat it as the contract
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency
Step 3: Generate test cases with realistic inputs and expected responses
Step 4: Validate that every field in output strictly matches output schema

Tasks
1. Generate a test plan for the given API
2. Return only JSON that matches output schema

Constraints
- Do not include explanations or markdown
- Do not output extra keys beyond output schema
- Keep test cases safe for the given environment

Guardrails
- Do not invent endpoints that are not implied by the API name and context
- Do not include secrets or real tokens
- If information is missing, leave a placeholder value and continue

Environment Constraints
Avoid destructive test cases. Prefer read-only or safely reversible operations.

Constraints
Include edge cases, rate limiting, and failure scenarios.

Output
Return a single JSON object
Each test case must include: name, description, request, expectedResponse

// console.log(buildPrompt({userType: "standard", environment: "staging")));
Role
You are a senior QA engineer designing API test plans for production services

Persona
You are pragmatic and efficient. You prioritize high-signal coverage.

Steps
Step 1: Read the provided input and treat it as the contract
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency
Step 3: Generate test cases with realistic inputs and expected responses
Step 4: Validate that every field in output strictly matches output schema

Tasks
1. Generate a test plan for the given API
2. Return only JSON that matches output schema

Constraints
- Do not include explanations or markdown
- Do not output extra keys beyond output schema
- Keep test cases safe for the given environment

Guardrails
- Do not invent endpoints that are not implied by the API name and context
- Do not include secrets or real tokens
- If information is missing, leave a placeholder value and continue

Output
Return a single JSON object
Each test case must include: name, description, request, expectedResponse

Structure becomes visible. Logic is explicit. Each rule exists as a first-class piece instead of being buried inside text. That makes prompts easier to change and safer to evolve.

Most importantly, prompts become more testable, maintainable, and predictable.

Individual rules can be validated in isolation instead of only through end-to-end runs.
Adding a new condition does not require rewriting or copying large blocks of text.
Changes have clearer boundaries and fewer unintended side effects.
These qualities are not optional in software engineering. They are the baseline that allows systems to scale.

But there are trade-offs.

Composable prompts add abstraction. They introduce a learning curve. They do not magically solve prompt quality.

There are also cases where this approach is unnecessary. One-off prompts. Prototypes. Low-risk flows that do not return structured data.

But for prompts that sit in the middle of real systems, composability aligns better with how the rest of the codebase evolves.

Closing

Prompts are no longer just text.

They are reused. They carry logic. They return structured data that other systems depend on. That makes testing, maintainability, and clarity important.

As prompts become a stable part of systems, tooling will follow.

promptfmt is one small experiment in that direction. It is early, and it should improve and evolve with real usage and feedback.

If this perspective resonates with you, check it out and contribute ideas or improvements.

If you want to read more about software engineering in the AI era, subscribe to my blog. I will keep sharing what I learn while building systems with AI in the loop.

My Engineering Workflow in CursorAI

Hoang Nguyen — Wed, 17 Dec 2025 05:16:17 +0000

Every Software Engineer follows a similar rhythm:

Understand requirement → Design → Plan → Implement → Review → Test → Deploy → Monitor.

That hasn’t changed for a long time. When AI is involved in the workflow, what’s changed is the speed. I also mentioned this in my FREE eBook about adapting AI in Software Engineering.

Before AI, a typical cycle for mid-sized features often stretched over several weeks. Each step involved back-and-forth reviews, context switching, and waiting for feedback. But with tools like Cursor (and yes, even Claude Code), the same workflow can now run faster. I mentioned this in the post How much faster can AI actually make your team?.

I’ve been experimenting with this new way of working for a while, the last time I talked about this was in my post about my learning after using CursorAI for 9 months. Eventually, I decided to turn my setup into a reusable toolkit, I called it ai-devkit, so that I can share it with others and make AI-assisted workflows consistent across projects.

Let’s walk through how my engineering workflow actually looks today, step by step.

Engineering Workflow (Before AI)

Whether you’re at a startup or a big tech company, the core process looks like this:

Understand Requirement: Read the PRD or tickets, clarify what’s being asked.
Review Requirement: Identify gaps, edge cases, or unclear assumptions.
Design: Draft your architecture, data model, or API contract.
Review Design: Discuss with peers or leads to validate approach.
Plan: Break it into tasks, estimate effort, and define milestones.
Implement: Write the code, debug, and document.
Review Implementation: Peer review for quality and maintainability.
Testing: Unit, integration, or E2E coverage before shipping.
Deploy & Monitor: Release safely and track metrics or logs.

Each phase produces something tangible: a document, a design, a Pull Request (PR) or Merge Request (MR), or a deployment. Every phase waited for human feedback or context setup, so it could take weeks or months for even a mid-sized feature.

AI-Accelerated Workflow

To make this section easier to navigate, I’ve broken it down into two parts: Feature development and Understanding existing code.

Feature development

The workflow itself didn’t change, we just execute it differently now. I personally use Cursor as the default editor and tried to tighten the workflow with it, although I was a Neovim user for many years.

I use Cursor Commands to create reusable workflows so that I don’t need to repeat myself.

You can create commands yourself. Or you also can easily scaffold the working environment with npx ai-devkit init. You can choose your environment, such as Cursor or Claude Code, so that it will set up the files accordingly.

Here’s how it looks with ai-devkit:

Understand Requirement

In the Cursor AI Chat (or Claude Code), use the /new-requirement command to summarize goals, constraints, and success metrics.
This prompt will try to highlight what’s unclear and what assumptions you’re making.

Review Requirement

After finishing running /new-requirement commands, you now have all your required documents in docs/ai/. At this stage, we will review the files so that we won’t miss anything. I usually manually review this so that I will stay on top of the work.
Ask Cursor to review the requirement by using the command /review-requirement.
The output helps spot what the original specs might have missed.

Design

When running /new-requirement it will also propose a design in docs/ai/design.
I often generate multiple options, compare trade-offs, and refine manually.

Review Design

To support this, I have a command /review-design so that Cursor can act like a simulated peer reviewer, which challenges design decisions and reminds me about different things that I overlooked.

Plan

Convert the design into concrete steps and a checklist.

Implement

This is where AI tools such as Cursor or Claude Code shine.
Whenever we finish reviewing all the design and plan documents, we can run /execute-plan the tool will pick up the tasks one by one and execute them.
Whenever the implementation is done, you can run check-implementation to make sure the code follows the requirements.
You will need to manually review the code with the support of /code-review, so that you still own the quality of the code.

Testing

Run /writing-test to create unit and integration tests targeting high coverage.

Deploy & Monitor

Cursor helps generate release notes, deployment steps, monitoring metrics, and alerts.

I still think through the same steps, but AI keeps the flow unbroken. It helps me stay in problem-solving mode longer.

Understanding existing code

A big part of coding is understanding the code that’s already there. For years, reading and making sense of existing code has been one of the hardest things, especially for newer engineers.

Now, AI can help speed up that process. It can describe what a piece of code is doing, show you where it’s being used, and help you understand how different parts fit together.

With ai-devkit, you can use /capture-knowledge command helps you understand how existing code works by analyzing it from any entry point and generating comprehensive documentation with visual diagrams.

Know when to use MCP

When working inside Cursor or Claude Code, not everything needs to run through a Model Context Protocol (MCP) server. MCP is extremely useful when you need live integration between tools, but many workflows are still more practical using existing CLIs.

For example, I use the Figma MCP server to fetch design details directly into Cursor. It saves time when I need to reference a component or color spec while coding. Similarly, I rely on the Atlassian MCP server to fetch Jira ticket or Confluence information automatically when running /new-requirement, which keeps requirements in sync.

However, not every integration benefits from MCP. For GitLab, I prefer to use the GitLab CLI instead of setting up an MCP server. It’s faster, simpler, and perfectly fine for tasks like creating MR. Sometimes, adding MCP just for the sake of it adds unnecessary complexity.

Use MCP when context-sharing between AI and external tools makes your workflow smoother, but don’t force everything into the MCP pattern. Command-line tools and direct editor integrations are often more efficient for straightforward tasks.

Why I Built ai-devkit

Before ai-devkit, I used to keep a folder of prompts. Every time I needed to start something new, I copied one prompt into Cursor, adjusted the wording, and ran it. It worked, but it was manual, inconsistent, and error-prone. Sometimes the context is lost in the middle.

So I bundled everything into a single, reusable setup ai-devkit. It gives me consistency across projects and lets others benefit too.

You can use it directly in Cursor, or even Claude Code, and that is your choice. The structure is universal, it just depends on how you integrate prompts into your workspace.

Real Example: Better Output, Lower Cost

In one of my previous posts, I compared Claude Sonnet 4.5, GPT-5 Codex, and Grok Code Fast 1. Grok Code Fast 1 was fast, but produced incomplete code.

When I tested the same task again using ai-devkit with the /new-requirement command in Cursor with Grok Code Fast 1, it was fast, the result was good on the first try, and the cost was much lower compared to the other models.

Model matters, but what we give to the model matters more.

The future of AI-assisted Engineering

AI is now part of the engineering workflow, it amplifies, not replaces, human engineers in the workflow. We are augmented by it.

The phases remain the same, but the cycle time is reduced. You still need to think clearly, design responsibly, and test thoroughly. AI just helps you get there faster, with less friction between steps.

I see this as the new normal: Engineers still own the craft, but they now have powerful copilots to accelerate it.

If you’re curious to explore this kind of workflow, try ai-devkit. Use it, experiment with your own commands, or even extend it with better prompts. All contributions are welcome.