How are you structuring AI-assisted product development workflows?

Posted by Wonderful_Trainer412@reddit | ExperiencedDevs | View on Reddit | 32 comments

Hey guys,

I know there’s quite a bit of skepticism here around using AI in day-to-day development, but it does seem like AI has strong potential as an assistant.

I’m trying to understand what effective development pipelines look like when actively using AI tools to create whole project from scratch.

Traditional approach (pre-AI) for me was something like:

Gather and analyze business requirements
Create UI/UX mockups (e.g. in Figma)
Implement screen by screen
Write tests to support refactoring

Now with AI tools (like Figma Make, Google Stitch, Claude Design), it feels like the process is shifting.

How i see this now:

Start from an idea (or clone an existing product)
Discuss and refine requirements with AI (and store them somewhere structured)
Generate design/mockups with AI
Implement screen-by-screen with AI assistance (but also manually review code)
Generate tests with AI

Main question: How do you structure this process efficiently in practice?

How do you iterate this process with AI?

Any tools, patterns, or gotchas that made a big difference for you?

Would love to hear how others are actually doing this in real projects, not just in theory and 'hello world' projects

[-]

Finorix079@reddit

The trap I'd flag in your flow: generating tests last, after the code is written. AI-generated tests that come after AI-generated code mostly just confirm the code does what the code does. They miss the actual edge cases.

Flip it. Write the acceptance criteria as plain-English test cases before you generate the implementation, then have the AI implement against those. The tests stop being a rubber stamp and start being a contract.

[-]

viktorianer4life@reddit

Agree on writing the criteria first. The pattern I keep landing on after running hundreds of agent sessions is: tests are necessary but not sufficient, because the model can quietly weaken a test to make it pass. Renaming a helper that still returns true. Wrapping the failing line in a rescue. Skipping the example "for now."

What plugs that hole is a tiny layer below the tests, outside anything the model controls. A pre-commit grep pass that fails on a short list of forbidden shapes: skipped tests, empty rescues, tautological assertions, banned method names. Each rule is one line of regex and one sentence of why. Tests above, deterministic gate below. The model can argue with the test runner, it cannot argue with grep.

[-]

programmerman9000@reddit

I start by using the AI to build a knowledge base that the AI can effectively leverage. That basically just means extracting the relevant context from your existing structures (code, database schemas, data in tables, product spec and requirements, etc), and slimming it down so that the AI’s context remains manageable. This usually just looks like a set of documents with very clear descriptions of all relevant information written in plain English. I screen out irrelevant details and try to keep things high level at this point.

Usually I get some insights from this high level knowledge base. I use this to make decisions: investigation of things that are unclear, pushing back on requirements that don’t make sense, building a basic design/solution, etc. These new “findings” go back into the knowledge base as I iterate and explore.

Then I refine this knowledge-base, usually by pointing the AI only at this knowledge base and building out “drafts”. These drafts are essentially design documents. Unlike the high level documents, these documents contain design decisions, which often reveal “bad ideas” (mine or AI’s) when it comes to implementation. I iterate (ie create new versions) of these drafts until I’m happy with the proposal.

Then I ask the AI to make a technical design document out of this proposal — this usually includes a moderate amount of code snippets but not too detailed. And from that document, I instruct the AI to build a document listing all the PRs that will be needed to build the feature, along with basic details of what each PR will accomplish.

Finally, I execute the implementation by asking the AI to generate detailed implementation plans based on the design document and PR overview document. I usually do this one by one, fully completing the implementation of one PR before asking the AI to write out a plan for the next one. Sometimes I have to make minor changes in direction at this phase.

I deeply review each implementation plan, and usually do several iterations on each one. Each decision is clearly detailed in these documents, including all the tests. I often dive deeply into the code to do this.

The last step is to tell the AI to execute the implementation plan. I rarely have to touch the code that the AI generates at this point, but sometimes I do some refactoring by hand because somehow AI never gets the refactoring right.

[-]

CodelinesNL@reddit

Seems you ask AI to write your Reddit comments too...

[-]

programmerman9000@reddit

We are increasingly reaching a point where that will become indiscernible. Not sure what you’re trying to add to the conversation

[-]

Wonderful_Trainer412@reddit (OP)

What do you do when busines requirements changed in the middle of your project?

[-]

NewRooster1123@reddit

I would not update the project in place once requirements move. Freeze the current design and plan, write a short delta note for what changed and why, then regenerate only the affected sections, tasks, and tests from that delta. Otherwise the knowledge base starts looking current while it quietly mixes incompatible assumptions.

[-]

programmerman9000@reddit

I’ve noticed that with Claude at least, it tends to do something like this. I try to help it by telling it to explicitly note the changes in a separate section.

I feel the whole challenge still comes down to keeping the knowledge base small, clear, and organized. If the knowledge base is garbage, the results will also be garbage

[-]

NewRooster1123@reddit

I don't get why I got downvote but you are right, I also feel how ever good the AI is, still the knowledge base should be maintainable to verify them otherwise it going to be abandoned shortly.

[-]

programmerman9000@reddit

Ask the AI to do it, then review. If the change is big, create a new design document or draft. The basic knowledge base that you created shouldn’t change much so you should still have a good base to fall back on.

[-]

ChronoT52@reddit

That would be treated as a new feature. This is exactly why I treat docs specifically related to building a feature as evolutions and point-in-time reflections of the system so we don't have to deal with maintaining them after. The charter, system design, and component docs are what maintain the bigger picture updates.

[-]

Cute_Activity7527@reddit

With AI prototyping I dont care about the „How” I care about „what” and „why”.

I have a goal and hammer agent till I get what I need. Its still waaaaaaaaaaaay faster and waaaaaaaaay cheaper than me doing that or asking someone to do it.

After you have a working prototype - this is the moment for QA. You pretty much „need” human to do it. AI works as assistance if possible.

[-]

rocketbunny77@reddit

These questions read like you've instructed your openclaw instance to figure this out.

[-]

Wonderful_Trainer412@reddit (OP)

No. This is my own minds. I only asked him translate and check gramma;)

[-]

Vector-Zero@reddit

I hope she's okay

[-]

rocketbunny77@reddit

IFHFISCSJ5IESQ27JVAUOSKDL5JVIUSJJZDV6VCSJFDUORKSL5JEKRSVKNAUYXZRIZAUKRSCGYYTON2CGQ3DOMSEIVCTAN2GHFCDGQKGIM3DENJYHBBUGRBSGYZTCRKEINDDEMSFHBBUGQZRIZBDGNKCGUYDCQZZIM4DM===

[-]

throwaway_0x90@reddit

binascii.Error: Invalid base64-encoded string: number of data characters (165) cannot be 1 more than a multiple of 4

I had to try.

[-]

rocketbunny77@reddit

base32 🫢

[-]

OwnsAYard@reddit

LLMs would have recommended they use the Figma api. This looks just formatted via LLM

[-]

ChronoT52@reddit

I've adopted a brainstorm -> plan -> implement approach with my team. The entire goal for the approach is to put people first in the process with using these tools and use the tools to promote team collaboration and understanding and to prevent slop from entering the codebase. My thesis on this approach is that communication is the real problem we can try to improve on with the benefit of these tools, not just raw code output which seems to be the myopic focus of a lot of engineers using LLMs.

Brainstorming involves creating a markdown spec that defines the user stories, experience mapping, business case, acceptance criteria, and a mockup. Once those are ready the team reviews it together to tweak the UX and decide if we are building the right thing. This is the one place I allow vibe coding, and the branch is clearly named "prototype/" so the vibe coded mockup is never merged in. This phase is built around centering the team on the "why" of the feature.
Planning takes the feature doc built in the previous step and moves it into a new branch where the actual technical design happens. I landed on spec-driven design with the team, so the expectation here is to create markdown specs that show how we are going to build it, including Mermaid sequence and class diagrams, data structures and data flow, and requirements around accessibility, auditibility, scaleability, maintainability, security/threat modeling, test strategy, etc. Additionally, plans are developed that detail exactly what will be done by the agent in the codebase to implement the plan incrementally. Once these artifacts are built, a technical review is done collaboratively with some subset of the team to get consensus on the technical approach and determine whether the work is appropriately phased to facilitate good review and incremental feedback.
Implementation is the execution of the plans from the last step where we actually build things that will enter the codebase. It's highly iterative, having the LLM move one step at a time and validating and tweaking the output as we go. Often things don't go as planned exactly though, which is where the engineering expertise in particular comes in to make sure the LLM isn't sailing us way off course or creating slop (things that LLMs are very apt to do, even with all this descriptive context). Once a plan is complete, if an engineer built it alone they submit a PR for review. If the engineer pair programmed with another engineer while executing the plan, I consider the review already done and the PR can be approved immediately between the two engineers. One way or another, collaboration and review are mandatory.

As far as all the markdown generated is concerned, I built Claude skills, agents, and hooks for the team to help structure this process and create the docs. Docs around features are considered evolutionary and while they are stored in source control with the project they are otherwise ignored after the feature is delivered. At the end of implementation we do reflect and add to more stable "charter", "system design", and "component" docs that can be brought into context for future feature work. They serve as higher level descriptions of the product and architecture. Most importantly, the docs are all written with LLM assist, but they're designed to be read by people to promote communication and understanding. It's a happy coincidence they also can be used for LLM context, and I treat that as a secondary benefit.

Regardless of your approach, put people first in it. LLMs are tools and nothing more, so if we're going to use them, let's use them to build people up and build team understanding of why and how we are building things. No matter which new tools come our way, the underlying concepts of engineering and team dynamics haven't changed.

[-]

pepejovi@reddit

If the engineer pair programmed with another engineer while executing the plan, I consider the review already done and the PR can be approved immediately between the two engineers

This sounds like a terrible idea

[-]

ChronoT52@reddit

Can you elaborate?

PRs serve two major functions, one being a way to review the quality and fit of contributions to a project, and the other being to establish trust. The second consideration is more important for open source projects where you don't know the person contributing. On a software team, we know each other and can trust each other, so the second part isn't as important. Review of quality and fit ARE still important though, and for me having a pair of engineers working together to validate the quality and architectural fit of what is being built actively as it's built fundamentally serves the same goals as having a single engineer build it in isolation then send it off for review by a different engineer. If anything, having a second engineer actively participating in the process of implementation will give that engineer context they never would have gotten in an async review.

[-]

pepejovi@reddit

Basically what the other person already replied. Trust isn't the issue here, but blind trust is. If trust was that big of a deal, we wouldn't even have PRs.

I'm concerned with the selective blindness that comes from writing code, I don't believe that blindness is reduced enough by pair-programming to skip the review process. Maybe it works but to me it just sounds a little off.

[-]

ChronoT52@reddit

Tbh historically I've required a separate reviewer and didn't have teams pair program. I decided to give pairing instead of separate review a chance mostly because I thought Allen Holub's take on PRs to be pretty compelling. I started following him when I heard of his "no estimates" movement a decade ago, and often find his stances on "small a agile" to be pretty spot on.

Found a LinkedIn post where he talks about it, if you're curious.

[-]

pepejovi@reddit

Never heard of this guy before. I think I'm maybe not getting some of the terminology used here, or I don't have the industry experience from ye olde ages, but I can't really agree with most of it, and the main comparison is confusing.

What is a continuous code review and a PR-driven code review?

Assuming "Continous code review" is what google results say it is - e.g. "Growing scrum masters" defines an example:

A development team implements continuous code review by using GitHub’s pull request features integrated with Slack notifications. When a developer pushes code changes, team members automatically receive notifications and can provide immediate feedback through inline comments. During daily stand-ups, developers mention which reviews they need help with, and team members allocate time for review sessions throughout the day rather than batching them at sprint end.

This is what every team and project I've been part of has done for my entire career. The only time I've heard of people pushing their reviews for specific days is when an extremely in-demand developer is swamped with other work so they block out a day to go through their review queue. But even then, PRs generally were not blocked by waiting for them to emerge for their "Review Day", because they had other people tagged on the PR to review it as well.

So when Holub talks about PR-driven reviews I'm confused, because isn't this description exactly what a PR driven review is? Is a non-pr-driven review something that involves pushing straight to the main branch and hoping people look at the new code with a critical eye as they stumble upon it? That also totally fucks over any kind of requirement for ABI/API stability, and you'd end up with a billion smart-ass devs pushing code to "fix" something they lack the required context to understand let alone fix - something he explicitly uses as an example to go against the myth in the last paragraph.

I agree that PR Reviews slow down development, sometimes dramatically, but I've never experienced it get so bad that it outweighs the benefits. I think if you're having big issues with long, multi-week code reviews, you should look at why that is rather than trying to skip them entirely. Why aren't your devs looking at PRs (too many tasks)? Why are they leaving so many comments (badly designed tasks)? Is your PR-creating process too cumbersome?

The less I think about mob programming the better. I think I'd rather walk naked into a cloud of mosquitoes. Mobs are good for solving tricky technical issues, but if I had a mob of backseating devs staring at my screen, I think I would snap and end up in prison.

It occurs to me that maybe the "PR-driven review" as a term is referring to having a meeting to go through the code together? That idea should go through a wood chipper. That's the ultimate boss of "this meeting should've been an e-mail".

[-]

throwaway_0x90@reddit

I think what they mean is that the code reviewer should be someone that wasn't involved it writing the code. If 2 devs worked together on a component, probably best to find a third person to review.

[-]

lolimouto_enjoyer@reddit

Why don't you ask the AI?

[-]

gfivksiausuwjtjtnv@reddit

How I want to do it: Human-written feature specs- user stories only, no implementation details - then some sort of AI-assisted static verifiable layer like Lean or TLA+… then somehow acquire the ridiculous token budget to run gascity and treat the code itself as an ephemeral implementation of the overlying spec layers

How we’re really doing it: throw away Jira, move fast without solid requirements, vibe code almost everything

Is going to be very very efficient… up to a certain point lol. But we’ve smashed out nearly a full rewrite of a bloated enterprisey thing in like two months so 10x achieved

[-]

CodelinesNL@reddit

It's nonsensical to think that the basic struggles around communicating with humans magically doesn't exist now that you use AI to write code. In fact; it becomes a larger bottleneck.

You can prototype faster, but not "getting requirements" from actual users means you're going to build the same crap everyone else is building.

[-]

OwnsAYard@reddit

Agentic development would probably have each step in your workflow being looked at from multiple prompted perspectives. Force the LLMs to take perspectives, look and build (and critique) from those different perspectives.

Prompt a bunch of different specialists at each of your steps. Even have one monitoring and improving your prompts. Find something ahitty - incorporate back into your prompts and rerun the flow. For a while, don’t fix the output- keep fixing the inputs.

[-]

Wonderful_Trainer412@reddit (OP)

Maybe there is some big tutorial building project from scratch on YouTube?

[-]

Hot-Profession4091@reddit

Maybe, but remember that we’re all simultaneously figuring out what is and isn’t effective right now. Some patterns are beginning to emerge, but be wary of any guru saying they’ve got it all figured out.