How I Built a Shared AI Engineering Workflow with Multiple Tools

Table of Contents

After months of wrestling with token limits, context windows, and model switching mid-project, I finally built a workflow that lets me use Claude, Codex, and local LLMs together without losing engineering continuity. Here’s how it works on a large Laravel application I’m calling MomentumX.

The Problem with Single-AI Workflows

The initial setup was simple: use Claude for everything. That worked until it didn’t.

Token limits meant I couldn’t feed the entire codebase context in one session. Long conversations degraded in quality as Claude forgot earlier decisions. Switching to a different model for cost reasons meant starting explanations from scratch. And when I needed offline access or privacy-sensitive analysis, I had no fallback.

The real pain wasn’t any single AI’s limitations — it was losing context when switching between them. Every new session felt like onboarding a junior developer who hadn’t read the previous sprint notes.

Graphify as the Structural Memory Layer

Graphify solves the code understanding problem. It scans your repository and generates a graph.json file containing:

Module dependencies and relationships
Architecture patterns detected
Semantic mapping of services, controllers, models
Call graphs and data flow paths

graphify analyze ./src --output ./ai-context/graph.json

Graphify knowledge graph visualising module dependencies and architecture relationships in a Laravel application

This file becomes the shared structural memory. When I start a Claude session, I feed it the relevant portions of graph.json. When Codex needs to understand how a service connects to the rest of the system, same file. When my local Gemma model needs context for a refactoring task, same source.

The key insight: different AI tools can continue from the same code understanding if they share a canonical representation of the codebase.

I regenerate graph.json after significant architectural changes — roughly once per sprint or after major refactors. Stale structural context is worse than no context at all.

ClickUp as the Project Memory Layer

While Graphify handles code structure, ClickUp handles project state. My ClickUp workspace contains:

Migration plans with phase breakdowns
Implementation tasks with acceptance criteria
Blockers and dependencies between tasks
Rollout notes and deployment checklists
Pending validations and QA items
Engineering decisions with rationale

I use Claude’s MCP integration to generate and manage tasks directly. If you haven’t explored MCP servers yet, I wrote about building custom MCP servers for local development workflows — the same approach works for ClickUp integration.

## Current Sprint Context
- Migrating payment service from v2 to v3 API
- Blocker: webhook signature validation failing in staging
- Decision: keeping backward compatibility for 2 release cycles
- Next: load testing after webhook fix

When I switch from Claude to Codex mid-task, I can pull the relevant ClickUp context and the AI picks up where the previous one left off.

The /ai-context Folder Structure

The missing piece was engineering memory — the unwritten knowledge that lives in developers’ heads. I created an /ai-context folder in the repository root:

/ai-context
├── architecture.md
├── migration-status.md
├── current-problems.md
├── coding-rules.md
├── deployment-flow.md
├── known-issues.md
└── graph.json

Each file serves a specific purpose:

architecture.md — High-level system design, service boundaries, database schema overview, external integrations. Updated quarterly or after major changes.

migration-status.md — Current state of any ongoing migrations. Which endpoints are on v2 vs v3, what’s tested, what’s pending.

current-problems.md — Active bugs, performance issues, technical debt being addressed this sprint. Cleared after resolution.

coding-rules.md — Project-specific conventions, naming patterns, forbidden patterns, linting exceptions. Similar to what I described in setting up Cursor rules for consistent AI coding.

deployment-flow.md — CI/CD pipeline details, environment-specific configs, rollback procedures.

known-issues.md — Edge cases, browser quirks, third-party API limitations that affect implementation decisions.

How Different AI Tools Fit the Workflow

The division of labor evolved naturally:

Claude — Reasoning, planning, architecture discussions, complex migrations. When I need to think through a database schema change or design a new service interface, Claude handles the back-and-forth exploration. The Claude API integration makes this easy to script for repetitive analysis tasks.

Codex — Implementation and autonomous edits. Once Claude and I have agreed on an approach, Codex executes. Write the migration, update the tests, refactor the affected files. Codex is better at following established patterns across multiple files.

Local Gemma/Qwen via Ollama — Lightweight offline work, grep/log analysis, privacy-sensitive code. I run Llama and similar models locally for quick questions when I don’t need the full reasoning power of Claude. Useful for parsing error logs, generating boilerplate, or working on client code that can’t leave my machine.

The Four-Layer Model

After several iterations, the system settled into four distinct layers:

Layer	Tool	Purpose
Code understanding	Graphify	Structural relationships, dependencies, architecture mapping
Project understanding	ClickUp	Tasks, decisions, blockers, rollout state
Source of truth	Git	Actual code, history, canonical state
Engineering memory	/ai-context docs	Unwritten knowledge, conventions, current problems

Each layer answers different questions:

“How does this service work?” → Graphify
“What are we building this sprint?” → ClickUp
“What does the code actually do?” → Git
“Why did we build it this way?” → /ai-context

Practical Lessons Learned

Multiple smaller specialised AI tools work better than expecting one perfect AI. Claude is great at reasoning but expensive for bulk edits. Codex is great at implementation but weaker at architectural exploration. Local models are great for offline access but struggle with complex multi-step problems.

Shared memory matters more than model quality alone. A mediocre model with good context outperforms a great model with no context. Invest in your memory layers.

Prompt standardisation across tools improves consistency. I use similar prompt templates regardless of which AI I’m talking to. The preamble always includes: current task from ClickUp, relevant architecture from /ai-context, specific files from Graphify mapping.

Stale context becomes dangerous without a canonical memory system. I’ve had Claude confidently suggest changes based on architecture that no longer existed. The /ai-context files need maintenance like any other documentation.

Local models are useful but still weaker for deep reasoning. I keep trying to use Gemma for more complex tasks, and it keeps disappointing me. Good for autocomplete, grep-like searches, and simple refactors. Not ready for architecture discussions.

What’s Next

The workflow keeps evolving. I’m exploring:

Automated /ai-context updates triggered by significant commits
Better ClickUp → Claude context injection via MCP
Fine-tuning a small local model on project-specific patterns
Version-controlled prompt templates per project type

The goal isn’t a perfect system — it’s a system that degrades gracefully when any single component fails or hits its limits.

Key Takeaways

Use Graphify or similar tools to generate shareable code structure context
Keep project state in a proper project management tool with API access
Maintain an /ai-context folder for engineering knowledge that doesn’t fit elsewhere
Match AI tools to their strengths: Claude for reasoning, Codex for implementation, local models for offline/private work
Update your memory layers regularly — stale context causes more problems than missing context