How I Built a Shared AI Engineering Workflow with Multiple Tools
Table of Contents
After months of wrestling with token limits, context windows, and model switching mid-project, I finally built a workflow that lets me use Claude, Codex, and local LLMs together without losing engineering continuity. Here’s how it works on a large Laravel application I’m calling MomentumX.
The Problem with Single-AI Workflows
The initial setup was simple: use Claude for everything. That worked until it didn’t.
Token limits meant I couldn’t feed the entire codebase context in one session. Long conversations degraded in quality as Claude forgot earlier decisions. Switching to a different model for cost reasons meant starting explanations from scratch. And when I needed offline access or privacy-sensitive analysis, I had no fallback.
The real pain wasn’t any single AI’s limitations — it was losing context when switching between them. Every new session felt like onboarding a junior developer who hadn’t read the previous sprint notes.
Graphify as the Structural Memory Layer
Graphify solves the code understanding problem. It scans your repository and generates a graph.json file containing:
- Module dependencies and relationships
- Architecture patterns detected
- Semantic mapping of services, controllers, models
- Call graphs and data flow paths
graphify analyze ./src --output ./ai-context/graph.json

This file becomes the shared structural memory. When I start a Claude session, I feed it the relevant portions of graph.json. When Codex needs to understand how a service connects to the rest of the system, same file. When my local Gemma model needs context for a refactoring task, same source.
The key insight: different AI tools can continue from the same code understanding if they share a canonical representation of the codebase.
I regenerate graph.json after significant architectural changes — roughly once per sprint or after major refactors. Stale structural context is worse than no context at all.
ClickUp as the Project Memory Layer
While Graphify handles code structure, ClickUp handles project state. My ClickUp workspace contains:
- Migration plans with phase breakdowns
- Implementation tasks with acceptance criteria
- Blockers and dependencies between tasks
- Rollout notes and deployment checklists
- Pending validations and QA items
- Engineering decisions with rationale
I use Claude’s MCP integration to generate and manage tasks directly. If you haven’t explored MCP servers yet, I wrote about building custom MCP servers for local development workflows — the same approach works for ClickUp integration.
## Current Sprint Context
- Migrating payment service from v2 to v3 API
- Blocker: webhook signature validation failing in staging
- Decision: keeping backward compatibility for 2 release cycles
- Next: load testing after webhook fix
When I switch from Claude to Codex mid-task, I can pull the relevant ClickUp context and the AI picks up where the previous one left off.
The /ai-context Folder Structure
The missing piece was engineering memory — the unwritten knowledge that lives in developers’ heads. I created an /ai-context folder in the repository root:
/ai-context
├── architecture.md
├── migration-status.md
├── current-problems.md
├── coding-rules.md
├── deployment-flow.md
├── known-issues.md
└── graph.json
Each file serves a specific purpose:
architecture.md — High-level system design, service boundaries, database schema overview, external integrations. Updated quarterly or after major changes.
migration-status.md — Current state of any ongoing migrations. Which endpoints are on v2 vs v3, what’s tested, what’s pending.
current-problems.md — Active bugs, performance issues, technical debt being addressed this sprint. Cleared after resolution.
coding-rules.md — Project-specific conventions, naming patterns, forbidden patterns, linting exceptions. Similar to what I described in setting up Cursor rules for consistent AI coding.
deployment-flow.md — CI/CD pipeline details, environment-specific configs, rollback procedures.
known-issues.md — Edge cases, browser quirks, third-party API limitations that affect implementation decisions.
How Different AI Tools Fit the Workflow
The division of labor evolved naturally:
Claude — Reasoning, planning, architecture discussions, complex migrations. When I need to think through a database schema change or design a new service interface, Claude handles the back-and-forth exploration. The Claude API integration makes this easy to script for repetitive analysis tasks.
Codex — Implementation and autonomous edits. Once Claude and I have agreed on an approach, Codex executes. Write the migration, update the tests, refactor the affected files. Codex is better at following established patterns across multiple files.
Local Gemma/Qwen via Ollama — Lightweight offline work, grep/log analysis, privacy-sensitive code. I run Llama and similar models locally for quick questions when I don’t need the full reasoning power of Claude. Useful for parsing error logs, generating boilerplate, or working on client code that can’t leave my machine.
The Four-Layer Model
After several iterations, the system settled into four distinct layers:
| Layer | Tool | Purpose |
|---|---|---|
| Code understanding | Graphify | Structural relationships, dependencies, architecture mapping |
| Project understanding | ClickUp | Tasks, decisions, blockers, rollout state |
| Source of truth | Git | Actual code, history, canonical state |
| Engineering memory | /ai-context docs | Unwritten knowledge, conventions, current problems |
Each layer answers different questions:
- “How does this service work?” → Graphify
- “What are we building this sprint?” → ClickUp
- “What does the code actually do?” → Git
- “Why did we build it this way?” → /ai-context
Practical Lessons Learned
Multiple smaller specialised AI tools work better than expecting one perfect AI. Claude is great at reasoning but expensive for bulk edits. Codex is great at implementation but weaker at architectural exploration. Local models are great for offline access but struggle with complex multi-step problems.
Shared memory matters more than model quality alone. A mediocre model with good context outperforms a great model with no context. Invest in your memory layers.
Prompt standardisation across tools improves consistency. I use similar prompt templates regardless of which AI I’m talking to. The preamble always includes: current task from ClickUp, relevant architecture from /ai-context, specific files from Graphify mapping.
Stale context becomes dangerous without a canonical memory system. I’ve had Claude confidently suggest changes based on architecture that no longer existed. The /ai-context files need maintenance like any other documentation.
Local models are useful but still weaker for deep reasoning. I keep trying to use Gemma for more complex tasks, and it keeps disappointing me. Good for autocomplete, grep-like searches, and simple refactors. Not ready for architecture discussions.
What’s Next
The workflow keeps evolving. I’m exploring:
- Automated /ai-context updates triggered by significant commits
- Better ClickUp → Claude context injection via MCP
- Fine-tuning a small local model on project-specific patterns
- Version-controlled prompt templates per project type
The goal isn’t a perfect system — it’s a system that degrades gracefully when any single component fails or hits its limits.
Key Takeaways
- Use Graphify or similar tools to generate shareable code structure context
- Keep project state in a proper project management tool with API access
- Maintain an /ai-context folder for engineering knowledge that doesn’t fit elsewhere
- Match AI tools to their strengths: Claude for reasoning, Codex for implementation, local models for offline/private work
- Update your memory layers regularly — stale context causes more problems than missing context