AI field note: AI is great at writing code but that's just one part of building software. At PwC we accelerate the whole lifecycle, end to end. Let's dive in. Enterprise software development is more complex than ever. As systems expand and intertwine, it's a lot. Documentation lags, business requirements drift from implementation, and technical debt piles up. While code-focused AI assistants have emerged, they only address a fraction of the challenge—missing the full context and scale that enterprise applications demand. Enter PwC Code Intelligence, a capability that redefines how enterprises understand, maintain, and evolve their software. Code Intelligence sees the big picture; by treating source code as the single source of truth and combining compiler techniques with generative AI, it builds a deep, contextual understanding of your entire software system. This understanding powers a suite of specialized AI agents working in concert to tackle engineering challenges at scale: 🧩 The Context Service forms the foundation, maintaining total recall of every line of your enterprise codebase and its interconnections. 📖 DocGen automatically keeps documentation accurate and up-to-date as your code evolves. ✅ ReqGen ensures business requirements remain aligned with implementation throughout development. 🧪 TestGen builds comprehensive test suites that validate both technical and business requirements. ⚙️ CodeGen implements features and modernizes code with a deep understanding of your enterprise patterns. Let's connect the dots. Consider modifying a mission-critical payment system—Context Service provides every agent with complete understanding of database dependencies, compliance requirements, and business logic. DocGen updates documentation instantly, ReqGen verifies requirements alignment, TestGen ensures full test coverage, and CodeGen implements changes while maintaining enterprise standards. What once took weeks of careful coordination now happens automatically—with enterprise-grade quality assured. Early adopters of Code Intelligence are seeing remarkable results: ✈️ A major U.S. airline achieved 50% efficiency gains in modernizing a critical legacy application. Code Intelligence delivered clarity on business rules, regulatory compliance, and code relationships—accelerating development while reducing costs. 📞 A leading telecom provider used Code Intelligence to migrate mission-critical data management applications from mainframes. AI-driven insights mapped complex dependencies, generated documentation, and automated migration scripts—cutting months of manual effort while improving quality. 💡 PwC's own Commercial Technology & Innovation team processed over 15 million lines of code, achieving documentation and test coverage levels beyond traditional capabilities. We couldn't be more excited by the opportunity and impact from Code Intelligence so far. We're ready to do more. Drop me a line if interested.
Using Code Generators for Reliable Software Development
Explore top LinkedIn content from expert professionals.
Summary
Using code generators for reliable software development means automating parts of the coding process with tools or AI that produce code, aiming to make software creation faster, more consistent, and less error-prone. While these tools can rapidly generate code and tests, ensuring quality and reliability still requires thoughtful workflows, proper context management, and thorough validation.
- Test generated code: Always run real execution tests, not just manual reviews, to confirm AI-generated code actually works in production scenarios.
- Manage context wisely: Keep coding agents focused by providing the right information at the right time, avoiding overload or irrelevant details.
- Review and refine: Regularly check generated code for issues like duplication and security vulnerabilities, and make adjustments before deploying.
-
-
Most AI coders (Cursor, Claude Code, etc.) still skip the simplest path to reliable software: make the model fail first. Test-driven development turns an LLM into a self-correcting coder. Here’s the cycle I use with Claude (works for Gemini or o3 too): (1) Write failing tests – “generate unit tests for foo.py covering logged-out users; don’t touch implementation.” (2) Confirm the red bar – run the suite, watch it fail, commit the tests. (3) Iterate to green – instruct the coding model to “update foo.py until all tests pass. Tests stay frozen!” The AI agent then writes, runs, tweaks, and repeats. (4) Verify + commit – once the suite is green, push the code and open a PR with context-rich commit messages. Why this works: -> Tests act as a concrete target, slashing hallucinations -> Iterative feedback lets the coding agent self-correct instead of over-fitting a one-shot response -> You finish with executable specs, cleaner diffs, and auditable history I’ve cut debugging time in half since adopting this loop. If you’re agentic-coding without TDD, you’re leaving reliability and velocity on the table. This and a dozen more tips for developers building with AI in my latest AI Tidbits post https://lnkd.in/gTydCV9b
-
We built a workflow to actually test AI-generated code, and assess how good each one is at writing Weaviate code in Python. The workflow: - Give an LLM a coding task - Take the generated code and run it in a sandboxed Docker container, against a cloud-based Weaviate instance - Binary result: either it works or it doesn't No human judgment calls. No "this looks about right." Just execution. This is scalable, too. We set this up some time ago, actually - and we were able to just update the repo to test it on the recent Claude 4, Gemini 2.5 Pro & GPT-5 models. Running this on 14 Weaviate Python tasks (successful executions out of 14) - the best performers are: • Gemini 2.5 Pro: 14/14 ✅ • Claude Sonnet 4: 13/14 • Claude 3.5 Haiku: 11/14 • Claude Opus 4: 11/14 • GPT-5: 10/14 • Gemini 2.5 Flash: 8/14 Why this matters: Most AI code evaluation is subjective - does it look right? Does it follow patterns? But for production systems, the only question that matters is: does it actually run? This approach scales. You can test hundreds of code samples automatically instead of manually reviewing each one. The Docker sandbox keeps everything safe while giving you real execution results. This also helps us to write better contextual examples, which can be copy/pasted, or used as a part of your context to generate amazing Weaviate code in seconds! See our learnings: https://lnkd.in/eEbvuqqm Execution testing reveals gaps that code review misses. The workflow is simple but changes how you think about AI code reliability. Instead of trusting generated code based on appearance, you get actual proof it works. Resources: Repo: https://lnkd.in/e6XHww66 Are you testing AI-generated code execution, or just reviewing it manually? #AI #CodeGeneration #Testing #DevOps
-
After 2,000+ hours using Claude Code across real production codebases, I can tell you the thing that separates reliable from unreliable isn't the model, the prompt, or even the task complexity. It's context management. About 80% of the coding agent failures I see trace back to poor context - either too much noise, the wrong information loaded at the wrong time, or context that's drifted from the actual state of the codebase. Even with a 1M token window, Chroma's research shows that performance degrades as context grows. More tokens is not always better. I built the WISC framework (inspired by Anthropic's research) to handle this systematically. Four strategy areas: W - Write (externalize your agent's memory) - Git log as long-term memory with standardized commit messages - Plan in one session, implement in a fresh one - Progress files and handoffs for cross-session state I - Isolate (keep your main context clean) - Subagents for research (90.2% improvement per Anthropic's data) - Scout pattern to preview docs before committing them to main context S - Select (just in time, not just in case) - Global rules (always loaded) - On-demand context for specific code areas - Skills with progressive disclosure - Prime commands for live codebase exploration C - Compress (only when you have to) - Handoffs for custom session summaries - /compact with targeted summarization instructions These work on any codebase. Not just greenfield side projects! I've applied this on enterprise codebases spanning multiple repositories, and the reliability improvement is consistent. I also just published a YouTube video going over the WISC framework in a lot more detail. Very value packed! Check it out here: https://lnkd.in/ggxxepik
-
Several comprehensive studies including O’Reilly’s Playbook for Large Language Model Security, the 2025 State of Software Delivery report, and GitClear’s 2025 AI Copilot Code Quality report conclude that companies have started using AI for #coding too soon. A general conclusion: “LLMs are not #software engineers; they are like interns with goldfish memory. They’re great for quick tasks but terrible at keeping track of the big picture.” “As reliance on #AI increases, that big picture is being sidelined. Ironically, by certain accounts, the total developer workload is increasing—the majority of developers spend more time debugging AI-generated code and resolving security vulnerabilities.” “AI output is usually pretty good, but it’s still not quite reliable enough,” says another. “It needs to be a lot more accurate and consistent. Developers still always need to review, debug, and adjust it.” One problem: “AI tools tend to duplicate code, missing opportunities for code reuse and increasing the volume of code that must be maintained.” GitClear’s report “analyzed 211 million lines of code changes and found that in 2024, the frequency of duplicated code blocks increased eightfold.” “In addition to piling on unnecessary technical debt, cloned code blocks are linked to more defects—anywhere from 15% to 50% more.” While larger context windows will help, “they’re still insufficient to grasp full software architectures or suggest proper refactoring.” One CEO says: “AI tools often waste more time than they save for areas like generating entire programs or where broader context is required. The quality of the code generated drops significantly when they’re asked to write longer-form routines.” “Hallucinations still remain a concern. AI doesn’t just make mistakes—it makes them confidently. It will invent open-source packages that don’t exist, introduce subtle security vulnerabilities, and do it all with a straight face.” “Security vulnerabilities are another issue. AI-generated code may contain exploitable flaws.” Furthermore, AI agents often “fail to find root cause, resulting in partial or flawed solutions:” “Agents pinpoint the source of an issue remarkably quickly, using keyword searches across the whole repository to quickly locate the relevant file and functions—often far faster than a human would. However, they often exhibit a limited understanding of how the issue spans multiple components or files, and fail to address the root cause, leading to solutions that are incorrect or insufficiently comprehensive.” Solutions include better training data, more testing to validate AI outputs, progressive rollouts, and greater use of finely tuned models. The bottom line for some: “AI-generated code isn’t great—yet. But if you’re ignoring it, you’re already behind. The next 12 months are going to be a wild ride.” #technology #innovation #artificialintelligence #hype
-
Are you tired of the manual "click-marathon" every time you need to create a new set of tables, domains, or data elements? Whether you are setting up a demo, a tutorial, or a large-scale project, manual DDIC creation is a time-sink. In my latest article, I dive deep into how to leverage the XCO Library to build a reusable generator that handles the heavy lifting for you. What’s inside: 🛠️ The Foundation: Using ABAP OO, Factories, and Injectors to build a testable generation framework. 🛠️ Smart Configuration: How to derive packages and transport requests automatically from the calling environment. 🛠️ Object Generation: Step-by-step implementation for: Domains & Data Elements: Automated type mapping and label generation. 🛠️ CDS Abstract Entities: Handling complex field lists and semantics (like Currency/Quantity annotations). 💡 By moving from manual creation to a Configuration-as-Code approach, you not only save time but also ensure consistency across your development landscape. Plus, it’s fully ABAP Cloud compatible. Read the full technical breakdown and get the source in the comments 👇 #ABAP #SAP #ABAPCloud #Automation #SoftwareEngineering
-
Achieving 3x-25x Performance Gains for High-Quality, AI-Powered Data Analysis Asking complex data questions in plain English and getting precise answers feels like magic, but it’s technically challenging. One of my jobs is analyzing the health of numerous programs. To make that easier we are building an AI app with Sapient Slingshot that answers natural language queries by generating and executing code on project/program health data. The challenge is that this process needs to be both fast and reliable. We started with gemini-2.5-pro, but 50+ second response times and inconsistent results made it unsuitable for interactive use. Our goal: reduce latency without sacrificing accuracy. The New Bottleneck: Tuning "Think Time" Traditional optimization targets code execution, but in AI apps, the real bottleneck is LLM "think time", i.e. the delay in generating correct code on the fly. Here are some techniques we used to cut think time while maintaining output quality: ① Context-Rich Prompts Accuracy starts with context. We dynamically create prompts for each query: ➜ Pre-Processing Logic: We pre-generate any code that doesn't need "intelligence" so that LLM doesn't have to ➜ Dynamic Data-Awareness: Prompts include full schema, sample data, and value stats to give the model a full view. ➜ Domain Templates: We tailor prompts for specific ontology like "Client satisfaction" or "Cycle Time" or "Quality". This reduces errors and latency, improving codegen quality from the first try. ② Structured Code Generation Even with great context, LLMs can output messy code. We guide query structure explicitly: ➜ Simple queries: Direct the LLM to generate a single line chained pandas expression. ➜ Complex queries : Direct the LLM to generate two lines, one for processing, one for the final result Clear patterns ensure clean, reliable output. ③ Two-Tiered Caching for Speed Once accuracy was reliable, we tackled speed with intelligent caching: ➜ Tier 1: Helper Cache – 3x Faster ⊙ Find a semantically similar past query ⊙ Use a faster model (e.g. gemini-2.5-flash) ⊙ Include the past query and code as a one-shot prompt This cut response times from 50+s to <15s while maintaining accuracy. ➜ Tier 2: Lightning Cache – 25x Faster ⊙ Detect duplicates for exact or near matches ⊙ Reuse validated code ⊙ Execute instantly, skipping the LLM This brought response times to ~2 seconds for repeated queries. ④ Advanced Memory Architecture ➜ Graph Memory (Neo4j via Graphiti): Stores query history, code, and relationships for fast, structured retrieval. ➜ High-Quality Embeddings: We use BAAI/bge-large-en-v1.5 to match queries by true meaning. ➜ Conversational Context: Full session history is stored, so prompts reflect recent interactions, enabling seamless follow-ups. By combining rich context, structured code, caching, and smart memory, we can build AI systems that deliver natural language querying with the speed and reliability that we, as users, expect of it.
-
In this research funded by a grant that I received and published in a Q1 journal (Impact Factor 6.5, CiteScore 13.6, h5 index: 148, h5 median: 211), my SJSU College of Information, Data and Society student, Neha Bais Thakur and I systematically evaluate and interpret 15 open-source models (Code LLaMa, Granite Code, DeepSeek-Coder-V2, Yi-Coder, etc.) on code translation and generation, uncovering insights into reliability and prompt sensitivity. The study provides interpretability insights into the LLM-driven code translation process, which is the first of its kind on coding LLMs for translation tasks. We demonstrate that LLMs can not only translate code but also comprehend, explain, and correct their own errors when provided with compiler feedback. This led to an average improvement of 10% in the CCR metric and 8% in the Pass@1 metric. Key contributions: + Comprehensive performance benchmarking with CodeBLEU, chrF, METEOR, and Pass@k. + Interpretability analysis using Feature Ablation and Shapley Value Sampling, uncovering prompt sensitivities. + Insights into reliability, error correction, and practical deployment of open-source LLMs for software engineering. We also showed that line-to-line interpretability heatmaps can be a valuable tool for guiding LLM output towards a desired outcome. We hope this study helps researchers and practitioners better understand the capabilities and limitations of code-focused LLMs. https://lnkd.in/gw3YW9eJ #AI #LLMs #CodeGeneration #XAI #SoftwareEngineering
-
The Evolution of Code: From Manual Craft to AI Collaboration Software development is changing rapidly! We're seeing three main approaches emerge: Manual Coding: The traditional way. Developers write code line-by-line, requiring deep understanding and control. Great for complex logic & robust systems, but can be slow. Debugging is efficient because you know the code inside out. Vibe Coding: Using AI (LLMs) to generate code from natural language prompts. Fast for quick prototypes and accessible to non-coders. However, understanding the generated code is limited, making debugging a significant challenge. Hybrid Coding: The best of both worlds. Developers use AI tools for assistance (code completion, generation, suggestions) but remain actively involved in reviewing, understanding, and refining the code. Why Hybrid is Gaining Traction: Hybrid coding boosts productivity by automating repetitive tasks while keeping the human developer firmly in the driver's seat. Crucially, developer understanding of the logic is maintained. This is KEY for effective debugging. When issues arise, a developer who understands the (even AI-assisted) code can quickly identify root causes and implement robust fixes, unlike the "black box" problem of pure vibe coding. Hybrid coding isn't just about speed; it's about smarter, more maintainable, and more debuggable code. It feels like the most practical path forward, blending AI power with essential human expertise. Comments? #coding #softwaredevelopment #AI #LLMs #hybridcoding #vibe #manualcoding #programming #debugging #futureoftech #Aryaka
-
With Google announcing that 25% of their shipped code last quarter was AI-generated, it's clear: AI isn't just coming to software development — it's already here. At LaunchWare, we've been thoughtfully integrating AI tools like Claude, ChatGPT, and GitHub Copilot into our development workflow. We're not using AI to replace developers or churn out complete applications. It's simply not there (yet?), and all generated code requires a good deal of informed, human intervention. Instead, we're leveraging these tools in focused, strategic ways: - 🤔 Architectural Rubber Duck: AI makes an excellent sounding board for system design discussions. It helps surface edge cases we might have missed and challenges our assumptions. This is my favorite application of the technology so far. - 🔍 Syntax Refinement: When working with complex regular expressions or tricky SQL queries, AI helps us polish our approach — though we always verify the output. - ✅ Unit Test Foundations: We're exploring AI for generating baseline test cases, which our developers then enhance and verify. The real question for technology leaders isn't whether to adopt AI — it's how to do it responsibly. Here are three key considerations we've identified: - Define clear usage guidelines (What types of code can be AI-generated? What requires human review?) - Establish verification protocols (How do we validate AI-generated code?) - Create documentation requirements (How do we track what's AI-assisted vs. human-written?) Fellow technology leaders: How are you approaching AI in your development workflow? What policies have you put in place? This is all so new - curious how other teams are approaching AI assisted coding. #SoftwareDevelopment #AI #TechnologyLeadership #SDLC #CodeQuality