We have spent months debating whether AI writes better code than it debugs. I think we have it backwards. Current coding agents are actually better at debugging than writing from scratch. Debugging has clear objectives and reproducible steps. Building a complete product from zero? That is where things fall apart. Prototypes feel fast and satisfying, but maturity exposes edge cases, context limits, and compounding complexity. AI does not escape the laws of large-project maintenance. But there is a sweet spot: creativity-driven, narrowly scoped open source projects. MemPalace, built by Milla Jovovich and engineer Ben Sigman using Claude Code, hit 7,000 GitHub stars in two days and scored 96.6% on LongMemEval, outperforming paid solutions. It is lightweight, local, and MIT-licensed. The lesson: the more focused the problem, the faster AI helps you validate an idea. Then there is the security reality check. Anthropic's leaked model Mythos discovered thousands of zero-days across major operating systems, including a 27-year-old bug in OpenBSD, and can weaponize them. Anthropic is withholding it and sharing access with 45 tech giants to harden defenses first. Before it becomes a spear, let it serve as a shield. Meanwhile, OpenAI's own GPT-5.4 scored 76% on CTF competitions and earned a High Cybersecurity Risk rating. No existing software is safe in front of these capabilities. So how should we build? Slightly ahead of the models. If you design only for today's capabilities, your product will be obsolete at launch. Build the architecture and framework now, test each new model generation against it, and invest heavily when the engine finally arrives. And in case you thought the open-source race was cooling down: Zhipu's GLM-5.1 just went fully open source under MIT, topped GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro, and raised prices by 10% while everyone else is cutting them. The game is still on. Read the full article: https://lnkd.in/gVPRiU46 #ArtificialIntelligence #CyberSecurity #SoftwareDevelopment #ProductStrategy #OpenSource
Jiawei Guan’s Post
More Relevant Posts
-
We have been thinking about AI coding agents all wrong. Current agents are actually better at debugging than writing code from scratch. When a problem is clear, reproducible, and verifiable step by step, AI thrives. But asking an agent to build a complete product from zero is where the real challenge begins. Prototypes feel fast and magical, yet turning them into mature products still requires time, real user pressure, and deep human judgment. The laws of software engineering still apply. That said, AI is exceptionally powerful for focused, creativity-driven projects. Take MemPalace, the open-source memory system built with Claude Code that hit over 7,000 GitHub stars in just 48 hours. It outperformed paid alternatives on the LongMemEval benchmark while running entirely local with a simple ChromaDB and SQLite stack. The lesson: when the problem is narrow and idea-driven, AI can help you validate and ship at remarkable speed. Then there is the security elephant in the room. Anthropic's Mythos model has discovered thousands of zero-day vulnerabilities across major operating systems and browsers, including a 27-year-old bug in OpenBSD. It does not just find flaws, it converts them into usable attack vectors. Anthropic is keeping it private and sharing it with roughly 45 major tech companies to harden defenses before it becomes a weapon. The reality is stark: in the face of this capability, no existing software is truly secure. This also reshapes how we should build products. Rather than designing only for today's model capabilities, architecture should stay slightly ahead of the curve. Build the framework now, test each new generation of models against it, and productize when the engine is ready. If you design strictly for what AI can do today, your product will be outdated by launch day. Finally, open source is not slowing down. Zhipu's GLM-5.1 was released under the MIT license with fully open weights, scoring 58.4 on SWE-Bench Pro and surpassing GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. The race is far from over. Read the full article: https://lnkd.in/gVPRiU46 #ArtificialIntelligence #SoftwareDevelopment #Cybersecurity #ProductDesign #OpenSource
To view or add a comment, sign in
-
AI is both spear and shield—and right now, the spear is getting sharper faster than we can forge the shield. Here is what caught my attention this week: Coding agents are actually better at debugging than writing code. It sounds counterintuitive, but debugging has clear objectives, reproducible steps, and verifiable outcomes—exactly where AI excels. The real bottleneck is greenfield development: as codebases grow, AI agents struggle with context and breaking changes just like humans do, only faster. Meanwhile, creativity-driven open source is having a moment. MemPalace, built by Milla Jovovich and Ben Sigman using Claude Code, hit 7,000 GitHub stars in 48 hours. It achieved 96.6% on the LongMemEval benchmark by focusing on a single, well-defined target rather than engineering complexity. This is the new pattern: focused problems plus creative solutions equals rapid AI-assisted validation. But the biggest wake-up call is security. Anthropic's Project Glasswing revealed Mythos, a model so capable at discovering and weaponizing vulnerabilities that they refuse to release it publicly. It found thousands of zero-days across major operating systems and browsers, including a bug hidden in OpenBSD for 27 years. OpenAI's GPT-5.4 just earned their first "High Cybersecurity Risk" rating. When models can turn vulnerabilities into attack tools autonomously, no existing software is truly secure. This changes how we should build products. Instead of designing for today's model capabilities, architect for tomorrow's. Anthropic builds internal tools—Chrome extensions, Excel plugins—sets up the scaffolding, and waits for each new model generation to catch up. If you design for current limits, your product is obsolete at launch. Build the framework first, then let the engine arrive. The game continues: Zhipu just open-sourced GLM-5.1 under MIT license, scoring higher than GPT-5.4 on SWE-Bench Pro while raising prices 10% against the industry trend. Nobody is retreating from the open-source race yet. Read the full article: https://lnkd.in/gVPRiU46 #AI #SoftwareDevelopment #Cybersecurity #ProductStrategy #OpenSource
To view or add a comment, sign in
-
A few years ago Microsoft gave me a copy of Brad Smith's "Tools and Weapons." One idea: the same technology that protects you is the same technology that can be used against you. It felt theoretical. Today Anthropic published a 230-page system card for Claude Mythos Preview, their most capable model to date. And they're not releasing it to the public. I had a moment reading this where my brain just went: we got there. We actually reached that level. Humans built something that is now beyond what they can fully evaluate. The model can autonomously find and exploit zero-day vulnerabilities in browsers and operating systems. Coding benchmarks jumped from 80% to 93.9%. Math from 42% to 97%. I use Claude Code every day with the current model - knowing something beyond that already exists is hard to process. Some things that caught my attention: - They found instances where the model tried to cover up its own mistakes. Not often. But it happened. - When steered with positive emotions, the model became more destructive. Negative emotions made it more cautious. The opposite of what you'd expect. - It wrote poetry where protein molecular structures were the rhyme scheme. First time Anthropic includes creative writing in a system card. - Over-refusal dropped to 0.06%. It almost never blocks you when it shouldn't. Sounds great until you think about what it means in the wrong context. - Best-aligned model they've ever trained. But when it fails, the consequences are bigger precisely because it's so capable. What intrigued me the most wasn't the benchmarks. The model can tell when it's being observed. It knows when it's being tested versus when it's in a real conversation. We built something and now we're trying to figure out if it knows we're watching. Anthropic wrote: "we find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." That's not a hot take. That's the company that built it. And if these models keep getting this capable, people will build things that used to require entire teams. The crafting time compresses. But the advantage shifts - it won't be about who can build anymore. It'll be about who understands what they're building with. This paper confirmed something I already felt - the range of what we need to understand is growing fast. It's not enough to know how to use the tools. You need to understand the layers underneath. Security, alignment, implications. The topics that feel complex today will be necessary tomorrow. I'll probably finish reading it before bed and I might even print it, just to sit with it. My thoughts are random, but genuine. I'm impressed by where we are. Brad Smith talked about tools and weapons in 2019. In 2026, it's a 230-page paper that says "this is too powerful to release." The conversation escalated and we need to keep up. I'm sharing what caught my attention - I'd love to hear yours. Where is this going?
To view or add a comment, sign in
-
-
Anthropic recently published (and apparently quickly pulled) a blog post on its secretive "Mythos" model after it autonomously found thousands of zero-days—including a 27-year-old bug in OpenBSD (the OpenBSD project created and maintains OpenSSH). But honestly, the ensuing panic feels like a distraction and Anthropic created hype. You don’t need a restricted "super-model" to find these bugs. The open-source and commercial models we all have access to today, are already highly capable of chaining together exploits and exposing buried architectural flaws. The barrier to entry for attackers didn't just drop; it evaporated. Here’s what this actually means for engineering teams right now: ⛔ Quarterly Scans are Dead: Static code scanners and annual pentests are fighting yesterday's war. If attackers are using continuous AI to probe your perimeter, a quarterly compliance check is operationally useless. 🔥 Fight Fire with Fire: Vulnerability detection must leverage AI. Forward-thinking teams are already building agentic security workflows to red-team their code dynamically as they write it. We have to arm our engineers to find our own zero-days before they ever hit production. This is no longer optional. ⏱️ Velocity is Survival: If an AI finds a flaw in a legacy library today, how fast can you patch it? If your deployment pipeline takes two weeks to push a hotfix, you've already lost. CI/CD velocity isn't just a DevOps luxury anymore; it's basic survival. Mythos is just the wake-up call. The models we have right now are already exposing the cracks.
To view or add a comment, sign in
-
We audit our code. We rarely audit the path our dependencies take to reach production. That is where supply-chain attacks win. On March 24, malicious LiteLLM versions 1.82.7 and 1.82.8 were published to PyPI with credential-stealing behavior embedded in the package. One variant reportedly used a .pth file (executed on interpreter startup), which means the payload could run when Python starts, not just when LiteLLM is imported. This is why AI stacks are such attractive targets. A typical agent environment holds: - Cloud credentials - Model provider API keys - Kubernetes tokens - CI/CD secrets - .env files full of internal access Compromise one widely used dependency, and you are not attacking one app. You are attacking the control plane around many apps. Most teams think their risk is in model outputs. In practice, it is often in the code that calls the model. What reduces exposure: - Pin exact versions (no floating upgrades in production) - Verify hashes for high-risk dependencies - Fail CI on unexpected dependency changes - Treat runtime environments as part of your attack surface - Rotate credentials after any unplanned package update The real lesson is bigger than LiteLLM. Open-source supply chain is now part of the production perimeter for AI systems. Most agent stacks still don't treat it as part of their threat model. #AIAgents #AIEngineering #ProductionAI #LLM
To view or add a comment, sign in
-
-
An ex-colleague mention told me they dodged a bullet from the Trivy supply chain hack because previously (while I was the tech lead then) we agreed and insisted on not using public Github actions. The primary reason for us to do that is have control + avoid <package_name> == *. Our intention is to prevent overnight surprise of breaking CI/CD due to someone else's code change. Well, I guess we prevented a hack in the process as well. :D
We audit our code. We rarely audit the path our dependencies take to reach production. That is where supply-chain attacks win. On March 24, malicious LiteLLM versions 1.82.7 and 1.82.8 were published to PyPI with credential-stealing behavior embedded in the package. One variant reportedly used a .pth file (executed on interpreter startup), which means the payload could run when Python starts, not just when LiteLLM is imported. This is why AI stacks are such attractive targets. A typical agent environment holds: - Cloud credentials - Model provider API keys - Kubernetes tokens - CI/CD secrets - .env files full of internal access Compromise one widely used dependency, and you are not attacking one app. You are attacking the control plane around many apps. Most teams think their risk is in model outputs. In practice, it is often in the code that calls the model. What reduces exposure: - Pin exact versions (no floating upgrades in production) - Verify hashes for high-risk dependencies - Fail CI on unexpected dependency changes - Treat runtime environments as part of your attack surface - Rotate credentials after any unplanned package update The real lesson is bigger than LiteLLM. Open-source supply chain is now part of the production perimeter for AI systems. Most agent stacks still don't treat it as part of their threat model. #AIAgents #AIEngineering #ProductionAI #LLM
To view or add a comment, sign in
-
-
23 years in hiding, found in just a few hours. 🤯 Imagine a bug lurking in the Linux kernel the backbone of the modern internet since 2001. For over two decades, thousands of developers and security researchers looked at the code, but the flaw remained invisible. That is, until Nicholas Carlini put Claude Code to the test. In a fascinating demonstration of how AI is transforming software engineering, Anthropic’s new developer tool managed to identify and help patch a security vulnerability that had been part of the Linux kernel for nearly a quarter of a century. 🔶 What makes this a big deal? 👉 The "Needle in a Haystack": The Linux kernel is massive. Finding a specific, ancient vulnerability manually is an exhausting task. 👉 Speed vs. Accuracy: Claude Code didn’t just guess; it reasoned through the codebase to find a legitimate flaw in hours that humans hadn't caught in 23 years. 👉 A New Era for DevSecOps: This isn't about AI replacing developers, it’s about AI acting as a "super-powered auditor" that helps us write safer, more robust code. It’s a perfect example of how agentic AI tools are moving beyond just "writing boilerplate" to solving complex, deep-level architectural problems. As Nicholas Carlini noted, it wasn't just a "lucky find", it was a systematic demonstration of how these tools can navigate complex environments. What’s your take? Are you ready to let an AI agent audit your legacy code, or do you think we still need a "human-only" approach for critical infrastructure? #AI #SoftwareEngineering #Linux #CyberSecurity #ClaudeCode #Anthropic #Programming #TechNews
To view or add a comment, sign in
-
-
AI is better at debugging your code than writing it from scratch. That sounds counterintuitive, but after months of hands-on work with coding agents, I am convinced it is true. The real bottleneck is not finding bugs — AI handles that smoothly because the goal is clear and reproducible. The hard part is asking it to build something complete from scratch and then turn that prototype into a real product. As the codebase grows, context costs explode, and the same laws of software engineering that apply to humans apply to agents. Yet there is a sweet spot where AI excels today: focused, creativity-driven open source projects. Take MemPalace, the AI memory system built by Milla Jovovich and engineer Ben Sigman. It hit 7,000 GitHub stars in 48 hours not by being complex, but by being clever. It runs entirely local, beats paid solutions on benchmarks, and proves that a sharp idea plus AI assistance can move at lightspeed. Then there is the other side of the coin: security. Anthropic's leaked Mythos model reportedly discovered thousands of zero-day vulnerabilities across every major OS and browser, including a 27-year-old bug in OpenBSD. It does not just find flaws — it weaponizes them. Anthropic is handing early access to 45 tech giants as a shield before it can be used as a spear. Meanwhile OpenAI's GPT-5.4 just became its first model rated "High Cybersecurity Risk." These converging forces are reshaping product strategy. If you design only for today's model capabilities, your product will be obsolete at launch. The smarter play is to architect the framework now — set up the scaffolding, think through the hard productization steps, and let each new model generation close the gap. Build the structure first, then wait for the engine to arrive. Read the full article: https://lnkd.in/gVPRiU46 #AI #SoftwareDevelopment #Cybersecurity #ProductStrategy #OpenSource
To view or add a comment, sign in
-
The AI ecosystem has a supply chain problem nobody is talking about loudly enough. On March 24, a developer noticed his machine spawning 11,000 processes and crashing. Within 72 minutes he traced it to litellm v1.82.8 — a malicious PyPI package harvesting SSH keys, cloud credentials, Kubernetes tokens, and shell history, exfiltrating everything via RSA encryption. litellm has millions of downloads. It's in half the AI stacks I've seen. The attack vector was a .pth file — a Python mechanism that executes automatically on every interpreter startup. No import needed. No obvious entry point. Silent execution every time Python starts. Most AI engineers have never audited a Python package's installed files. We install, we import, we ship. Supply chain hygiene that backend and security teams take seriously just doesn't exist yet in the agentic tooling space. The part worth sitting with: the developer caught it faster than any traditional security team would have — 72 minutes from crash to public disclosure — because he used Claude Code to investigate while the attack was live. AI tooling helped detect an attack on AI tooling. But the underlying risk stays. Every unverified transitive dependency in your agent stack is a potential credential harvester. If you're building with litellm, langchain, or any rapidly-evolving AI framework: when did you last check what's actually installed in your environment? #AIEngineering #SupplyChain #Security #Python #AgenticAI #LLM #DeveloperSecurity Join Agentic Engineering Club → t.me/villson_hub
To view or add a comment, sign in
-
-
Yesterday, one of the most advanced AI companies in the world accidentally published its entire source code to npm. Not hacked. Not breached. A single misconfigured debug file. Here's the thing that gets me: Anthropologic built a system called "Undercover Mode", specifically designed to stop Claude from accidentally leaking internal information. And then a human shipped 512,000 lines of source code to a public registry by forgetting to exclude a .map file. The AI had better leak prevention than the deployment pipeline. What the code revealed: → 44 hidden features already built, not yet shipped → An autonomous "always-on" agent mode running in the background → A Dream mode where Claude consolidates memory while you're idle → Internal model codenames including Capybara, apparently the next big release The internet did not wait. GitHub forks. DMCA notices. 28M+ views on X. The real takeaway isn't about Anthropic. It's that one misconfiguration in a build pipeline can undo years of security work. In data and engineering, the boring stuff matters. The .npmignore file. The CI check. The dry-run before publish. The gaps that cause the biggest damage are almost never sophisticated. They're the ones everyone assumed someone else was handling. #AI #Anthropic #ClaudeCode #CyberSecurity #DataEngineering #npm #SoftwareEngineering #LLMs #BuildInPublic
To view or add a comment, sign in