Amazon’s New AI Agent Just Booked My Entire Flight
You just say: nova.act(”Click the login button”)
Imagine you spent hours on a browser automation, and then a developer renamed a single CSS class and broke the whole thing, that’s the 'Selector Tax' in action.
We spend roughly 80% of our time acting as “CSS janitors” babysitting scripts that break the moment a developer moves a button by 10 pixels. We’ve all been there: the tests are green locally, but the moment they hit the “real world” of dynamic IDs and shadow DOMs, they fall apart.
It’s frustrating, it’s expensive, and honestly? It’s boring. We should be building features, not arguing with XPaths.
The Problem: Why Traditional Automation Breaks
Traditional tools like Selenium or Playwright are “blind.” They don’t “see” the website; they just hunt for a specific string in the code. If a site uses React or Vue and generates dynamic class names, your automation script is basically guessing.
This leads to “flaky” tests, the kind that pass 8 out of 10 times for no reason, eroding trust in the whole system until your team eventually just ignores the failures.
Enter Nova Act: Automation with “Eyes”
Amazon Nova Act is a shift from telling a machine where to click to telling it what to do. It’s powered by a “computer use” model that actually looks at the screen.
Instead of writing: await page.click(’.submit-button-01-final-v2’)
You just say: nova.act(”Click the login button”)
Because the AI understands the intent and the visual layout, it doesn’t care if the button’s ID changed or if it moved to a different corner of the screen.
Why It’s Actually Reliable (The 90% Factor)
You’ll see a “90% reliability” stat thrown around. Here’s why it isn’t just a marketing number: Nova Act wasn’t just trained on the internet; it was trained in “web gyms.”
These are simulated enterprise environments, fake CRMs, travel sites, and HR portals where the AI plays “web chess” against itself millions of times. It learned how to handle pop-ups, cookie banners, and loading spinners by failing in a simulator until it got it right.
Real World Examples: No More Demos
This isn’t just for toy projects. Real teams are using it to solve actual headaches:
The Human in the Loop
We shouldn’t blindly trust an AI with a corporate credit card. Nova Act has a built-in “Human-in-the-Loop” (HITL) system.
If the agent hits a CAPTCHA or a weird MFA screen, it doesn’t just crash. It can pause, send you a screenshot, and let you take over the mouse and keyboard to solve the blocker. Once you’re past it, you hand control back to the agent to finish the job.
How to Get Started (In 3 Simple Steps)
You don’t need a massive infrastructure to test this out
Web automation has been a “maintenance tax” on developers for twenty years. We’ve accepted that tests will break and that 3:00 AM calls are just part of the job.
Nova Act is the first time we’ve been able to treat a browser like a teammate rather than a brittle series of code paths. It’s about spending less time fixing selectors and more time actually shipping code.
Shoutout to Amazon for sponsoring this post.
10 followers
2wOw m.s
Sider AI•51 followers
1moThe shift from selectors to vision trades one failure mode for another. CSS was fragile but precise — visual agents still struggle with ambiguity humans resolve instantly, like picking the right 'Book Now' button among three on a page. Probably a net win, but debugging gets much weirder when your agent 'saw something different.'
QESTIT Group•493 followers
1moThis resonates deeply. Vision-based automation eliminates fragile brittle dependencies completely.
BookTranspo•2K followers
1moHi Sir, I’ve been learning a lot from your content since my B.Tech days. I recently got laid off and have been searching for an Android developer role for the past 4 months. If you know of any opportunities or could guide me, it would mean a lot. Portfolio: akhileshdev.me 🙏
topmate.io•242 followers
1mowaao its make very easy