Thibault Koechlin’s Post

CrowdSec•2K followers

A few weeks ago, we shared that we were working on bot detection and browser fingerprinting for CrowdSec WAF/Security Engine. We've been cooking, and now have something functional (though it needs some polish before shipping) that will enable bot detection & browser fingerprinting for all of our web bouncers that support appsec / waf (nginx, openresty, haproxy, traefik, envoy and hopefully others by then!). We’ve been collaborating with Antoine Vastel, PhD to integrate his fpscanner library into CrowdSec’s WAF/Security Engine. We’re getting ready to ship (like we do with scenarios, waf-rules etc.) collections and rules that will enable users to block common bot frameworks out-of-the-box, while leaving room for more advanced users to craft their own detection rules based on all the signals that can be collected by fpscanner. I’m really excited about this specific feature in CrowdSec, as it’s been something we have been thinking about for a while and will enhance not only the open-source software itself but also the network's strength, allowing us to flag residential proxies, shady shared infrastructure and bot frameworks at scale. let's go!

2 Comments

Romain D.

Enix France•957 followers

Nice! Hâte de tester ça :-)

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Waldek Mastykarz

Microsoft•16K followers
3d
Report this post
You need to show what your app actually does over the wire. Support ticket? They want a reproduction. Performance analysis? Need to see the requests. Debugging with a colleague? "Here's what I'm seeing" is vague. "Here's the HAR file" is concrete. HarGeneratorPlugin dumps intercepted traffic into the standard HTTP Archive format: { "plugins": [{ "name": "HarGeneratorPlugin", "enabled": true, "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll", "configSection": "harGeneratorPlugin" }], "harGeneratorPlugin": { "includeSensitiveInformation": false, "includeResponse": true } } Stop recording and you get a `devproxy-{timestamp}.har` file. Full request details: URLs, methods, headers, query parameters, bodies. Full response details: status codes, headers, bodies, timing. HAR is universal. Browser DevTools import it. Charles, Fiddler, countless other tools read it natively. Hand over the file, the recipient sees exactly what happened. The `includeSensitiveInformation` flag takes care of security. Set it to false (the default) and authorization headers, cookies, API keys, tokens, session IDs all get scrubbed out. Safe to share without leaking secrets. Flip `includeResponse` to true to capture response bodies. Essential for debugging those "the API returned something weird" moments. The "show, don't tell" of API debugging. Instead of describing the problem, you hand over the raw evidence. What format do you typically use when sharing API request logs?

1 Comment
Like Comment
To view or add a comment, sign in
Juan Felipe Rivera González

ACUE (Association of College…•483 followers
2w Edited
Report this post
Built another CLI this night. This one manages AdGuard Home from the terminal. AdGuard Home is a network-wide DNS filter. It blocks ads, trackers, and specific services at the DNS level for every device on your network. I run it in a Raspberry PI at home and the web UI works, but a CLI lets me script client management, automate DNS rewrites, pipe query stats into monitoring, and run health checks from cron. It also plugs directly into AI coding agents. I use Claude Code with custom skills, and a CLI with structured JSON output is exactly what an AI agent needs to manage DNS infrastructure. No MCP server required, no credentials leaking, just a binary the agent calls. adguard-cli covers 90%+ of the 81 API operations. Clients, blocked services, DNS rewrites, query logs, filters, DHCP, TLS, safe search. Table, JSON, or YAML output. Credentials in the system keyring. Same patterns from two other Go CLIs I maintain: structured errors with fix hints, stderr/stdout separation, cross-platform static binaries. https://lnkd.in/e2X3bnHh brew tap jjuanrivvera/adguard-cli && brew install adguard-cli #golang #cli #adguard #adguardhome #dns #homelab #opensource #devtools #networking #selfhosted #claudecode
Like Comment
To view or add a comment, sign in
Spidra

170 followers
1mo
Report this post
Discover how stealth browsers overcome anti-bot systems for web scraping. In this new article, you will learn about their operation and also insights for effective data extraction Read more here: https://lnkd.in/dMzaSgTi #spidra #webscraping
1 Comment
Like Comment
To view or add a comment, sign in
TheNextGenTechInsider.com

575 followers
3w
Report this post
Cloudflare Turnstile Uses Browser Fingerprinting to Automate Abuse Mitigation 📌 Cloudflare’s Turnstile now uses browser fingerprinting to automatically detect and block bots without annoying users - analyzing WebGL, hardware specs, and user-agent data to create a unique device profile. This marks a shift from CAPTCHA puzzles to smarter, invisible behavioral analysis that outsmarts even advanced automation tools. 🔗 Read more: https://lnkd.in/dsC7Bhqs #Cloudflare #Turnstile #Webgl #Hardwareconcurrency
Like Comment
To view or add a comment, sign in
Celesto AI

359 followers
1w
Report this post
A sandbox that can call any domain on the internet is not much of a sandbox. We just added domain allowlists for SmolVM. Now you can start a VM with internet settings and restrict which external domains it can reach. Why this matters: ➡️ Running code in an isolated VM is step one ➡️ Controlling where that VM can talk to is step two ➡️ That is how you reduce the blast radius for AI agents running untrusted code This is especially useful when your agent only needs access to a small set of domains: Your API, MCP server, Stripe, GitHub, or a few internal services. Not the whole internet. Under the hood, SmolVM resolves the allowed domains at VM creation time and enforces egress rules on the Firecracker backend. This is the direction we believe sandboxes should go: not just isolated compute, but programmable security boundaries for agents. If you’re building AI agents that execute code, browse, or call external tools, what network restrictions do you want by default? And if you like where SmolVM is going, give the repo a star.
Like Comment
To view or add a comment, sign in
Brew Hash

8 followers
2w Edited
Report this post
Secure your smart contracts with BrewHash AI AuditorBuilt for Web3 developers who care about real security. 1. Drop your contract link (Etherscan or GitHub) 2. Add source code (optional) 3. Choose mode: → Audit = Full professional security report → Hunter = Critical attack vectors & exploits only 4. Run the scan 5. Download clean Markdown report Using it to ship safer code, protect users, and strengthen web3. Fast, actionable, and made fro serious builders.

1 Comment
Like Comment
To view or add a comment, sign in
Emre Sokullu

SecureBtcWallet•2K followers
1w
Report this post
An open-source alternative to Claude's browser plugin. WebBrain is a browser extension that turns any LLM into a full browser agent — right from a side panel. Works with OpenAI, Gemini, Anthropic, or local models via OpenRouter. Two modes: 🔍 Ask — read-only page analysis, summarization, data extraction 🤖 Act — full browser automation: click, type, scroll, navigate Chrome + Firefox. MIT licensed. 🧠 webbrain.one 📦 https://lnkd.in/dfV32Yfp

GitHub - esokullu/webbrain github.com

2 Comments
Like Comment
To view or add a comment, sign in
Mick Davies

Cloudflare•677 followers
2w
Report this post
This is an A/B tool I created for Feature Flags without touching the origin server... which I named Injector 🤠 As part of my personal goal to build in public as much as possible (scary), this is one of my latest creations. If you use Cloudflare for your DNS (which you should!) then the entire world (Region: Earth) opens up by placing a Worker in front of your website 😉 This allows for extremely interesting things like injecting feature flags, A/B/C testing components dynamically to cohorts of visitors, tracking interaction events, real-time watching visits using Durable Objects websockets, caching the uncacheable™ and more! All without touching your origin server, no inline data-ab-xxxxx hard-coding, nothing... zilch! 🥷 Dynamically generated, streamed in components for any system (this happens to be React + Vite) generated by Kimi 2.5 AI Model from the edge 🤯 How cool is that? Let me know if you'd like to know more about injecting feature flags to hide/show/create sections on your website, happy to talk shop! As always, this post was hand-written by yours truly 🤘 #regionearth #cloudflare #durableobjects #agentsdk #agents #injector #featureflags

2 Comments
Like Comment
To view or add a comment, sign in
Neil Emeigh

Rayobyte•2K followers
1w
Report this post
Most browser-based scraping projects don’t break all at once. They slowly get harder to run. A script that worked fine last week starts behaving inconsistently, sessions clash, results vary, and things fail in ways that are difficult to reproduce. The bigger you scale, the worse it gets. That’s exactly the kind of problem we’ve been working through with rayobrowse. Our latest update (v0.1.33) focuses on making large-scale browser workloads more stable and easier to operate in production: 🔵 Support for multiple Chromium versions (including v146), so you’re not locked to a single browser build or caught out by updates 🔵 Improvements to canvas + ClientRect handling, making browser output more consistent and realistic 🔵 Hardened Docker setup (non-root user, sandboxing, improved security config) for safer production deployments Plus a number of smaller updates around performance, debugging, and reliability. We’re already running this at scale internally, and each release is about making that experience more predictable and easier for others to replicate. If you’ve ever wrestled with browser automation, you’ll know why this matters. Read more here: https://bit.ly/3PNnOtL

Release v0.1.33 — SEE CHANGELOG - Chromium v146, per-browser VNC, Doc… · rayobyte-data/rayobrowse@045cc65 github.com
Like Comment
To view or add a comment, sign in
Vivek Jami

Quadratyx•3K followers
3w
Report this post
Same IP. Same proxy. Same request headers. One site scrapes fine. Another returns a 403 before any HTML is returned. The reason isn't your IP address. It's the TLS handshake. When you connect to an HTTPS site, your client performs a TLS handshake before sending a single HTTP request. This handshake reveals something called the ClientHello, A list of cipher suites, TLS extensions, and protocol versions your client supports. In 2017, Salesforce researchers created JA3: a way to hash this handshake into a 32-character fingerprint. Cloudflare, Akamai, DataDome, and Imperva all maintain databases of these fingerprints. Python's requests library has a known JA3 hash. Scrapy has a known JA3 hash. Real Chrome on Windows 11 has a different one. The block happens in the first milliseconds of the connection. Your carefully crafted User-Agent header never even gets read. This is why changing your user agent doesn't fix the 403. It's why rotating IPs doesn't fix the 403 either. The TLS fingerprint travels with your HTTP client library, not your IP. Chrome 2023+ further complicated this by randomizing TLS extension order. So JA3 is increasingly unreliable, which is why JA4 was developed in 2023, adding HTTP/2 frame ordering and QUIC fingerprinting to the mix. What actually works: → Use a real browser engine under the hood (Playwright/Puppeteer, not requests) → Use a managed service that routes through residential IPs with real browser TLS stacks → If building the stack yourself: curl_cffi (Python) mimics Chrome's TLS fingerprint The web scraping problem people think is an IP/proxy problem is increasingly a TLS fingerprinting problem. Understanding the mechanism tells you exactly which layer to fix. Deep dive into the full fingerprinting stack (canvas, WebGL, navigator properties, TLS, HTTP/2 frame ordering), what each one catches and what bypasses it: https://lnkd.in/gKWtDqUw

Why Your Scraper Gets Blocked and Proxies Don't Help — A Deep Dive into Browser Fingerprinting substack.com
Like Comment
To view or add a comment, sign in

1,710 followers

38 Posts

View Profile Follow

LinkedIn respects your privacy

Thibault Koechlin’s Post

Explore content categories