If you want to learn how databases really work behind the scenes, check out this free YouTube series by Ben Dicken He has covered the entire "Database Internals" book videos where you will learn: • How databases store data using B-Trees • How storage engines work • LSM Trees and write optimization • Distributed systems and leader election • etc Over 10 hours of free content. Highly recommended for anyone into backend development. checkout: https://lnkd.in/gNNSbeWZ #Databases #BackendDevelopment #SoftwareEngineering
Learn Database Internals with Ben Dicken's Free YouTube Series
More Relevant Posts
-
Distributed systems can break when a database write succeeds but the event publish fails. In this tutorial, Alex teaches you how to implement the Outbox Pattern in Go and PostgreSQL so data writes and event records happen atomically. You'll also learn how to build a relay service, work with an outbox table, and handle at-least-once delivery in event-driven systems. https://lnkd.in/gP8vUAse
To view or add a comment, sign in
-
-
Your favorite CRUD application is already eventually consistent. Yes, even with PostgreSQL. Even if you think you’re “doing it right.” Between replication lag, caching layers, and transaction isolation, your users are seeing stale data every single day. You just don’t notice it because it usually “fixes itself” fast enough. Most production systems are eventually consistent in practice (whether we acknowledge it or not ;-) ) The difference? Event Sourcing doesn’t pretend otherwise. It makes consistency trade-offs explicit, instead of hiding them behind abstractions. And once you see it, you start designing differently. 👉 Do you design your systems assuming perfect consistency—or embracing its limits?
To view or add a comment, sign in
-
Last week I ran into a pretty frustrating issue. An API I was working on kept failing with “SQL Server has gone away.” At first, it felt like a server or load problem… but it wasn’t. After digging deeper, I realized the real issue was in the code. Some queries were being executed inside loops — which meant the same database calls were running again and again. It didn’t look like a big problem initially, but under real data, it completely broke the API. So instead of increasing server resources, I focused on simplifying things: • Moved queries outside the loop • Reduced unnecessary database hits • Cleaned up the data flow • Applied a few small optimizations And honestly, the result was surprising… The API that was failing before is now responding in under a second — even with larger datasets. This reminded me of something simple: 👉 Not every performance issue needs scaling 👉 Sometimes, it’s just about writing cleaner and smarter code Have you ever faced something similar where a small fix made a huge difference? #Backend #API #Performance #Optimization #CleanCode #WebDevelopment
To view or add a comment, sign in
-
This Thursday, I’ll be speaking at ClickHouse and Open Source Builders Night at Taipei 101 with 李緒成. I’ve been working with Hao Jiang and 李緒成 on a new DuckDB extension: DuckDB Query Condition Cache It’s designed for a very common pattern: → the same predicate repeated across queries In practice, the same filter keeps showing up, so instead of recomputing it every time, we compute it once and reuse it. This leads to significantly faster repeated queries in real workloads — with zero changes to your original SQL. We’ll be sharing this work at the talk. If you’re into data, databases, or open source, feel free to come by and hang out! Huge thanks to Zoe Steinkamp for the invitation! 📍 Taipei 101 🕒 This Thursday night Register here: https://lnkd.in/gbPWEwC7 Check out the code: https://lnkd.in/gTA_jbxW
To view or add a comment, sign in
-
Your launch gets picked up. Traffic spikes. Database refuses connections at user 101. This doesn't happen because your database is weak. It happens because each request opened its own connection. Development traffic is too low to expose this. Production traffic is not. The fix is connection reuse, not a bigger database. Set pool size against your database limits. Handle exhaustion before it crashes. Serverless makes this harder. Each function instance opens its own pool. At scale, you need an external connection pooler. Kostra structures database access patterns built for production connection management. Built to Forward. Swipe to see how one missing pattern takes down your launch day. #BackendEngineering #DatabaseDesign #ConnectionPooling #Serverless #SoftwareArchitecture #Kostra
To view or add a comment, sign in
-
There’s a limit to how much a single database can handle. Beyond a point, scaling vertically isn’t enough. That’s where sharding comes in. Instead of one large database → data is split across multiple smaller databases (shards). Each shard handles a portion of the load. ⚡ Better scalability ⚡ Improved performance ⚡ Distributed load handling But it comes with challenges: ⚠️ Complex queries across shards ⚠️ Data rebalancing ⚠️ Increased system complexity Sharding doesn’t simplify systems — it enables them to scale beyond limits. The trade-off is clear: More scale → More complexity #BackendEngineering #SystemDesign #Databases #DistributedSystems #Scalability #SoftwareEngineering
To view or add a comment, sign in
-
Most systems don’t break because of code. They break because the data outgrows the design. I wrote a deep dive connecting partitioning, RAC, sharding, resharding, desharding, and consistent hashing, showing how real systems evolve from a single database to distributed scale, and the trade-offs between relational and NoSQL worlds. If you’ve ever wondered why scaling databases is never just “add more servers”, this breaks it down in a connected way. Would love to hear how you’ve handled scaling challenges in your systems.
To view or add a comment, sign in
-
Your `db-migrate up` command is a time bomb on a multi-terabyte table. Standard migration tools like Flyway or Alembic work by running DDL statements that often acquire long-running locks. On a small table, this is a non-issue. On a critical table with millions of rows, that `ALTER TABLE` can lock writes for minutes or even hours, causing a production outage. This is where online schema change tools become essential. Tools like Percona's `pt-online-schema-change` or GitHub's `gh-ost` operate on a fundamentally different principle. They don't lock your primary table. Instead, they create an empty 'ghost' table with the new schema. They then begin a slow, throttled process of copying data from the original table to the new one in small chunks. Triggers are placed on the original table to capture any ongoing writes (inserts, updates, deletes) and apply them to the ghost table, keeping it in sync. Once the copy is complete and the tables are synchronized, the final step is an atomic `RENAME TABLE` operation—a metadata change that is nearly instantaneous. The old table is swapped out for the new one with no extended lock contention. It's a more complex process, but it's the only safe way to evolve the schema of a massive, high-traffic database without scheduling downtime. #Database #SystemDesign #DevOps
To view or add a comment, sign in
-
-
Imagine a world where your filesystem is as robust and versatile as your database. That's exactly what TigerFS, a filesystem backed by PostgreSQL, is aiming to achieve, courtesy of the innovative minds at Timescale. Integrating a database like PostgreSQL as a filesystem opens up new doors for data management, offering features like transactional consistency, data integrity, and complex query capabilities right at the storage level. This could fundamentally change how data is stored and retrieved, making it more seamless and efficient. But here's the million-dollar question: Do we really need a database-backed filesystem, or is this a solution in search of a problem? On one hand, we gain powerful querying and indexing capabilities, but on the other, there's a potential overhead and complexity that comes with managing such a system. Balancing these factors will be key to TigerFS's adoption. What are your thoughts on the practicality of a database-backed filesystem in your current projects or infrastructure? Would the benefits outweigh the potential complexities? I’m curious to hear your insights!
To view or add a comment, sign in
-
You might assume CDC filtering happens after the replication stream leaves Postgres. Grab full logical stream → filter tables in app code → drop noise. Wrong. How it really works: Postgres lets you dynamically specify tables in the replication stream itself. Filtering happens upstream, inside the database. No unnecessary events touch the network. Even better: WHERE clauses (with restrictions) for row-level filtering. Column subsets too. App threads process exactly what they need. Building our Postgres read-caching layer forced us deep into replication protocols. James put it like this in one of our recent conversations: Logical Replication is "more flexible but also partly incomplete ... it doesn't communicate DDL changes." That tradeoff unlocks powerful filtering but demands schema sync discipline. Table/row/column filters make CDC pipelines dramatically simpler. No custom logic. Postgres owns subscriptions. 👉 Aurora/RDS teams scaling CDC, are you filtering downstream or using replication slots with table/row filters? #PostgreSQL #Replication #CDC
To view or add a comment, sign in