# Designing Data-Intensive Applications

## Metadata
- Author: [[Martin Kleppmann]]
- Full Title: Designing Data-Intensive Applications
- Category: #books
## Highlights
- A software project mired in complexity is sometimes described as a big ball of mud [30]. ([Location 683](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=683))
- For example, high-level programming languages are abstractions that hide machine code, CPU registers, and syscalls. SQL is an abstraction that hides complex on-disk and in-memory data structures, concurrent requests from other clients, and inconsistencies after crashes. Of course, when programming in a high-level language, we are still using machine code; we are just not using it directly, because the programming language abstraction saves us from having to think about it. ([Location 703](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=703))
- The limits of my language mean the limits of my world. Ludwig Wittgenstein, Tractatus Logico-Philosophicus (1922) ([Location 864](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=864))
- A number of interesting database systems are now associated with the #NoSQL hashtag, and it has been retroactively reinterpreted as Not Only SQL ([Location 925](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=925))
- Some developers feel that the JSON model reduces the impedance mismatch between the application code and the storage layer. ([Location 1018](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=1018))
## New highlights added June 25, 2023 at 9:14 PM
- It seems that relational and document databases are becoming more similar over time, and that is a good thing: the data models complement each other.v If a database is able to handle document-like data and also perform relational queries on it, applications can use the combination of features that best fits their needs. ([Location 1313](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=1313))
- Any kind of index usually slows down writes, because the index also needs to be updated every time data is written. This is an important trade-off in storage systems: well-chosen indexes speed up read queries, but every index slows down writes. For this reason, databases don’t usually index everything by default, but require you—the application developer or database administrator—to choose indexes manually, using your knowledge of the application’s typical query patterns. ([Location 2404](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=2404))
## New highlights added June 26, 2023 at 9:14 PM
- By contrast, SOAP is an XML-based protocol for making network API requests.vii Although it is most commonly used over HTTP, it aims to be independent from HTTP and avoids using most HTTP features. Instead, it comes with a sprawling and complex multitude of related standards (the web service framework, known as WS-*) that add various features ([Location 4152](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=4152))
## New highlights added June 28, 2023 at 12:15 PM
- These issues—node failures; unreliable networks; and trade-offs around replica consistency, durability, availability, and latency—are in fact fundamental problems in distributed systems. In Chapter 8 and Chapter 9 we will discuss them in greater depth. ([Location 4889](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=4889))
- If you are using a system with multi-leader replication, it is worth being aware of these issues, carefully reading the documentation, and thoroughly testing your database to ensure that it really does provide the guarantees you believe it to have. ([Location 5414](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=5414))
- We discussed three main approaches to replication: Single-leader replication Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale. Multi-leader replication Clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes. Leaderless replication Clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data. ([Location 5858](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=5858))
## New highlights added June 29, 2023 at 5:15 AM
- Serializable snapshot isolation (SSI) A fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable. ([Location 8121](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=8121))
- With some digging, it turns out that a wide range of problems are actually reducible to consensus ([Location 11474](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=11474))
- Tools like ZooKeeper play an important role in providing an “outsourced” consensus, failure detection, and membership service that applications can use. ([Location 11506](https://readwise.io/to_kindle?action=open&asin=B06XPJML5D&location=11506))