“Designing Data-Intensive Applications” is a book written by Martin Kleppmann. It explores the fundamental concepts and principles behind the design and implementation of data-intensive systems. The book covers various topics related to data storage, processing, and scalability.
“The Unicorn Project” is a book by Gene Kim, a companion novel to “The Phoenix Project,” that explores the challenges faced by a different protagonist, Maxine, as she works on a transformational project within the same company, Parts Unlimited. The book delves into the importance of developer productivity, innovation, and the role of technology in […]
Two common data processing models: Batch vs Stream Processing. What are the differences? The diagram below shows a typical scenario with user clicks: 🔹 Batch Processing: We aggregate user click activities at end of the day. 🔹 Stream Processing: We detect potential frauds with the user click streams in real-time. Both processing models are used […]
Collaborative filtering is a type of recommendation system that analyzes user interactions and behavior to make personalized recommendations for content, products, or services. It uses machine learning algorithms to identify patterns and relationships in user data, helping to automate the process of finding relevant recommendations. Collaborative filtering can improve user experience, increase engagement, and enhance customer retention and loyalty. It has been successfully applied in various industries, including e-commerce, entertainment, and social networking. To ensure the accuracy and effectiveness of collaborative filtering systems, performance metrics such as precision, recall, and F1 score are used to evaluate their performance. As recommendation systems continue to evolve, collaborative filtering is expected to play a vital role in the future of personalized recommendations.
I think Deep Learning finds its strength in its ability to model efficiently with different types of data at once. It is trivial to build models from multimodal datasets nowadays. It is not a new concept though, nor was it impossible to do it prior to the advent of DL, but the level of complexity […]
In this video, we briefly talk about: 🔹Skiplist: a common in-memory index type. Used in Redis 🔹Hash index: a very common implementation of the “Map” data structure (or “Collection”) 🔹SSTable: immutable on-disk “Map” implementation 🔹LSM tree: Skiplist + SSTable. High write throughput 🔹B-tree: disk-based solution. Consistent read/write performance 🔹Inverted index: used for document indexing. Used […]
A real-time big data analytics architecture is a system designed to process, analyze, and act on large volumes of data in real time. What’s missing in the below infographic? This type of architecture is typically used to support applications that require fast and accurate data processing, such as fraud detection, real-time recommendation engines, and social […]
There are hundreds or even thousands of databases available today, such as Oracle, MySQL, MariaDB, SQLite, PostgreSQL, Redis, ClickHouse, MongoDB, S3, Ceph, etc. How do you select the architecture for your system? My short summary is as follows: 🔹Relational database. Almost anything could be solved by them. 🔹In-memory store. Their speed and limited data size […]
The median annual salary for a Data Scientist is $98,230, according to the Bureau of Labor Statistics. Data science job postings grew by 31% over the past few years, while data science job searches only rose by 14% over the same period. This high demand has led to a shortage of over 150,000 data science […]
Database isolation allows a transaction to execute as if there are no other concurrently running transactions. The diagram below illustrates four isolation levels. 🔹Serializalble: This is the highest isolation level. Concurrent transactions are guaranteed to be executed in sequence. 🔹Repeatable Read: Data read during the transaction stays the same as the transaction starts. 🔹Read Committed: […]