Categories
Data Science Software Architecture

[Book Summary] Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

“Designing Data-Intensive Applications” is a book written by Martin Kleppmann. It explores the fundamental concepts and principles behind the design and implementation of data-intensive systems. The book covers various topics related to data storage, processing, and scalability.

Categories
Agile Development Book Post Data Science Innovation

The Unicorn Project Book Summary: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data

“The Unicorn Project” is a book by Gene Kim, a companion novel to “The Phoenix Project,” that explores the challenges faced by a different protagonist, Maxine, as she works on a transformational project within the same company, Parts Unlimited. The book delves into the importance of developer productivity, innovation, and the role of technology in […]

Categories
Data Science Software Architecture

Batch vs Stream Processing

Two common data processing models: Batch vs Stream Processing. What are the differences? The diagram below shows a typical scenario with user clicks: 🔹 Batch Processing: We aggregate user click activities at end of the day. 🔹 Stream Processing: We detect potential frauds with the user click streams in real-time. Both processing models are used […]

Categories
Artificial Intelligence Data Science Innovation

From Data to Insights: How Collaborative Filtering Can Improve Your Recommendations

Collaborative filtering is a type of recommendation system that analyzes user interactions and behavior to make personalized recommendations for content, products, or services. It uses machine learning algorithms to identify patterns and relationships in user data, helping to automate the process of finding relevant recommendations. Collaborative filtering can improve user experience, increase engagement, and enhance customer retention and loyalty. It has been successfully applied in various industries, including e-commerce, entertainment, and social networking. To ensure the accuracy and effectiveness of collaborative filtering systems, performance metrics such as precision, recall, and F1 score are used to evaluate their performance. As recommendation systems continue to evolve, collaborative filtering is expected to play a vital role in the future of personalized recommendations.

Categories
Artificial Intelligence Data Science

Embeddings: The superpower of Deep Learning

I think Deep Learning finds its strength in its ability to model efficiently with different types of data at once. It is trivial to build models from multimodal datasets nowadays. It is not a new concept though, nor was it impossible to do it prior to the advent of DL, but the level of complexity […]

Categories
Data Science Programming Software Architecture

8 Key Data Structures That Power Modern Databases

In this video, we briefly talk about: 🔹Skiplist: a common in-memory index type. Used in Redis 🔹Hash index: a very common implementation of the “Map” data structure (or “Collection”) 🔹SSTable: immutable on-disk “Map” implementation 🔹LSM tree: Skiplist + SSTable. High write throughput 🔹B-tree: disk-based solution. Consistent read/write performance 🔹Inverted index: used for document indexing. Used […]

Categories
Artificial Intelligence Data Science

Real-time big data analytics architecture

A real-time big data analytics architecture is a system designed to process, analyze, and act on large volumes of data in real time. What’s missing in the below infographic? This type of architecture is typically used to support applications that require fast and accurate data processing, such as fraud detection, real-time recommendation engines, and social […]

Categories
Data Science Software Architecture

How do you decide which type of database to use?

There are hundreds or even thousands of databases available today, such as Oracle, MySQL, MariaDB, SQLite, PostgreSQL, Redis, ClickHouse, MongoDB, S3, Ceph, etc. How do you select the architecture for your system? My short summary is as follows: 🔹Relational database. Almost anything could be solved by them. 🔹In-memory store. Their speed and limited data size […]

Categories
Data Science Software Architecture

Data Science Interview: Prep for SQL, Panda, Python, R Language, Machine Learning, DBMS and RDBMS – And More – The Full Data Scientist Interview Handbook

The median annual salary for a Data Scientist is $98,230, according to the Bureau of Labor Statistics. Data science job postings grew by 31% over the past few years, while data science job searches only rose by 14% over the same period. This high demand has led to a shortage of over 150,000 data science […]

Categories
Data Science Innovation Software Architecture

What are database 𝐢𝐬𝐨𝐥𝐚𝐭𝐢𝐨𝐧 𝐥𝐞𝐯𝐞𝐥𝐬? What are they used for?

Database isolation allows a transaction to execute as if there are no other concurrently running transactions. The diagram below illustrates four isolation levels. 🔹Serializalble: This is the highest isolation level. Concurrent transactions are guaranteed to be executed in sequence. 🔹Repeatable Read: Data read during the transaction stays the same as the transaction starts. 🔹Read Committed: […]