-17% $41.57$41.57
$3.99 delivery May 15 - 17
Ships from: 365giftshop Sold by: 365giftshop
$8.52$8.52
Ships from: Amazon Sold by: Signature-Marketplace
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
OK
Audible sample Sample
Big Data: Principles and best practices of scalable realtime data systems 1st Edition
Purchase options and add-ons
Summary
Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Book
Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.
Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases.
This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.
What's Inside
- Introduction to big data systems
- Real-time processing of web-scale data
- Tools like Hadoop, Cassandra, and Storm
- Extensions to traditional database skills
About the Authors
Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.
Table of Contents
- A new paradigm for Big Data
PART 1 BATCH LAYER
- Data model for Big Data
- Data model for Big Data: Illustration
- Data storage on the batch layer
- Data storage on the batch layer: Illustration
- Batch layer
- Batch layer: Illustration
- An example batch layer: Architecture and algorithms
- An example batch layer: Implementation
PART 2 SERVING LAYER
- Serving layer
- Serving layer: Illustration
PART 3 SPEED LAYER
- Realtime views
- Realtime views: Illustration
- Queuing and stream processing
- Queuing and stream processing: Illustration
- Micro-batch stream processing
- Micro-batch stream processing: Illustration
- Lambda Architecture in depth
- ISBN-101617290343
- ISBN-13978-1617290343
- Edition1st
- PublisherManning Publications
- Publication dateMay 19, 2015
- LanguageEnglish
- Dimensions7.38 x 0.6 x 9.25 inches
- Print length328 pages
Frequently bought together
Similar items that may deliver to you quickly
From the Publisher
About This Book
Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database. Complexity increases with scale and demand, and handling Big Data is not as simple as just doubling down on your RDBMS or rolling out some trendy new technology. Fortunately, scalability and simplicity are not mutually exclusive—you just need to take a different approach. Big Data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers.
Big Data teaches you to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to Big Data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of Big Data systems and how to implement them in practice.
Big Data requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful, though not required. The goal of the book is to teach you how to think about data systems and how to break down difficult problems into simple solutions. We start from first principles and from those deduce the necessary properties for each component of an architecture.
Editorial Reviews
About the Author
Nathan Marz is currently working on a new startup. Previously, he was the lead engineer at BackType before being acquired by Twitter in 2011. At Twitter, he started the streaming compute team which provides and develops shared infrastructure to support many critical realtime applications throughout the company. Nathan is the creator of Cascalog and Storm, open-source projects which are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many more companies.
James Warren is an analytics architect at Storm8 with a background in big data processing, machine learning and scientific computing.
Product details
- Publisher : Manning Publications; 1st edition (May 19, 2015)
- Language : English
- Paperback : 328 pages
- ISBN-10 : 1617290343
- ISBN-13 : 978-1617290343
- Item Weight : 1.21 pounds
- Dimensions : 7.38 x 0.6 x 9.25 inches
- Best Sellers Rank: #822,210 in Books (See Top 100 in Books)
- #313 in Data Mining (Books)
- #1,022 in Software Development (Books)
- #1,248 in Internet & Telecommunications
- Customer Reviews:
About the author
Discover more of the author’s books, see similar authors, read author blogs and more
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonReviews with images
-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
The book is very organized. Introduction in chapter 1 will be the road map of the whole book. Motivating with a simple web application based on RDBMS, the author showed how the approach to scale it becomes undesirable. After enumerating a list of desired properties, he proposed Lambda architecture, an approach in contrast to fully incremental architecture (with RDBMS).
The Lambda architecture is partitioned into three layers:
1. batch layer that computes different views on big data
2. serving layer that answers user queries using views from the batch layer and speed layer.
3. speed layer that compensates an approximate answer over a period time when the batch layer is working on the complete answers.
In the remaining chapters, the author dive deep into the rationale and requirements of all the different pieces of Lambda Architecture.
To understand the context of Lambda Architecture, also refer to the wikipedia for crticism.
The only downside to the book is that the architecture and exosystem is so new that there's not really a lot of pragmatic solutions. For example, the theory describes a query layer that can merge the results of batch and real time processing for client applications. However, in real life there are no pragmatic solutions for doing this so you'd have to write your own.
It'll be interesting to see how the lambda architecture matures and to see future editions of this book. Hopefully, future editions will be as well written and have a better ecosystem for practice chapters.
My girlfriend and I enjoyed every chapter in this book. I guarantee you that you won't regret buying this book. I am looking forward to another book from you guys on the topic because its the first time where I couldn't wait to pick up the book and get to the end of it.
But it brought Paul's letter to the Romans to mind!
Clear, just enough detail, well-ordered.
I work at a large corporation, on a real-time data system. If we had followed the author's recommendations, I wouldn't have the problem I've been dealing with for the last several weeks.
The other very useful for me feature of this book is that it is the first book where I could find a concise explanation of Storm Trident framework, even though the book is not about Storm.
I did scan through the rest of the book, though. First, the so-called lambda architecture might sound like a new term, but many high concurrency websites already work that way. For a high concurrency web site, the first-layer would be memcached-based, which gives O(1) low latency on all queries. The second layer would be a clustered app-server layer. The third layer could be a high-concurrency, extremely low-latency layer like a NoSQL cluster. The far backend could be Hadoop- or Spark-based for batch jobs. This is the known architecture in production for high traffic websites that need to support millions of concurrent users.
Secondly, the bulk of the book is actually about Hadoop in the so-called batch layer. Hadoop once generated some excitement, but has lost its steam due to the new kid in the spot named Spark, which can do whatever Hadoop can do, but 10x - 20x faster with a fractional cost.
Reviewed in the United States on December 22, 2021
Top reviews from other countries
With this book i found it clear, concise and explained in such a way that everyone with little or no background in IT can understand.
A very good Big Data insight and also helpful for understanding which are the best tools to achieve good results with Hadoop and other technologies.
I found it very interesting, well written and pleasant to read as well. This book helped me a lot and i'm sure it can help a lot beginners with this subject.
Ouvrage didactique, mais qui nécessite une certaine concentration en raison de la complexité technologique décrite.
S'adresse à un public averti de développeurs (nombreuses illustrations avec échantillons de code Java)