Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large Scale Data Analysis

Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert.Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. You can, for example, use Spark to perform low latency computations through the use of efficient caching and iterative algorithms; leverage the features of its shell for easy and interactive Data analysis; employ its fast batch processing and low latency features to process your real time data streams and so on. As a result, adoption of Spark is rapidly growing and is replacing Hadoop MapReduce as the technology of choice for big data analytics. This book provides an introduction to Spark and related big-data technologies[...]

Next Generation Databases: NoSQLand Big Data

"It’s not easy to find such a generous book on big data and databases. Fortunately, this book is the one." Feng Yu. Computing Reviews. June 28, 2016. This is a book for enterprise architects, database administrators, and developers who need to understand the latest developments in database technologies. It is the book to help you choose the correct database technology at a time when concepts such as Big Data, NoSQL and NewSQL are making what used to be an easy choice into a complex decision with significant implications.The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this "one size fits all" stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the "Big Data" and "NoSQL" revolutions, as well as forcing fundamental changes in databases across the board.Deciding to use a relational database was once truly a no-brainer, and the various commercial relational databases competed on price, performance, reliability, and ease of use rather than on fundamental architectures. Today we are faced with choices between radically different database technologies. Choosing the right database today is a complex undertaking, with serious economic and technological consequences.Next Generation Databases demystifies[...]

Real-World Hadoop

If you’re a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues.You’ll learn about early decisions and pre-planning that can make the process easier and more productive. If you’re already using these technologies, you’ll discover ways to gain the full range of benefits possible with Hadoop. While you don’t need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects.Examine a day in the life of big data: India’s ambitious Aadhaar projectReview tools in the Hadoop ecosystem such as Apache’s Spark, Storm, and Drill to learn how they can help youPick up a collection of technical and strategic tips that have helped others succeed with HadoopLearn from several prototypical Hadoop use cases, based on how organizations have actually applied the technologyExplore real-world stories that reveal how MapR customers combine use cases when[...]

Hadoop in Practice: Includes 104 Techniques

SummaryHadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.About the BookIt's always a good time to upgrade your Hadoop skills! Hadoop in Practice, Second Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You'll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date[...]

Head First Data Analysis: A learner’s guide to big numbers, statistics, and good decisions

Today, interpreting data is a critical decision-making factor for businesses and organizations. If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others.Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool.You'll learn how to:Determine which data sources to use for collecting informationAssess data quality and distinguish signal from noiseBuild basic data models to illuminate patterns, and assimilate new information into the modelsCope with ambiguous informationDesign experiments to test hypotheses and draw conclusionsUse segmentation to organize your data within discrete market groupsVisualize data distributions to reveal new relationships and persuade othersPredict the future[...]

Mining the Social Web: Data Mining Facebook, Twitter, Linkedin, Google+, Github, And More

How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.Employ the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sitesApply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language dataBootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projectsBuild interactive visualizations with D3.js, an extraordinarily flexible HTML5 and JavaScript toolkitTake advantage of more than two-dozen Twitter recipes, presented in O’Reilly’s popular "problem/solution/discussion" cookbook formatThe example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks[...]

Beginning Oracle Database 11g Administration: From Novice to Professional (Expert’s Voice in Oracle)

This book, written by veteran Oracle database administrator Iggy Fernandez, a regular on the Oracle conference circuit and the editor of NoCOUG Journal, is a manageable introduction to key Oracle database administration topics including planning, installation, monitoring, troubleshooting, maintenance, and backups, to name just a few. As is clear from the table of contents, this book is not simply a recitation of Oracle Database features such as what you find in the reference guides available for free download on the Oracle web site. For example, the chapter on database monitoring explains how to monitor database availability, database changes, database security, database backups, database growth, database workload, database performance, and database capacity. The chapters of this book are logically organized into four parts that closely track the way your database administration career will naturally evolve. Part 1 gives you necessary background in relational database theory and Oracle Database concepts, Part 2 teaches you how to implement an Oracle database correctly, Part 3 exposes you to the daily routine of a database administrator, and Part 4 introduces you to the fine art of performance tuning. Each chapter has exercises designed to help you apply the lessons of the chapter. Each chapter also[...]

Oracle Database 11g DBA Handbook (Oracle Press)

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product.The Essential Resource for Oracle DBAs--Fully Updated and ExpandedManage a flexible, highly available Oracle database with help from the expert information contained in this exclusive Oracle Press guide. Fully revised to cover every new feature and utility, Oracle Database 11g DBA Handbook shows how to perform a new installation, upgrade from previous versions, configure hardware and software for maximum efficiency, and employ bulletproof security. You will learn to automate the backup and recovery process, provide transparent failover capability, audit and tune performance, and distribute your enterprise databases with Oracle Net.Plan and deploy permanent, temporary, and bigfile tablespacesOptimize disk allocation, CPU usage, I/O throughput, and SQL queriesDevelop powerful database management applicationsGuard against human errors using Oracle Flashback and Oracle Automatic Undo ManagementDiagnose and tune system performance using Oracle Automatic Workload Repository and SQL Tuning SetsImplement robust security using authentication, authorization, fine-grained auditing, and fine-grained access controlMaintain high availability using Oracle Real Application Clusters and Oracle Active Data GuardRespond more efficiently to failure scenarios by leveraging the Oracle Automatic Diagnostic Repository and the Oracle Repair AdvisorBack[...]

