Apache Hive Essentials

Immerse yourself on a fantastic journey to discover the attributes of big data by using Hive About This BookDiscover how Hive can coexist and work with other tools in the Hadoop ecosystem to create big data solutionsGrasp the skills needed, learn the best practices, and avoid the pitfalls in writing efficient Hive queries to analyze the big dataCreate an environment to analyze big data using practical, example-oriented scenariosWho This Book Is ForIf you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an SQL-like language, some previous experience with the SQL language and databases is useful to have a better understanding of this book.What You Will Learn Create and set up the Hive environment Discover how to use Hive's definition language to describe data Discover interesting data by joining and filtering datasets in Hive Transform data by using Hive sorting, ordering, and functions Aggregate and sample data in different ways Boost Hive[...]

Cracking the Coding Interview: 189 Programming Questions and Solutions

I am not a recruiter. I am a software engineer. And as such, I know what it's like to be asked to whip up brilliant algorithms on the spot and then write flawless code on a whiteboard. I've been through this as a candidate and as an interviewer. Cracking the Coding Interview, 6th Edition is here to help you through this process, teaching you what you need to know and enabling you to perform at your very best. I've coached and interviewed hundreds of software engineers. The result is this book. Learn how to uncover the hints and hidden details in a question, discover how to break down a problem into manageable chunks, develop techniques to unstick yourself when stuck, learn (or re-learn) core computer science concepts, and practice on 189 interview questions and solutions. These interview questions are real; they are not pulled out of computer science textbooks. They reflect what's truly being asked at the top companies, so that you can be as prepared as possible. WHAT'S INSIDE? 189 programming interview questions, ranging from the basics to the trickiest algorithm problems. A walk-through of how to derive each solution, so that you can learn how to[...]

Mastering Data Analysis with R

Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualizationAbout This BookHandle your data with precision and care for optimal business intelligenceRestructure and transform your data to inform decision-makingPacked with practical advice and tips to help you get to grips with data miningWho This Book Is ForIf you are a data scientist or R developer who wants to explore and optimize your use of R’s advanced features and tools, this is the book for you. A basic knowledge of R is required, along with an understanding of database logic.What You Will LearnConnect to and load data from R’s range of powerful databasesSuccessfully fetch and parse structured and unstructured dataTransform and restructure your data with efficient R packagesDefine and build complex statistical models with glmDevelop and train machine learning algorithmsVisualize social networks and graph dataDeploy supervised and unsupervised classification algorithmsDiscover how to visualize spatial data with RIn DetailR is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering[...]

Fast Data Processing with Spark – Second Edition

Perform real-time analytics using Spark in a fast, distributed, and scalable way About This BookDevelop a machine learning system with Spark's MLlib and scalable algorithmsDeploy Spark jobs to various clusters such as Mesos, EC2, Chef, YARN, EMR, and so onThis is a step-by-step tutorial that unleashes the power of Spark and its latest featuresWho This Book Is ForFast Data Processing with Spark - Second Edition is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too big to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.What You Will Learn Install and set up Spark on your cluster Prototype distributed applications with Spark's interactive shell Learn different ways to interact with Spark's distributed representation of data (RDDs) Query Spark with a SQL-like query syntax Effectively test your distributed software Recognize how Spark works with big data Implement machine learning systems with highly scalable algorithms In DetailSpark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach[...]

The Truth Machine: The Blockchain and the Future of Everything

"Views differ on bitcoin, but few doubt the transformative potential of Blockchain technology. The Truth Machine is the best book so far on what has happened and what may come along. It demands the attention of anyone concerned with our economic future." —Lawrence H. Summers, Charles W. Eliot University Professor and President Emeritus at Harvard, Former Treasury SecretaryFrom Michael J. Casey and Paul Vigna, the authors of The Age of Cryptocurrency, comes the definitive work on the Internet’s Next Big Thing: The Blockchain.Big banks have grown bigger and more entrenched. Privacy exists only until the next hack. Credit card fraud is a fact of life. Many of the “legacy systems” once designed to make our lives easier and our economy more efficient are no longer up to the task. Yet there is a way past all this—a new kind of operating system with the potential to revolutionize vast swaths of our economy: the blockchain. In The Truth Machine, Michael J. Casey and Paul Vigna demystify the blockchain and explain why it can restore personal control over our data, assets, and identities; grant billions of excluded people access to the global economy; and shift the balance of power to revive[...]

Data Manipulation with R – Second Edition

Efficiently perform data manipulation using the split-apply-combine strategy in R About This BookPerform data manipulation with add-on packages such as plyr, reshape, stringr, lubridate, and sqldfLearn about factor manipulation, string processing, and text manipulation techniques using the stringr and dplyr librariesEnhance your analytical skills in an intuitive way through step-by-step working examplesWho This Book Is ForThis book is for all those who wish to learn about data manipulation from scratch and excel at aggregating data effectively. It is expected that you have basic knowledge of R and have previously done some basic administration work with R.What You Will Learn Learn about R data types and their basic operations Work efficiently with string, factor, and date variables using stringr Understand group-wise data manipulation Work with different layouts of R datasets and interchange between layouts for varied purposes Manage bigger datasets using pylr and dpylr Perform data manipulation with add-on packages such as plyr, reshape, stringr, lubridate, and sqldf Manipulate datasets using SQL statements with the sqldf package Clean and structure raw data for data mining using text manipulation In DetailThis book starts with the installation of R and how to go about using R and its libraries. We then discuss the[...]

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy

A FINANCIAL TIMES BOOK OF THE MONTH FROM THE WALL STREET JOURNAL: "Nothing Mr. Gilder says or writes is ever delivered at anything less than the fullest philosophical decibel... Mr. Gilder sounds less like a tech guru than a poet, and his words tumble out in a romantic cascade." “Google’s algorithms assume the world’s future is nothing more than the next moment in a random process. George Gilder shows how deep this assumption goes, what motivates people to make it, and why it’s wrong: the future depends on human action.” — Peter Thiel, founder of PayPal and Palantir Technologies and author of Zero to One: Notes on Startups, or How to Build the Future The Age of Google, built on big data and machine intelligence, has been an awesome era. But it’s coming to an end. In Life after Google, George Gilder—the peerless visionary of technology and culture—explains why Silicon Valley is suffering a nervous breakdown and what to expect as the post-Google age dawns. Google’s astonishing ability to “search and sort” attracts the entire world to its search engine and countless other goodies—videos, maps, email, calendars.And everything it offers is free, or so it seems. Instead of paying directly[...]

R: Recipes for Analysis, Visualization and Machine Learning

Get savvy with R language and actualize projects aimed at analysis, visualization and machine learningAbout This BookProficiently analyze data and apply machine learning techniquesGenerate visualizations, develop interactive visualizations and applications to understand various data exploratory functions in RConstruct a predictive model by using a variety of machine learning packagesWho This Book Is ForThis Learning Path is ideal for those who have been exposed to R, but have not used it extensively yet. It covers the basics of using R and is written for new and intermediate R users interested in learning. This Learning Path also provides in-depth insights into professional techniques for analysis, visualization, and machine learning with R – it will help you increase your R expertise, regardless of your level of experience.What You Will LearnGet data into your R environment and prepare it for analysisPerform exploratory data analyses and generate meaningful visualizations of the dataGenerate various plots in R using the basic R plotting techniquesCreate presentations and learn the basics of creating apps in R for your audienceCreate and inspect the transaction dataset, performing association analysis with the Apriori algorithmVisualize associations in various graph formats and find frequent itemset using the ECLAT algorithmBuild, tune, and evaluate predictive models[...]

