Feed Your Creativity

Inspiration in Your Inbox !
BLOGS: Popular The Captain's Blog Discover Data Discover Stats Discover Visualisation

RESOURCES: Popular eBooks Videos eCourses Exclusive **NEW**

Free Data Science eBooks - January 2018

Now that Christmas and the New Year are behind us the nights are becoming a little longer with each passing day. Nevertheless, there's still loads of cold winter nights left to endure (unless you're in the Southern Hemisphere, in which case - throw me a shrimp on the barbie!).

It's time to dust off your New Year resolutions from last year (remember those?) and get ready to learn some new data skills.

Here are three free eBooks to help you on that journey and make those long nights just that bit shorter.

I hope these books prove to be a valuable resource to you and that you will visit regularly (and share with your friends in social media too).

If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.

 

3 Free Data Science Books for January

Free Data Science eBooks - January 2018 Free Data Science eBooks - January 2018

 

This month we highlight 3 books:

  • Data-Intensive Text Processing with MapReduce
  • Programming Pig
  • Test-Driven Development With Python

They're all FREE, so help yourselves...

Enjoy!

 


 

Data-Intensive Text Processing with MapReduce

by Jimmy Lin and Chris Dyer

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.

MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance.

This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains.

This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.

 

Enjoying this blog post? Share it with the world...

 

Programming Pig

by Alan Gates

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.

Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

  • Delve into Pig’s data model, including scalar and complex data types
  • Write Pig Latin scripts to sort, group, join, project, and filter your data
  • Use Grunt to work with the Hadoop Distributed File System (HDFS)
  • Build complex data processing pipelines with Pig’s macros and modularity features
  • Embed Pig Latin in Python for iterative processing and other advanced tasks
  • Create your own load and store functions to handle data formats and storage mechanisms
  • Get performance tips for running scripts on Hadoop clusters in less time

 

 

Test-Driven Development With Python

by Harry Percival

By taking you through the development of a real web application from beginning to end, the second edition of this hands-on guide demonstrates the practical advantages of test-driven development (TDD) with Python.

You'll learn how to write and run tests before building each part of your app, and then develop the minimum amount of code required to pass those tests. The result? Clean code that works. In the process, you'll learn the basics of Django, Selenium, Git, jQuery, and Mock, along with current web development techniques.

If you're ready to take your Python skills to the next level, this book - updated for Python 3.6 - clearly demonstrates how TDD encourages simple designs and inspires confidence.

  • Dive into the TDD workflow, including the unit test/code cycle and refactoring
  • Use unit tests for classes and functions, and functional tests for user interactions within the browser
  • Learn when and how to use mock objects, and the pros and cons of isolated vs. integrated tests
  • Test and automate your deployments with a staging server
  • Apply tests to the third-party plugins you integrate into your site
  • Run tests automatically by using a Continuous Integration environment
  • Use TDD to build a REST API with a front-end Ajax interface

 


 

Learn More

 

If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:

- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro

 

Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links

You can find further details in our TCs

 

Blog Posts

 

 

 

Books



 

Videos & Video Courses

Statistics for Data Science and Business Analysis

Statistics for Data Science and Business Analysis

4 hour Udemy Video Course delivered with animated videos. Perfect for beginners and will help get you started with basic statistical concepts

Statistics for Business Analytics A-Z

Statistics for Business Analytics A-Z

7 hour Udemy Video Course. Great for those needing a more business-oriented introduction to stats. Better still, the course even comes with homework. Yay!

Applied Statistical Modeling for Data Analysis in R

Applied Statistical Modeling for Data Analysis in R

9 hour Udemy Video Course. This is one of the top stats courses at Udemy and is a must-see for those that need to learn stats in R

 

Software

CorrelViz - visualise all the correlations in your data in minutes

CorrelViz - visualise all the correlations in your data in minutes
CorrelViz is completely automated and gives you the Story of Your Data in minutes, with one click - saving you months of manual analysis and shed-loads of cash!
Analyse all your data, discover all the correlations you seek - and some you never even dreamed of...

 

Geeky Stuff

 


 

blog comments powered by Disqus