Feed Your Creativity

Inspiration in Your Inbox !
BLOGS: Popular The Captain's Blog Discover Data Discover Stats Discover Visualisation

RESOURCES: Popular eBooks Videos eCourses

Free Data Science eBooks - July 2017

Every month we scour the internet seeking out free eBooks to help you on your educational journey, and this month has been no different.

I hope these books prove to be a valuable resource to you and that you will visit regularly (and invite your friends too).

If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.

 

3 Free Data Science Books for July

 

This month, we have Information Theory, Inference and Learning algorithms, Data Science in the Cloud with Microsoft Azure Machine Learning and Python and Data-Intensive Text Processing with MapReduce. They're all FREE, so help yourselves.

By the way, the first one is written by David MacKay. You might not have heard of him, but when I was doing my PhD, his 1992 PhD thesis Bayesian Methods for Adaptive Models was my Bible. He doesn't know it, but David is my God! We are not worthy...

 

Enjoy!

 


 

Information Theory, Inference and Learning algorithms

by David MacKay

Information theory and inference, often taught separately, are here united in one entertaining textbook.

These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography.

This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction.

A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks.

 

Enjoying this blog post? Share it with the world...

 

Data Science in the Cloud with Microsoft Azure Machine Learning and Python

by Stephen Elston

Take time to explore Microsoft’s Azure machine learning platform, Azure ML - a production environment that simplifies the development and deployment of machine learning models.

In this O’Reilly report, Stephen Elston from Quantia Analytics uses a complete data science example (forecasting hourly demand for a bicycle rental system) to show you how to manipulate data, construct models, and evaluate models with Azure ML.

The report walks you through key steps in the data science process from problem definition, data understanding, and feature engineering, through construction of a regression model and presentation of results. You’ll also learn how to extend Azure ML with Python.

Elston uses downloadable Python code and data to demonstrate how to perform data munging, data visualization, and in-depth evaluation of model performance. At the end, you’ll learn how to publish your trained models as web services in the Azure cloud.

 

 

Data-Intensive Text Processing with MapReduce

by Jimmy Lin and Chris Dyer

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications.

Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.

MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance.

This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains.

This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.

 


 

Learn More

 

If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:

- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro

 

Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links

You can find further details in our TCs

 

Blog Posts

 

 

 

Books

Practical Data Cleaning - 19 Essential Tips to Scrub Your Dirty Data

Practical Data Cleaning - 19 Essential Tips to Scrub Your Dirty Data
It's always difficult knowing where to start, but especially so when it comes to Data Science. No need to fret, though - we've selected our top 21 books that all aspiring data scientists should read.
These will get you going in no time...


Correlation and Causation - The Trouble With Story Telling

Correlation and Causation - The Trouble With Story Telling
How many times have you heard that ‘correlation does not imply causation’? Lots, but I bet you didn't know that there are five reasons why you should not trust your intuition. This book gives you the tools to discover the five traps that even experienced investigators fall into.


 

Videos & Video Courses

Statistics for Data Science and Business Analysis

Statistics for Data Science and Business Analysis

4 hour Udemy Video Course delivered with animated videos. Perfect for beginners and will help get you started with basic statistical concepts

Statistics for Business Analytics A-Z

Statistics for Business Analytics A-Z

7 hour Udemy Video Course. Great for those needing a more business-oriented introduction to stats. Better still, the course even comes with homework. Yay!

Applied Statistical Modeling for Data Analysis in R

Applied Statistical Modeling for Data Analysis in R

9 hour Udemy Video Course. This is one of the top stats courses at Udemy and is a must-see for those that need to learn stats in R

 

Software

CorrelViz - visualise all the correlations in your data in minutes

CorrelViz - visualise all the correlations in your data in minutes
CorrelViz is completely automated and gives you the Story of Your Data in minutes, with one click - saving you months of manual analysis and shed-loads of cash!
Analyse all your data, discover all the correlations you seek - and some you never even dreamed of...

 

Geeky Stuff

 


 

blog comments powered by Disqus