Free Data Science eBooks - July 2017
Every month we scour the internet seeking out free eBooks to help you on your educational journey, and this month has been no different.
I hope these books prove to be a valuable resource to you and that you will visit regularly (and invite your friends too).
If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.
This month, we have Information Theory, Inference and Learning algorithms, Data Science in the Cloud with Microsoft Azure Machine Learning and Python and Data-Intensive Text Processing with MapReduce. They're all FREE, so help yourselves.
By the way, the first one is written by David MacKay. You might not have heard of him, but when I was doing my PhD, his 1992 PhD thesis Bayesian Methods for Adaptive Models was my Bible. He doesn't know it, but David is my God! We are not worthy...
by David MacKay
Information theory and inference, often taught separately, are here united in one entertaining textbook.
These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography.
This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction.
A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks.
Enjoying this blog post? Share it with the world...
by Stephen Elston
Take time to explore Microsoft’s Azure machine learning platform, Azure ML - a production environment that simplifies the development and deployment of machine learning models.
In this O’Reilly report, Stephen Elston from Quantia Analytics uses a complete data science example (forecasting hourly demand for a bicycle rental system) to show you how to manipulate data, construct models, and evaluate models with Azure ML.
The report walks you through key steps in the data science process from problem definition, data understanding, and feature engineering, through construction of a regression model and presentation of results. You’ll also learn how to extend Azure ML with Python.
Elston uses downloadable Python code and data to demonstrate how to perform data munging, data visualization, and in-depth evaluation of model performance. At the end, you’ll learn how to publish your trained models as web services in the Azure cloud.
by Jimmy Lin and Chris Dyer
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications.
Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.
MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance.
This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains.
This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.
If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:
- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro
Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links
You can find further details in our TCs
Videos & Video Courses
4 hour Udemy Video Course delivered with animated videos. Perfect for beginners and will help get you started with basic statistical concepts
7 hour Udemy Video Course. Great for those needing a more business-oriented introduction to stats. Better still, the course even comes with homework. Yay!
9 hour Udemy Video Course. This is one of the top stats courses at Udemy and is a must-see for those that need to learn stats in R
CorrelViz - visualise all the correlations in your data in minutes
CorrelViz is completely automated and gives you the Story of Your Data in minutes, with one click - saving you months of manual analysis and shed-loads of cash!
Analyse all your data, discover all the correlations you seek - and some you never even dreamed of...
blog comments powered by Disqus