Free Data Science eBooks - November 2017
The clocks fell back an hour last week, halloween is behind us and there's only a few days until bonfire night - most of the autumn has already gone and the nights are getting colder and longer.
What better time to get comfy under a nice warm blanket with a hot mug of cocoa and a good book.
It's getting late in our Back To School series, but here are three free eBooks to help you on your educational journey and make those long nights just that bit shorter.
I hope these books prove to be a valuable resource to you and that you will visit regularly (and share with your friends in social media too).
If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.
This month we highlight 3 books:
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
- Advanced Linear Models for Data Science
- A Probabilistic Theory of Pattern Recognition
They're all FREE, so help yourselves...
by Hadley Wickham and Garrett Grolemund
Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.
You’ll learn how to:
- Wrangle – transform your datasets into a form convenient for analysis
- Program – learn powerful R tools for solving data problems with greater clarity and ease
- Explore – examine your data, generate hypotheses, and quickly test them
- Model – provide a low-dimensional summary that captures true “signals” in your dataset
- Communicate – learn R Markdown for integrating prose, code, and results
Enjoying this blog post? Share it with the world...
by Brian Caffo
Linear models are the cornerstone of statistical methodology. Perhaps more than any other tool, advanced students of statistics, biostatistics, machine learning, data science, econometrics, etcetera should spend time learning the finer grain details of this subject.
In this book, we give a brief, but rigorous treatment of advanced linear models. It is advanced in the sense that it is of level that an introductory PhD student in statistics or biostatistics would see. The material in this book is standard knowledge for any PhD in statistics or biostatistics.
Students will need a fair amount of mathematical prerequisites before trying to undertake this class. First, is multivariate calculus and linear algebra. Especially linear algebra, since much of the early parts of linear models are direct applications of linear algebra results applied in a statistical context. In addition, some basic proof based mathematics is necessary to follow the proofs. In addition, some regression models and mathematical statistics are needed.
by Luc Devroye, Laszlo Györfi and Gabor Lugosi
A self-contained and coherent account of probabilistic techniques, covering: distance measures, kernel rules, nearest neighbour rules, Vapnik-Chervonenkis theory, parametric classification, and feature extraction. Each chapter concludes with problems and exercises to further the readers understanding. Both research workers and graduate students will benefit from this wide-ranging and up-to-date account of a fast- moving field.
Pattern recognition presents one of the most significant challenges for scientists and engineers, and many different approaches have been proposed.
The aim of this book is to provide a self-contained account of probabilistic analysis of these approaches. The book includes a discussion of distance measures, nonparametric methods based on kernels or nearest neighbors, Vapnik-Chervonenkis theory, epsilon entropy, parametric classification, error estimation, tree classifiers, and neural networks. Wherever possible, distribution-free properties and inequalities are derived. A substantial portion of the results or the analysis is new. Over 430 problems and exercises complement the material.
If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:
- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro
Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links
You can find further details in our TCs
Practical Data Cleaning - 19 Essential Tips to Scrub Your Dirty Data
It's always difficult knowing where to start, but especially so when it comes to Data Science. No need to fret, though - we've selected our top 21 books that all aspiring data scientists should read.
These will get you going in no time...
Correlation and Causation - The Trouble With Story Telling
How many times have you heard that ‘correlation does not imply causation’? Lots, but I bet you didn't know that there are five reasons why you should not trust your intuition. This book gives you the tools to discover the five traps that even experienced investigators fall into.
Videos & Video Courses
4 hour Udemy Video Course delivered with animated videos. Perfect for beginners and will help get you started with basic statistical concepts
7 hour Udemy Video Course. Great for those needing a more business-oriented introduction to stats. Better still, the course even comes with homework. Yay!
9 hour Udemy Video Course. This is one of the top stats courses at Udemy and is a must-see for those that need to learn stats in R
CorrelViz - visualise all the correlations in your data in minutes
CorrelViz is completely automated and gives you the Story of Your Data in minutes, with one click - saving you months of manual analysis and shed-loads of cash!
Analyse all your data, discover all the correlations you seek - and some you never even dreamed of...
blog comments powered by Disqus