3 Must-Read Statistics Books for Aspiring Data Scientists
If you're making the switch to Data Science, you might have come from a programming route or from science. It would be all too easy to learn a few new skills in data handling and machine learning and neglect statistics.
After all, many of the new breed of Data Scientists are declaring stats to be dead.
They are wrong, and it would be a big mistake to ignore statistics.
Traditionally stats was used mainly for hypothesis testing, but in these days of Data Science, Big Data and the Internet of Things it's being used just as much for making discoveries and formulating new hypotheses.
If you don't know where to start your educational journey with stats, the 3 books in this blog post will help you make your first steps.
Disclosure: the three books in this post link you to the listed book at your local Amazon store. We may earn an affiliate commission for purchases you make when using the links to books on this page.
You can find further details in our TCs.
In this post - the 3rd in a series of 8 in which we bring you 21 Inspirational Books for All Aspiring Data Scientists, we highlight 3 books to introduce you to the subject of statistics in Data Science:
- Naked Statistics: Stripping the Dread from the Data
- Practical Statistics for Data Scientists: 50 Essential Concepts
- Statistics Done Wrong: The Woefully Complete Guide
They are all for beginners, are very entertaining and give you a great idea of how to do stats right - and how to spot when they're wrong!
by Charles Wheelan
Once considered tedious, the field of statistics is rapidly evolving into a discipline Hal Varian, chief economist at Google, has actually called “sexy”. From batting averages and political polls to game shows and medical research, the real-world application of statistics continues to grow by leaps and bounds. How can we catch schools that cheat on standardized tests? How does Netflix know which movies you’ll like? What is causing the rising incidence of autism? As best-selling author Charles Wheelan shows us in Naked Statistics, the right data and a few well-chosen statistical tools can help us answer these questions and more.
For those who slept through Stats 101, this book is a lifesaver. Wheelan strips away the arcane and technical details and focuses on the underlying intuition that drives statistical analysis. He clarifies key concepts such as inference, correlation, and regression analysis, reveals how biased or careless parties can manipulate or misrepresent data, and shows us how brilliant and creative researchers are exploiting the valuable data from natural experiments to tackle thorny questions.
And in Wheelan’s trademark style, there’s not a dull page in sight. You’ll encounter clever Schlitz Beer marketers leveraging basic probability, an International Sausage Festival illuminating the tenets of the central limit theorem, and a head-scratching choice from the famous game show Let’s Make a Deal – and you’ll come away with insights each time. With the wit, accessibility, and sheer fun that turned Naked Economics into a bestseller, Wheelan defies the odds yet again by bringing another essential, formerly unglamorous discipline to life.
Enjoying this blog post? Share it with the world...
by Peter Bruce and Andrew Bruce
Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
With this book, you’ll learn:
- Why exploratory data analysis is a key preliminary step in data science
- How random sampling can reduce bias and yield a higher quality dataset, even with big data
- How the principles of experimental design yield definitive answers to questions
- How to use regression to estimate outcomes and detect anomalies
- Key classification techniques for predicting which categories a record belongs to
- Statistical machine learning methods that “learn” from data
- Unsupervised learning methods for extracting meaning from unlabeled data
by Alex Reinhart
Scientific progress depends on good research, and good research needs good statistics. But statistical analysis is tricky to get right, even for the best and brightest of us. You’d be surprised how many scientists are doing it wrong.
Statistics Done Wrong is a pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free. You’ll examine embarrassing errors and omissions in recent research, learn about the misconceptions and scientific politics that allow these mistakes to happen, and begin your quest to reform the way you and your peers do statistics.
You’ll find advice on:
- Asking the right question, designing the right experiment, choosing the right statistical analysis, and sticking to the plan
- How to think about p values, significance, insignificance, confidence intervals, and regression
- Choosing the right sample size and avoiding false positives
- Reporting your analysis and publishing your data and source code
- Procedures to follow, precautions to take, and analytical software that can help
All 8 posts in the series:
- 21 Inspirational Books for All Aspiring Data Scientists:
- 3 Great Data Science Books for Aspiring Data Scientists
- 3 Must-Read Statistics Books for Aspiring Data Scientists
- 3 Essential Python Books for Aspiring Data Scientists
- 3 Books on R That all Aspiring Data Scientists Should Read
- 3 Inspirational Machine Learning Books for Aspiring Data Scientists
- 3 Essential Visualisation Books for Aspiring Data Scientists
- 3 Must-Read Books on Data Ethics for Aspiring Data Scientists
If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:
- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro
Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links
You can find further details in our TCs
Videos & Video Courses
4 hour Udemy Video Course delivered with animated videos. Perfect for beginners and will help get you started with basic statistical concepts
7 hour Udemy Video Course. Great for those needing a more business-oriented introduction to stats. Better still, the course even comes with homework. Yay!
9 hour Udemy Video Course. This is one of the top stats courses at Udemy and is a must-see for those that need to learn stats in R
CorrelViz - visualise all the correlations in your data in minutes
CorrelViz is completely automated and gives you the Story of Your Data in minutes, with one click - saving you months of manual analysis and shed-loads of cash!
Analyse all your data, discover all the correlations you seek - and some you never even dreamed of...
blog comments powered by Disqus