Archive for May, 2012

Book Review – Machine Learning for Hackers

Sunday, May 13th, 2012

Machine Learning for Hackers provides an introduction to Machine Learning and the increasingly popular statistics oriented language: R.

The book covers the basic concepts and some useful tools, including:

  • An introduction to R;
  • Basic stats and probability;
  • Supervised and unsupervised learning;
  • Linear regression and categorization;
  • Non-linear data and regularization;
  • Principal Component Analysis (PCA) and input correlation;
  • Multidimensional scaling (MDS) for clustering;
  • k-nearest neighbour (kNN) for social network analysis; and
  • SVMs for non-linear classification.

The general structure of each section is to first introduce a new concept, then demonstrate it by applying the concept to a trivial data set. Next, the technique is applied to a real data set. This structure is a great way to understand a technique.

The complete process of first massaging the data and then determining the technique to apply is covered. Occasionally the author makes a wrong turn and the analysis fails. The demonstration of failure, why it occurs and what to do about it is a great feature of the book.

The book is almost completely lacking in any of the mathematics or workings of the underlying algorithms being used, which may be considered a good or bad thing. Sometimes the book felt more like a tutorial on using R’s various machine learning packages, rather than learning about machine learning itself.

If you aren’t familiar with R or machine learning, this book presents a significant learning curve. Unfortunately, R’s syntax can be quite opaque, even to experienced programmers. Indeed, due to the heavy R component in this book, a better title may have been “Machine Learning with R”.

I’m not sure you can “hack” machine learning without properly understanding the underlying concepts, but with this book you can undoubtedly try.

The book presents a relatively quick, somewhat cursory overview of Machine Learning. It provides a good starting point for further study.

Note: This book was provided by O’Reilly Media as part of their blogger review program.

I review for the O'Reilly Blogger Review Program