Archive for the ‘data analysis’ Category

Book Review – Graph Databases

Monday, August 26th, 2013


Graph Databases provides a concise introduction to this particular alternative to the relational database.

Having lots of experience with relational databases and very little experience with graph databases, I found this book to be an interesting read. The book effectively describes the weaknesses of relational databases and explains how graph databases address these weaknesses.

After introducing the idea of a graph database, the book proceeds to demonstrate domains that graph databases are suited to. This corresponds to domains where a network is a natural representation of the data, although the authors tend to suggest that graph databases are almost always more suitable than a relational database!

Next up is the demonstration of a specific implementation: Neo4J and Cypher. Examples of how to create a Neo4J database and query with Cypher follow. Explanations are a little terse, but the interested student can easily investigate further.

Finally, the book includes an interesting comparison of Graph Databases with some of the other NoSQL options available.

My only reservation is that the book felt a little unbalanced in its unwavering promotion of graph databases and the limited discussion of alternatives to Neo4J/Cypher. Overall though, this book provided a good overview of this technology and opened my eyes to the possibilities of Graph Databases.

Note: This book was provided by O’Reilly Media as part of their blogger review program.

I review for the O'Reilly Blogger Review Program

Book Review – Python for Data Analysis

Sunday, January 20th, 2013


Python for Data Analysis is primarily a reference for Pandas. Pandas is a Data Analysis library for Python.

Also covered in less depth are some other components in Python’s data analysis ecosystem. There are chapters on iPython and NumPy. A chapter on plotting and visualization provides a great rundown of matplotlib, along with mention of alternatives like chaco and mayavi.

Pandas is then presented in significant depth, with sections on data storage, data transformation, data aggregation and time series analysis. This forms the bulk of the book.

This is a well-written book that provides a good summary of Python’s data analysis capabilities, however, it will not teach you how to do data analysis. This book will show you how to use the Pandas library.

Note: This book was provided by O’Reilly Media as part of their blogger review program.

I review for the O'Reilly Blogger Review Program