9. So How Can Hive Help?

The most dimensionally rich reports were generally scheduled reports, since it didn't make sense to present such voluminous data in a web UI.

Our initial solution is therefore bifurcated: We still cube a limited set of dimensions for UI presentation purposes, while sending full log files to Hadoop for scheduled reports. A batched report doesn't need to have a sub-second response time.

This means we can actually expand the dimensions available in our scheduled reports in future, since Hadoop gets the raw log files.

It also means we can perform free-form analytics on our log data in future, using Hive as an exploratory tool. Maybe we can ask even better questions of our data in future, which could in turn suggest better scheduled reports to present to our customers.

If our traffic volume increases to the point where even our cubed data with limited dimensions are taking too long to load, we can go to a model where we update the UI-facing relational tables from our Hive database.