Thursday, April 13, 2017

Big Data Analytics using TDA as First Step

I recently got involved with the learning of Topological Data Analytics (TDA) to understand the "shape" of Big Data data set. In classical Machine Learning approach, we often start with trying to explore the given data set before we even try to create hypothesis and then select the entire pipeline of Machine Learning processing workflow. Actually, after we done initial poking of the data set format, and maybe collected some initial domain specific knowledge about the data set, we typically try to apply "dimension reduction" techniques such as PCA, SVM to identify what are the main dominating dimensions. However, when the size of the dimension of the given data set is very large such as tens of thousands in DNA related analysis, those Machine Learning algorithms for dimension reduction become not effective or helpful in providing "visual" shape of the data set. For example, when using clustering algorithms over the Twitter challenge data set (? reference) or Netflex challenge data set (?reference). Prof. Dr. Gunnar Carlsson with his students has published papers in this area. Also, you can check out his talk at University of Chicago about TDA - Data Shape.

With my initial understanding after studying a lot of TDA technical journal papers published in recent years and many YouTube TDA related videos. I am forming my opinion about how TDA can be a very effective technique for understanding or viewing the "shape" of the big data analytics and I also believe that the TDA and classical Machine Learning are complimentary to each other.

Container-based Computing Platforms Anywhere!

Wanna Container-based Computing Platforms Anywhere? Imaging that all you need is just some bare bone OS (Linux, Mac, or Windows) with only...