I recently got involved with the learning of Topological Data Analytics (TDA) to understand the "shape" of Big Data data set. In the classical Machine Learning approach, we often start with trying to explore the given data set before we even try to create a hypothesis and then select the entire pipeline of Machine Learning processing workflow. Actually, after we did initial poking of the data set format, and may be collected some initial domain-specific knowledge about the data set, we typically try to apply "dimension reduction" techniques such as PCA, SVM to identify what are the main dominating dimensions. However, when the size of the dimension of the given data set is very large such as tens of thousands in DNA related analysis, those Machine Learning algorithms for dimension reduction become not effective or helpful in providing the "data visual" shape of the data set. There are similar studies using various approaches in "Reduction of High-Dimensional Data" (see Google Scholar in Dimension Reduction researches). However, TDA using Topology Theory provides the simplicity of computation while preserving the "important relations intra-dimensional and inter-dimensional features. For example, when using clustering algorithms over the Twitter challenge data set (? reference) or Netflex challenge data set (?reference). Prof. Dr. Gunnar Carlsson with his students has published papers in this area. Also, you can check out his talk at University of Chicago about TDA - Data Shape.
With my initial understanding after studying a lot of TDA technical journal papers published in recent years and many YouTube TDA related videos. I am forming my opinion about how TDA can be a very effective technique for understanding or viewing the "shape" of the big data analytics and I also believe that the TDA and classical Machine Learning are complementary to each other.
- Updated 2020/07/26 #QED