Sunday, March 25, 2018

Container-based Computing Platforms Anywhere!

Wanna Run Container-based (Big Data/ML) Computing GUI & Platforms Anywhere and accessible from your Tablets or Smartphones? 

(updated 2020-07-26)

During the past years, with many actual deployments of leveraging and integrating many open-source projects in Github and Docker Hub including using many of my own 320+ diversified Container-based tools and projects (in programming Java, Python in AI/ML analytics applications, Interactive ML/DL Notebooks, in Ubuntu/CentOS, even HPC Containers computing and enabling various kinds of Containers to be actually running as back-end Servers, Desktop Applications (e.g., KNIME, Protege, Eclipse, Pycharm, etc) and VNC/No-VNC HTML-5 based Container Applications (e.g., many of vnc/no-vnc based Container in my Github and Docker Hub), I personally seeing more adoption by engineers, researchers, management. As an A.I./ML/DL researchers and practitioner of Container technologies, I have converted many "doubters" about "does Container really work as it promises?". We can't predict the future of new technologies, but we can confirm that "changing of available technologies is certainly for those who will adapt to thrive and rise above!". I will continue to publish more and evolve my open-source projects to adopt new technologies and adapt to the new needs to make them be practically useful.

One new trend in using Container is that the VNC/no-VNC HTML-5 based Container is having more growing downloads recently. The vnc/no-vnc based Containers in my 320+ projects are becoming more preferred mostly maybe it is due to its ubiquitously accessible from anywhere or any device with HTML-5 Web Browser. In my Docker Hub site's downloading, I have been seen the trending is picking up more recently. For example, openkbs/knime-vnc-docker (Web browser version using vnc/no-vnc HTML-5) downloading is rapidly growing recently from hundreds to now 2.5K downloads of images while openkbs/knime-docker (Desktop version using X11). You might also want to consider to explore those.


(updated 2019-01-08)

Imagine that all you need is just some bare bone OS (Linux, Mac, or Windows) with only a tiny installation of Docker (Linux, Mac, or, Windows), and, within a few minutes, you can have an array of your favorite tools, IDE (Eclipse, ScalaIDE, IntelliJ, PyCharm, etc.), programming languages environments (Java 8/9, Python 2/3, Maven, etc.), Big Data / Machine Learning / Analytics tools (R, Weka, KNIME, RapidMiner, OpenRefine, etc.), Machine / Deep Learning Environment (Jupyter, Zeppelin, SparkNotebook, etc.) with Spark and/or Hadoop clusters, NLP tools, Logic Programming (Berkeley BLOG), RDF/OWL (Stanford's Protege, OntoText, Blazegraph), HPC (High-Performance Computing) using Singularity containers), or any other commonly used tools as portable agile software development, prototype, or testing computing environments.

And, your laptop, desktop, or server requires no local installation of any library or dependency to mess up your host machines' OS files - no conflicting versions of tools and libraries. And, most importantly, the agility and light-weight Docker-based tools, IDE, or clusters, or even deploy your favorite container to using enterprise container platforms like Kubernetes, DC/OS or OpenShift to have very large scale production environments.

My interests and goals are to enable users (developers or anyone) to do the above by rapidly standing up full-fledged computing platform either on a simple laptop, desktop, server, cluster, or cloud infrastructures with the needed containers (e.g. from the GitHub) using the source to build your own or using ready-to-run docker images (e.g. from the docker hub).

Accessing Big Data Analytics Platform GUI Tools (KINME, ...) or IDE (IntelliJ, Eclipse, Netbeans) with your Tablets or Smartphones?

  • VNC / noVNC-based docker containers (Newly launched! 2019)
    • Recently launched a few of VNC/noVNC-based container including KNIME, Eclipse, and more to come. So, you can use your Tablets those Desktop-based tools or IDEs with all kinds of internet-enabled devices or PCs including iPad/iPadPro, Pi, or even your large screen smartphones to access KNIME Big data platform tools.
    • openkbs/knime-vnc-docker 
    • openkbs/eclipse-photon-vnc-docker
    • (more VNC-based data analytics / ML / AI containers to come).
  • With newly deloyed VNC-based containers in openkbs docker repository, you expand your horizon of using IDE tools, or GUI tools for Big Data Analytics or Machine Learning to any device including iPad, any web-enabled Tablet, smartphones. However, due to the nature of most of those big data studio tools requiring bigger screen, it is recommended you use larger screen device such as iPad Pro, Microsoft Surface Pro, or any other similar larger screen devices.


You can try them out and they are open sources!

Overview of the above Open Source Docker Projects

In the GitHub projects, about 30% are unique creations and 70% are forking other GIT projects:
  • Simple Docker Github project templates
    • With the template files, docker.env (for variables), Dockerfile,, to enable you to have some working Docker project. The scripts (Bash) files are coded smartly so that you don't need to change anything (unless you want to customize the default). You can just leave and as it is.
    • To build, just, do in shell, "./" 
    • To run, just do in shell, "./"
    • You can try it out by git clone this "Docker Template GIT ("
  • Basic Dockers
    • Java 8/9 (JDK) + Python (2 or 3) + Maven (3.5) containers
      • As the base container images to enable users to overlay extensions or domain specific add-on processing.
      • In the github home, just search for "java", "jre", "jdk" and your will see multiple choices.
  • X11 base docker container
    • As the base X11 desktop application, e.g., Eclipse, IntelliJ, etc., to have display of GUI on your host computer's screen.
    • In the github home, just search for "x11".
  • IDE docker containers (Eclipse, ScalaIDE, IntelliJ, PyCharm, etc.)
    • In the github home, just search for "eclipse", "IntelliJ", "pycharm", "scala".
  • Spark / Hadoop Cluster / NoSQL etc.
  • RDF/OWL/RDFS/OWLS Database and Tools
  • Big Data Platforms
  • HPC (High-Performance Computing - Super Computers) Docker for Singularity
    • Note that HPC docker for Singularity is still in high churning of revisions.
    • In the github home, just search for "hpc", "singularity"
  • Or, you can browse all the 170+ container-based Docker projects


Currently, all the above Docker-based tools / IDE / projects are mainly focusing at any Linux-based or Mac OS. For Windows, the automated scripts, "" and "" are not having equivalent versions in Windows Power-shell yet. In Windows, you still can use Docker to launch any of the above Docker Containers. And, you are welcomed to fork the above GIT projects to add Windows' Power-shell to do identical automation both ( or scripts.

Thursday, April 13, 2017

Big Data Analytics using TDA as First Step

Big Data Analytics using TDA as First Step

I recently got involved with the learning of Topological Data Analytics (TDA) to understand the "shape" of Big Data data set. In the classical Machine Learning approach, we often start with trying to explore the given data set before we even try to create a hypothesis and then select the entire pipeline of Machine Learning processing workflow. Actually, after we did initial poking of the data set format, and may be collected some initial domain-specific knowledge about the data set, we typically try to apply "dimension reduction" techniques such as PCA, SVM to identify what are the main dominating dimensions. However, when the size of the dimension of the given data set is very large such as tens of thousands in DNA related analysis, those Machine Learning algorithms for dimension reduction become not effective or helpful in providing the "data visual" shape of the data set. There are similar studies using various approaches in "Reduction of High-Dimensional Data"  (see Google Scholar in Dimension Reduction researches). However, TDA using Topology Theory provides the simplicity of computation while preserving the "important relations intra-dimensional and inter-dimensional features. For example, when using clustering algorithms over the Twitter challenge data set (? reference) or Netflex challenge data set (?reference). Prof. Dr. Gunnar Carlsson with his students has published papers in this area. Also, you can check out his talk at University of Chicago about TDA - Data Shape.

With my initial understanding after studying a lot of TDA technical journal papers published in recent years and many YouTube TDA related videos. I am forming my opinion about how TDA can be a very effective technique for understanding or viewing the "shape" of the big data analytics and I also believe that the TDA and classical Machine Learning are complementary to each other.

  - Updated 2020/07/26 #QED

Container-based Computing Platforms Anywhere!

Wanna Run Container-based (Big Data/ML) Computing GUI & Platforms Anywhere and accessible from your Tablets or Smartphones?  (updated ...