Intel is introducing an open source platform called Nauta for distributed DL using Kubernetes.
Nauta provides a multi-user, distributed computing environment for running DL model training experiments on Intel Xeon Scalable processor-based systems. Results can be viewed and monitored using a command line interface, web UI and/or TensorBoard. Developers can use existing data sets, proprietary data, or downloaded data from online sources, and create public or private folders to make collaboration among teams easier. For scalability and ease of management, Nauta uses components from the industry-leading Kubernetes orchestration system, leveraging Kubeflow, and Docker for containerized machine learning at scale. DL model templates are available (and customizable) on the platform, removing complexities associated with creating and running single and multi-node deep learning training experiments. For model testing, Nauta also supports both batch and streaming inference, all in a single platform.
Intel says it created Nauta with the workflow of developers and data scientists in mind. Nauta is an enterprise-grade stack for teams who need to run DL workloads to train models that will be deployed in production. With Nauta, users can define and schedule containerized deep learning experiments using Kubernetes on single or multiple worker nodes, and check the status and results of those experiments to further adjust and run additional experiments, or prepare the trained model for deployment.
Nauta gives users the ability to leverage shared best practices from seasoned machine learning developers and operators without sacrificing flexibility. At every level of abstraction, developers still have the opportunity to fall back to Kubernetes and use primitives directly. Nauta gives newcomers to Kubernetes the ability to experiment – while maintaining guard rails.
Nauta also facilitates collaboration with team members, as it was designed from the start with the ability to enable multiple users. Job inputs and outputs can be shared between team members and used to help debug issues by launching TensorBoard against others’ job checkpoints.
Nauta can run in public cloud and enterprise data center environment. In its current form, Nauta can be tested on Google Cloud Platform.
Technical information, including installation guides, user documentation, and inforamtion on how to get involved with the project, is available on Intel's Github repo.
Kubernetes is becoming the de facto standard for running modern, distributed workloads. Kubeflow, an open source project initiated by Google aims to bring the best of machine learning and container orchestration to model management and experimentation. Intel Nauta embraces and extends Kubeflow to support additional scenarios.
Public cloud vendors including Amazon, Google, IBM, and Microsoft are investing in next-generation PaaS that aims to simplify machine learning model management and experimentation. Open source projects such as Intel’s Nauta attempt to bring ML PaaS capabilities to enterprises. Customers deploying ML platforms in the data center would be able to provide data scientists and developers the same experience of using a managed service hosted in the public cloud.