Deploying a Distributed AI Stack to Kubernetes on CentOS¶
Install and manage a Kubernetes cluster (version 1.13.4) with helm on a single CentOS 7 vm or in multi-host mode that runs the cluster on 3 CentOS 7 vms. Once running, you can deploy a distributed, scalable python stack capable of delivering a resilient REST service with JWT for authentication and Swagger for development. This service uses a decoupled REST API with two distinct worker backends for routing simple database read and write tasks vs long-running tasks that can use a Redis cache and do not need a persistent database connection. This is handy for not only simple CRUD applications and use cases, but also serving a secure multi-tenant environment where multiple users manage long-running tasks like training deep neural networks that are capable of making near-realtime predictions.
This guide was built for deploying the AntiNex stack of docker containers and the Stock Analysis Engine on a Kubernetes single host or multi-host cluster.
- Managing a Multi-Host Kubernetes Cluster with an External DNS Server
- Cert Manager with Let’s Encrypt SSL support
- A Native Ceph Cluster for Persistent Volume Management with KVM
- A Third-party Rook Ceph Cluster for Persistent Volumes
- Minio S3 Object Store
- Redis
- Postgres
- Django REST API with JWT and Swagger
- Django REST API Celery Workers
- Jupyter
- Core Celery Workers
- pgAdmin4
- (Optional) Splunk with TCP and HEC Service Endpoints
- Deploying a Distributed AI Stack to Kubernetes on CentOS
- Getting Started
- Validate
- Deploy Redis and Postgres and the Nginx Ingress
- Start Applications
- Run a Database Migration
- Add Ingress Locations to /etc/hosts
- Using the Minio S3 Object Store
- Using the Rook Ceph Cluster
- Create a User
- Deployed Web Applications
- View Django REST Framework
- View Swagger
- View Jupyter
- View pgAdmin
- View Minio S3 Object Storage
- View Ceph
- View Splunk
- Training AI with the Django REST API
- Train a Deep Neural Network on Kubernetes
- Get the AI Job Record
- Get the AI Training Job Results
- Standalone Deployments
- Deploy Redis
- Deploy Postgres
- Deploy pgAdmin
- Deploy Django REST API
- Deploy Django Celery Workers
- Deploy AntiNex Core
- Deploy Jupyter
- Deploy Splunk
- Searching in Splunk
- Search using Spylunking
- Find Django REST API Logs in Splunk
- Find Django Celery Worker Logs in Splunk
- Find Core Logs in Splunk
- Find Jupyter Logs in Splunk
- Deploy Nginx Ingress
- View Ingress Nginx Config
- View a Specific Ingress Configuration
- Deploy Splunk
- Deploy Splunk-Ready Applications
- Create your own self-signed x509 TLS Keys, Certs and Certificate Authority with Ansible
- Deploying Your Own x509 TLS Encryption files as Kubernetes Secrets
- Deploy Cert Manager with Let’s Encrypt
- Stop the Cert Manager
- Troubleshooting
- Customize Minio and How to Troubleshoot
- Ceph Troubeshooting
- Validate Ceph System Pods are Running
- Validate Ceph Pods are Running
- Validate Persistent Volumes are Bound
- Validate Persistent Volume Claims are Bound
- Create a Persistent Volume Claim
- Verify the Persistent Volume is Bound
- Verify the Persistent Volume Claim is Bound
- Describe Persistent Volumes
- Show Ceph Cluster Status
- Show Ceph OSD Status
- Show Ceph Free Space
- Show Ceph RDOS Free Space
- Out of IP Addresses
- AntiNex Stack Status
- Reset Cluster
- Development
- Testing
- License
- Running a Distributed Ceph Cluster on a Kubernetes Cluster
- Add the Ceph Mon Cluster Service FQDN to /etc/hosts
- Build KVM HDD Images
- Attach KVM Images to VMs
- Format Disks in VM
- Install Ceph on All Kubernetes Nodes
- Deploy Ceph Cluster
- Watch all Ceph Logs with Kubetail
- Show Pods
- Check Cluster Status
- Validate a Pod can Mount a Persistent Volume on the Ceph Cluster in Kubernetes
- Kubernetes Ceph Cluster Debugging Guide
- OSD Issues
- Cluster Status Tools
- Uninstall
- Managing a Multi-Host Kubernetes Cluster with an External DNS Server
- Start All Kubernetes Cluster VMs
- Deploy a Distributed AI Stack to a Multi-Host Kubernetes Cluster
- Set up an External DNS Server for a Multi-Host Kubernetes Cluster
- Start using the Stack
- Deployed Web Applications
- View Django REST Framework
- View Swagger
- View Jupyter
- View pgAdmin
- View Minio S3 Object Storage
- View Ceph
- View Splunk
- Train AI with Django REST API
- Next Steps