- Kubeflow 1.5 milestones and timeline
- Switching to Emissary executor enables Kubeflow Pipelines deployment on Kubernetes >= v1.20, which runs on containerd runtime instead of Docker runtime
- Improve model accuracy and reduce overfitting, especially with hyper parameter tuning
- Simplify operations and optimize utilization (including spot instance use cases for distributed training)
- More consistent user experience - UI appearance, features and naming
- Improved documentation, tutorials and examples
- Stretch - Support for K8s 1.22 and associated dependencies (cert mgr, istio)
- Switching to Emissary executor enables Kubeflow Pipelines deployment on Kubernetes >= v1.20, which runs on containerd runtime instead of Docker runtime.
- Katib controller leader election for HA operations and faster recovery
- Validation for Early Stopping algorithm settings helps users to reduce model overfitting
- Improve SDK, AWS CI, parameter settings across frameworks (goptuna, optuna, hyperopt)
- Update namespace label for Metrics Collector injection (note - breaking change)
- Elastic Training for PyTorch, restarts pod rather than whole job, supports spot instances
- MPI addition to Unified training operator
- Python SDK supported on PyPI
- Exposing notebook idleness parameters, reduces infra used on idle notebooks
- UI consistency between web apps along with frontend pagination, and remove cards and user responsive tables
- Support for dark mode
- More options to define PVC specification
- KFServing is rebranded to KServe
- ModelMesh joins KServe
- Triton model serving runtime defaults to v21.09
- (Alpha feature) Raw kubernetes deployment support, Istio/Knative dependency is now optional
- Maturation of Version 2 (V2) Protocols in KFServing and Kubeflow Pipelines
- Increased use of metadata for pipeline orchestration and model performance monitoring
- Kubeflow Pipelines - introduce emissary executor (Alpha) which no longer depends on docker container runtime.
- KFServing User Interface with model details, logs, yaml
- Reduce redundant code in training operators and Kubeflow’s web apps
- Faster, better builds with more CI/CD - Katib, Training Operators, Notebooks
- Audit relevant docs pages and update for 1.4 features
- Ongoing improvement to user documentation on Kubeflow.org
The Kubeflow 1.3 roadmap includes many User Interface (UI) improvements and core Kubeflow component upgrades to improve installation, management, and authentication. It also includes support for the latest Istio versions.
The 1.3 release plan includes the following features:
User Interface (UI) & Working Group enhancements to improve user experience and simplify workflows & operations
- Completely new UIs for KFServing, Katib, Tensorboard & Volumes Manager
- Notebooks
- Important backend updates to Notebooks (i.e. to improve interop with Tensorboard)
- Addition of R-Studio and Code-Server (VS-Code) support
- Kubeflow Pipelines (KFP)
- UI reorganization for better User Experience
- Simplified view of dependency graphs
- Multi-user feature enhancements in Kubeflow Pipelines
- KFServing v0.5
- Multi-model Serving
- Ability to specify container fields on ML Framework spec such as env variable, liveness/readiness probes etc.
- Ability to specify pod template fields on component spec such as NodeAffinity etc.
- gRPC support Tensorflow Serving.
- Triton Inference server V2 inference REST/gRPC protocol support
- TorchServe predict integration
- PyTorch Captum explain integration
- SKLearn/XGBoost V2 inference REST/gRPC protocol support with MLServer
- PMMLServer support
- LightGBM support
- Allow specifying timeouts on component spec
- Simplified canary rollout, traffic split at knative revisions level instead of services level
- Transformer to predictor call is now made async
Core improvements to Kubeflow Installation, Management, Authentication, and Istio
- Support for latest Istio versions across Kubeflow applications:
- KFP, Profile-Controller and KFAM will support the new AuthorizationPolicy API
- Manifests refactor:
- Easy installation of Kubeflow applications and common services
- Easy creation of Kubeflow distributions
- Moving manifest development to upstream application repositories
- This allows separation of responsibilities between Application Owners and Distribution Owners.
- These will be sync'ed on a regular basis.
- This will result in a reduction of tech debt from old or duplicate manifests.
Kubeflow 1.2 provides valuable enhancements to HyperParameter Tuning, Pipelines, KFServing, Notebooks and the Training Operators, which improve Kubeflow operations and data scientist productivity.
1.2 includes the following features:
- Katib 0.10 with the new v1beta1 API
- Katib support for early stopping
- Katib support for custom CRD in the new Trial template
- Katib support to resume experiments
- Katib support for multiple ways to extract metrics
- KFServing support to add batcher module as sidecar
- KFServing for the Alibi explainer upgrade to 0.4.0
- KFServing for Triton inference server rename and integrations
- Pipelines support for a Tekton backend option.
- Kubeflow Pipelines 1.0.4, Changelog includes ~20 fixes and ~5 minor features.
- Notebooks support for Affinity/Toleration configs
- Update mxnet-operator manifest to v1
- Correct XGBoostJob CRD group name and add singular name
- Fix XGBoost Operator manifest issue
- Move Pytorch operator e2e tests to AWS Prow
- Support BytePS in MXNet Operator
- Fix error when conditions is empty in tf-operator
- Fix success Policy logic in MXNet Operator
For more details please see this post: https://blog.kubeflow.org/release/official/2020/11/18/kubeflow-1.2-blog-post.html
Kubeflow 1.1 will continue to enhance enterprise grade functionality for secure operations and upgrades. 1.1 will also simplify ML workflows to improve data scientist productivity.
The following features were delivered in Kubeflow 1.1:
-
Additional security use cases for GCP users (including support for private GKE & Anthos Service Mesh),design doc; #1705
-
A CVE scanning report and mitigation process, 4590
-
Improved workflow automation tools (fairing and kale) to simplify and mature the Core and EcoSystem supported CUJs
-
Establishment of Kubeflow Policy / Guidelines on how to implement authorization in web applications. Propose SubjectAccessReview in order to use K8s RBAC as the source of truth for Authz. 4899
-
Guidelines on how cluster admins can interact with Kubeflow's authorization. There are already some difficulties with the self-serve model, process of adding contributors to a namespace and the way KFAM is using magic annotations on RoleBindings (#4574 #4889 #4936 #4924 #4938). Document current workarounds. #4960
-
Decide when the CentralDashboard should show a namespace. Right now, this is done with KFAM in an error-prone way (magic annotations on RoleBindings). Design doc exploring different options (change KFAM to use SubjectAccessReview, use a model like GCP Console checking read permission on namespace, etc.) This is also related to item above. #4960
-
Ability to turn off the self-serve mode, as in many environments there are mechanisms other than the Kubeflow Dashboard that provision/share an environment for/with the user. (#4942)
-
Multi-User Authorization: Add support for K8s RBAC via SubjectAccessReview #3513
The 1.1 features are tracked in this Kanban board
Kubeflow 1.0 was released on March 2, 2020. The 1.0 release consists of the following key pieces
- A core set of applications targeting the critical user journey of build-train-deploy
- Scaffolding to securely deploy and manage multi-user Kubeflow environments on-prem and in the cloud.
- A process to graduate Kubeflow components to a stable version based upon an Application Requirements definition that has been defined and validated by the Community’s testing process.
The following applications graduated to stable versions in Kubeflow 1.0.
- kfctl for deployment and upgrades
- TFJob and PyTorch for distributed training
- Jupyter notebook controller and web app
- Profile controller and UI for multiuser management
The following applications are considered in a beta version in Kubeflow 1.0.
- Katib for hyper-parameter tuning
- fairing SDK to facilitate use of notebooks for build-train-deploy
- Kale which extends jupyter notebooks to create, run, and explore KF pipelines Metadata SDK, UI, and backend
- KFServing for model deployment and inference
- Pipelines
Here is a preliminary list of limitations and requirements that will be part of our 1.0 release
- ISTIO will be required as a service mesh and for AuthN and AuthZ support
- We will only support a single shared Kubeflow deployment per Kubernetes cluster
- Users can consume Kubeflow in their own, isolated namespace
- Upgrades will require downtime
Following Kubeflow's quarterly relese schedule Kubeflow 0.7 will be released in October 2019.
Notebooks
- Notebook infrastructure to beta quality in 0.7
- 1.0 in the following release
- The notebook infrastructure has 2 primary pieces; the jupyter notebook controller and web application for managing notebooks
- Kanban board
Deployment
- kfctl to beta quality in 0.7
- 1.0 in the following release
- v1beta1 for KfDef
- Create a more consistent and clean API for describing Kubeflow deployments
- Handle plugins and platforms consistently
- Clean up kfctl flags and semantics
- Upgradability between minor and major releases (issue)
- Kanban board
Monitoring
- Kubernetes' applications provide a unified view of the deployed Kubeflow applications and their state
Metadata
- Generic logger to auto log K8s resources to metadata store
- Kanban board
Pipelines
- Local execution of pipelines for easy development
- Enhanced metadata integration
Enterprise support
- Restrict kubeflow access to subset of namespaces (kubeflow/kubeflow#3623)
- Profile controller to beta
- Kubeflow identity management API and UI to beta
Onpremise support
- E2E CI for setup and multi-user support in on premise environments
- Kanban board
Data management
- Persistent volume management via a central dashboard UI
Engprod
- Remove ksonnet from E2E testing
- CI for applications in scope for 1.0 (notebooks, job operators, central dashboard etc..)
- Kanban board
Support for hardware accelerated training and inference
- Automatic injection of device configuration needed to use hardware accelerators for training and inference
0.6 was released in mid-july; blog post
Deployment
- kustomize replaced ksonnet for application configuration
- kfctl can now take a single YAML file specifying a KFDef describing Kubeflow (example)
- ISTIO used as the service mesh
Metadata
- Introduced support for artifact tracking via a UI and python SDK
- Predefined schemas for models, datasets, and evaluation metrics
Multi-user support
- ISTIO and RBAC for AuthZ
- Dex provides a reference implementation for AuthN
- Integration with GCP and AWS AuthN services
- Profile resource to manage per-team namespaces
- Kubeflow identity management API to manage multiple users/teams
Training Operators
- TFJob and PyTorch to 1.0
Pipelines
- UI improvements:
- 10x Perf improvement (e.g. load list of runs)
- API performance optimization
- streamlined run creation
- stackdriver integration for logs (better debugging)
- better visualization of metadata outputs
- Tensorboard CRD
- versioning for data volumes
The roadmap provides a high level overview of key areas that will likely span multiple releases.
The roadmap provides links to critical user journeys(CUJs) that we want to deliver. A CUJ shows how a user would accomplish some critical task (for example build, train, and deploy a model).
Kubeflow does a major release at the end of every quarter. Minor releases occur as needed to fix important bugs.
For detailed information about what will be in a release look for the issues tagged "area/X.Y.Z".
If you are a member of the Kubeflow org you can use these search queries
- Issues for 0.4.0