The Hybrid Cloud Operator for PostgreSQL is a Kubernetes Operator that has been designed for hybrid cloud, multi-teams kubernetes platforms to allow teams to deploy and manage their own databases via kubernetes without cloud provider specific provisioning.
In classical cloud environments things like databases would typically be managed by a central platform team via infrastructure automation like terraform. But this means when different teams are active on such a platform there exists a bottleneck because that central platform team must handle all requests for databases. With this operator teams in kubernetes gain the potential to manage databases on their own. And because the operator integrates into the kubernetes API the teams have the same unified interface/API for all their deployments: Kubernetes YAMLs.
Additionally the operator also provides a consistent interface regardless of the environment (cloud provider, on-premise) the kubernetes cluster runs in. This means in usecases where teams have to deploy to clusters running in different environments they still get the same interface on all clusters and do not have to concern themselves with any differences.
Main features:
- Provides Kubernetes Custom resources for deploying and managing PostgreSQL servers and databases
- Abstracted, unified API regardless of target environment (cloud, on-premise)
- Currently supported backends:
- Azure Database for PostgreSQL single server
- Azure Database for PostgreSQL flexible server
- AWS RDS PostgreSQL
- AWS RDS Aurora
- bitnami helm chart (prototype)
- Yugabyte helm chart deployment (prototype, due to limitations in the chart only one cluster per namespace is possible)
To test out the operator you do not need Azure or AWS, you just need a kubernetes cluster (you can for example create a local one with k3d) and cluster-admin rights on it.
- Run
helm repo add maibornwolff https://maibornwolff.github.io/hybrid-cloud-postgresql-operator/
to prepare the helm repository. - Run
helm install hybrid-cloud-postgresql-operator-crds maibornwolff/hybrid-cloud-postgresql-operator-crds
andhelm install hybrid-cloud-postgresql-operator maibornwolff/hybrid-cloud-postgresql-operator
to install the operator. - Check if the pod of the operator is running and healthy:
kubectl get pods -l app.kubernetes.io/name=hybrid-cloud-postgresql-operator
. - Create your first server:
kubectl apply -f examples/simple.yaml
. - Check if the postgresql instance is deployed:
kubectl get pods -l app.kubernetes.io/instance=demoteam-postgresql
. - Retrieve the credentials for the database:
kubectl get secret demoservice-postgres-credentials -o jsonpath="{.data.password}" | base64 -d
- After you are finished, delete the server:
kubectl delete -f examples/simple.yaml
Note: You have to manually clean up any remaining PVCs in kubernetes as these are not automatically deleted to avoid accidental data loss.
Important
There is currently a bug in the kopf library used by this operator that leads to crashes on AKS clusters with Kubernetes versions >= 1.30 due to expiring tokens.
Add tokenExpirationWorkaround: true
to the values to activate a workaround.
To achieve its hybrid-cloud feature the operator abstracts between the generic API (Custom resources PostgreSQLServer
, PostgreSQLDatabase
) and the concrete implementation for a specific cloud service. The concrete implementations are called backends. You can configure which backends should be active in the configuration. If you have several backends active the user can also select one (for examaple choose between Azure Database for Postgres single server and flexible server).
The operator can be configured using a yaml-based config file. This is the complete configuration file with all options. Please refer to the comments in each line for explanations:
handler_on_resume: false # If set to true the operator will reconcile every available resource on restart even if there were no changes
backend: helmbitnami # Default backend to use, required
allowed_backends: [] # List of backends the users can select from. If list is empty the default backend is always used regardless of if the user selects a backend
backends: # Configuration for the different backends. Required fields are only required if the backend is used
azure: # General azure configuration. Every option from here can be repeated in the specific azure backends. The operator first tries to find the option in the specific backend config and falls back to the general config if not found
subscription_id: 1-2-3-4-5 # Azure Subscription id to provision database in, required
location: westeurope # Location to provision database in, required
name_pattern: "foobar-{namespace}-{name}" # Pattern to use for naming databases in azure. Variables {namespace} and {name} can be used and will be replaced by metadata.namespace and metadata.name of the custom object
resource_group: foobar-rg # Resource group to provision database in, required
virtual_network: null # Name of the virtual network to connect the database to, optional, only needed if public_access is disabled
subnet: null # Name of the subnet in the virtual_network to connect the database to, optional, only needed if public_access is disabled
cpu_limit: 64 # Upper limit for the number of CPUs the user can request, optional
storage_limit_gb: 512 # Upper limit for storage the user can request, optional
server_delete_fake: false # If enabled on delete the server will not actually be deleted but only be tagged, optional
database_delete_fake: false # If enabled on delete the database will not actually be deleted, optional
lock_from_deletion: false # If enabled an azure lock will be set on the server object, requires owner permissions for the operator, optional
admin_username: postgres # Username to use as admin user, optional
tags: {} # Extra tags to add to the server object in azure, {namespace} and {name} can be used as variables, optional
network:
public_access: true # If enabled database server will be reachable from outside the virtual_network, optional
allow_azure_services: true # If enabled a firewall rule will be added so that azure services can access the database server, optional
parameters: # Defaults to use when the user does not provide values of their own
geo_redundant_backup: false # If enabled geo redundant backups will be enabled, optional
backup_retention_days: 7 # Number of days the backups should be retained, optional
network:
firewall_rules: # List of firewall rules to add to each server, optional
- name: foobar # Name of the rule, required
startIp: 1.2.3.4 # Start IP address, required
endIp: 1.2.3.4 # End IP address, required
azurepostgres:
classes: # List of instance classes the user can select from, optional
dev: # Name of the class
name: GP_Gen5_2 # Name of the SKU in Azure, required
tier: GeneralPurpose # Tier of the SKU in Azure, required
family: Gen5 # Family of the SKU in Azure, required
capacity: 2 # Capacity (CPU Cores) of the SKU in Azure, required
default_class: dev # Name of the class to use as default if the user-provided one is invalid or not available, required if classes should be usable
network:
create_private_endpoint: false # If enabled a private link + private endpoint will be created for the server, virtual_network and subnet must be supplied in this case, optional
vnets: # List of vnets the database should allow access from, optional
- vnet: foobar-vnet # Name of the virtual network, required
subnet: default # Name of the subnet, required
parameters: # Defaults to use when the user does not provide values of their own
infrastructure_encryption: Disabled # Should infrastructure encryption be enabled for the database, optional
azurepostgresflexible:
classes: # List of instance classes the user can select from, optional
dev: # Name of the class, required
name: Standard_B1ms # Name of the SKU in Azure, required
tier: Burstable # Tier of the SKU in Azure, required
small:
name: Standard_D2ds_v4
tier: GeneralPurpose
default_class: dev # Name of the class to use as default if the user-provided one is invalid or not available, required if classes should be usable
availability_zone: "1" # Availability zone to use for the database, required
standby_availability_zone: "2" # Standby availability zone to use for the database if the user enables high-avalability, optional
dns_zone: # Settings for the private dns zone to use for vnet integration. If the private dns zone is in the same resource group as the server, the fields "name" and resource_group can be omitted and the name can be placed here, optional
name: privatelink.postgres.database.azure.com # Name of the private dns zone, optional
resource_group: foobar-rg # Resource group the private dns zone is part of, if omitted it defaults to the resource group the server resource group, optional
aws: # This is a virtual backend that can be used to configure both awsrds and awsaurora. Fields defined here can also be defined directly in the other backends
region: eu-central-1 # AWS region to use, required
vpc_security_group_ids: [] # List of VPC security group IDs to assign to DB cluster instances, required
subnet_group: # The name of a DB subnet group to place instances in, required
deletion_protection: false # Configure deletion protection for instances, will prevent instances being deleted by the operator, optional
network:
public_access: false # Allow public access from outside the VPC for the instance (security groups still need to be configured), optional
admin_username: postgres # Username to use as admin user, optional
name_pattern: "{namespace}-{name}" # Pattern to use for naming instances in AWS. Variables {namespace} and {name} can be used and will be replaced by metadata.namespace and metadata.name of the custom object
tags: {} # Extra tags to add to the server object in AWS, {namespace} and {name} can be used as variables, optional
awsrds:
availability_zone: eu-central-1a # Availability zone to place DB instances in, required
default_class: small # Name of the class to use as default if the user-provided one is invalid or not available, required
classes: # List of instance classes the user can select from, required
small: # Name of the class
instance_type: db.m5.large # EC2 Instance type to use, required
storage_type: gp2 # Storage type for the DB instance, currently gp2, gp3 or io1, optional
iops: 0 # Only needed when storage_type == gp3 or io1, number of IOPS to provision for the storage, optional
awsaurora:
availability_zones: [] # List of availability zones to place DB instances in, optional
default_class: small # Name of the class to use as default if the user-provided one is invalid or not available, required
classes: # List of instance classes the user can select from, required
small: # Name of the class
instance_type: db.serverless # EC2 Instance type to use, use db.serverless for an Aurora v2 serverless cluster, required
scaling_configuration: # Needs to be configured for serverless cluster only, optional
min_capacity: 0.5 # Minimal number of capacity units, required
max_capacity: 1 # Maximum number of capacity units, required
storage_type: aurora # Storage type for the DB instance, currently aurora and aurora-iopt1 are allowed, optional
iops: 0 # Only needed when storage_type == aurora-iopt1, number of IOPS to provision for the storage, optional
helmbitnami:
default_class: small # Name of the class to use as default if the user-provided one is invalid or not available, required if classes should be usable
classes: # List of instance classes the user can select from, optional
small: # Name of the class
cpu: "1000m" # CPU requests/limits for the pod, required
memory: "256Mi" # Memory requests/limits for the pod, required
admin_username: postgres # Username to use as admin user, optional
storage_class: "" # Storage class to use for the pods, optional
pvc_cleanup: false # If set to true the operator will when deleting a server also delete the persistent volumes, optional
helmyugabyte:
default_class: small # Name of the class to use as default if the user-provided one is invalid or not available, required if classes should be usable
classes: # List of instance classes the user can select from, optional
small: # Name of the class
master:
cpu: "1000m" # CPU requests/limits for the master pods, required
memory: "256Mi" # Memory requests/limits for the master pod, required
tserver:
cpu: "1000m" # CPU requests/limits for the tserver pods, required
memory: "256Mi" # Memory requests/limits for the tserver pod, required
replicas_master: 1 # Number of replicas for the master nodes, set to 3 to get a HA cluster, optional
replicas_tserver: 1 # Number of replicas for the tserver nodes, set to 3 to get a HA cluster, optional
partitions_master: 1 # Number of partitions on the master nodes, optional
partitions_tserver: 1 # Number of partitions on the tserver nodes, optional
storage_class: "" # Storage class to use for the pods, optional
pvc_cleanup: false # If set to true the operator will when deleting a server also delete the persistent volumes, optional
security: # Security-related settings independent of any backends, optional
password_length: 16 # Number of characters to use for passwords that are generated for servers and databases, optional
special_characters: true # Allows to enable/disable the usage of special characters (+-_.:<>?) in the passwords. Defaults to true, optional
Single configuration options can also be provided via environment variables, the complete path is concatenated using underscores, written in uppercase and prefixed with HYBRIDCLOUD_
. As an example: backends.azure.subscription_id
becomes HYBRIDCLOUD_BACKENDS_AZURE_SUBSCRIPTION_ID
.
The azure
backend is a virtual backend that allows you to specify options that are the same for both azurepostgres
and azurepostgresflexible
. As such each option under backends.azure
in the above configuration can be repeated in the backends.azurepostgres
and backends.azurepostgresflexible
sections. Note that currrently the operator cannot handle using different subscriptions for the backends.
To make it easier for the users to specify database sizes you can prepare a list of recommendations, called classes, the users can choose from. The fields of the classes are backend-dependent. Using this mechanism you can give the users classes like small
, production
, production-ha
and size them appropriately for each backend. If the user specifies size using CPU and memory the backend will pick an appropriate match.
To protect database servers against accidential deletion you can enable lock_from_deletion
in the azure backends. When enabled the operator will create a delete lock on the server resource in Azure. Note that the operator will not remove that lock when the server object in kubernetes is deleted, you have to do that yourself via either the Azure CLI or the Azure Portal so the operator can delete the server. If that is not done the kubernetes object cannot be deleted and any calls ala kubectl delete
will hang until the lock is manually removed.
The azure backends also support a feature called fake deletion
(via options server_delete_fake
and database_delete_fake
) where the database or server are not actually deleted when the kubernetes custom object is deleted. This can be used in situations where the operator is freshly introduced in an environment where the users have little experience with this type of declarative management and you want to reduce the risk of accidental data loss.
The azure backends support deploying the server in a way that it is only reachable from inside an azure virtual network. For the single server this is done using a private endpoint, for the flexible server via the vnet integration. To enable the feature set network.public_access
to false for the backend in the config. For azurepostgres
you also need to enable network.create_private_endpoint
. For azurepostgresflexible
you can't change the option after a server is created (see Azure Docs). Additionally you need to prepare your Azure resource group:
For the single server:
- You need an existing virtual network with a subnet
- Create a private dns zone with the name
postgres.database.azure.com
in your resource group - Link the dns zone to your virtual network
For the flexible server (also see the Azure Docs):
- You need an existing virtual network with a subnet
- For the subnet enable the delegation to
Microsoft.DBforPostgreSQL/flexibleServers
- You need a private dns zone with a name that ends on
.postgres.database.azure.com
, which can also be part of an other resource group (e.g.mydatabases.postgres.database.azure.com
) - Link the dns zone to the virtual network the server is part of
- In the operator config for the backend fill out the fields
virtual_network
,subnet
anddns_zone
For the operator to interact with Azure it needs credentials. For local testing it can pick up the token from the azure cli but for real deployments it needs a dedicated service principal. Supply the credentials for the service principal using the environment variables AZURE_SUBSCRIPTION_ID
, AZURE_TENANT_ID
, AZURE_CLIENT_ID
and AZURE_CLIENT_SECRET
(if you deploy via the helm chart use the use envSecret
value). Depending on the backend the operator requires the following azure permissions within the scope of the resource group it deploys to:
Microsoft.DBforPostgreSQL/*
Microsoft.Network/*
Microsoft.Authorization/locks/*
, optional, if you want the operator to set delete locks
Unfortunately there is no built-in azure role for the Database for PostgreSQL service, if you do not want to create a custom role you can also assign the operator the Contributor or Owner (if lock handling is required) roles, but beware this is a potential attack surface as someone compromising the operator can access your entire Azure account.
The awsrds
backend supports single-instance RDS Postgresql deployments. The awsaurora
backend supports single-instance aurora clusters (the operator currently only creates a primary writer instance and no read replicas). The aws
backend is a virtual backend that allows you to specify options that are the same for both awsrds
and awsaurora
.
Both AWS backends have some prerequisites:
- An existing VPC
- An existing VPC Security group (to be applied to the DB cluster) to control access to the RDS instances (the firewall options currently have no effect)
- An existing DB subnet group
- Some defined size classes (in the operator configuration) as specifying a size using CPU and memory is currently not implemented for AWS
For the operator to interact with AWS it needs credentials. For local testing it can pick up the credentials from a ~/.aws/credentials
file. For real deployments you need an IAM user. The IAM user needs full RDS permissions (the easiest way is to attach the AmazonRDSFullAccess
policy to the user). Supply the credentials for the user using the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
(if you deploy via the helm chart use the use envSecret
value). The operator can also pick up credentials using IAM instance roles if they are configured.
The AWS backends currently have some limitations:
- No support for managing firewalls / IP whitelists (must be done via preprovided VPC security groups)
- No support for HA or Multi-AZ clusters
- No support for custom parameter or option groups
- No support for storage autoscaling or configuring storage throughput (for gp3)
- No support for Extended monitoring / performance insights
- No support for Aurora serverless v1
To get started with AWS you can use the following minimal operator config:
handler_on_resume: false
backend: awsrds
allowed_backends:
- awsrds
backends:
awsrds:
name_pattern: "{namespace}-{name}"
region: eu-central-1
availability_zone: "eu-central-1c"
subnet_group: "<db-subnet-group>" # You must create it
vpc_security_group_ids: ["<security-group-id>"] # You must create it
network:
public_access: true
classes:
small:
instance_type: db.m5.large
default_class: small
The operator can be deployed via helm chart:
- Run
helm repo add maibornwolff https://maibornwolff.github.io/hybrid-cloud-postgresql-operator/
to prepare the helm repository. - Run
helm install hybrid-cloud-postgresql-operator-crds maibornwolff/hybrid-cloud-postgresql-operator-crds
to install the CRDs for the operator. - Run
helm install hybrid-cloud-postgresql-operator maibornwolff/hybrid-cloud-postgresql-operator -f values.yaml
to install the operator.
Configuration of the operator is done via helm values. For a full list of the available values see the values.yaml in the chart. These are the important ones:
operatorConfig
: overwrite this with your specific operator configenvSecret
: Name of a secret with sensitive credentials (e.g. Azure service principal credentials)serviceAccount.create
: Either set this to true or create the serviceaccount with appropriate permissions yourself and setserviceAccount.name
to its name
The operator is completely controlled via Kubernetes custom resources (PostgreSQLServer
and PostgreSQLDatabase
). Once a server object is created the operator will utilize one of its backends to provision an actual database server. For each server one or more databases can be created by creating PostgreSQLDatabase
objects that reference that server.
The PostgreSQLServer
resource has the following options:
apiVersion: hybridcloud.maibornwolff.de/v1alpha1
kind: PostgreSQLServer
metadata:
name: teamfoo # Name of the database server, based on this a name in the backend will be generated
namespace: default # Kubernetes namespace
spec:
backend: azurepostgres # Name of the backend to use, optional, should be left empty unless provided by the admin
version: latest # Version to use, can be a number like 11, 12, 13. If empty or `latest` the newest available version for that backend is used. If specified version is not available in backend default is used, optional
size:
class: dev # Resource class to use, available classes are specified by the operator admin. if this is specified cpu and memoryMB are ignored. Use only if told to by admin.
cpu: 1 # Number of CPU cores to use, optional
memoryMB: 512 # Memory to use in MB, optional
storageGB: 32 # Size of the storage for the database in GB, required
storageAutoGrow: false # If the backend supports it automatic growing of the storage can be enabled, optional
backup: # If the backend supports automatic backup it can be configured here, optional
retentionDays: 7 # Number of days backups should be retained. Min and max are dependent on the backend (for azure 7-35 days, for AWS 0 disables backups), optional
geoRedundant: false # If the backend supports it the backups can be stored geo-redundant in more than one region, optional
extensions: [] # List of postgres extensions to install in the database. List is dependent on the backend (e.g. azure supports timescaledb). Currently only supported with azure backends. optional.
network: # Network related features, optional
firewallRules: # If the backend supports it a list of firewall rules to configure access from outside the cluster
- name: foobar # Name of the rule
startIp: 1.2.3.4 # Start IP
endIp: 1.2.3.4 # End IP
serverParameters: {} # Map of server parameters, optional
maintenance:
window: # If the backend supports configuring a maintenance window it can be done here, optional
weekday: Wed # Weekday of the maintenance window. Must be provided as 3-letter english weekday name (Mon, Tue, Wed, Thu, Fri, Sat, Sun), required
starttime: 03:00 # Start time as hour:minute, required
highavailability:
enabled: false # If the backend supports it high availability (via several instances) can be enabled here, optional
credentialsSecret: teamfoo-postgres-credentials # Name of a secret where the credentials for the database server should be stored
For each server one or more databases can be created with the PostgreSQLDatabase
resource which has the following options:
apiVersion: hybridcloud.maibornwolff.de/v1alpha1
kind: PostgreSQLDatabase
metadata:
name: fooservice # Name of the database, will be used as name of the database in postgres (with dashes replaced with underscores)
namespace: default # Kubernetes namespace, must be in the same namespace as the server object
spec:
serverRef:
name: teamfoo # Name of the server object of type `PostgreSQLServer`. Must be in the same namespace. Required
database:
charset: UTF8 # charset to use for the database, default depends on the backend, optional
collation: "de-DE" # Collation to use for the database, default depends on the backend, optional
extensions: [] # List of extensions to activate in the database (via CREATE EXTENSION), only extensions provisioned for the server (via spec.extensions) can be activated here
credentialsSecret: fooservice-postgres-credentials # Name of a secret where the credentials for the database should be stored
It is recommended not to use the system database (postgres
) for anything but instead create a separate database for each service/application.
A service/application that wants to access the database should depend on the credentials secret and use its values for the connection. That way it is independent of the actual backend. Provided keys in the secret are: hostname
, port
, dbname
, username
, password
, sslmode
and should be directly usable with any postgresql-compatible client library.
The operator has support for resetting the password of a server or database (for example if the passwords has been compromised or your organization requires regular password changes). To initiate a reset just add a label operator/action: reset-password
to the custom resource (for example with kubectl label postgresqldatabase mydatabase operator/action=reset-password
). The operator will pick it up, generate a new password, set it for the server/database and then update the credentials secret. It will then remove the label to signal completion. Note that you are responsible for restarting any affected services that use the password.
The operator is implemented in Python using the Kopf (docs) framework.
To run it locally follow these steps:
- Create and activate a local python virtualenv
- Install dependencies:
pip install -r requirements.txt
- Setup a local kubernetes cluster, e.g. with k3d:
k3d cluster create
- Apply the CRDs in your local cluster:
kubectl apply -f helm/hybrid-cloud-postgresql-operator-crds/templates/
- If you want to deploy to the cloud:
- For Azure: Either have the azure cli installed and configured with an active login or export the following environment variables:
AZURE_TENANT_ID
,AZURE_CLIENT_ID
,AZURE_CLIENT_SECRET
- For AWS: Either have a local
~/.aws/credentials
or export the following environment variables:AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- For Azure: Either have the azure cli installed and configured with an active login or export the following environment variables:
- Adapt the
config.yaml
to suit your needs - Run
kopf run main.py -A
- In another window apply some objects to the cluster to trigger the operator (see the
examples
folder)
The code is structured in the following packages:
handlers
: Implements the operator interface for the provided custom resources, reacts to create/update/delete events in handler functionsbackends
: Backends for the different environments (currently Azure + on-premise with helm)util
: Helper and utility functions
To locally test the helm backends the operator needs a way to communicate with pods running in the cluster. You can use sshuttle and kuttle for that. Run:
kubectl run kuttle --image=python:3.10-alpine --restart=Never -- sh -c 'exec tail -f /dev/null'
sshuttle --dns -r kuttle -e kuttle <internal-ip-range-of-your-cluster>
- Kopf marks every object it manages with a finalizer, that means that if the operator is down or doesn't work a
kubectl delete
will hang. To work around that edit the object in question (kubectl edit <type> <name>
) and remove the finalizer from the metadata. After that you can normally delete the object. Note that in this case the operator will not take care of cleaning up any azure resources. - If the operator encounters an exception while processing an event in a handler, the handler will be retried after a short back-off time. During the development you can then stop the operator, make changes to the code and start the operator again. Kopf will pick up again and rerun the failed handler.
- When a handler was successful, but you still want to rerun it you need to fake a change in the object being handled. The easiest is adding or changing a label.