Failover plugin affecting health checks when using consistent hashing with Maglev #10457

sadieleob · 2025-01-15T18:24:50Z

K8sGateway Version

1.17.3

Kubernetes Version

1.30.6

Describe the bug

The user is trying the loadBalancingWeight for failover hosts. Currently, they are using loadBalancingWeight only for primary hosts.

While trying that out they noticed that when localityWeightedLbConfig: {} and maglev: {} both are set together under spec.loadBalancerConfig in Upstream, we consistently get 503 errors when we invoke an API endpoint, even though all the hosts -- primary and failover ones are correct and are healthy. When we look at the respCodeDetails, we see it as no_healthy_upstream. This error goes away if we on purpose make the primary host invalid OR keep only either localityWeightedLbConfig or maglev.

It seems that the failover plugin is affecting the health checks.

@andy-fong and I were troubleshooting this on Slack and here is the latest we found:

service-blue.default.svc.cluster.local doesn't have priority set, so it's default to 0. The other 2 service-green.default.svc.cluster.local and service-yellow.default.svc.cluster.local are set to priority 1 and Andy sees this:

[2024-12-13 23:57:35.341][1][trace][upstream] [external/envoy/source/common/upstream/load_balancer_impl.cc:247] recalculated priority state: priority level 0, healthy weight 0, total weight 1, overprovision factor 140, healthy result 0, degraded result 0
[2024-12-13 23:57:35.341][1][trace][upstream] [external/envoy/source/common/upstream/load_balancer_impl.cc:247] recalculated priority state: priority level 1, healthy weight 1, total weight 3, overprovision factor 140, healthy result 46, degraded result 0

priority 0 has not healthy weight but then it should fall down to level 1 which seems to have 1 health weight. But what's strange is that I am seeing quite a few of these even when the healthcheck gets a 200 response, it still marks it failed:

[2024-12-13 23:57:35.341][1][debug][connection] [external/envoy/source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"131"] current connecting state: true
[2024-12-13 23:57:35.341][1][debug][client] [external/envoy/source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"131"] connecting
[2024-12-13 23:57:35.341][1][debug][connection] [external/envoy/source/common/network/connection_impl.cc:1021] [Tags: "ConnectionId":"131"] connecting to 10.96.195.85:10000
[2024-12-13 23:57:35.341][1][debug][connection] [external/envoy/source/common/network/connection_impl.cc:1040] [Tags: "ConnectionId":"131"] connection in progress
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:534] [Tags: "ConnectionId":"131"] writing 88 bytes, end_stream false
[2024-12-13 23:57:35.341][1][debug][client] [external/envoy/source/common/http/codec_client.cc:142] [Tags: "ConnectionId":"131"] encode complete
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:619] [Tags: "ConnectionId":"131"] socket event: 2
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:742] [Tags: "ConnectionId":"131"] write ready
[2024-12-13 23:57:35.341][1][debug][connection] [external/envoy/source/common/network/connection_impl.cc:751] [Tags: "ConnectionId":"131"] connected
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:474] [Tags: "ConnectionId":"131"] raising connection event 2
[2024-12-13 23:57:35.341][1][debug][client] [external/envoy/source/common/http/codec_client.cc:88] [Tags: "ConnectionId":"131"] connected
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:742] [Tags: "ConnectionId":"131"] write ready
[2024-12-13 23:57:35.341][1][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:70] [Tags: "ConnectionId":"131"] write returns: 88
[2024-12-13 23:57:35.435][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:619] [Tags: "ConnectionId":"131"] socket event: 3
[2024-12-13 23:57:35.435][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:742] [Tags: "ConnectionId":"131"] write ready
[2024-12-13 23:57:35.435][1][trace][connection] [external/envoy/source/common/network/connection_impl.cc:659] [Tags: "ConnectionId":"131"] read ready. dispatch_buffered_data=0
[2024-12-13 23:57:35.435][1][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:25] [Tags: "ConnectionId":"131"] read returns: 273
[2024-12-13 23:57:35.435][1][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:39] [Tags: "ConnectionId":"131"] read error: Resource temporarily unavailable, code: 0
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:646] [Tags: "ConnectionId":"131"] parsing 273 bytes
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:588] [Tags: "ConnectionId":"131"] message begin
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=x-app-name value=http-echo
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=x-app-version value=0.2.3
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=date value=Fri, 13 Dec 2024 23:57:35 GMT
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=content-length value=13
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=content-type value=text/plain; charset=utf-8
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=x-envoy-upstream-service-time value=0
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=x-envoy-upstream-healthchecked-cluster value=ingress
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:839] [Tags: "ConnectionId":"131"] onHeadersCompleteImpl
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:545] [Tags: "ConnectionId":"131"] completed header: key=server value=envoy
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:1483] [Tags: "ConnectionId":"131"] status_code 200
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:1493] [Tags: "ConnectionId":"131"] Client: onHeadersComplete size=8
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:943] [Tags: "ConnectionId":"131"] message complete
[2024-12-13 23:57:35.435][1][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:1565] [Tags: "ConnectionId":"131"] message complete
[2024-12-13 23:57:35.435][1][debug][client] [external/envoy/source/common/http/codec_client.cc:129] [Tags: "ConnectionId":"131"] response complete
[2024-12-13 23:57:35.435][1][debug][hc] [external/envoy/source/extensions/health_checkers/http/health_checker_impl.cc:325] [Tags: "ConnectionId":"131"] hc response_code=200 health_flags=/failed_active_hc

A good health check looks exactly the same:

[2024-12-13 23:55:29.836][1][debug][connection] [Tags: "ConnectionId":"46"] current connecting state: true
[2024-12-13 23:55:29.836][1][debug][client] [Tags: "ConnectionId":"46"] connecting
[2024-12-13 23:55:29.836][1][debug][connection] [Tags: "ConnectionId":"46"] connecting to 10.96.243.74:10000
[2024-12-13 23:55:29.836][1][debug][connection] [Tags: "ConnectionId":"46"] connection in progress
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] writing 86 bytes, end_stream false
[2024-12-13 23:55:29.837][1][debug][client] [Tags: "ConnectionId":"46"] encode complete
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] socket event: 2
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] write ready
[2024-12-13 23:55:29.837][1][debug][connection] [Tags: "ConnectionId":"46"] connected
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] raising connection event 2
[2024-12-13 23:55:29.837][1][debug][client] [Tags: "ConnectionId":"46"] connected
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] write ready
[2024-12-13 23:55:29.837][1][trace][connection] [Tags: "ConnectionId":"46"] write returns: 86
[2024-12-13 23:55:29.838][1][trace][connection] [Tags: "ConnectionId":"46"] socket event: 3
[2024-12-13 23:55:29.838][1][trace][connection] [Tags: "ConnectionId":"46"] write ready
[2024-12-13 23:55:29.838][1][trace][connection] [Tags: "ConnectionId":"46"] read ready. dispatch_buffered_data=0
[2024-12-13 23:55:29.838][1][trace][connection] [Tags: "ConnectionId":"46"] read returns: 271
[2024-12-13 23:55:29.838][1][trace][connection] [Tags: "ConnectionId":"46"] read error: Resource temporarily unavailable, code: 0
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] parsing 271 bytes
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] message begin
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=x-app-name value=http-echo
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=x-app-version value=0.2.3
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=date value=Fri, 13 Dec 2024 23:55:29 GMT
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=content-length value=11
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=content-type value=text/plain; charset=utf-8
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=x-envoy-upstream-service-time value=0
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=x-envoy-upstream-healthchecked-cluster value=ingress
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] onHeadersCompleteImpl
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] completed header: key=server value=envoy
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] status_code 200
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] Client: onHeadersComplete size=8
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] message complete
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] message complete
[2024-12-13 23:55:29.838][1][debug][client] [Tags: "ConnectionId":"46"] response complete
[2024-12-13 23:55:29.838][1][debug][hc] [Tags: "ConnectionId":"46"] hc response_code=200 health_flags=healthy
[2024-12-13 23:55:29.838][1][trace][http] [Tags: "ConnectionId":"46"] parsed 271 bytes
[2024-12-13 23:56:16.344][1][debug][connection] [Tags: "ConnectionId":"46"] closing data_to_write=0 type=3
[2024-12-13 23:56:16.344][1][debug][connection] [Tags: "ConnectionId":"46"] closing socket: 1
[2024-12-13 23:56:16.344][1][trace][connection] [Tags: "ConnectionId":"46"] raising connection event 1
[2024-12-13 23:56:16.344][1][debug][client] [Tags: "ConnectionId":"46"] disconnect. resetting 0 pending requests

Expected Behavior

When using the Maglev consistent hashing method the failover plugin should not alter health checks of healthy upstream hosts.

Steps to reproduce the bug

Deploy a k8s cluster 1.30.6 with kind with Gloo Edge 1.17.3
Deploy 3 svc:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
 annotations:
   service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
   service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   service.beta.kubernetes.io/aws-load-balancer-type: external
 labels:
   app: bluegreen
   text: blue
 name: service-blue
 namespace: default
spec:
 ports:
 - name: color
   port: 10000
   protocol: TCP
   targetPort: 10000
 selector:
   app: bluegreen
   text: blue
 sessionAffinity: None
 type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: bluegreen
   text: blue
 name: echo-blue
 namespace: default
spec:
 progressDeadlineSeconds: 600
 replicas: 1
 revisionHistoryLimit: 10
 selector:
   matchLabels:
     app: bluegreen
     text: blue
 strategy:
   rollingUpdate:
     maxSurge: 25%
     maxUnavailable: 25%
   type: RollingUpdate
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: bluegreen
       text: blue
   spec:
     containers:
     - args:
       - -text="blue-pod"
       image: hashicorp/http-echo@sha256:ba27d460cd1f22a1a4331bdf74f4fccbc025552357e8a3249c40ae216275de96
       imagePullPolicy: IfNotPresent
       name: echo
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     - args:
       - --config-yaml
       - |2

         node:
          cluster: ingress
          id: "ingress~for-testing"
          metadata:
           role: "default~proxy"
         static_resources:
           listeners:
           - name: listener_0
             address:
               socket_address: { address: 0.0.0.0, port_value: 10000 }
             filter_chains:
             - filters:
               - name: envoy.filters.network.http_connection_manager
                 typed_config:
                   "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                   stat_prefix: ingress_http
                   codec_type: AUTO
                   route_config:
                     name: local_route
                     virtual_hosts:
                     - name: local_service
                       domains: ["*"]
                       routes:
                       - match: { prefix: "/" }
                         route: { cluster: some_service }
                   http_filters:
                   - name: envoy.filters.http.health_check
                     typed_config:
                       "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                       pass_through_mode: true
                   - name: envoy.filters.http.router
           clusters:
           - name: some_service
             connect_timeout: 0.25s
             type: STATIC
             lb_policy: ROUND_ROBIN
             load_assignment:
               cluster_name: some_service
               endpoints:
               - lb_endpoints:
                 - endpoint:
                     address:
                       socket_address:
                         address: 0.0.0.0
                         port_value: 5678
         admin:
           access_log_path: /dev/null
           address:
             socket_address:
               address: 0.0.0.0
               port_value: 19000
       - --disable-hot-restart
       - --log-level
       - debug
       - --concurrency
       - "1"
       - --file-flush-interval-msec
       - "10"
       image: envoyproxy/envoy:v1.14.2
       imagePullPolicy: IfNotPresent
       name: envoy
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     dnsPolicy: ClusterFirst
     restartPolicy: Always
     schedulerName: default-scheduler
     securityContext: {}
     terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
 annotations:
   service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
   service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   service.beta.kubernetes.io/aws-load-balancer-type: external 
 labels:
   app: bluegreen
 name: service-green
 namespace: default
spec:
 ports:
 - name: color
   port: 10000
   protocol: TCP
   targetPort: 10000
 selector:
   app: bluegreen
   text: green
 sessionAffinity: None
 type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: bluegreen
   text: green
 name: echo-green
 namespace: default
spec:
 progressDeadlineSeconds: 600
 replicas: 1
 revisionHistoryLimit: 10
 selector:
   matchLabels:
     app: bluegreen
     text: green
 strategy:
   rollingUpdate:
     maxSurge: 25%
     maxUnavailable: 25%
   type: RollingUpdate
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: bluegreen
       text: green
   spec:
     containers:
     - args:
       - -text="green-pod"
       image: hashicorp/http-echo@sha256:ba27d460cd1f22a1a4331bdf74f4fccbc025552357e8a3249c40ae216275de96
       imagePullPolicy: IfNotPresent
       name: echo
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     - args:
       - --config-yaml
       - |2
         node:
          cluster: ingress
          id: "ingress~for-testing"
          metadata:
           role: "default~proxy"
         static_resources:
           listeners:
           - name: listener_0
             address:
               socket_address: { address: 0.0.0.0, port_value: 10000 }
             filter_chains:
             - filters:
               - name: envoy.filters.network.http_connection_manager
                 typed_config:
                   "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                   stat_prefix: ingress_http
                   codec_type: AUTO
                   route_config:
                     name: local_route
                     virtual_hosts:
                     - name: local_service
                       domains: ["*"]
                       routes:
                       - match: { prefix: "/" }
                         route: { cluster: some_service }
                   http_filters:
                   - name: envoy.filters.http.health_check
                     typed_config:
                       "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                       pass_through_mode: true
                   - name: envoy.filters.http.router
           clusters:
           - name: some_service
             connect_timeout: 0.25s
             type: STATIC
             lb_policy: ROUND_ROBIN
             load_assignment:
               cluster_name: some_service
               endpoints:
               - lb_endpoints:
                 - endpoint:
                     address:
                       socket_address:
                         address: 0.0.0.0
                         port_value: 5678
         admin:
           access_log_path: /dev/null
           address:
             socket_address:
               address: 0.0.0.0
               port_value: 19000
       - --disable-hot-restart
       - --log-level
       - debug
       - --concurrency
       - "1"
       - --file-flush-interval-msec
       - "10"
       image: envoyproxy/envoy:v1.14.2
       imagePullPolicy: IfNotPresent
       name: envoy
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     dnsPolicy: ClusterFirst
     restartPolicy: Always
     schedulerName: default-scheduler
     securityContext: {}
     terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
 annotations:
   service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
   service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   service.beta.kubernetes.io/aws-load-balancer-type: external
 labels:
   app: yellowgreen
   text: yellow
 name: service-yellow
 namespace: default
spec:
 ports:
 - name: color
   port: 10000
   protocol: TCP
   targetPort: 10000
 selector:
   app: yellowgreen
   text: yellow
 sessionAffinity: None
 type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: yellowgreen
   text: yellow
 name: echo-yellow
 namespace: default
spec:
 progressDeadlineSeconds: 600
 replicas: 1
 revisionHistoryLimit: 10
 selector:
   matchLabels:
     app: yellowgreen
     text: yellow
 strategy:
   rollingUpdate:
     maxSurge: 25%
     maxUnavailable: 25%
   type: RollingUpdate
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: yellowgreen
       text: yellow
   spec:
     containers:
     - args:
       - -text="yellow-pod"
       image: hashicorp/http-echo@sha256:ba27d460cd1f22a1a4331bdf74f4fccbc025552357e8a3249c40ae216275de96
       imagePullPolicy: IfNotPresent
       name: echo
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     - args:
       - --config-yaml
       - |2

         node:
          cluster: ingress
          id: "ingress~for-testing"
          metadata:
           role: "default~proxy"
         static_resources:
           listeners:
           - name: listener_0
             address:
               socket_address: { address: 0.0.0.0, port_value: 10000 }
             filter_chains:
             - filters:
               - name: envoy.filters.network.http_connection_manager
                 typed_config:
                   "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                   stat_prefix: ingress_http
                   codec_type: AUTO
                   route_config:
                     name: local_route
                     virtual_hosts:
                     - name: local_service
                       domains: ["*"]
                       routes:
                       - match: { prefix: "/" }
                         route: { cluster: some_service }
                   http_filters:
                   - name: envoy.filters.http.health_check
                     typed_config:
                       "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                       pass_through_mode: true
                   - name: envoy.filters.http.router
           clusters:
           - name: some_service
             connect_timeout: 0.25s
             type: STATIC
             lb_policy: ROUND_ROBIN
             load_assignment:
               cluster_name: some_service
               endpoints:
               - lb_endpoints:
                 - endpoint:
                     address:
                       socket_address:
                         address: 0.0.0.0
                         port_value: 5678
         admin:
           access_log_path: /dev/null
           address:
             socket_address:
               address: 0.0.0.0
               port_value: 19000
       - --disable-hot-restart
       - --log-level
       - debug
       - --concurrency
       - "1"
       - --file-flush-interval-msec
       - "10"
       image: envoyproxy/envoy:v1.14.2
       imagePullPolicy: IfNotPresent
       name: envoy
       resources: {}
       terminationMessagePath: /dev/termination-log
       terminationMessagePolicy: File
     dnsPolicy: ClusterFirst
     restartPolicy: Always
     schedulerName: default-scheduler
     securityContext: {}
     terminationGracePeriodSeconds: 0
EOF

VS and Upstream:

kubectl apply -f - <<EOF
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  name: us-5033-failover
  namespace: meg-system-dev01-internal
spec:
  connectionConfig:
    commonHttpProtocolOptions:
      idleTimeout: 50s
    connectTimeout: 20s
    tcpKeepalive:
      keepaliveProbes: 5
      keepaliveTime: 20s
  failover:
    prioritizedLocalities:
    - localityEndpoints:
      - lbEndpoints:
        - address: service-green.default.svc.cluster.local
          port: 10000
          upstreamSslConfig:
            secretRef:
              name: gateway-tls-int
              namespace: meg-system-dev01
            sni: service-green.servebeer.com
        loadBalancingWeight: 5
        locality:
          region: px
          zone: alpha1
      - lbEndpoints:
        - address:  service-yellow.default.svc.cluster.local
          port: 10000
          upstreamSslConfig:
            secretRef:
              name: gateway-tls-int
              namespace: meg-system-dev01
            sni: service-blue.servebeer.com
        loadBalancingWeight: 2
        locality:
          region: px
          zone: alpha2
  healthChecks:
  - healthyThreshold: 2
    httpHealthCheck:
      expectedStatuses:
      - end: 227
        start: 200
      path: /get
    interval: 120s
    timeout: 10s
    unhealthyThreshold: 3
  loadBalancerConfig:
    healthyPanicThreshold: 0
    localityWeightedLbConfig: {}
    maglev: {}
  static:
    hosts:
    - addr:  service-blue.default.svc.cluster.local
      loadBalancingWeight: 5
      port: 10000
    useTls: false
---
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: us-5033-failover-vs
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - 'failover.servebeer.com'
    routes:
    - matchers:
      - prefix: /
      options:
          lbHash:
            hashPolicies:
            - cookie:
                name: my-gloo-session
                ttl: 1h
            - header: user-id
      routeAction:
        single:
          upstream:
            name: us-5033-failover
            namespace: meg-system-dev01-internal

Add/remove maglev and confirm the following behavior:
Healthy

loadBalancerConfig:
    healthyPanicThreshold: 0
    localityWeightedLbConfig: {}

Unhealthy

loadBalancerConfig:
    healthyPanicThreshold: 0
    localityWeightedLbConfig: {}
    maglev: {}

Requests:
503 with maglev:

curl -vvv http://failover.servebeer.com
* Host failover.servebeer.com:80 was resolved.
* IPv6: (none)
* IPv4: 172.19.0.9
*   Trying 172.19.0.9:80...
* Connected to failover.servebeer.com (172.19.0.9) port 80
> GET / HTTP/1.1
> Host: failover.servebeer.com
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 19
< content-type: text/plain
< date: Wed, 15 Jan 2025 18:23:27 GMT
< server: envoy
<
* Connection #0 to host failover.servebeer.com left intact
no healthy upstream#

healthy:

curl -vvv http://failover.servebeer.com
* Host failover.servebeer.com:80 was resolved.
* IPv6: (none)
* IPv4: 172.19.0.9
*   Trying 172.19.0.9:80...
* Connected to failover.servebeer.com (172.19.0.9) port 80
> GET / HTTP/1.1
> Host: failover.servebeer.com
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< x-app-name: http-echo
< x-app-version: 0.2.3
< date: Wed, 15 Jan 2025 18:24:29 GMT
< content-length: 11
< content-type: text/plain; charset=utf-8
< x-envoy-upstream-service-time: 0
< x-envoy-upstream-healthchecked-cluster: ingress
< server: envoy
<
"blue-pod"
* Connection #0 to host failover.servebeer.com left intact

gateway-proxy-maglev-trace.log

config_dump.json

Additional Environment Detail

No response

Additional Context

No response

The text was updated successfully, but these errors were encountered:

soloio-bot · 2025-01-15T18:27:48Z

Zendesk ticket #5033 has been linked to this issue.

sadieleob added the Type: Bug Something isn't working label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failover plugin affecting health checks when using consistent hashing with Maglev #10457

Failover plugin affecting health checks when using consistent hashing with Maglev #10457

sadieleob commented Jan 15, 2025 •

edited

Loading

soloio-bot commented Jan 15, 2025

Failover plugin affecting health checks when using consistent hashing with Maglev #10457

Failover plugin affecting health checks when using consistent hashing with Maglev #10457

Comments

sadieleob commented Jan 15, 2025 • edited Loading

K8sGateway Version

Kubernetes Version

Describe the bug

Expected Behavior

Steps to reproduce the bug

Additional Environment Detail

Additional Context

soloio-bot commented Jan 15, 2025

sadieleob commented Jan 15, 2025 •

edited

Loading