Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster health monitoring - uses for system internal/read-only topics #433

Closed
glycerine opened this issue Feb 9, 2017 · 2 comments
Closed

Comments

@glycerine
Copy link

glycerine commented Feb 9, 2017

To consolidate discussion on slack about using a system-internal topic to have servers provide health and cluster group membership information:

As a nats system operator, when gnatsd-cluster membership changes, I want:

(a) nats-top to be monitoring the correct set of all servers; (see also nats-io/nats-top#31)

(b) to receive an event published on nats that gets translated into a pager duty call so I know what box to go reboot/reclone;

(c) to allow the remaining set of servers’ clients to decide upon a new master to be the one writing.

Moreover, as a nats operator, when I run nats-top, for efficiency and ease of configuration:

(d) I don’t want to have to setup TLS. I want to avoid TLS between nats-top and my local gnatsd, and I want to avoid TLS between nats-top and any other remote gnatsd server.

Slack participants pointed out prior art in #230, which is similar but we're talking here about monitoring gnatsd-to-gnatsd (intra-cluster) connections rather than client connections.

With help from the Slack discussions, I'm trying to work up pull-request that will address these. My plan is start with #230 and adjust as needed.

@glycerine
Copy link
Author

useful answers from the slack discussion:

larry:
How would you tell the difference between a planned or unplanned server outage?

jason:
a planned outage is one where I tell pager duty to stop bothering me about it :)

ivan: [when we were discussing if we could approximate with a regular client instead of a "system-internal" client]
... as long as your health monitor knows 1 address in the cluster, it could use that to connect. On success, it can then regularly invoke Servers() to detect the list of servers in the cluster and create individual connections to those servers to check for health. Also, know that Servers() return list of URLs that clients can connect to. If each server uses 0.0.0.0 as its listen address, then the URLs that you get in your client are the resolved external IPs. So you may get more than one address if a server’s host has many interfaces. If you want to restrict to 1, then you should start your servers with -a <specific IP>. Then clients will receive only those. ...otherwise, you may create many health connections to the same server. Nothing bad per se, but you have to be aware of that.

glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 11, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
Leader election and priority
assignment are made available
with -health and -rank.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election and priority
assignment are made available.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election and priority
assignment are made available.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election and priority
assignment are made available.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election and priority
assignment are made available.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election and priority
assignment are made available.

InternalClient interface offers
plugin capability for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
 options to control health monitoring.

Leader election among gnatsd
instances, cluster health update
topics, and priority rank
assignment from the command
line are made available.

InternalClient interface offers
a general plugin interface for running
internal clients.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.
The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/server/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 12, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
glycerine added a commit to glycerine/hnatsd that referenced this issue Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
@ColinSullivan1
Copy link
Member

There quite a bit of good work here, and this is really appreciated. At this time, this is a feature we do not plan on implementing in the NATS server.

Parts of this would make a valuable monitoring tool outside of the server; if you are interested and implement something along the lines of using an external client monitor the health of a cluster and surface health related events, we'd certainly reference your work on our NATS.io Connectors and Utilities.

We hope you do so!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants