[RLlib] Offpolicy add metrics to multi-agent replay buffers. #49959

simonsays1980 · 2025-01-19T15:22:29Z

Why are these changes needed?

Multi-agent (episode) replay buffers in the new API stack were lacking metrics. This PR proposes adding metrics tom the multi-agent (episode) replay buffers by:

Changing the metrics in the base class 'EpisodeReplayBuffer' to include per-agent and per-module metrics. This helps in class inheritage, such that no _update_add_metrics and _update_sample_metrics method needs to be overridden.
To fill default values in cases some metrics do not exist (only the sampled_n_step, agent_to_sampled_n_step, and module_to_sampled_n_step can be None and are then not added).
Adding metrics for the environment, agent and module to all (episode) replay buffers.
Adds unique hashes per sample to keep track of how often step duplicates exist in the batch.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…odeReplayBuffer'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…chanism in 'DQN'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…ermore, added a further key argument for the initialization of the buffer to get the number of iterations for smoothing. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…ics to the 'PrioritizedEpisodeReplayBuffer'. Added also docstrings. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

… logic such that method overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…ethod overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…'PrioritizedEpisodeReplayBuffer' such that method overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…ppened because no episodes were evicted, yet. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…playBuffer' for 'indipendent' sampling and fixed a small nit in the 'EpisodeReplayBuffer._update_sample_metrics'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…PrioritizedReplayBuffer'. Furthermore, I modified the base class' '_update_sample_metrics' to already include the 'resample' counters, such that no subclass needs to moverride. This was before creating a conflict when the 'MultiAgentPrioritizedReplayBuffer' inherited from (a) 'PrioritizedReplayBuffer# and 'MultiAgentReplayBuffer'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2025-01-20T11:20:06Z

rllib/utils/replay_buffers/prioritized_episode_buffer.py

@@ -128,6 +124,7 @@ def __init__(
        batch_length_T: int = 1,
        alpha: float = 1.0,
        metrics_num_episodes_for_smoothing: int = 100,
+        metrics_num_episodes_for_smoothing: int = 100,


Arg defined twice?

Great catch! Removed in the next commit.

sven1977 · 2025-01-20T11:21:43Z

rllib/utils/replay_buffers/prioritized_episode_buffer.py

@@ -468,6 +497,7 @@ def sample(
            # Skip, if we are too far to the end and `episode_ts` + n_step would go
            # beyond the episode's end.
            if episode_ts + actual_n_step > len(episode):
+                num_resamples += 1


duplicate line?

Let me check this. Should be called once only.

Yup! Great catch again!

sven1977 · 2025-01-20T11:21:50Z

rllib/utils/replay_buffers/prioritized_episode_buffer.py

            # Increment counter.
            B += 1

            # Keep track of sampled indices for updating priorities later.
            self._last_sampled_indices.append(idx)

+        # Add to the sampled timesteps counter of the buffer.


Duplicate line?

Again. I will remove this.

sven1977 · 2025-01-20T11:28:59Z

rllib/utils/replay_buffers/episode_replay_buffer.py

@@ -108,6 +129,7 @@ def __init__(
        batch_size_B: int = 16,
        batch_length_T: int = 64,
        metrics_num_episodes_for_smoothing: int = 100,
+        metrics_num_episodes_for_smoothing: int = 100,


duplicate line?

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2025-01-20T11:31:21Z

rllib/utils/replay_buffers/multi_agent_episode_buffer.py

+                module_to_num_steps_added[mid] += e_eps.agent_steps()
+
+        # Update the adding metrics.
+        self._update_add_metrics(


Just for safety: Can we make all these utility methods with forced keywords?

def _update_add_metrics(self, *, ...):

Sure. Great idea.

sven1977 · 2025-01-20T11:32:35Z

rllib/utils/replay_buffers/multi_agent_episode_buffer.py

+        module_to_sampled_n_step = {
+            mid: sum(l) / len(l) for mid, l in module_to_sampled_n_steps.items()
+        }
+        self._update_sample_metrics(


same here, let's make these forced keyword args:

def _update_sample_metrics(self, *, ...):

sven1977 · 2025-01-20T11:33:04Z

rllib/utils/replay_buffers/multi_agent_prioritized_episode_buffer.py

            # Increase index to the new length of `self._indices`.
            j = len(self._indices)

+        # Update the adding metrics.
+        self._update_add_metrics(


same: def _update_add_metrics(self, *, ...)

sven1977 · 2025-01-20T11:35:14Z

rllib/utils/replay_buffers/multi_agent_prioritized_episode_buffer.py

+                agent_to_sampled_episode_idxs[sa_episode.agent_id].add(sa_episode.id_)
+                module_to_sampled_episode_idxs[module_id].add(sa_episode.id_)
+                # Get the corresponding index in the `env_to_agent_t` mapping.
+                ma_episode_ts = ma_episode.env_t_to_agent_t[agent_id].data.index(


Could this be super expensive, if we have long episodes?

Are we doing this only for the actual_n_steps metrics? How important is it to have these? It's mostly relevant for short episodes, correct? Where we sometimes sample too close to the end and can't fulfil the given n-step length?

No, this is done for any step. The idea is to check, how variate this sample batch is. How many steps are duplicates. Here specifically: how many agent steps come from the same env step. The hash is built over the multi-agent-episode env step

sven1977 · 2025-01-20T11:38:46Z

rllib/utils/replay_buffers/episode_replay_buffer.py

+                module_to_num_episodes_evicted[DEFAULT_MODULE_ID] += 1
+                agent_to_num_steps_evicted[
+                    DEFAULT_AGENT_ID
+                ] += evicted_eps.agent_steps()


nit: use evicted_eps_len ?

sven1977 · 2025-01-20T11:38:53Z

rllib/utils/replay_buffers/episode_replay_buffer.py

+                ] += evicted_eps.agent_steps()
+                module_to_num_steps_evicted[
+                    DEFAULT_MODULE_ID
+                ] += evicted_eps.agent_steps()


nit: use evicted_eps_len ?

Yes, we could. I wanted to make explicit that this metrics considers the agent steps - which here is of course the same.

simonsays1980 added 14 commits January 8, 2025 12:01

Defined metrics for replay buffers and added metrics logging to 'Epis…

f4e56ee

…odeReplayBuffer'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

WIP. Added counting metrics to 'EpisodeReplayBuffer' and a logging me…

22d406f

…chanism in 'DQN'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

LINTER

5eea74e

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added metrics for adding and sampling in 'EpisodeReplayBuffer'. Furth…

78bcb4f

…ermore, added a further key argument for the initialization of the buffer to get the number of iterations for smoothing. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added further metrics to the 'EpisodeReplayBuffer' and added all metr…

d440c93

…ics to the 'PrioritizedEpisodeReplayBuffer'. Added also docstrings. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merged master and resolved conflicts.

12a251b

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offpolicy-add-metrics-to-buffers

9f5ab29

Added per-module metrics to 'EpisodeReplayBuffer' and modified update…

4544f7b

… logic such that method overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added per-module sample metrics and modified sample logic such that m…

e8ed12c

…ethod overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added per-module add and sample metrics and modified sample logic in …

6248715

…'PrioritizedEpisodeReplayBuffer' such that method overriding works for MA-buffers. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added metrics for the 'add' method in 'MultiAgentEpisodeReplayBuffer'.

801d544

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Fixed a small bug in updating metrics for multi-agent buffers that ha…

96cbc6d

…ppened because no episodes were evicted, yet. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added functionalities to update sample metrics in 'MultiAgentEpidoeRe…

9a73095

…playBuffer' for 'indipendent' sampling and fixed a small nit in the 'EpisodeReplayBuffer._update_sample_metrics'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 marked this pull request as ready for review January 19, 2025 15:23

simonsays1980 requested a review from sven1977 as a code owner January 19, 2025 15:23

simonsays1980 added 3 commits January 20, 2025 11:26

Merge branch 'master' into offpolicy-add-metrics-to-ma-buffers

aac1507

Removed unused imports from replay buffer files.

931b93c

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merged master and resolved conflicts.

1f2cc24

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 changed the title ~~[RLlib] - Offpolicy add metrics to multi-agent replay buffers~~ [RLlib] Offpolicy add metrics to multi-agent replay buffers. Jan 20, 2025

sven1977 reviewed Jan 20, 2025

View reviewed changes

Remvoed duplicate keyword argument from merging.

9f8fcf2

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 reviewed Jan 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Offpolicy add metrics to multi-agent replay buffers. #49959

[RLlib] Offpolicy add metrics to multi-agent replay buffers. #49959

simonsays1980 commented Jan 19, 2025 •

edited

Loading

sven1977 Jan 20, 2025

simonsays1980 Jan 20, 2025

sven1977 Jan 20, 2025

simonsays1980 Jan 20, 2025

simonsays1980 Jan 20, 2025

sven1977 Jan 20, 2025

simonsays1980 Jan 20, 2025

sven1977 Jan 20, 2025

sven1977 Jan 20, 2025 •

edited

Loading

simonsays1980 Jan 20, 2025

sven1977 Jan 20, 2025

sven1977 Jan 20, 2025

sven1977 Jan 20, 2025

sven1977 Jan 20, 2025

simonsays1980 Jan 20, 2025 •

edited

Loading

sven1977 Jan 20, 2025

sven1977 Jan 20, 2025

simonsays1980 Jan 20, 2025

[RLlib] Offpolicy add metrics to multi-agent replay buffers. #49959

Are you sure you want to change the base?

[RLlib] Offpolicy add metrics to multi-agent replay buffers. #49959

Conversation

simonsays1980 commented Jan 19, 2025 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 commented Jan 19, 2025 •

edited

Loading

sven1977 Jan 20, 2025 •

edited

Loading

simonsays1980 Jan 20, 2025 •

edited

Loading