Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance DataCollector to validate model_reporters functions #2605

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peter-kinger
Copy link

Summary

This PR enhances the DataCollector class by adding a comprehensive validation system for model reporters. The main goal is to provide clearer feedback when reporters are misconfigured, while maintaining backward compatibility.

Motive

The original DataCollector implementation had several limitations:

  1. No validation for different types of model reporters
  2. Silent failures or unclear error messages when reporters were misconfigured
  3. Inconsistent handling of method calls and function parameters
  4. No early warning system for potential issues

These issues made debugging difficult and led to confusion when setting up model reporters.

Implementation

  1. New Validation Method:
def _validate_model_reporter(self, name, reporter, model):
    """Validates four types of model reporters:
    1. Lambda functions
    2. Method references (as strings)
    3. Attribute strings
    4. Function lists with parameters
    """
  1. Modified Collection Logic:
def collect(self, model):
    if self.model_reporters:
        for var, reporter in self.model_reporters.items():
            # Add validation before collection
            self._validate_model_reporter(var, reporter, model)
            # Existing collection logic continues...
  1. Warning System:
  • Replaced hard errors with warning messages
  • Added specific warnings for each reporter type
  • Included examples in warning messages

Usage Examples

  1. Lambda Functions:
# Valid usage
model_reporters = {
    "Agent Count": lambda m: len(m.agents)
}

# Invalid usage (will show warning)
model_reporters = {
    "Bad Lambda": lambda m: m.nonexistent_attr
}
  1. Method References:
# Valid usage
model_reporters = {
    "Grid Size": "get_grid_size"  # As string
}

# Invalid usage (will show warning)
model_reporters = {
    "Grid Size": self.get_grid_size  # Direct reference
}
  1. Attribute Strings:
# Valid usage
model_reporters = {
    "Total Wealth": "total_wealth"
}

# Invalid usage (will show warning)
model_reporters = {
    "Status": "nonexistent_attribute"
}
  1. Function Lists:
# Valid usage
model_reporters = {
    "Custom": [calculate_metric, [model, param]]
}

# Invalid usage (will show warning)
model_reporters = {
    "Bad Function": ["not_callable", [1, 2]]
}

Additional Notes

  1. Test Coverage:
  • Added comprehensive test suite in test_model_reporters.py
  • Tests cover both valid and invalid configurations
  • Includes edge cases and error conditions
  1. Backward Compatibility:
  • All existing code continues to work
  • Warnings can be suppressed if desired
  • No breaking changes to the API
  1. Documentation:
  • Updated docstrings with clear examples
  • Added warning messages with helpful suggestions
  • Included migration guide for existing code
  1. Files Modified:
  • datacollection.py: the only one py code changed in mesa offical repository.
  1. Dependencies:
  • No new dependencies added
  • Uses standard Python warnings module

datacollection_new.zip

@EwoutH
Copy link
Member

EwoutH commented Jan 10, 2025

Thanks for the PR, sounds interesting.

What's the performance overhead?

Comment on lines +152 to +156
f"Warning: Lambda reporter '{name}' failed: {e!s}\n"
f"Example of valid lambda: lambda m: len(m.agents)",
UserWarning,
stacklevel=2,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why issue a warning instead of an exception?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why issue a warning instead of an exception?

Thanks for the detailed suggestions for my revised codes! During this time, I try to understand them.

About the two questions:

  1. Yes, the mechanism of the expectation can avoid serious problems when using the function of the datacollector. As it's my first time adding the validate function, I chose the soft way to validate: just a warning not to influence the mesa code of the original version.

I had tried to write a new version of codes, which are all replaced with the expectation instead of the warning.

  # Type 1: Lambda function
        if isinstance(reporter, types.LambdaType):
            try:
                reporter(model)
            except Exception as e:
                raise RuntimeError(
                    f"Lambda reporter '{name}' failed validation: {str(e)}\n"
                    f"Example: lambda m: len(m.agents)"
                ) from e
  1. You're right that validating every data collection could impact performance. I propose two potential solutions:
  • (a) make the "_validate_model_reporter" function into the optional function, like:
def __init__(

        self,

        model_reporters=None,

        agent_reporters=None,

        agenttype_reporters=None,

        tables=None,

        validate_reporters=False,  # Add new parameter

    ):
  • (b) Move validation to initialization time:
    Put the "_validate_model_reporter" function in the Class DataCollector __init__() and the "_validate_model_reporter" function can be only run once, boosting the running time and lowing the need of performance.
Class DataCollector:
         def __init__():
              xxxxxxx...
         
        if model_reporters is not None:
            for name, reporter in model_reporters.items():
                self._validate_reporter_format(name, reporter) # my new way
                self._new_model_reporter(name, reporter)


I wonder what your opinions are. Thanks for your attention to this PR.

Comment on lines +341 to +343
# Add validation
self._validate_model_reporter(var, reporter, model)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the validation is done every single time you try to collect the data? That seems inefficient and overkill.

@peter-kinger
Copy link
Author

Thanks for the PR, sounds interesting.

What's the performance overhead?

Thanks for your attention to the PR. I apologize for the delayed response as I was unfamiliar with Mesa's testing logistics.

Regarding the overhead:

  1. Are you referring to the execution time measured by global_benchmark.py?

  2. As "quaquel" mentioned, the current version of _validate_model_reporter performs validation during data collection. Our performance benchmarks show some noticeable overhead.

Future considerations:

  1. I've been working on a new version of the validation logic using _validate_reporter_format within the DataCollector.__init__ function. However, this approach presents challenges, particularly with Type 4 reporters (function with parameters), as we can't fully validate them without the model parameter.

  2. I would greatly appreciate your suggestions on how to proceed, as this has been a challenging issue to resolve.

@quaquel
Copy link
Member

quaquel commented Jan 19, 2025

I suggest running the validation logic only on the first-ever call to DataCollector.collect. This solves both the overhead problem and your issue with type 4 reporters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants