Enhance DataCollector to validate model_reporters functions #2605

peter-kinger · 2025-01-10T07:47:25Z

Summary

This PR enhances the DataCollector class by adding a comprehensive validation system for model reporters. The main goal is to provide clearer feedback when reporters are misconfigured, while maintaining backward compatibility.

Motive

The original DataCollector implementation had several limitations:

No validation for different types of model reporters
Silent failures or unclear error messages when reporters were misconfigured
Inconsistent handling of method calls and function parameters
No early warning system for potential issues

These issues made debugging difficult and led to confusion when setting up model reporters.

Implementation

New Validation Method:

def _validate_model_reporter(self, name, reporter, model):
    """Validates four types of model reporters:
    1. Lambda functions
    2. Method references (as strings)
    3. Attribute strings
    4. Function lists with parameters
    """

Modified Collection Logic:

def collect(self, model):
    if self.model_reporters:
        for var, reporter in self.model_reporters.items():
            # Add validation before collection
            self._validate_model_reporter(var, reporter, model)
            # Existing collection logic continues...

Warning System:

Replaced hard errors with warning messages
Added specific warnings for each reporter type
Included examples in warning messages

Usage Examples

Lambda Functions:

# Valid usage
model_reporters = {
    "Agent Count": lambda m: len(m.agents)
}

# Invalid usage (will show warning)
model_reporters = {
    "Bad Lambda": lambda m: m.nonexistent_attr
}

Method References:

# Valid usage
model_reporters = {
    "Grid Size": "get_grid_size"  # As string
}

# Invalid usage (will show warning)
model_reporters = {
    "Grid Size": self.get_grid_size  # Direct reference
}

Attribute Strings:

# Valid usage
model_reporters = {
    "Total Wealth": "total_wealth"
}

# Invalid usage (will show warning)
model_reporters = {
    "Status": "nonexistent_attribute"
}

Function Lists:

# Valid usage
model_reporters = {
    "Custom": [calculate_metric, [model, param]]
}

# Invalid usage (will show warning)
model_reporters = {
    "Bad Function": ["not_callable", [1, 2]]
}

Additional Notes

Test Coverage:

Added comprehensive test suite in test_model_reporters.py
Tests cover both valid and invalid configurations
Includes edge cases and error conditions

Backward Compatibility:

All existing code continues to work
Warnings can be suppressed if desired
No breaking changes to the API

Documentation:

Updated docstrings with clear examples
Added warning messages with helpful suggestions
Included migration guide for existing code

Files Modified:

datacollection.py: the only one py code changed in mesa offical repository.

Dependencies:

No new dependencies added
Uses standard Python warnings module

datacollection_new.zip

EwoutH · 2025-01-10T10:05:22Z

Thanks for the PR, sounds interesting.

What's the performance overhead?

quaquel · 2025-01-12T18:47:45Z

mesa/datacollection.py

+                    f"Warning: Lambda reporter '{name}' failed: {e!s}\n"
+                    f"Example of valid lambda: lambda m: len(m.agents)",
+                    UserWarning,
+                    stacklevel=2,
+                )


why issue a warning instead of an exception?

why issue a warning instead of an exception?

Thanks for the detailed suggestions for my revised codes! During this time, I try to understand them.

About the two questions:

Yes, the mechanism of the expectation can avoid serious problems when using the function of the datacollector. As it's my first time adding the validate function, I chose the soft way to validate: just a warning not to influence the mesa code of the original version.

I had tried to write a new version of codes, which are all replaced with the expectation instead of the warning.

# Type 1: Lambda function if isinstance(reporter, types.LambdaType): try: reporter(model) except Exception as e: raise RuntimeError( f"Lambda reporter '{name}' failed validation: {str(e)}\n" f"Example: lambda m: len(m.agents)" ) from e

You're right that validating every data collection could impact performance. I propose two potential solutions:

(a) make the "_validate_model_reporter" function into the optional function, like:

def __init__( self, model_reporters=None, agent_reporters=None, agenttype_reporters=None, tables=None, validate_reporters=False, # Add new parameter ):

(b) Move validation to initialization time:
Put the "_validate_model_reporter" function in the Class DataCollector __init__() and the "_validate_model_reporter" function can be only run once, boosting the running time and lowing the need of performance.

Class DataCollector: def __init__(): xxxxxxx... if model_reporters is not None: for name, reporter in model_reporters.items(): self._validate_reporter_format(name, reporter) # my new way self._new_model_reporter(name, reporter)

I wonder what your opinions are. Thanks for your attention to this PR.

quaquel · 2025-01-12T18:48:53Z

mesa/datacollection.py

+                # Add validation
+                self._validate_model_reporter(var, reporter, model)
+


Does this imply that the validation is done every single time you try to collect the data? That seems inefficient and overkill.

peter-kinger · 2025-01-19T07:36:33Z

Thanks for the PR, sounds interesting.

What's the performance overhead?

Thanks for your attention to the PR. I apologize for the delayed response as I was unfamiliar with Mesa's testing logistics.

Regarding the overhead:

Are you referring to the execution time measured by global_benchmark.py?
As "quaquel" mentioned, the current version of _validate_model_reporter performs validation during data collection. Our performance benchmarks show some noticeable overhead.

Future considerations:

I've been working on a new version of the validation logic using _validate_reporter_format within the DataCollector.__init__ function. However, this approach presents challenges, particularly with Type 4 reporters (function with parameters), as we can't fully validate them without the model parameter.
I would greatly appreciate your suggestions on how to proceed, as this has been a challenging issue to resolve.

quaquel · 2025-01-19T18:48:45Z

I suggest running the validation logic only on the first-ever call to DataCollector.collect. This solves both the overhead problem and your issue with type 4 reporters.

Enhance DataCollector to validate model_reporters functions

426c62a

peter-kinger mentioned this pull request Jan 10, 2025

Enhance DataCollector to Validate model_reporters Functions #2606

Open

quaquel reviewed Jan 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance DataCollector to validate model_reporters functions #2605

Enhance DataCollector to validate model_reporters functions #2605

peter-kinger commented Jan 10, 2025

EwoutH commented Jan 10, 2025

quaquel Jan 12, 2025

peter-kinger Jan 19, 2025

quaquel Jan 12, 2025

peter-kinger commented Jan 19, 2025

quaquel commented Jan 19, 2025

		# Add validation
		self._validate_model_reporter(var, reporter, model)

Enhance DataCollector to validate model_reporters functions #2605

Are you sure you want to change the base?

Enhance DataCollector to validate model_reporters functions #2605

Conversation

peter-kinger commented Jan 10, 2025

Summary

Motive

Implementation

Usage Examples

Additional Notes

EwoutH commented Jan 10, 2025

quaquel Jan 12, 2025

Choose a reason for hiding this comment

peter-kinger Jan 19, 2025

Choose a reason for hiding this comment

quaquel Jan 12, 2025

Choose a reason for hiding this comment

peter-kinger commented Jan 19, 2025

quaquel commented Jan 19, 2025