WIP: "Thick" Pareto frontier #400

MilesCranmer · 2025-01-04T23:10:20Z

This puts in some of the groundwork for MilesCranmer/PySR#791. cc @folivetti. Basically if you pass

pareto_element_options = ParetoTopKOptions(k=5)

to SRRegressor, it will allow the top-5 individuals to be stored at a given complexity level, rather than only the top-1.

The interface aims to provide an easy way to try other ideas for researchers – for example, here is most of the implementation of ParetoTopK:

# Options for this type of pareto front:
struct ParetoTopKOptions <: AbstractParetoOptions
    k::Int
end

# The actual storage:
struct ParetoTopK{T,L,N,P<:PopMember{T,L,N}} <: AbstractParetoElement{P}
    members::Vector{P}
    k::Int
end

# This is what happens when we offer a new individual (member) with a
# given loss (called `score`... I know, bad name, will eventually change it to `cost`)
function Base.push!(el::ParetoTopK, (score, member)::Pair{<:LOSS_TYPE,<:PopMember})
    if isempty(el.members)
        push!(el.members, copy(member))
        return el
    elseif el.members[end].score <= score
        # No update needed
        return el
    elseif el.members[1].score > score
        pushfirst!(el.members, copy(member))
    else
        # Find the first member with worse score
        i = findfirst(m -> m.score > score, el.members)::Int
        # member assumes that position, and pushes the array forward
        insert!(el.members, i, copy(member))
    end
    if length(el.members) > el.k
        pop!(el.members)
    end
    return el
end

So basically this stores the best K individuals seen for a given complexity.

You now update the hall of fame like this:

push!(hall_of_fame, complexity => pop_member)

This gets routed through to the specific element of hall_of_fame, which would then (internally) call

hall_of_fame.elements[complexity] =
    push!(hall_of_fame.elements[complexity], score => pop_member)

And the given AbstractParetoElement type can consider this new individual however it likes.

I guess what is missing for @folivetti's idea is some way to measure diversity. Right now the pareto element type can only see the genotype. Would we want it to have access to the phenotype (evaluation result) as well? This seems super tricky to me because a user can also implement a custom loss function. How can we do that in a generic way?

folivetti · 2025-01-05T10:54:07Z

I think this is enough to implement that idea! The user can filter the list themselves if needed. For diversity filtering, I imagine something like:

# X_out is the data points X sampled outside the training boundaries
for complexity in 1:max_complexity
    exprs = hall_of_fame.elements[complexity]
    similars = Set()
    for i in 1:length(exprs)
        for j in (i+1):length(exprs)
            if distance(exprs[i], exprs[j]) < threshold
                push!(similars, j)
            end
        end
    end
    deleteat!(hall_of_fame.elements[complexity], sort!(collect(similars)))
end

What could be helpful is a function that implements this ^ and the user provides the distance function.

MilesCranmer added 16 commits January 4, 2025 17:12

feat: abstract types for Pareto front elements

84f14f1

feat: working abstract pareto element

c3c0550

fix: missing copy in merge

91b485d

docs: improve docstring

5911382

fix: printing with ParetoSingle

b740e3a

refactor: clean up some API calls

d79bcfc

fix: type instability in search state

3ecc0be

fix: type instability from depwarn

0e3e66f

fix: other misuses for new element type

1804cf3

fix: generalize other instances of .members to pareto elements

0e4fb7c

fix: force inline of ParetoSingle getproperty

8f024d0

feat: implement utility functions for ParetoNeighborhood

aedcc5f

refactor: rename ParetoNeighborhood to ParetoTopK

a1036e6

feat: more sizehint! for pareto elements

f191d38

fix: aliasing issue from shallow copy

78261a7

feat: migrate entire hof rather than only dominating

dcc0bae

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: "Thick" Pareto frontier #400

WIP: "Thick" Pareto frontier #400

MilesCranmer commented Jan 4, 2025

This comment was marked as resolved.

folivetti commented Jan 5, 2025

WIP: "Thick" Pareto frontier #400

Are you sure you want to change the base?

WIP: "Thick" Pareto frontier #400

Conversation

MilesCranmer commented Jan 4, 2025

This comment was marked as resolved.

folivetti commented Jan 5, 2025