Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: "Thick" Pareto frontier #400

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

WIP: "Thick" Pareto frontier #400

wants to merge 16 commits into from

Conversation

MilesCranmer
Copy link
Owner

This puts in some of the groundwork for MilesCranmer/PySR#791. cc @folivetti. Basically if you pass

pareto_element_options = ParetoTopKOptions(k=5)

to SRRegressor, it will allow the top-5 individuals to be stored at a given complexity level, rather than only the top-1.

The interface aims to provide an easy way to try other ideas for researchers – for example, here is most of the implementation of ParetoTopK:

# Options for this type of pareto front:
struct ParetoTopKOptions <: AbstractParetoOptions
    k::Int
end

# The actual storage:
struct ParetoTopK{T,L,N,P<:PopMember{T,L,N}} <: AbstractParetoElement{P}
    members::Vector{P}
    k::Int
end

# This is what happens when we offer a new individual (member) with a
# given loss (called `score`... I know, bad name, will eventually change it to `cost`)
function Base.push!(el::ParetoTopK, (score, member)::Pair{<:LOSS_TYPE,<:PopMember})
    if isempty(el.members)
        push!(el.members, copy(member))
        return el
    elseif el.members[end].score <= score
        # No update needed
        return el
    elseif el.members[1].score > score
        pushfirst!(el.members, copy(member))
    else
        # Find the first member with worse score
        i = findfirst(m -> m.score > score, el.members)::Int
        # member assumes that position, and pushes the array forward
        insert!(el.members, i, copy(member))
    end
    if length(el.members) > el.k
        pop!(el.members)
    end
    return el
end

So basically this stores the best K individuals seen for a given complexity.

You now update the hall of fame like this:

push!(hall_of_fame, complexity => pop_member)

This gets routed through to the specific element of hall_of_fame, which would then (internally) call

hall_of_fame.elements[complexity] =
    push!(hall_of_fame.elements[complexity], score => pop_member)

And the given AbstractParetoElement type can consider this new individual however it likes.

I guess what is missing for @folivetti's idea is some way to measure diversity. Right now the pareto element type can only see the genotype. Would we want it to have access to the phenotype (evaluation result) as well? This seems super tricky to me because a user can also implement a custom loss function. How can we do that in a generic way?

This comment was marked as resolved.

@folivetti
Copy link

I think this is enough to implement that idea! The user can filter the list themselves if needed. For diversity filtering, I imagine something like:

# X_out is the data points X sampled outside the training boundaries
for complexity in 1:max_complexity
    exprs = hall_of_fame.elements[complexity]
    similars = Set()
    for i in 1:length(exprs)
        for j in (i+1):length(exprs)
            if distance(exprs[i], exprs[j]) < threshold
                push!(similars, j)
            end
        end
    end
    deleteat!(hall_of_fame.elements[complexity], sort!(collect(similars)))
end

What could be helpful is a function that implements this ^ and the user provides the distance function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants