Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Introduce an index for package meta information #43322

Open
Dilhasha opened this issue Aug 26, 2024 · 10 comments · May be fixed by #43717
Open

[Improvement]: Introduce an index for package meta information #43322

Dilhasha opened this issue Aug 26, 2024 · 10 comments · May be fixed by #43717

Comments

@Dilhasha
Copy link
Contributor

Description

Introduce a local index for package meta information which will be used for the resolution decisions done on the client side.

Describe your problem(s)

No response

Describe your solution(s)

No response

Related area

-> Other Area

Related issue(s) (optional)

No response

Suggested label(s) (optional)

No response

Suggested assignee(s) (optional)

No response

@Dilhasha Dilhasha moved this to In Progress in Ballerina Team Main Board Aug 26, 2024
@ballerina-bot ballerina-bot added the needTriage The issue has to be inspected and labeled manually label Aug 26, 2024
@Dilhasha Dilhasha self-assigned this Aug 26, 2024
@Dilhasha Dilhasha removed the needTriage The issue has to be inspected and labeled manually label Aug 26, 2024
@Dilhasha Dilhasha moved this from In Progress to On Hold in Ballerina Team Main Board Sep 9, 2024
@azinneera azinneera assigned azinneera and unassigned Dilhasha Sep 9, 2024
@gayaldassanayake
Copy link
Contributor

This feature intends to offload the package resolution logic from the Ballerina Central to the language resolution engine. Currently supporting custom package registries is hard due to the complexity of the package registry server implementation.

However, once this is completed, the client (i.e. the local Ballerina instance) will do the dependency resolution, so the server doesn't need to be intelligent. That way any custom server implementations will not have to worry about dependency resolution.

We are getting inspiration from the Rust implementation of their package registry indexing.

The document below is the design document for the feature.
https://docs.google.com/document/d/1YQn931CMBKME090pUXcvjjA2GuI7Nze-NvRdsMH6-vQ/edit?usp=sharing

@gayaldassanayake
Copy link
Contributor

The current progress on the design that we have agreed upon contains,

  1. Using an package index and maintaining a cache of it locally.
  2. Having a new repository type that acts as a proxy and a cache that sits between the Ballerina Central and the client.
  3. Dependency resolution performed in the client side.

However, we are in the process of evaluating what's the best possible approach to implementing the index. The main approaches are,

  1. git based approach with dense indexing
  2. HTTP based with sparse indexing

@sameerajayasoma
Copy link
Contributor

We had an offline chat about this and decided to proceed with a Git-based approach. I believe This approach should work fine for Ballerina for the next few years. We could move to a sparse indexing solution if and when it becomes a bottleneck.

@gayaldassanayake
Copy link
Contributor

gayaldassanayake commented Oct 18, 2024

A POC implementation for the indexing is now carried out. The approach used is git-based.

As a part of the POC, we need to generate a index for the current existing packages. For this, a new admin endpoint needs to be added to the central package API. For this, https://github.com/wso2-enterprise/ballerina-registry/pull/2835 has been sent to the ballerina-registry repository.

Furthermore, I'm parallelly working on introducing the package resolution logic that uses the index implementation from the project API side.

@gayaldassanayake
Copy link
Contributor

gayaldassanayake commented Oct 30, 2024

After last week meeting on indexing, we decided to look into an index-optimized dependency resolution.
The current dependency resolution has the following steps.

  1. Extract the direct dependencies of a package from the source code
  2. Create a static dependency graph by getting the versions of the dependencies and their dependency graphs using a Ballerina Central call.
  3. Iterate through the nodes and update the dependency versions (again a central call is used).

However, with the introduction of the index, we don't have a dependency graph with us. Instead, we have only the direct dependencies of a package. Furthermore, we have all the package index information available locally. Therefore, a newer design that doesn't require the dependency graph to be sent from the Central is appropriate.

The current design that is being worked on is iterating the dependencies as we find them using a DFS of a BFS and resolving the dependency conflicts as we create and traverse the graph.

An alternate approach is using an SAT[1], similar to what's specified in [2] and [3]. This is an algorithm used to solve a Boolean satisfiability problem. Once we convert the dependencies and the other constraints to boolean expressions, the SAT solver will do the rest. This has been done in other programming languages successfully.

[1] https://en.wikipedia.org/wiki/SAT_solver
[2] https://borretti.me/article/dependency-resolution-made-simple
[3] https://ochagavia.nl/blog/the-magic-of-dependency-resolution/

@gayaldassanayake
Copy link
Contributor

The Central PR for bulk index creation has been merged. However, the process couldn't be deployed in the Central dev environment due to the lack of file creation permissions. I am currently working on using temp directories to avoid this.

@gayaldassanayake
Copy link
Contributor

An implementation of the package resolution is implemented here - https://github.com/gayaldassanayake/package-resolver/. The implementation was done in Ballerina. I am currently working on moving it to Java and integrating it with the compilation process.

This implementation contains the simplest scenario where the locking mode is SOFT. The following are the constraints that need to be considered/ implemented next.

  1. MEDIUM, HARD locking modes
  2. dependency scopes
  3. Module level resolution - currently only the packages are picked up.
  4. Repository types - Distribution/Central/(FileSystem)/Local/Custom
  5. Pre-release versions
  6. Cyclic dependency detection

The newly proposed locking modes can be found in here.

@gayaldassanayake
Copy link
Contributor

An initial draft implementation of the resolution has been implemented. Following have been so far implemented.

  • SOFT locking mode
  • Index class structure
  • Package locking modes
  • Dependency resolution est framework updates

This branch will be updated with new commits as the next set of features get add on to the implementation.

https://github.com/gayaldassanayake/ballerina-lang/tree/index-based-resolution

@gayaldassanayake gayaldassanayake linked a pull request Dec 19, 2024 that will close this issue
13 tasks
@gayaldassanayake
Copy link
Contributor

gayaldassanayake commented Dec 19, 2024

With eb01b48, the following has been implemented.

  1. Update policy vs Locking mode matrix. This includes consideration of following constraints.
  • the different Update policies (SOFT, MEDIUM, HARD, LOCKED)
  • last build time-based constraints
  • import changes
  • Distribution changes
  1. Local repository support
  2. Dependency resolution test framework updates to match update policy

The code up to this point reside in the PR #43717

The next steps are,

  • module level import handling
  • Update the rest of the resolution test framework to work with the new implementation
  • Cyclic dependency detection
  • Custom repository support
  • Logic for fetching the index from remote git repo (already implemented in a separate branch. Need to migrate to this)
  • Finalizing the interfaces used

@gayaldassanayake
Copy link
Contributor

The module-level import handling has been added to the draft PR.

As the next step, I am trying to standardize the set of Java APIs. This will

  • Help with the custom repository implementation as it relies on the APIs provided by the package resolution
  • Clean the code and allow the future implementations to be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

5 participants