Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pidlock to public functions #45

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

willow-ahrens
Copy link

@willow-ahrens willow-ahrens commented Jan 6, 2025

This PR locks the basic functions of pyjuliapkg. I used FileLock instead of pidlock because I needed a reentrant lock. I also used a thread lock. There is a test which I have verified does show that this fix avoids redundant resolves. fixes #19.

@amilsted
Copy link
Contributor

Do you know if this will work on a cluster if the shared filesystem does not support unix file locking? The "pidfile" approach, which stores the hostname and process ID of the acquirer in the lock file, does not depend on filesystem support for locking. I don't think filelock uses this approach, so I'm a bit worried about robustness.

@willow-ahrens
Copy link
Author

Filelock advertises itself as platform-independent, and has windows and unix specific functions. I haven't been able to test this on windows, however.

@willow-ahrens
Copy link
Author

One question I have about this PR is where to write the lock. I have it being written to the source directory of pyjuliapkg, but I suspect it would be better to use some location which is specified by the STATE dict. I just don't know which one makes sense to use here.

@amilsted
Copy link
Contributor

Filelock advertises itself as platform-independent, and has windows and unix specific functions.

It can be platform independent and still not work well across multiple hosts on a shared filesystem.

@willow-ahrens
Copy link
Author

I see what you're asking. The filelock library seems to advise the use of the SoftFileLock type in the event that there are multiple hosts, but this may require sometimes deleting the lock file when someone interrupts the setup. The julia developers seem to have had similar concerns in the design of Pkg, and ultimately I'm not sure of the extent to which this is solved. The solution in this PR should fix bugs on not-shared filesystems, and I'm open to input on how to ensure this also fixes bugs on shared ones. I chose the PID file package off of PyPI with the greatest support that offered a simple reentrant lock, but I'm open to suggestions on others?

@amilsted
Copy link
Contributor

amilsted commented Jan 16, 2025

Yeah, I understand this is a nontrivial choice. I shopped around for a Python package that ticked all the boxes (basically replicating the Julia solution) but didn't find one that worked well out of the... well, box. Any locking is definitely better than none here and I appreciate your taking this on!

Edit: And I don't mean to sound like I'm a maintainer here. I'm not! Just a user quite invested in making it easier to have python packages that use julia (including for use on clusters).

@willow-ahrens
Copy link
Author

willow-ahrens commented Jan 16, 2025

Looking at package managers in python:

@amilsted
Copy link
Contributor

In case it's useful, I used a somewhat modified version of pidlock to externally lock juliapkg operations. This worked okay on a cluster, but it again probably doesn't do everything right. For one thing, it's missing a mechanism to delete stale lock files.

Also the github version has a defect, in that it checks for file existence and then, if the file exists, opens the file without exception handling. The delay between existence check and open allows other processes or hosts to delete the file and cause a crash. This can be fixed through handling exceptions on open instead of checking for existence separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible concurrency issues with resolve on a cluster
2 participants