-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: modernisation of proselint #1371
base: main
Are you sure you want to change the base?
Conversation
Co-Authored-By: Tyler J Russell <xtylerjrx@gmail.com>
Co-Authored-By: Tyler J Russell <xtylerjrx@gmail.com>
This commit is expected to land at a time close to or after 2024-10-01, following which Python 3.8 will be end-of-life. Several tools already require a Python version >=3.9, which has caused dependency resolution issues for some time now. Python 3.9 changed the workings of the type system quite substantially. This will allow for a higher code quality going forwards. BREAKING CHANGE: Python 3.8 is no longer supported as of this commit. To migrate, update your system's Python version.
This looks fantastic @Nytelife26! I see the last commit here is from July, is there much more that needs to be done? |
I am in the process of porting proselint in its entirety, which became a necessity after an unfortunate conflict arose with the original author of the refactor. I see this as the best possible path forward for proselint - a fresh start, which will come with performance benefits, long-overdue housekeeping, and a good chance to implement some features people have been asking for for a long time. This effort has not been easy, and while I'm back at university, things have been a hard balance. However, I have many of the internals done already (configuration, core parts of the command line, specification structures), and I feel good progress is being made. I will be making this effort more public once I have a solid foundation in place. For now, the latest commits to this pull request mark the last Python version of the project, unless some major breakthrough in communication happens. Let me know if you have any furher questions. As always, I am incredibly grateful and happy to see that people are still interested in proselint. Things were rough, with stagnant development after communications ceased some years ago, but I am excited to finally have the chance to revive this project. |
Small victories are showing - the command line, configuration parser, and check primitives are operational. As an additional bonus, all of the check specifications will be evaluated and stored at compile-time, entirely eliminating runtime discovery costs from the Python version. Shown here is the first test with an actual regex specification from the original code. Some things are not yet possible for reasons beyond my control, like consistent case-insensitive matching without using hacky mode modifiers (blocked by fancy-regex#132). I also have yet to implement parallelization, but that will not be a priority until much of the other major work is complete; although, it should be as trivial as adding Rayon and adjusting the dispatch iterators. |
I committed a preview version today so any onlookers can see how things are coming along. Be advised that at present, things are messy, quite inefficient, and there are still traces of Python-esque design patterns lying around. However, results speak for themselves. With 51 of ~180 checks implemented, an uncached serial run in release mode is at least 10 times faster than the previous implementation of an uncached parallel run mode from my measurements. Assuming this performance will scale in a linear fashion with some pessimistic padding, I would expect no worse than 2 times faster.
Updated results with a parallel iterator via rayon:
All check specifications are registered at compile-time. Things that remain to be done include message templating, output formats, deciding whether I'd like |
This affects all check types except for ExistenceSimple, which is currently the only check type requiring backtracking or lookarounds.
Applications of the unicode flag for checks were previously consistent, and having it default to true makes no difference for any check within proselint. Further, this behaviour is consistent with the default for the Rust regex crate. The only check that previously had dotall enabled did not use dot matching, so it was redundant. Removing these unnecessary features results in a cleaner codebase.
This data is held statically in check specifications immutably. Therefore, there is no reason to convert it to a heap buffer when it is passed around. Equally, this removes the cost of cloning the heap buffers when the conversion from a CheckResult to a LintResult is made.
The previous emulation was inefficient and inconsistent. Now, the signature of all check functions is the same, and a layer of indirection is removed. In the future, this may allow for combining several specifications into one check.
this is a follow on from #1361. credit to @orgua for the initial work here.
following a request that no work from the initial refactoring effort should be used, preserved below, the oxidation of proselint begins.
it can be observed here that the time it now takes to launch proselint, parse CLI options, find config paths for both JSON and TOML, and deserialize them, is faster than it previously took just to load the CLI options. it is worth noting that the previous measurements were taken even after a highly optimised refactor that involved replacing click with a simplified parse function.
it may take some time to get this all up and running. i would like to thank any onlookers for bearing with me.
keeping this here so i don't forget: