-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile commonmark spec examples with unified #28
base: main
Are you sure you want to change the base?
Conversation
value: <div> | ||
myst: |2 | ||
<div> | ||
|
||
<div> | ||
html: |2- | ||
html: |-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not correct, due to the regex replacement I used, as a temporary solution, but not sure how to output this format using js-yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the regex replacement, but then obviously there will be more "spurious" diffs of |
vs |-
- type: thematicBreak | ||
- type: thematicBreak | ||
- type: yaml | ||
value: '' | ||
myst: | | ||
--- | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially the only "known" divergence from the commonmark spec, using the "MyST Plugins", i.e. a thematic break on the first line is treated as the front-matter
- type: link | ||
url: /f%C3%B6%C3%B6 | ||
title: föö | ||
- type: linkReference | ||
children: | ||
- type: text | ||
value: foo | ||
label: foo | ||
identifier: foo | ||
referenceType: shortcut | ||
- type: definition | ||
identifier: foo | ||
label: foo | ||
title: föö | ||
url: /föö |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights two key differences to the current spec:
- URLs are not HTML escaped: this makes sense, since it only needs to be done when generating HTML
- References are not yet resolved to definitions
(2) Is a more conceptual question: should this "snapshot" of the AST, represent it before or after reference resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are certainly use cases (such as LSPs) where one would like to retain information regarding the position of definitions and references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure there aother things to look for, but my first scroll through:
Should we be adding the nulls for checked/start/meta/lang/title?
I don't think that markdownit is good at keeping the spread
value.
Should the mdast for list items have paragraphs put in? I suppose the is the default of unified?
children: | ||
- type: text | ||
value: one | ||
- type: paragraph | ||
children: | ||
- type: text | ||
value: one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right from the html ouput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inclusion of paragraph tags is based on spread
. I've already implemented this in: https://github.com/chrisjsewell/myst-spec/blob/c7828391501447616e530491f4bae0d16f20ea8b/src/myst_spec_py/mdast_to_html.py#L96-L102
spread: false | ||
children: | ||
- type: listItem | ||
spread: true | ||
spread: false | ||
checked: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include checked in commonmark? Probably not?
It is part of GFM, and null is equivalent to undefined/not there:
https://github.com/syntax-tree/mdast#listitem-gfm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that its probably not ideal, however, it is something baked into the reference implementation of mdast: https://github.com/syntax-tree/mdast-util-from-markdown/blob/0e70e0a937d89f5a0164128b7d2c53b77875d12c/dev/lib/index.js#L1069
This makes it a bit of a pain for creating the reference implementation, either:
we have to get them to remove it from there, or I have to add an extra "special case" step in unified-myst to remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is obviously similar for start
/meta
/lang
/title
and even if they are not included in the examples, the schema should at least accept null values, e.g. here: https://github.com/executablebooks/myst-spec/blob/a8a57d9d88c69f09c3fdc2c7ce7a1d971a223e29/schema/commonmark.schema.json#L187-L194
Looks like we also need to update the json schema, as that test doesn't pass. |
I think the key question here is: should we be modifying what is essentially the reference implementation of CommonMark MDAST? What I feel it should not really be, is based off of the "third-party" markdown-it implementation (although obviously we do want that to also comply) |
This is possible to achieve, since I've done it 😉 : https://github.com/chrisjsewell/myst-spec/blob/c7828391501447616e530491f4bae0d16f20ea8b/src/myst_spec_py/mdit_to_mdast.py#L108-L114 |
@rowanc1 @fwkoch, just to check, you do know that https://www.npmjs.com/package/@types/mdast is a thing 😬 |
At least on the subject of nullable fields, it seems that there are discrepancies, between their types, and the ones we distribute here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like that definitions
are in there now. I agree references should not yet be resolved - this is the case for crossReferences
, adding imageReferences
and linkReferences
seems ok. Also nice to have some clarity on spread vs. paragraphs on list items. Possibly we should update the schema so list item children must be flow content, rather than flow or phrasaing...
For the nulls
, checked
, other things that are coming from unified mdast, I don't feel too strongly - probably not worth the workaround?
Biggest thing is to just update the JSON schema to reflect the changes here.
@@ -10489,7 +11191,8 @@ cases: | |||
- type: paragraph | |||
children: | |||
- type: link | |||
url: "/url\u00A0\"title\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few picky things in here like non-space whitespace. I think the problem here is that the non-space whitespace isn't actually present in the markup...
from export interface ListItem extends Parent {
type: 'listItem';
checked?: boolean | null | undefined;
spread?: boolean | null | undefined;
children: Array<BlockContent | DefinitionContent>;
} |
This was generated from executablebooks/unified-myst#2, with: