Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile commonmark spec examples with unified #28

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chrisjsewell
Copy link
Contributor

This was generated from executablebooks/unified-myst#2, with:

// Note, use with `node --experimental-json-modules`
import * as data from 'myst-spec/dist/myst.tests.json' assert { type: 'json' }
import { removePosition } from 'unist-util-remove-position'
import { Processor } from '../packages/core-parse/src/index.js'
import { dump } from 'js-yaml'
import { writeFileSync } from 'fs'

/** @type {{title: string, myst: string, mdast: any}[]} */
const cases = data.default
const parser = new Processor()

const generated = cases
    .filter((value) => {
        return value.title.startsWith('cmark_spec')
    })
    .map((data) => {
        const result = parser.toAst(data.myst)
        data.mdast = removePosition(result.ast, true)
        data.title = data.title.split(':')[1].trimStart()
        return data
    })

writeFileSync(
    'cmark_spec.yml',
    dump({ cases: generated }).replace(/html: \|/g, 'html: |-'),
    'utf8'
)

value: <div>
myst: |2
<div>

<div>
html: |2-
html: |-2
Copy link
Contributor Author

@chrisjsewell chrisjsewell Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not correct, due to the regex replacement I used, as a temporary solution, but not sure how to output this format using js-yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the regex replacement, but then obviously there will be more "spurious" diffs of | vs |-

Comment on lines -1743 to 1825
- type: thematicBreak
- type: thematicBreak
- type: yaml
value: ''
myst: |
---
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially the only "known" divergence from the commonmark spec, using the "MyST Plugins", i.e. a thematic break on the first line is treated as the front-matter

Comment on lines -520 to +566
- type: link
url: /f%C3%B6%C3%B6
title: föö
- type: linkReference
children:
- type: text
value: foo
label: foo
identifier: foo
referenceType: shortcut
- type: definition
identifier: foo
label: foo
title: föö
url: /föö
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights two key differences to the current spec:

  1. URLs are not HTML escaped: this makes sense, since it only needs to be done when generating HTML
  2. References are not yet resolved to definitions

(2) Is a more conceptual question: should this "snapshot" of the AST, represent it before or after reference resolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are certainly use cases (such as LSPs) where one would like to retain information regarding the position of definitions and references

@chrisjsewell chrisjsewell requested review from fwkoch and rowanc1 April 5, 2022 08:09
@chrisjsewell chrisjsewell changed the title Reeconcile commonmark spec examples with unified Reconcile commonmark spec examples with unified Apr 5, 2022
Copy link
Member

@rowanc1 rowanc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there aother things to look for, but my first scroll through:

Should we be adding the nulls for checked/start/meta/lang/title?

I don't think that markdownit is good at keeping the spread value.

Should the mdast for list items have paragraphs put in? I suppose the is the default of unified?

Comment on lines 5390 to +5394
children:
- type: text
value: one
- type: paragraph
children:
- type: text
value: one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right from the html ouput.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of paragraph tags is based on spread. I've already implemented this in: https://github.com/chrisjsewell/myst-spec/blob/c7828391501447616e530491f4bae0d16f20ea8b/src/myst_spec_py/mdast_to_html.py#L96-L102

spread: false
children:
- type: listItem
spread: true
spread: false
checked: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include checked in commonmark? Probably not?

It is part of GFM, and null is equivalent to undefined/not there:
https://github.com/syntax-tree/mdast#listitem-gfm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that its probably not ideal, however, it is something baked into the reference implementation of mdast: https://github.com/syntax-tree/mdast-util-from-markdown/blob/0e70e0a937d89f5a0164128b7d2c53b77875d12c/dev/lib/index.js#L1069
This makes it a bit of a pain for creating the reference implementation, either:
we have to get them to remove it from there, or I have to add an extra "special case" step in unified-myst to remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is obviously similar for start/meta/lang/title
and even if they are not included in the examples, the schema should at least accept null values, e.g. here: https://github.com/executablebooks/myst-spec/blob/a8a57d9d88c69f09c3fdc2c7ce7a1d971a223e29/schema/commonmark.schema.json#L187-L194

@rowanc1
Copy link
Member

rowanc1 commented Apr 5, 2022

Looks like we also need to update the json schema, as that test doesn't pass.

@chrisjsewell
Copy link
Contributor Author

chrisjsewell commented Apr 5, 2022

I don't think that markdownit is good at keeping the spread value.
Should the mdast for list items have paragraphs put in? I suppose the is the default of unified?

I think the key question here is: should we be modifying what is essentially the reference implementation of CommonMark MDAST?
i.e. are we essentially saying that https://github.com/syntax-tree/mdast-util-from-markdown, does not produce "correct" https://github.com/syntax-tree/mdast?

What I feel it should not really be, is based off of the "third-party" markdown-it implementation (although obviously we do want that to also comply)

@chrisjsewell
Copy link
Contributor Author

I don't think that markdownit is good at keeping the spread value.

This is possible to achieve, since I've done it 😉 : https://github.com/chrisjsewell/myst-spec/blob/c7828391501447616e530491f4bae0d16f20ea8b/src/myst_spec_py/mdit_to_mdast.py#L108-L114

@chrisjsewell
Copy link
Contributor Author

@rowanc1 @fwkoch, just to check, you do know that https://www.npmjs.com/package/@types/mdast is a thing 😬

@chrisjsewell
Copy link
Contributor Author

chrisjsewell commented Apr 5, 2022

just to check, you do know that npmjs.com/package/@types/mdast is a thing 😬

At least on the subject of nullable fields, it seems that there are discrepancies, between their types, and the ones we distribute here

Copy link
Contributor

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like that definitions are in there now. I agree references should not yet be resolved - this is the case for crossReferences, adding imageReferences and linkReferences seems ok. Also nice to have some clarity on spread vs. paragraphs on list items. Possibly we should update the schema so list item children must be flow content, rather than flow or phrasaing...

For the nulls, checked, other things that are coming from unified mdast, I don't feel too strongly - probably not worth the workaround?

Biggest thing is to just update the JSON schema to reflect the changes here.

@@ -10489,7 +11191,8 @@ cases:
- type: paragraph
children:
- type: link
url: "/url\u00A0\"title\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few picky things in here like non-space whitespace. I think the problem here is that the non-space whitespace isn't actually present in the markup...

@chrisjsewell
Copy link
Contributor Author

possibly we should update the schema so list item children must be flow content, rather than flow or phrasaing...

from @types/mdast

export interface ListItem extends Parent {
    type: 'listItem';
    checked?: boolean | null | undefined;
    spread?: boolean | null | undefined;
    children: Array<BlockContent | DefinitionContent>;
}

@chrisjsewell

This comment was marked as resolved.

@jupyter-book jupyter-book deleted a comment from welcome bot Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants