Skip to content

url: file URL path normalization#35477

Closed
watilde wants to merge 1 commit into
nodejs:masterfrom
watilde:url/file-path-normalization
Closed

url: file URL path normalization#35477
watilde wants to merge 1 commit into
nodejs:masterfrom
watilde:url/file-path-normalization

Conversation

@watilde

@watilde watilde commented Oct 3, 2020

Copy link
Copy Markdown
Member

Refs: whatwg/url#544
Refs: web-platform-tests/wpt#25716
Fixes: #35429

Checklist

@watilde watilde added semver-major PRs that contain breaking changes and should be released in the next major version. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Oct 3, 2020
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Oct 3, 2020
@watilde watilde added the wip Issues and PRs that are still a work in progress. label Oct 3, 2020
@watilde watilde force-pushed the url/file-path-normalization branch 2 times, most recently from 1de4ef4 to b6e0aa2 Compare October 4, 2020 14:33
@CYBAI

CYBAI commented Oct 4, 2020

Copy link
Copy Markdown

This might fix #35429 👀?

@watilde watilde force-pushed the url/file-path-normalization branch from fea5da9 to a843655 Compare October 5, 2020 10:23
@watilde watilde added inspector Issues and PRs related to the V8 inspector protocol policy Issues and PRs related to the policy subsystem. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed wip Issues and PRs that are still a work in progress. labels Oct 5, 2020
@watilde

watilde commented Oct 5, 2020

Copy link
Copy Markdown
Member Author

/cc @nodejs/tsc because semver-major.

@guybedford guybedford left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One pattern for a cross platform URL construction is:

new URL('//' + path.resolve('path'), 'file:')

where that works in both Windows and Posix because the // in the Posix case is deduped.

Under this PR it would resolve to file:////path/to/path and the four /'s will lead to some module system issues.

Ideally this should be corrected specifically for file: URLs in the WhatWG spec, but otherwise we should be renormalizing these in the module system otherwise at least now to avoid these cases I think.

We should also have tests for fileURLToPath(new URL('file:////path')) and pathToFileURL('//path') to verify these new cases too.

"hostname": "spider",
"port": "",
"pathname": "/",
"pathname": "///",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing to me. Why are the / in paths no longer being deduped? I am concerned this will be breaking for modules users who make this mistake.

@watilde watilde added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 5, 2020
@github-actions github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 5, 2020
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

@Trott

Trott commented Oct 5, 2020

Copy link
Copy Markdown
Member

@guybedford You've approved this PR but left a lengthy comment expressing some concerns. Are the concerns things that need to be addressed by the modules team before this makes it into a release? Or is resolution of those issues a prerequisite before landing this change?

@guybedford

Copy link
Copy Markdown
Contributor

@Trott unfortunately I think this has some problems for the modules system as there is the potential of file:////path and file:///path being separate entries in the registry with the knock-on effects of that. So I do think we need to come up with a solution to these cases before landing yes.

One simple approach could be to have the module resolver throw for file://// URLs being returned to avoid these identity issues. That is sort of breaking though although we might get away with it given the experimental status. Verifying some common modules resolution cases in the tests also seems important.

@guybedford guybedford left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a concern that file://// URLs in the module registry lead to duplicate loads. We should likely throw in the resolver or dig into the spec differences here.

@guybedford guybedford added the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Oct 5, 2020
@guybedford

Copy link
Copy Markdown
Contributor

@watilde do you have any further background on why new URL('file:///////') is no longer deduping the leading slashes?

@jasnell

jasnell commented Oct 7, 2020

Copy link
Copy Markdown
Member

I think we should absolutely follow the standard and make this change. Any possible impact to the modules subsystem should be dealt with there. That said, there's nothing saying that we cannot inject a flag that forces a revert back to original behavior if it is absolutely needed.

e.g.

// Implements the new behavior

$ node -pe "new URL('////path/to/resolve', 'file:')"

// Implements the old behavior

$ node --original-url-path-normalization "new URL('////path/to/resolve', 'file:')" 

Yes, this would allow some inconsistency in behavior (similar to --preserve-symlinks) but it gives a usable escape hatch if the new behavior definitely breaks someone.

@bmeck

bmeck commented Oct 7, 2020

Copy link
Copy Markdown
Member

@jasnell I agree we should update to w/e the standard is but there is some lack of clarity on what to do w/ comparison of these URLs with multiple adjacent / in the pathname (we actually have some divergence already due to normalizing joiners of // in the loader due to #35507 ). I think we just need to iron that out before landing this actual PR so that we know expectations. My concern in particular is that we need to document what these overlaps mean so that across the project we have a stable set of interpretations; the URLs are different, but are the resources/files they point to compared or the URLs etc. To my knowledge this is the first known example of handling these cases, which on most FS do treat multiple / as a single / but that is not true on flat filesystems. If we want to tie this PR to the modules impl giving it a check, that seems fine. We have a meeting today with this on the agenda.

@guybedford guybedford left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disussed this in the meeting but weren't able to come to any conclusions there unfortunately.

Discussing further with @bmeck it appears the major module resolution cases already do handle normalization fine, and it is only when using --preserve-symlinks that these resolutions can creep into the module registry. A follow-up patch to apply a path.resolve normalization in the --preserve-symlinks case can fix any of these issues up, but is not on the critical path to landing on this PR as it is a much wider edge case.

In addition I have posted a proposal at whatwg/url#552 for handling deduplication.

This PR likely will still cause breaks for URL consumers in the ecosystem, but it seems moving forward with landing would be best at this point.

@watilde

watilde commented Oct 8, 2020

Copy link
Copy Markdown
Member Author

@guybedford Thank you for your effort to carefully clarify the impact with the team and sharing back the result! That is very helpful feedback and I agreed on that we should follow up for --preserve-symlinks case.

@bmeck Thank you for your participation as well. I'd appreciate if you can update the review status based on the discussion.

Let me cc @nodejs/tsc again for one more review and approval before landing, because this is semver-major as same as #33770. It would be great if we can ship this to v15.

@watilde watilde removed needs-citgm PRs that need a CITGM CI run. modules-agenda Issues and PRs to discuss during the meetings of the Modules team. labels Oct 9, 2020
@bmeck

bmeck commented Oct 9, 2020

Copy link
Copy Markdown
Member

I think if we deduplicate // in --preserve-symlinks it would be fine. That would mean that URLs for file://host//a would never be sent through policies so it would be a non-issue. That plan seems fine as a follow on. If we ever support covering these URLs through the loading pipeline we will have to re-discuss.

@watilde

watilde commented Oct 12, 2020

Copy link
Copy Markdown
Member Author

Landed in bb62f4a

@watilde watilde closed this Oct 12, 2020
@Flarna

Flarna commented Oct 12, 2020

Copy link
Copy Markdown
Member

On master some tests fail which look related to this PR. see e.g. https://ci.nodejs.org/job/node-test-binary-windows-native-suites/5470/#showFailuresLink

@Flarna

Flarna commented Oct 12, 2020

Copy link
Copy Markdown
Member

test/cctest/test_url.cc has a #ifdef _WIN32 section which was not adapted. CI above #35477 (comment) shows the same fails as on master now.

@Flarna Flarna mentioned this pull request Oct 12, 2020
@watilde

watilde commented Oct 12, 2020

Copy link
Copy Markdown
Member Author

@Flarna Thank you for catching that. I'll open PR for tests shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. inspector Issues and PRs related to the V8 inspector protocol policy Issues and PRs related to the policy subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URL: changes to file URL path normalization