Skip to content

Updates to support arbitrary channel platforms.#934

Closed
afranchuk wants to merge 6 commits into
mamba-org:masterfrom
afranchuk:master
Closed

Updates to support arbitrary channel platforms.#934
afranchuk wants to merge 6 commits into
mamba-org:masterfrom
afranchuk:master

Conversation

@afranchuk

Copy link
Copy Markdown
Contributor

This closes #794. It removes the list of special/supported platforms, and instead queries the repository with HEAD requests to find the channel root url to determine if a platform is present in the channel url or not. This is cached so it only occurs once per channel root url, assuming that urls won't change from being a channel root to being a platform-specific directory (incredibly unlikely).

@wolfv

wolfv commented May 12, 2021

Copy link
Copy Markdown
Member

Hey, thanks for this PR! Sounds awesome. Do you think it'll be hard to fix the failing tests?

@afranchuk

afranchuk commented May 12, 2021

Copy link
Copy Markdown
Contributor Author

Well, you might notice I had to remove a couple of tests that involved testing with non-existent channel locations. The "breaking" change here is that you do always need to use channels that actually exist, since it doesn't rely on the predefined set of platform names to use as separators when parsing channels. I had the cmake tests working, I guess I missed some of the other test suites :) I can take a look.

@afranchuk

Copy link
Copy Markdown
Contributor Author

I think maybe a few of the micromamba tests are failing because there's no network connection? This brings up a good point, albeit not much of an issue: if you run with --offline, it'll only successfully resolve cached packages if their url roots are cached, however they necessarily will be cached if a package was retrieved before, so --offline should work as expected assuming those files were obtained from a remote place at some point.

Comment thread environment-dev.yml
# - reproc-static
# - yaml-cpp-static
# - xz-static
# - zstd-static

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were all the packages required to build a static micromamba. There were also a few other tweaks I had to do to get it to build. What is your process of building a static micromamba?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these could probably be removed or moved to a different environment file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afranchuk

afranchuk commented May 19, 2021

Copy link
Copy Markdown
Contributor Author

@wolfv Do the umamba tests not run with network access?

If you look at the new code where

WARNING could not find channel root for https://conda.anaconda.org/conda-forge

is emitted, that should only happen if https://conda.anaconda.org/conda-forge/channeldata.json can't be fetched.

@wolfv

wolfv commented May 20, 2021

Copy link
Copy Markdown
Member

@afranchuk the test that is currently failing is a rather basic test for installing from a yaml file ... there should be internet connection.

Could it be an issue with loading the files from cache?

I can re-run the tests to make sure that it's not related to some anaconda.org outage.

@afranchuk

Copy link
Copy Markdown
Contributor Author

It shouldn't be a caching issue, the cache certainly isn't required. Odd, something else must be going on...

@afranchuk

afranchuk commented May 20, 2021

Copy link
Copy Markdown
Contributor Author

It looks like the error is that the SSL certificate is failing verification (curl error 60).

@afranchuk

afranchuk commented May 20, 2021

Copy link
Copy Markdown
Contributor Author

All right, so one thing is that I added a static method to DownloadTarget that needed to call init_curl_ssl (because presumably the DownloadTarget constructor wasn't called prior; maybe init_curl_ssl should be a static method called from elsewhere?). That's fixed, but repo.mamba.pm seems to reply with a 404 for any HEAD requests (which is really odd behavior). I requested in the browser, and simply edited the request to change from GET to HEAD and it gives a 404...

@afranchuk

afranchuk commented May 20, 2021

Copy link
Copy Markdown
Contributor Author

I guess I can change it to a GET, but that is kind of wasteful...

@afranchuk

Copy link
Copy Markdown
Contributor Author

I changed the code to try a GET if the HEAD fails. This should be as fast as we can go, and it will be cached so realistically it will be a one-time cost per (misbehaving) url.

@wolfv

wolfv commented May 21, 2021

Copy link
Copy Markdown
Member

Hi @afranchuk, I had some time to dig deeper into this PR and the functionality. In fact, I'd like to refactor the entire DownloadTarget thing a bit in the future anyways.

I have some thoughts:

I think with this setup here, we still don't have the ability to "layer" platforms, right? One could argue that e.g. "linux-64" and "noarch" are two layered platforms (similar to how channels would be different layers).
I am wondering if we should support this in a general fashion. For example, it might be interesting to say

- c conda-forge[linux-64-static, linux-64, noarch]

and that would fetch those three platforms. On the contrary we might want to explicitly disable noarch. We could either require it with the syntax above (so that -c conda-forge[linux-64] would only fetch the linux-64 platform, or we could use a syntax such as -c conda-forge[linux-64, ~noarch] to explicitly disable the noarch platform).
Syntactically that would also correspond well to the 'compile-time feature' support I've added to boa where you can add / remove compile time features (if configured) with this syntax.

I also wanted to make you aware that with micromamba we already support a platform argument in the .mambarc file, so that you can set a platform-per-env. E.g. you could do micromamba create -p ~/myenv and then echo "platform: osx-64" > ~/myenv/.mambarc to select osx-64 for that particular environment. This is especially useful on the new Apple M1 computers that fully support emulation of osx-64 using Rosetta 2 but obviously there might be some other good use-cases.

We could modify the platform arg here to use a list of platforms using the same syntax as indicated above.

I also want to note that channeldata.json is not supported on all anaconda.org channels, specifically I think it's not supported on those not behind a CDN. E.g. check this one out:

Works: https://conda.anaconda.org/robostack/osx-arm64/ros-noetic-laser-geometry-1.6.7-py38h0184f97_0.tar.bz2
Returns fine for some http: https://conda.anaconda.org/robostack/channeldata.json
But also returns something for https://conda.anaconda.org/robostack/abc.json

Thanks for the heads-up on the failing HEAD requests on repo.mamba.pm. I think the problem is that for these files we redirect with a 301 to an authenticated S3 file location. We can look into that.

However, given that channeldata.json doesn't even fully work on anaconda.org maybe we should require explicitly specifying platforms?

Would you be interested in discussing this PR over video? It might be faster to discuss this by voice?

Sorry for taking so much time to write up these thoughts.

@afranchuk

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback. As far as layering goes, I was just imagining that as -c conda-forge/linux-64 -c conda-forge/linux-64-static (specifying multiple channel+platform combos, just like specifying multiple channels). But I completely agree, I'd like noarch to not be implicit. I think the special case is if you specify a channel with no platforms, in which case noarch and the platform architecture should be implicit. But if you specify a platform, noarch shouldn't be added.

I'm okay with a different syntax for providing platforms, I was just trying to keep it roughly backward-compatible with prior usage. I think one (small) issue with the syntax you suggested is that (at least as you literally suggested) it is not shell-friendly (without quotes). But naturally excluding the spaces makes it a bit nicer to work with (e.g. -c conda-forge[linux-64,noarch] versus -c "conda-forge[linux-64, noarch]"). I don't think a ~noarch should be necessary, if that's the only thing that is implicitly added that feels very one-off. I think if someone is specifying explicit platforms, they probably are capable of adding the noarch when they want it (though I admit that generally most people would want it).

The lack of consistency around channeldata.json is concerning, though. So yeah, though I already plugged ahead here (choosing this approach to maintain command-line compatibility/consistency), maybe discovery is not the best solution (unless there are some more consistent behaviors; I did originally have it trying to find repodata.json, which might be a little better behaved).

Unrelated, I didn't realize that a libcurl GET by default writes to stdout (though in retrospect that makes sense), which is what caused the tests to fail again. Looks like it did correctly identify the repositories this time around though, the extra output just confused the final test checks. I pushed a fix to silence that.

@afranchuk

Copy link
Copy Markdown
Contributor Author

Yeah, discussing directly may be a bit faster. But given the inconsistency of server behavior, it does seem like explicitly providing the platforms with additional syntax ("breaking" command-line API for the old channel/platform specification) may be the best option.

@afranchuk

Copy link
Copy Markdown
Contributor Author

@wolfv looks like it's working now, but to work better we have two options:

  1. Look for repodata.json instead of channeldata.json. repodata.json must exist, though I have found that sometimes it doesn't parse as valid JSON! But we don't need to parse it (I just wanted to for deriving the subdir more accurately), or
  2. Scrap all of this and change the command-line parsing to account for platforms like you described. This may be preferable in that it doesn't need a network connection, however to do it right I think a handful of things could do with a refactor along the way (which is good, but will be a little more work).

@wolfv

wolfv commented Jun 2, 2021

Copy link
Copy Markdown
Member

Hi @afranchuk to be honest, I lean towards the explicit option. Would you be OK with that change? I think it would be a very awesome feature!

@afranchuk

Copy link
Copy Markdown
Contributor Author

Yeah, I'm working on it right now.

@wolfv

wolfv commented Jun 2, 2021

Copy link
Copy Markdown
Member

awesome!

@afranchuk

Copy link
Copy Markdown
Contributor Author

@wolfv Sorry, for the last few weeks I've had to put this on the backburner, even though I had it basically done 3 weeks ago. I've made a different branch and so there's a different PR to better capture discussion at #1033.

@afranchuk afranchuk closed this Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom platform support

3 participants