-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pkg + BinaryProvider #841
Comments
Okay, let's get started on the first bullet point of this list; defining a
|
I guess we can create an |
Is the idea to download a |
How about just calling it |
Ok, and theses types of nodes will be mostly indistinguishable until we hit what is currently |
Sounds reasonable to me; I'd be happy to discuss this further and nail down more of an implementation plan during the Pkg call tomorrow? |
Version constraints are against the version of the library, not the version of the thing that builds the library. But you want to be able to lock down a specific build of a library. But a specific build is completely platform-specific. There are some layers of versioning:
Is this correct and complete? The artifact identity should be completely determined by some "system properties" tuple that captures all the things that determine which artifact generated by a build script one needs. The end user mostly only needs to care about the library version, which is what determines its API and therefore usage. There might, however, be situations where one needs compatibility constraints on both the library version and the build script version: e.g. an older build was configured in some way that makes the resulting artifact unusable in certain ways. |
Does a given version of a build script always produce just a single version of a given library? |
How would this work with packages that use BinaryProvider but fall back to compiling from source if a working binary is not available (typically for less-popular Linux distros)? e.g. ZMQ or Blosc IIRC. You need some kind of optional-dependency support, it seems, or support for a source “platform”. |
For building from source, we will support it manually by allowing users to I do not think we should ever build from source automatically. Looking at ZMQ, it looks like you have full platform coverage; under what circumstances are you compiling? |
Another example to add to Steven's list is SpecialFunctions, which falls back to BinDeps when a binary isn't available from BinaryProvider. Once upon a time that was used on FreeBSD, before we had FreeBSD support in BinaryProvider, but now I don't know when it's used aside from on demand on CI. |
We needed it on CentOS, for example (JuliaInterop/ZMQ.jl#176), because of JuliaPackaging/BinaryBuilder.jl#230. There are an awful lot of Unix flavors out there, and it's nice to have a compilation fallback. |
Regardless of the many UNIX variations, the only things you really need are the right executable format and the right libc, which we can pretty much cover at this point. |
And the right (This was why I had to enable source builds for ZMQ and Blosc. Are we confident that this is fixed, or are we happy to go back to breaking installs for any package that calls a C++ library?) |
I think our |
Should JuliaPackaging/BinaryBuilder.jl#230 be closed then? |
Yes I think so. |
I'm very supportive in managing the binary artifacts by Pkg. I'd just like to point out that the implementation of library loading should be flexible enough to include some strategy for AOT compilation and deployment (to a different computer). The app deployed to a different computer will have to load libraries from different locations and the hardcoding of paths in deps.jl makes this pretty difficult, see JuliaPackaging/BinaryProvider.jl#140. The best way would be either not have |
Yes, that's the plan: you declare what you need, referring to it by platform-independent identity instead of generating it explicitly and then hardcoding its location, instead allowing Pkg to figure out the best way to get you what you need and telling you where it is. |
Progress! There is some code behind this post, and other things remain vaporware, with the aspiration of striking up some discussion on whether these are the aesthetics we want.
name = "JpegTurbo_jll"
uuid = "7e164b9a-ae9a-5a84-973f-661589e6cf70"
version = "2.0.1"
[artifacts.arm-linux-gnueabihf]
hash = "45674d19e63e562be8a794249825566f004ea194de337de615cb5cab059e9737"
url = "https://github.com/JuliaPackaging/Yggdrasil/releases/download/JpegTurbo-v2.0.1/JpegTurbo.v2.0.1.arm-linux-gnueabihf.tar.gz"
[artifacts.arm-linux-gnueabihf.products]
djpeg = "bin/djpeg"
libjpeg = "lib/libjpeg.so"
libturbojpeg = "lib/libturbojpeg.so"
jpegtran = "bin/jpegtran"
cjpeg = "bin/cjpeg"
[artifacts.i686-w64-mingw32]
hash = "c2911c98f9cadf3afe84224dfc509b9e483a61fd4095ace529f3ae18d2e68858"
url = "https://github.com/JuliaPackaging/Yggdrasil/releases/download/JpegTurbo-v2.0.1/JpegTurbo.v2.0.1.i686-w64-mingw32.tar.gz"
[artifacts.i686-w64-mingw32.products]
djpeg = "bin/djpeg.exe"
libjpeg = "bin/libjpeg-62.dll"
libturbojpeg = "bin/libturbojpeg.dll"
jpegtran = "bin/jpegtran.exe"
cjpeg = "bin/cjpeg.exe"
...
# LibFoo_jll/src/LibFoo_jll.jl
# Autogenerated code, do not modify
module LibFoo_jll
using Libdl
# Chain other dependent jll packages here, as necessary
using LibBar_jll
# This is just the `artifacts` -> platform_key() -> `products` mappings embedded in `Artifact.toml` above
const libfoo = abspath(joinpath(@__DIR__, "..", "deps", "usr", "lib", "libfoo.so"))
const fooifier = abspath(joinpath(@__DIR__, "..", "deps", "usr", "bin", "fooifier"))
# This is critical, as it allows a dependency that `libfoo.so` has on `libbar.so` to be satisfied.
# It does mean that we pretty much never dlclose() things though.
handles = []
function __init__()
# Explicitly link in library products so that we can construct a necessary dependency tree
for lib_product in (libfoo,)
push!(handles, Libdl.dlopen(lib_product))
end
end
end Example Julia package client code: # LibFoo.jl/src/LibFoo.jl
import LibFoo_jll
function fooify(a, b)
return ccall((:fooify, LibFoo_jll.libfoo), Cint, (Cint, Cint), a, b)
end
... |
I like it in general. I'll have to think for a bit about the structure of the artifacts file. There's a consistent compression scheme used by Do you think I think we'll eventually want to teach |
I am actively shying away from teaching Pkg/Base too much about dynamic libraries; it's a deep rabbit hole. In this proposal I'm even not baking in the platform-specific library searching awareness (e.g. "look for libraries in On the other hand, I would like it if It would be nice if we could do things like search for packages that contain |
I'm not entirely sure what you mean by this, but I will await your instruction. I have no strong opinions over the |
This automatic wrapper generation with const assigning the absolute path is exactly the the thing that prevents AOT with deployment to a different computer. So during AOT PackageCompiler will need to modify every single artifact_wrapper_jlpackage to get rid of the baked-in absolute path. If the code is auto-generated, why cannot this functionality be part of some function or macro-call that would open the handles and generate the const paths on-the-fly? In that case PackageCompiler could just pre-collect all the artifact to a "deployment depot" and let the And is the constantness of the lib path really necessarily for efficient |
That's what I'm saying is wrong; you're saying "if I delete |
Yeah, there's no good reason to support that. I'm also having trouble coming up with realistic scenarios where you need to clean out packages but not artifacts or vice versa. But the operation proceeds in two fairly separate phases:
You can do one or the other independently and not break things, or one then the other which should be the default and cleans up the most space. |
One thing that I really like about this new approach that occurred to me is that by not having artifacts inside of packages, it allows artifacts to live in different |
Right, ok, I had the picture wrong in my head.
That is nice. The DataDeps way of doing the same is a bit scary and unsafe. and kind encourages being unsafe (will probably have to change it eventually. I am now super sold on this whole naming things using their SHA); DataDeps just uses the name. Ok cool things are much clearer now. |
They don't need UUIDs or versions because they're content-addressed. You don't really care if one
Oops, I meant "depots" not "repos".
This comment was about keeping metadata about artifacts around after they're installed so that you know what the SHA etc. was. I'm not really sure about how to structure the thing that goes at |
I'm removing the "speculative" label because this is getting pretty concrete at this point. Some updates from Slack discussion:
|
The advantage of the dict approach is that it is more extensible should additional keys be required in future. |
I’m very willing to use a dict based approach. There’s no inherent advantage to the string format other than compactness (and ability to fit within a filename) but living within the Artifact.toml, if we have access to richer data structures we should just use them. |
Great work on the design. I want to bring up a point about build variants, that I was thinking about. Curious about your thoughts. If I understand correctly, the |
So latest sketch of the way [dataset-A]
git-tree-sha1 = "e445efb1f3e2bffc06e349651f13729e6f7aeaaf"
basename = "dataset-A.csv"
[dataset-A.download]
sha256 = "b2ebe09298004f91b988e35d633668226d71995a84fbd12fea2b08c1201d427f"
url = [ # multiple URLs to try
"https://server1.com/path/to/dataset.csv",
"https://server2.com/path/to/dataset.csv",
]
[nlp-model-1]
git-tree-sha1 = "dccae443aeddea507583c348d8f082d5ed5c5e55"
basename = "nlp-model-1.onnx"
[[nlp-model-1.download]] # multiple ways to download
sha256 = "5dc925ffbda11f7e87f866351bf859ee7cbe8c0c7698c4201999c40085b4b980"
url = "https://server1.com/nlp-model-1.onnx.gz"
extract = "gzip" # decompress file
[[nlp-model-1.download]]
sha256 = "9f45411f32dcc332331ff244504ca12ee0b402e00795ab719612a46b7fb24216"
url = "https://server2.com/nlp-model-1.onnx"
[[libfoo]]
git-tree-sha1 = "05d42b0044984825ae286ebb9e1fc38ed2cce80a"
os = "Linux"
arch = "armv7l"
[libfoo.download]
sha256 = "19e7370ab1819d45c6126d5017ba0889bd64869e1593f826c6075899fb1c0a38"
url = "https://server.com/libfoo/Linux-armv7l/libfoo-1.2.3.tar.gz"
extract = ["gzip", "tar"] # outermost first or last?
[[libfoo]]
git-tree-sha1 = "c2dc12a509eec2236e806569120e72058579ba19"
os = "Windows"
arch = "i686"
[libfoo.download]
sha256 = "95683bb088e35743966d1ea8b242c2694b57155c8084a406b29aecd81b4b6c92"
url = "https://server.com/libfoo/Windows-i686/libfoo-1.2.3.zip"
extract = "zip"
[[libfoo]]
git-tree-sha1 = "d633f5f44b06d810d75651a347cae945c3b7f23d"
os = "macOS"
arch = "x86_64"
[libfoo.download]
sha256 = "b65f08c0e4d454e2ff9298c5529e512b1081d0eebf46ad6e3364574e0ca7a783"
url = "https://server.com/libfoo/macOS-x86_64/libfoo-1.2.3.xz"
extract = ["xz", "tar"] Some features of this sketch:
|
I'm not so sure if the |
Part of the download seems right. Edit: Oh but we migth want to allowed |
Maybe call it [dataset-A]
git-tree-sha1 = "e445efb1f3e2bffc06e349651f13729e6f7aeaaf"
[dataset-A.download]
basename = "dataset-A.csv"
sha256 = "b2ebe09298004f91b988e35d633668226d71995a84fbd12fea2b08c1201d427f"
url = [ # multiple URLs to try
"https://server1.com/path/to/dataset.csv",
"https://server2.com/path/to/dataset.csv",
]
[nlp-model-1]
git-tree-sha1 = "dccae443aeddea507583c348d8f082d5ed5c5e55"
[[nlp-model-1.download]] # multiple ways to download
basename = "nlp-model-1.onnx"
sha256 = "5dc925ffbda11f7e87f866351bf859ee7cbe8c0c7698c4201999c40085b4b980"
url = "https://server1.com/nlp-model-1.onnx.gz"
extract = "gzip" # decompress file
[[nlp-model-1.download]]
basename = "nlp-model-1.onnx"
sha256 = "9f45411f32dcc332331ff244504ca12ee0b402e00795ab719612a46b7fb24216"
url = "https://server2.com/nlp-model-1.onnx"
[[libfoo]]
git-tree-sha1 = "05d42b0044984825ae286ebb9e1fc38ed2cce80a"
os = "Linux"
arch = "armv7l"
[libfoo.download]
sha256 = "19e7370ab1819d45c6126d5017ba0889bd64869e1593f826c6075899fb1c0a38"
url = "https://server.com/libfoo/Linux-armv7l/libfoo-1.2.3.tar.gz"
extract = ["gzip", "tar"] # outermost first or last?
[[libfoo]]
git-tree-sha1 = "c2dc12a509eec2236e806569120e72058579ba19"
os = "Windows"
arch = "i686"
[libfoo.download]
sha256 = "95683bb088e35743966d1ea8b242c2694b57155c8084a406b29aecd81b4b6c92"
url = "https://server.com/libfoo/Windows-i686/libfoo-1.2.3.zip"
extract = "zip"
[[libfoo]]
git-tree-sha1 = "d633f5f44b06d810d75651a347cae945c3b7f23d"
os = "macOS"
arch = "x86_64"
[libfoo.download]
sha256 = "b65f08c0e4d454e2ff9298c5529e512b1081d0eebf46ad6e3364574e0ca7a783"
url = "https://server.com/libfoo/macOS-x86_64/libfoo-1.2.3.xz"
extract = ["xz", "tar"] |
I think we need more thought. What is so It only should matter for things that are not tarballs or zips. Or at least I am not sure what it will do in those cases. Understanding more how in interacts with
Are we thinking that tarballs extract to become 1 folder and we the rename that folder? |
|
Idea: [dataset-A.download]
sha256 = "b2ebe09298004f91b988e35d633668226d71995a84fbd12fea2b08c1201d427f"
url = [ # multiple URLs to try
"https://server1.com/path/to/dataset.csv",
"https://server2.com/path/to/dataset.csv",
]
extract = { rename: "dataset.csv" } That's not quite right though since I don't think you can put a dict in an array. |
That is what I was saying. |
Only took me four days for the same thing to occur to me 😁 |
With the stern rule rename always occurs after extract, and that omitting either results in identity/noop. |
Having thought about this for a bit, I am uncomfortable with the coupling between I think I would rather have extraction only be an option in the well-defined case; where we have a container (like a For more complex usecases, I think I would rather push this off onto a more advanced Pkg concept, which I have helpfully written up a big "thing" about over here: #1234 (whooo I got a staircase issue number! Lucky day!). Even if that's not something we want in Pkg, I still think restricting the flexibility here is going to help us keep a sane, simple design. |
Making When it comes to extraction, we should be very strict about how extraction is allowed: it should only ever produce files under the target location. I know some archive formats allow other destinations, which we should make sure to prevent. |
Yeah, I like
I want to make sure that extraction can work everywhere, right now with |
1277: Add Artifacts to Pkg r=StefanKarpinski a=staticfloat This adds the artifacts subsystem to Pkg, [read this WIP blog post](https://github.com/JuliaLang/www.julialang.org/pull/417/files?short_path=514f74c#diff-514f74c34d50677638b76f65d910ad17) for more details. Closes #841 and #1234. This PR still needs: - [x] A `pkg> gc` hook that looks at the list of projects that we know about, examines which artifacts are bound, and marks all that are unbound. Unbound artifacts that have been continuously unbound for a certain time period (e.g. one month, or something like that) will be automatically reaped. - [x] Greater test coverage (even without seeing the codecov report, I am certain of this), especially as related to the installation of platform-specific binaries. - [x] `Overrides.toml` support for global overrides of artifact locations Co-authored-by: Elliot Saba <[email protected]>
Let's talk about the possible merging of BinaryProvider and Pkg, to integrate the binary installation story to unheard-of levels. Whereas:
I suggest that we do away with the weird indirection we currently have with packages using
build.jl
files to download tarballs, and instead integrate these downloads into Pkg completely. This implies that we:Create a new concept within Pkg, that of a Binary Artifact. The main difference between a Binary Artifact and a Package is that Packages are platform-independent, Binary Artifacts are necessarily not so. We would need to load over the same kind of platform-matching code as is in BP right now, e.g. dynamically choosing the most specific matching tarball based on the currently running Julia. (See
choose_download()
within BP for more).Modify BinaryBuilder output to generate Binary Artifacts that are then directly imported into the General Registry. The Binary Artifacts contain within them a small amount of Julia code; things like setting environment variables, mappings from
LibraryProduct
to actual.so
file, functions to run anExecutableProduct
, etc... This is all auto-generated by BinaryBuilder.Change client packages to simply declare a dependency upon these Binary Artifacts when they require a library. E.g.
FLAC.jl
would declare a dependency uponFLAC_jll
, which itself declares a dependency uponOgg_jll
, and so on and so forth.Eliminate the
Pkg.build()
step for these packages, as the build will be completed by the end of the download step. (We can actually just bake thedeps.jl
file into the Binary Artifact, as we are using relative paths anyway)Please discuss.
The text was updated successfully, but these errors were encountered: