-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User namespace #3
base: rata/sidecar-ordering-annotations-1.17
Are you sure you want to change the base?
Conversation
…n, User Initial work by vikaschoudhary16 <[email protected]> on PR 64005. Rebased on Kubernetes 1.17 by Alban Crequy <[email protected]>. Conflicts resolution: pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto was moved to staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
Initial work by vikaschoudhary16 <[email protected]> on PR 64005 from two commits: - Add Node-Level UserNamespace support - Add HostUserNamespace in the pod spec Using annotations instead of changing the PodSpec: alpha.kinvolk.io/userns = enabled | disabled So, no feature gate is necessary
Updated with the command: hack/update-generated-runtime.sh
When the CRI implementation does not support the new gRPC method GetRuntimeConfigInfo, consider that user namespaces is not enabled. Just print a log but don't consider it a fatal error. Otherwise, the Kubelet would crash at startup with the following: > Kubelet failed to change kubelet pod dir ownership to remapped user: > error while determining usernamespace configuration at runtime: failed > to get container runtime info: container runtime info get failed: rpc > error: code = Unimplemented desc = unknown method GetRuntimeConfigInfo > for service runtime.v1alpha2.RuntimeService
Examples of kubectl commands with the associated logs in containerd/cri: ``` kubectl apply -f userns-tests/pod-simple.yaml Namespace options for sandbox "4a03d62fa399fa6134f020d325716c659e9b55c539ca6a9c76f747e6d08d2258": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE,} kubectl apply -f userns-tests/pod-userns.yaml Namespace options for sandbox "a723293f5498dcfa30e8e6d619131172502923ff5dd3cf9b745f84d29d9dc199": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE_WIDE_REMAPPED,} ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments:
- I think uint32 should we uses in all places to handle uid/gids. We could return
4294967295
("no user ID") when there are errors. The mix between int and uint32 could cause some overflow problems. - If the annotation is "enabled" and the feature is not available in the runtime, it's safer to fail than ignore the problem.
I still have to do a more detailed run...
uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} | ||
gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really needed to do uint32(0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. Probably just to be explicit.
userns, _ := pod.Annotations["alpha.kinvolk.io/userns"] | ||
klog.V(4).Infof("pod userns setting: %v", userns) | ||
if userns == "enabled" { | ||
if err := kl.chownDirForRemappedIDs(kl.getPodVolumesDir(pod.UID)); err != nil { | ||
kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedMountVolume, "Unable to set ownership on mount volumes for pod %q: %v", format.Pod(pod), err) | ||
klog.Errorf("Unable to chown volumes for pod %q: %v; skipping pod", format.Pod(pod), err) | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this operation has to be done before L1754, otherwise the container could be started before settings the right permissions on the volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds right. On the other hand, are the directories (and sub-directory) created before SyncPod()
?
Could you comment on the upstream PR 64005? Even though this code is different with labels, the structure is the same in the upstream PR.
containerUID := 0 | ||
containerGID := 0 | ||
uid, err := kl.getHostUID(containerUID) | ||
if err != nil { | ||
return fmt.Errorf("Failed to get remapped host UID corresponding to UID 0 in container namespace: %v", err) | ||
} | ||
gid, err := kl.getHostGID(containerGID) | ||
if err != nil { | ||
return fmt.Errorf("Failed to get remapped host GID corresponding to GID 0 in container namespace: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this logic. Why to calculate it each time if container is always 0 and the result should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerUID
is zero (root in the userns) but uid
is the mapping on the host (e.g. 100000). So we correctly do a os.Chown(file, 100000, 100000)
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that the calculation is inside the loop but IIUC it's the same for all the files, so we could do it once.
if err := kl.chownDirForRemappedIDs(kl.getPodsDir()); err != nil { | ||
klog.Fatalf("Kubelet failed to change kubelet pod dir ownership to remapped user: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed? I did some tests with these lines commented out and it seems to work fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you had a problem with:
Bad owner of /var/lib/kubelet/pods
I think this code aims to fix that, but with a bug. See comment.
} | ||
if fileGID != 0 { | ||
klog.V(5).Infof("GID, %v, for path %v is not equal to 0. Skipping chowing assuming it to be FsGroup GID ", fileGID, file) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this continue
is not what we want: If /var/lib/kubelet/pods
is owned by 200000
(because of your previous test with docker and /etc/subuid
configured with 200000
), then this code skips the chown
when you give containerd/cri
a try with the 100000
configuration, even though the owner is not correct.
uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} | ||
gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. Probably just to be explicit.
@@ -169,6 +169,15 @@ func (m *kubeGenericRuntimeManager) generatePodSandboxLinuxConfig(pod *v1.Pod) ( | |||
} | |||
lc.SecurityContext.NamespaceOptions = namespacesForPod(pod) | |||
|
|||
if m.runtimeConfig != nil && m.runtimeConfig.IsUserNamespaceSupported() && !userNamespaceDefinedForPod(pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should raise an error:
https://github.com/vikaschoudhary16/community/blob/1cfe64b48107575c98cef4cd321e435482c60293/contributors/design-proposals/node/node-usernamespace-remapping.md#pod-admission-at-kubelet
A pod admit handler will be introduced which will fail the pod admission at Kubelet [...]
If pod spec has hostuserNamespace: false AND usernamespace remapping is NOT enabled at container runtime
But I don't know if it is the right place for this check. Shouldn't it be done before?
containerd/cri now implements user namespaces with either runtimeapi.NamespaceMode_POD or runtimeapi.NamespaceMode_NODE.
It can be used together with kinvolk/containerd-cri#1