Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User namespace #3

Draft
wants to merge 7 commits into
base: rata/sidecar-ordering-annotations-1.17
Choose a base branch
from

Conversation

alban
Copy link
Member

@alban alban commented Jun 18, 2020

It can be used together with kinvolk/containerd-cri#1

vikaschoudhary16 and others added 5 commits June 18, 2020 19:02
…n, User

Initial work by vikaschoudhary16 <[email protected]> on PR 64005.

Rebased on Kubernetes 1.17 by Alban Crequy <[email protected]>.
Conflicts resolution: pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto
was moved to
staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
Initial work by vikaschoudhary16 <[email protected]> on PR 64005 from
two commits:
- Add Node-Level UserNamespace support
- Add HostUserNamespace in the pod spec

Using annotations instead of changing the PodSpec:
  alpha.kinvolk.io/userns = enabled | disabled
So, no feature gate is necessary
Updated with the command:
hack/update-generated-runtime.sh
When the CRI implementation does not support the new gRPC method
GetRuntimeConfigInfo, consider that user namespaces is not enabled.

Just print a log but don't consider it a fatal error. Otherwise, the
Kubelet would crash at startup with the following:

> Kubelet failed to change kubelet pod dir ownership to remapped user:
> error while determining usernamespace configuration at runtime: failed
> to get container runtime info: container runtime info get failed: rpc
> error: code = Unimplemented desc = unknown method GetRuntimeConfigInfo
> for service runtime.v1alpha2.RuntimeService
Examples of kubectl commands with the associated logs in containerd/cri:

```
kubectl apply -f userns-tests/pod-simple.yaml
Namespace options for sandbox "4a03d62fa399fa6134f020d325716c659e9b55c539ca6a9c76f747e6d08d2258": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE,}

kubectl apply -f userns-tests/pod-userns.yaml
Namespace options for sandbox "a723293f5498dcfa30e8e6d619131172502923ff5dd3cf9b745f84d29d9dc199": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE_WIDE_REMAPPED,}
```
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  • I think uint32 should we uses in all places to handle uid/gids. We could return 4294967295 ("no user ID") when there are errors. The mix between int and uint32 could cause some overflow problems.
  • If the annotation is "enabled" and the feature is not available in the runtime, it's safer to fail than ignore the problem.

I still have to do a more detailed run...

Comment on lines +336 to +337
uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)}
gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed to do uint32(0)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. Probably just to be explicit.

pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Show resolved Hide resolved
Comment on lines +1757 to +1765
userns, _ := pod.Annotations["alpha.kinvolk.io/userns"]
klog.V(4).Infof("pod userns setting: %v", userns)
if userns == "enabled" {
if err := kl.chownDirForRemappedIDs(kl.getPodVolumesDir(pod.UID)); err != nil {
kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedMountVolume, "Unable to set ownership on mount volumes for pod %q: %v", format.Pod(pod), err)
klog.Errorf("Unable to chown volumes for pod %q: %v; skipping pod", format.Pod(pod), err)
return err
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this operation has to be done before L1754, otherwise the container could be started before settings the right permissions on the volumes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right. On the other hand, are the directories (and sub-directory) created before SyncPod()?

Could you comment on the upstream PR 64005? Even though this code is different with labels, the structure is the same in the upstream PR.

Comment on lines +1311 to +1320
containerUID := 0
containerGID := 0
uid, err := kl.getHostUID(containerUID)
if err != nil {
return fmt.Errorf("Failed to get remapped host UID corresponding to UID 0 in container namespace: %v", err)
}
gid, err := kl.getHostGID(containerGID)
if err != nil {
return fmt.Errorf("Failed to get remapped host GID corresponding to GID 0 in container namespace: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this logic. Why to calculate it each time if container is always 0 and the result should be the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerUID is zero (root in the userns) but uid is the mapping on the host (e.g. 100000). So we correctly do a os.Chown(file, 100000, 100000) below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the calculation is inside the loop but IIUC it's the same for all the files, so we could do it once.

Comment on lines +1423 to +1425
if err := kl.chownDirForRemappedIDs(kl.getPodsDir()); err != nil {
klog.Fatalf("Kubelet failed to change kubelet pod dir ownership to remapped user: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? I did some tests with these lines commented out and it seems to work fine...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you had a problem with:

Bad owner of /var/lib/kubelet/pods

I think this code aims to fix that, but with a bug. See comment.

}
if fileGID != 0 {
klog.V(5).Infof("GID, %v, for path %v is not equal to 0. Skipping chowing assuming it to be FsGroup GID ", fileGID, file)
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this continue is not what we want: If /var/lib/kubelet/pods is owned by 200000 (because of your previous test with docker and /etc/subuid configured with 200000), then this code skips the chown when you give containerd/cri a try with the 100000 configuration, even though the owner is not correct.

Comment on lines +336 to +337
uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)}
gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. Probably just to be explicit.

pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Outdated Show resolved Hide resolved
pkg/kubelet/dockershim/docker_service.go Show resolved Hide resolved
pkg/kubelet/kubelet_usernamespace_others.go Show resolved Hide resolved
@@ -169,6 +169,15 @@ func (m *kubeGenericRuntimeManager) generatePodSandboxLinuxConfig(pod *v1.Pod) (
}
lc.SecurityContext.NamespaceOptions = namespacesForPod(pod)

if m.runtimeConfig != nil && m.runtimeConfig.IsUserNamespaceSupported() && !userNamespaceDefinedForPod(pod) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise an error:
https://github.com/vikaschoudhary16/community/blob/1cfe64b48107575c98cef4cd321e435482c60293/contributors/design-proposals/node/node-usernamespace-remapping.md#pod-admission-at-kubelet

A pod admit handler will be introduced which will fail the pod admission at Kubelet [...]
If pod spec has hostuserNamespace: false AND usernamespace remapping is NOT enabled at container runtime

But I don't know if it is the right place for this check. Shouldn't it be done before?

containerd/cri now implements user namespaces with either
runtimeapi.NamespaceMode_POD or runtimeapi.NamespaceMode_NODE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants