-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ip-reconciler: Use ContainerID instead of PodRef #180
base: master
Are you sure you want to change the base?
ip-reconciler: Use ContainerID instead of PodRef #180
Conversation
5f03ce5
to
4b04987
Compare
@dougbtv @maiqueb Can you please review? One of the failing testcases did not make sense, but I fixed it regardless by using container ID. It is negative testcase that checks whether the ip-reconciler fails to delete a reservation because the podRef name does not match. But this could never be the actual case, because the orphanedIP Reservation object is formed from the IPPool itself. I instead changed the testcase so now the podRef names are the same, but the container ID is different, because my fix changes the MatchingFunc to use containerID instead of podName. |
a48b24c
to
8d22508
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this PR to my to-do list, I'll get back to you next week.
What about the unit test currently failing on this PR ?
I think you need to add the containerID
to this remainingAllocation
allocation map
whereabouts/cmd/reconciler/ip_test.go
Line 159 in 55be906
remainingAllocation := map[string]v1alpha1.IPAllocation{ |
EDIT: couldn't help noticing there's also something wrong with the vendoring ... You'll need to sort that out as well.
cmd/reconciler/ip_test.go
Outdated
@@ -289,7 +291,8 @@ func generateIPPoolSpec(ipRange string, namespace string, poolName string, podNa | |||
allocations := map[string]v1alpha1.IPAllocation{} | |||
for i, podName := range podNames { | |||
allocations[fmt.Sprintf("%d", i+1)] = v1alpha1.IPAllocation{ | |||
PodRef: fmt.Sprintf("%s/%s", namespace, podName), | |||
PodRef: fmt.Sprintf("%s/%s", namespace, podName), | |||
ContainerID: strconv.Itoa(rand.Intn(1000)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run gofmt
before pushing; there's something wrong with the formatting on this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks good; I honestly do not understand right now why I hadn't matched using the containerID instead.
Let's wait for the follow up work :)
Good catch.
pkg/reconciler/iploop.go
Outdated
func findOutPodRefsToDeallocateIPsFrom(orphanedIP OrphanedIPReservations) []string { | ||
var podRefsToDeallocate []string | ||
func findOutContainerIDsToDeallocateIPsFrom(orphanedIP OrphanedIPReservations) []string { | ||
var cidsToDeallocate []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename this variable. All the time I've seen cid
my brain translated it to CIDR
and I couldn't understand what was happening. I suggest using containerID
, which makes it crystal clear.
cmd/reconciler/ip_test.go
Outdated
@@ -289,7 +291,8 @@ func generateIPPoolSpec(ipRange string, namespace string, poolName string, podNa | |||
allocations := map[string]v1alpha1.IPAllocation{} | |||
for i, podName := range podNames { | |||
allocations[fmt.Sprintf("%d", i+1)] = v1alpha1.IPAllocation{ | |||
PodRef: fmt.Sprintf("%s/%s", namespace, podName), | |||
PodRef: fmt.Sprintf("%s/%s", namespace, podName), | |||
ContainerID: strconv.Itoa(rand.Intn(1000)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@TothFerenc would you review this PR as well, since you're introduced the pod ref attribute to the model. I'm also thinking of replacing the whole reconciler pod / cron by a controller that would listen to pod deletions, and assure the allocations for the deleted pods are indeed gone. If / once this route holds, we can drop the reconciler, and, for instance, merge your bash-based alternative for it. |
8d22508
to
f7d884c
Compare
f7d884c
to
f5a64e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I worry about this is upgrade: let's say you have a running cluster with an existing IPPool. We then upgrade whereabouts to store the allocations (and garbage collect them) based on the container ID. Wouldn't we garbage collect the entire pool ?
My point is... don't we need some sort of an upgrade out of band tool to re-generate the pool, porting it to the "new" model ?
BTW: you should rebase this with latest master. It will fix the unit test issue you're currently facing, and give us results. |
@maiqueb The storage model doesnt change nor the logic to determine what is a stale reservation in first place (that still uses podRef, since getting the pause Container ID from a Pod resource is impossible). This is just the matching function that looks for reservations to delete AFTER on the already computed orphaned reservations list. The bug was that it tried to match an already computed orphaned reservation to a IP reservation in the IPPool and used the podRef in the matching function We've already upgraded and used this fix, FYI in our custom whereabouts release. |
@xagent003 do you know if the reconciliation control loop covers this one appropriately? It might. If it doesn't, mind rebasing it and we can take another look? Thanks! |
Fixes issue: #176