-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_interfaces_by_mac_on_linux: RuntimeError: duplicate mac found (driver: mlx5_core) #5794
Comments
Can you provide more information on how to do this? Is there a way I can specify an SR-IOV bonded device at launch time? Is it inherent to a certain instance shape? If you use specific CLI launch args or options in the web interface, that would be helpful.
We currently workaround these types of devices in cloud-init on other platforms. We would need to adapt similar code for Oracle's platform.
It'd be very helpful to get access to |
In terms of reproducing on your side, this is running on a PCA (Private Cloud Appliance) for development, so not yet available broadly. PCA is basically a mini OCI in a rack that customers can purchase to run on-premises with OCI compatible API. So, looking at the serial console, we have this: (init-local)
and later this:
Then I disconnect the SR-IOV interfaces, managed to reboot the instance properly, and ran Thank you 🙂 |
Also, it seems like adding |
A patch like this might resolve this issue. jeremy@jeremy-lx:~/dev/cloud-init$ git diff e10b09be321b81f82f1a2cb3b3724deedfefe9ff
diff --git a/cloudinit/net/__init__.py b/cloudinit/net/__init__.py
index 78b15a47b..dfd02f087 100644
--- a/cloudinit/net/__init__.py
+++ b/cloudinit/net/__init__.py
@@ -971,7 +971,7 @@ def get_interfaces_by_mac_on_linux() -> dict:
# cloud-init happens to enumerate network interfaces before drivers
# have fully initialized the leader/subordinate relationships for
# those devices or switches.
- if driver in ("fsl_enetc", "mscc_felix", "qmi_wwan"):
+ if driver in ("fsl_enetc", "mscc_felix", "qmi_wwan", "mlx5_core"):
LOG.debug(
"Ignoring duplicate macs from '%s' and '%s' due to "
"driver '%s'.",
diff --git a/tests/unittests/test_net.py b/tests/unittests/test_net.py
index 590061e03..9924a296e 100644
--- a/tests/unittests/test_net.py
+++ b/tests/unittests/test_net.py
@@ -5249,7 +5249,8 @@ class TestGetInterfacesByMac:
assert expected == result
-@pytest.mark.parametrize("driver", ("mscc_felix", "fsl_enetc", "qmi_wwan"))
+@pytest.mark.parametrize("driver", ("mscc_felix", "fsl_enetc", "qmi_wwan",
+ "mlx5_core"))
@mock.patch("cloudinit.net.get_sys_class_path")
@mock.patch("cloudinit.util.system_info", return_value={"variant": "ubuntu"})
class TestDuplicateMac: I couldn't push my branch to origin, it seems like I am not allowed 🙂 |
Yes, your patch is essentially ignoring one of the duplicates but configuring the other, which is unideal as you mention. We dealt with a similar issue on Azure where there was similar ignoring of 'mlx5_core', but it eventually evolved into this: #2153 . The solution doesn't work for you because it is on a different hypervisor, but I'd think the solution could look similar but using the driver name as surfaced in your cloud.
Correct. If you're looking to submit a PR, you need to fork the repo, push a branch to your remote, and then create a PR against the Canonical main branch. |
Thank you, I will have a look at #2153 . Also, is this bug still incomplete? I still see the incomplete label. I couldn't find a way to remove it, as it is my understanding that it is now not missing any information 🙂 |
Sorry, removed the incomplete label. |
Bug report
We are creating an instance with multiple network interfaces with the same MAC address on purpose because they are part of the same SR-IOV bond, but cloud-init code throws an exception.
Steps to reproduce the problem
Create an instance connected with 2 or more Mellanox CX5 or CX6 SR-IOV virtual functions with the same MAC address. The driver is mlx5_core.
Environment details
cloud-init logs
We want those ens5 and ens6 Mellanox SR-IOV / virtual function interfaces to be ignored, as a custom script will configure bonding, what would be the best solution for this within the cloud-init framework?
The workaround today is to only attach those SR-IOV interfaces after the first boot, but this problem occurs if attached at first boot.
Also, I couldn't run collect-logs because I couldn't log in to the instance since the cloud-init process was stopped by this problem.
Thank you,
Jeremy
The text was updated successfully, but these errors were encountered: