Replies: 3 comments 6 replies
-
@emoek my eye sight reading is poor :D Let's run the program to see if the correlation is strong import prometheus_client
import requests
import numpy as np
import datetime
# Define the Prometheus instance URL and the gauges to retrieve
PROMETHEUS_URL = 'http://localhost:9090'
GAUGES = ['kepler_container_package_joules_total', 'kepler_container_cpu_instructions_total']
# Define the container name for filtering the gauges
CONTAINER_NAME = 'kepler-exporter'
# Define the Prometheus query to use for retrieving the gauges
QUERY = 'sum(rate({gauge}{{container_name=~"{container_name}"}}[1m]))'
# set end of query to current time, and start of query to 1 hour ago using the following format
# start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s
end_time = datetime.datetime.now().isoformat() + 'Z'
start_time = (datetime.datetime.now() - datetime.timedelta(hours=1)).isoformat() + 'Z'
# Scrape the Prometheus instance and retrieve the gauges
gauges = {}
for gauge in GAUGES:
query = QUERY.format(gauge=gauge, container_name=CONTAINER_NAME)
# create the query string
query = f'{PROMETHEUS_URL}/api/v1/query_range?query={query}&start={start_time}&end={end_time}&step=15s'
response = requests.get(query)
values = [float(sample[1]) for sample in response.json()['data']['result'][0]['values']]
# print(f'Gauge {gauge}: {values}')
gauges[gauge] = values
# Calculate the correlation between the two gauges
joules = gauges['kepler_container_package_joules_total']
instructions = gauges['kepler_container_cpu_instructions_total']
correlation = np.corrcoef(joules, instructions)[0, 1]
print(f'Correlation between kepler_container_package_joules_total and kepler_container_cpu_instructions_total: {correlation}') My output is Correlation between kepler_container_package_joules_total and kepler_container_cpu_instructions_total: 0.8485091316695434 |
Beta Was this translation helpful? Give feedback.
-
What an interesting discussion here! Thanks @emoek for bringing it up. @rootfs Is Kepler really making the assumption that CPU Instructions and Energy are necessarily linearly and also necessarily proportional connected? given the following example:
Would it not to be expected that there is no proportionality or correlation between these two? Since some instructions are just more costly than others? To my understanding the value of instructions is primarily chosen because it is the causal factor of activity that allows for process splitting but not because it is a proxy for energy. The correlation to energy can only be derived if we have the type of instruction (load, mul, add, SIMD etc.). So even making a statistical reasoning for it is no proof one way or the other if the base assumption is that there is another currently hidden variable that joins the two variables. So in summary: If Instructions is a good or bad value cannot be reasoned about by just comparing Instructions to energy. And it is also not what is the design intention. Is that understanding correct? |
Beta Was this translation helpful? Give feedback.
-
@emoek, thank you for the discussion. The dynamic power consumption is directly related to resource utilization. However, determining CPU utilization can vary based on the type of instructions and cache operations. In practice, CPU cycles typically exhibit a better correlation than CPU instructions because it also includes cache operations. Nonetheless, we expect a high correlation between the CPU package power and the container instructions or cycles as we saw in our experiments and many other research papers. That is, both instructions or cycles should show a good correlation with the CPU power consumption. Regarding your experiment:
|
Beta Was this translation helpful? Give feedback.
-
Looking at the documentation of Kepler, I understand that it builds upon the premise that resource utilization and power consumption are linear proportional. With this premise, per default using the instructions as splitting parameter, Kepler is attributing the power to the containers. However, looking at the data in Grafana as exemplarily visible in the attached figure, I can not recognize any relation between the two metrics. While the CPU instructions seem to follow a specific pattern, the Power Consumption seems to be kind of random.
Now, I am asking myself if I have understand Kepler wrong or if I am missing something. Would appreciate if someone could provide any insights. I have tried it both for disruptive scenarios (as visible in image) and also with more controlled load scenarios.
For the image:
Beta Was this translation helpful? Give feedback.
All reactions