Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge max-next-v2016.06-1 into next #917

Merged
merged 26 commits into from
May 19, 2016
Merged

Conversation

eugeneia
Copy link
Member

Includes: #869 #870 #906 #898 #888 #909

petebristow and others added 24 commits April 11, 2016 18:38
…style.

Remove misleading return statement (the branch never returns a value).
system("mkdir .images 2>/dev/null || true")
system("ditaa " diagram " .images/" diagram ".png > /dev/null");
system("rm " diagram)
} }' < $input > $output
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugly but sensible imho.

@lukego
Copy link
Member

lukego commented May 18, 2016

Sorry about the slow feedback both on this PR and changes further downstream.

I have two requests that are both negotiable:

Change function definitions to use consistent syntax i.e. space between function name and parameter (like function foo () and not function foo()). This is for consistency with existing surrounding code and to follow the style from Programming in Lua.

Change ingress_packet_drops to use a simpler mechanism. How about if we just had a counter called engine.ingress_packet_drops that could be incremented by any app? Then the engine simply polls that counter for changes and reacts. Would also be visible to external tools like snabb top. Then we are reusing one existing abstraction (a counter) rather than inventing two new ones (ingress_packet_drops method and ingress_drop_monitor object).

@eugeneia
Copy link
Member Author

Change ingress_packet_drops to use a simpler mechanism. How about if we just had a counter called engine.ingress_packet_drops that could be incremented by any app? Then the engine simply polls that counter for changes and reacts. Would also be visible to external tools like snabb top. Then we are reusing one existing abstraction (a counter) rather than inventing two new ones (ingress_packet_drops method and ingress_drop_monitor object).

👍

@dpino Would you be OK with it if I made the changes as described above?

@wingo
Copy link
Contributor

wingo commented May 19, 2016

@lukego that sounds fine to me fwiw, although it means more machinery in intel10g and other apps to actually accumulate that counter value, and I don't know how often to do it. Should the app add to that counter at every pull?

@lukego
Copy link
Member

lukego commented May 19, 2016

@wingo Good point. Relatedly, #886 is introducing a counter called in-discards (defined in RFC 7223) that every I/O app should provide. So now we are talking about three different ways to monitor packets dropped at ingress: app callback method, global engine counter, per-app counter. Which way(s) is best and how often should the counter be sampled?

One possibility would be to adopt the #886 style where every app provides a counter saying how many ingress packets it has discarded (updated every 1ms) and the engine would monitor those counters at some reasonable interval (perhaps also 1ms).

If we found that we were doing a lot of things at 1ms intervals then we might want to add a tick() method to the engine and to each app that is automatically called at this interval and is the default place to do house-keeping that is not quite cheap enough to do every breath (e.g. read a dozen device registers over PCIe).

However, now I am really floating away with the fairies, so maybe better to merge this implementation and consider iterating on it e.g. when landing #886?

@wingo
Copy link
Contributor

wingo commented May 19, 2016

@lukego All of your thoughts sound right to me. I think that in the future we'll still have the ingress_drop_monitor object because we need a place to hold the state like "did I flush the JIT recently" and "what's my current threshold for when I should flush JIT". MHO is that we should merge this code and iterate on it in #886 like you suggest to make the ingress_drop_monitor check counters instead of calling the method, if that makes sense.

Apologies for the parens thing, somehow I got started with that coding convention in Lua and I seem to have infected the team. We need to do a global search and replace on the lwaftr code, and until then be more careful when porting code to core.

@dpino
Copy link
Contributor

dpino commented May 19, 2016

FWIW, I agree with the proposed solution (merge ingress_drop_monitor now and iterate on it later)

@eugeneia
Copy link
Member Author

Agree as well. 👍 Good point about waiting for the incoming statistics changes.

@lukego lukego merged commit 6c3c677 into snabbco:next May 19, 2016
@lukego
Copy link
Member

lukego commented May 19, 2016

Merge, thanks! Thanks @wingo and @dpino for the quick input too!

self.last_flush = now()
self.last_value[0] = self.current_value[0]
jit.flush()
print("jit.flush")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing my Snabb processes print jit.flush when running on next and as a user this is a little confusing. I think we should either make this message more verbose to explain to the user what is going on, and/or only print the message when debugging is enabled.

Example seeing the message during snabbnfv startup:

snabbnfv traffic starting (benchmark mode)
Loading program/snabbnfv/test_fixtures/nfvconfig/test_functions/snabbnfv-bench1.port
engine: start app B_NIC
engine: start app B_Virtio
jit.flush
Get features 0x28000
 VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF
Get features 0x28000
 VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF
load: time: 1.00s  fps: 0         fpGbps: 0.000 fpb: 0   bpp: -    sleep: 100 us
load: time: 1.00s  fps: 0         fpGbps: 0.000 fpb: 0   bpp: -    sleep: 100 us

and then later while processing traffic (that is always arriving faster than it can be processed, in this benchmark):

load: time: 1.00s  fps: 4,606,149 fpGbps: 2.690 fpb: 170 bpp: 64   sleep: 4   us
load: time: 1.00s  fps: 4,626,202 fpGbps: 2.702 fpb: 170 bpp: 64   sleep: 4   us
jit.flush
load: time: 1.00s  fps: 4,508,809 fpGbps: 2.633 fpb: 169 bpp: 64   sleep: 4   us
load: time: 1.00s  fps: 4,519,463 fpGbps: 2.639 fpb: 168 bpp: 64   sleep: 5   us

@wingo
Copy link
Contributor

wingo commented May 27, 2016

I think we should disable the jit.flush code by default. I apologize for arguing for it -- it works well for the lwaftr but I don't feel comfortable rolling that code out to general production right now. There are two major problems: one is the bad error message. The second is that it counts packets immediately after a jit.flush as well. So let's say you have a legitimate case where jit.flush could help, you detect the dropped packets and you flush: cool. However the counter effectively re-sets right at the flush -- right when we expect to see more dropped packets because of the flush. Instead we should stop monitoring for a period of time after a flush, instead of simply delaying the next flush, if any. We will submit a fix for this, but this sort of heuristic immaturity is what makes me uncomfortable rolling this code out to production.

We should change the default for the ingress drop monitor to false.

dpino pushed a commit to dpino/snabb that referenced this pull request Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants