Merge max-next-v2016.06-1 into next #917

eugeneia · 2016-05-12T14:42:50Z

Includes: #869 #870 #906 #898 #888 #909

…style. Remove misleading return statement (the branch never returns a value).

…x-next

… into max-next

…-next

This reverts commit 0ae9eae.

eugeneia · 2016-05-13T11:18:53Z

src/scripts/process-markdown

+                         system("mkdir .images 2>/dev/null || true")
+                         system("ditaa " diagram " .images/" diagram ".png > /dev/null");
+                         system("rm " diagram)
+                     } }' < $input > $output


Ugly but sensible imho.

lukego · 2016-05-18T13:47:46Z

Sorry about the slow feedback both on this PR and changes further downstream.

I have two requests that are both negotiable:

Change function definitions to use consistent syntax i.e. space between function name and parameter (like function foo () and not function foo()). This is for consistency with existing surrounding code and to follow the style from Programming in Lua.

Change ingress_packet_drops to use a simpler mechanism. How about if we just had a counter called engine.ingress_packet_drops that could be incremented by any app? Then the engine simply polls that counter for changes and reacts. Would also be visible to external tools like snabb top. Then we are reusing one existing abstraction (a counter) rather than inventing two new ones (ingress_packet_drops method and ingress_drop_monitor object).

eugeneia · 2016-05-18T13:59:28Z

Change ingress_packet_drops to use a simpler mechanism. How about if we just had a counter called engine.ingress_packet_drops that could be incremented by any app? Then the engine simply polls that counter for changes and reacts. Would also be visible to external tools like snabb top. Then we are reusing one existing abstraction (a counter) rather than inventing two new ones (ingress_packet_drops method and ingress_drop_monitor object).

👍

@dpino Would you be OK with it if I made the changes as described above?

wingo · 2016-05-19T07:14:52Z

@lukego that sounds fine to me fwiw, although it means more machinery in intel10g and other apps to actually accumulate that counter value, and I don't know how often to do it. Should the app add to that counter at every pull?

lukego · 2016-05-19T08:14:20Z

@wingo Good point. Relatedly, #886 is introducing a counter called in-discards (defined in RFC 7223) that every I/O app should provide. So now we are talking about three different ways to monitor packets dropped at ingress: app callback method, global engine counter, per-app counter. Which way(s) is best and how often should the counter be sampled?

One possibility would be to adopt the #886 style where every app provides a counter saying how many ingress packets it has discarded (updated every 1ms) and the engine would monitor those counters at some reasonable interval (perhaps also 1ms).

If we found that we were doing a lot of things at 1ms intervals then we might want to add a tick() method to the engine and to each app that is automatically called at this interval and is the default place to do house-keeping that is not quite cheap enough to do every breath (e.g. read a dozen device registers over PCIe).

However, now I am really floating away with the fairies, so maybe better to merge this implementation and consider iterating on it e.g. when landing #886?

wingo · 2016-05-19T08:41:02Z

@lukego All of your thoughts sound right to me. I think that in the future we'll still have the ingress_drop_monitor object because we need a place to hold the state like "did I flush the JIT recently" and "what's my current threshold for when I should flush JIT". MHO is that we should merge this code and iterate on it in #886 like you suggest to make the ingress_drop_monitor check counters instead of calling the method, if that makes sense.

Apologies for the parens thing, somehow I got started with that coding convention in Lua and I seem to have infected the team. We need to do a global search and replace on the lwaftr code, and until then be more careful when porting code to core.

dpino · 2016-05-19T09:15:15Z

FWIW, I agree with the proposed solution (merge ingress_drop_monitor now and iterate on it later)

eugeneia · 2016-05-19T09:18:46Z

Agree as well. 👍 Good point about waiting for the incoming statistics changes.

lukego · 2016-05-19T09:22:40Z

Merge, thanks! Thanks @wingo and @dpino for the quick input too!

lukego · 2016-05-27T09:05:27Z

src/core/app.lua

+   self.last_flush = now()
+   self.last_value[0] = self.current_value[0]
+   jit.flush()
+   print("jit.flush")


I am seeing my Snabb processes print jit.flush when running on next and as a user this is a little confusing. I think we should either make this message more verbose to explain to the user what is going on, and/or only print the message when debugging is enabled.

Example seeing the message during snabbnfv startup:

snabbnfv traffic starting (benchmark mode) Loading program/snabbnfv/test_fixtures/nfvconfig/test_functions/snabbnfv-bench1.port engine: start app B_NIC engine: start app B_Virtio jit.flush Get features 0x28000 VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF Get features 0x28000 VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF load: time: 1.00s fps: 0 fpGbps: 0.000 fpb: 0 bpp: - sleep: 100 us load: time: 1.00s fps: 0 fpGbps: 0.000 fpb: 0 bpp: - sleep: 100 us

and then later while processing traffic (that is always arriving faster than it can be processed, in this benchmark):

load: time: 1.00s fps: 4,606,149 fpGbps: 2.690 fpb: 170 bpp: 64 sleep: 4 us load: time: 1.00s fps: 4,626,202 fpGbps: 2.702 fpb: 170 bpp: 64 sleep: 4 us jit.flush load: time: 1.00s fps: 4,508,809 fpGbps: 2.633 fpb: 169 bpp: 64 sleep: 4 us load: time: 1.00s fps: 4,519,463 fpGbps: 2.639 fpb: 168 bpp: 64 sleep: 5 us

wingo · 2016-05-27T09:36:01Z

I think we should disable the jit.flush code by default. I apologize for arguing for it -- it works well for the lwaftr but I don't feel comfortable rolling that code out to general production right now. There are two major problems: one is the bad error message. The second is that it counts packets immediately after a jit.flush as well. So let's say you have a legitimate case where jit.flush could help, you detect the dropped packets and you flush: cool. However the counter effectively re-sets right at the flush -- right when we expect to see more dropped packets because of the flush. Instead we should stop monitoring for a period of time after a flush, instead of simply delaying the next flush, if any. We will submit a fix for this, but this sort of heuristic immaturity is what makes me uncomfortable rolling this code out to production.

We should change the default for the ingress drop monitor to false.

Load config from setup.lua

petebristow and others added 24 commits April 11, 2016 18:38

basic truncate / sample apps

369fb56

delayed_start, stall snabb so that peer NIC drivers can fully initialize

e1f2fa8

Feedback in PR869: do not block the whole process

6baa1f3

bits and bytes

ac86e72

add bits/byte method to RO/RC registers

8878939

use ffi.fill

41beb99

snabbco#896/apps.test.delayed_start: Remove tabs.

4ce5eae

Merge PR snabbco#869 (Add apps.test.delayed_start) into max-next

d8a8e71

Amendment to snabbco#870, lib.hardware.register: edit documentation, …

9643866

…style. Remove misleading return statement (the branch never returns a value).

Merge PR snabbco#870 (Add bits and bytes methods to Register) into ma…

4cb9dd1

…x-next

fix merge conflict

5044726

test_env.sh: set initrd and be able to append kernel params

e78b8b2

Merge PR snabbco#906 (test_env.sh: use initrd / append kernel params)…

f403a6c

… into max-next

Monitor ingress packet drops and jit flush if threshold exceeded

22b4a8f

Call not() and require jit module

c694fe7

Wrap 'ingress_packet_drops' call within 'with_restart'

6cac870

Make 'with_restart' not local

0ae9eae

Amendment to snabbco#898: edit documentation.

c4eff5c

Merge PR snabbco#898 (Add Truncate and Sample to basic_apps) into max…

b5f0ed6

…-next

Revert "Make 'with_restart' not local"

4207165

This reverts commit 0ae9eae.

Move ingress drop monitor code after definition of with_restart

cc94fb7

Merge PR snabbco#888 (v2016.05 release) onto master

7a75d68

Merge branch 'master' into max-next

fc60252

Merge PR snabbco#909 (JIT flush on demand) into max-next

12ab863

eugeneia assigned lukego May 12, 2016

process-markdown: fix bug where diagram at EOF was not rendered.

8dde153

eugeneia reviewed May 13, 2016
View reviewed changes

Code style.

6c3c677

lukego merged commit 6c3c677 into snabbco:next May 19, 2016

lukego added a commit that referenced this pull request May 19, 2016

Merge #917 branch 'eugeneia/max-next-v2016.06-1' into next

4c9a552

lukego reviewed May 27, 2016
View reviewed changes

dpino mentioned this pull request May 30, 2016

Disable ingress_drop_monitor by default #927

Closed

dpino pushed a commit to dpino/snabb that referenced this pull request Aug 24, 2017

Merge pull request snabbco#917 from Igalia/setup-conf

1294297

Load config from setup.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge max-next-v2016.06-1 into next #917

Merge max-next-v2016.06-1 into next #917

eugeneia commented May 12, 2016

eugeneia May 13, 2016

lukego commented May 18, 2016

eugeneia commented May 18, 2016

wingo commented May 19, 2016

lukego commented May 19, 2016

wingo commented May 19, 2016

dpino commented May 19, 2016

eugeneia commented May 19, 2016

lukego commented May 19, 2016

lukego May 27, 2016

wingo commented May 27, 2016

Merge max-next-v2016.06-1 into next #917

Merge max-next-v2016.06-1 into next #917

Conversation

eugeneia commented May 12, 2016

eugeneia May 13, 2016

Choose a reason for hiding this comment

lukego commented May 18, 2016

eugeneia commented May 18, 2016

wingo commented May 19, 2016

lukego commented May 19, 2016

wingo commented May 19, 2016

dpino commented May 19, 2016

eugeneia commented May 19, 2016

lukego commented May 19, 2016

lukego May 27, 2016

Choose a reason for hiding this comment

wingo commented May 27, 2016