Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify how Logs SDK implements Enabled #4207

Open
pellared opened this issue Sep 10, 2024 · 27 comments
Open

Specify how Logs SDK implements Enabled #4207

pellared opened this issue Sep 10, 2024 · 27 comments
Labels

Comments

@pellared
Copy link
Member

pellared commented Sep 10, 2024

Context: #4203 (comment)

Blocked by:

@pellared
Copy link
Member Author

pellared commented Sep 16, 2024

Proposal A - Extend Processor

Below is a proposal mostly based on how Go Logs SDK currently implements Logger.Enabled.
Spike: open-telemetry/opentelemetry-go#5816

The idea is to add an OnEnabled operation to the LogRecordProcessor.

The parameters of OnEnabled would be defined in the same way as Logger.Enabled. However, it is important to notice that in future the parameters may independently change (for instance, the LogRecordProcessor.OnEnabled may start accepting an instrumentation scope to allow dynamic filtering of loggers by name). Still, I propose to start with the minimal set of parameters.

The return value of OnEnabled would be defined in the same way as Logger.Enabled.

Implementation of OnEnabled operation is optional so that the specification and existing processors implementations remain backwards compatible. If a LogRecordProcessor does not implement OnEnabled then it defaults to returning true.

When the user calls Logger.Enabled, the SDK would call OnEnabled in the same order as processors were registered until any of the processors' returns true. Logger.Enabled returns false if none of registered processors returns true. Pseudocode:

func (l *logger) Enabled(ctx context.Context, param log.EnabledParameters) bool {
	newParam := l.newEnabledParameters(ctx, param)
	for _, p := range l.provider.processors {
		if enabled := p.Enabled(ctx, newParam); enabled {
			return true
		}
	}
	return false
}

In order the implement filtering, the processor would need to wrap (decorate) an existing processor. Example pseudocode of a processor filtering log records by setting a minimum severity level:

// LogProcessor is an [log.Processor] implementation that wraps another
// [log.Processor]. It will pass-through calls to OnEmit and Enabled for
// records with severity greater than or equal to a minimum. All other method
// calls are passed to the wrapped [log.Processor].
type LogProcessor struct {
	log.Processor
	Minimum api.Severity
}

// OnEmit passes ctx and r to the [log.Processor] that p wraps if the severity
// of record is greater than or equal to p.Minimum. Otherwise, record is
// dropped.
func (p *LogProcessor) OnEmit(ctx context.Context, record *log.Record) error {
	if record.Severity() >= p.Minimum {
		return p.Processor.OnEmit(ctx, record)
	}
	return nil
}

// Enabled returns if the [log.Processor] that p wraps is enabled if the
// severity of record is greater than or equal to p.Minimum. Otherwise false is
// returned.
func (p *LogProcessor) OnEnabled(ctx context.Context, param log.EnabledParameter) bool {
	lvl, ok := param.Severity()
	if !ok {
		return true
	}
	return lvl >= p.Minimum && p.Processor.OnEnabled(ctx, param)
}

I also want to call out that the idea of composing/wrapping/decorating the processors is already used by the isolating processor. Composing of processors is necessary to achieve complex processing pipelines in the SDK.

Personal remarks:
I am going to work on another proposal(s) which be more inspired on tracing sampling. I want to coin a simpler filtering mechanism - a new abstraction called Filterer that would be used by SDK's Logger in both Emit and Enabled implementations.

@pellared
Copy link
Member Author

pellared commented Sep 17, 2024

Proposal B - Add filtering via Filterer

Below is a proposal inspired by sampling design in tracing SDK.
Spike: open-telemetry/opentelemetry-go#5825

The idea is to add a new abstraction named Filterer with a Filter method.

The parameters of Filter would be defined in the same way as Logger.Enabled. However, it is important to notice that in future the parameters may independently change (for instance, the Filterer.Filter may start accepting an instrumentation scope to allow dynamic filtering of loggers by name). Still, I propose to start with the minimal set of parameters.

The return value of Filter would be defined in the same way as Logger.Enabled.

Pseudocode:

type Filterer interface {
	Filter(ctx context.Context, param FilterParameters) bool
}

When the user calls Logger.Enabled, the SDK would call Filter in the same order as filterers were registered until any processor returns false (then it breaks and returns false). Logger.Enabled returns true if none of registered filterers returned false. Pseudocode:

// Enabled returns false if at least one Filterer held by the LoggerProvider
// that created the logger will return false for the provided context and param.
func (l *logger) Enabled(ctx context.Context, param log.EnabledParameters) bool {
	newParam := l.newEnabledParameters(param)
	for _, flt := range l.provider.filterers {
		if !flt.Filter(ctx, newParam) {
			return false
		}
	}
	return true
}

When the user calls Logger.Emit, the SDK would first check if the record should be filtered out. Pseudocode:

func (l *logger) Emit(ctx context.Context, r log.Record) {
	param:= l.toEnabledParameters(r)
	if !l.Enabled(ctx, param) {
		return
	}

	newRecord := l.newRecord(ctx, r)
	for _, p := range l.provider.processors {
		if err := p.OnEmit(ctx, &newRecord); err != nil {
			otel.Handle(err)
		}
	}
}

Pseudocode of a custom minimum severity level filterer :

type MinSevFilterer struct {
	Minimum api.Severity
}

func (f *MinSevFilterer) Filter(ctx context.Context, param log.FilterParameters) bool {
	lvl, ok := param.Severity()
	if !ok {
		return true
	}
	return lvl >= p.Minimum
}

Filtering is coupled to emitting log records. Therefore, a custom processor filtering implementation does not need to implement filtering on both OnEmit and OnEnabled as in Proposal A. It makes also makes defining filters easier for users.

This design should be easier to be added to existing SDKs as it adds a new abstraction instead of adding more responsibilities to Processor.

IMPORTANT:
This proposal assumes that each logger provider is used to define a separate processing pipeline (e.g. open-telemetry/opentelemetry-go#5830). It is NOT inline with #4010 (and #4062) which added the possibility to have multiple processing pipelines to the SDK's processing. It may decrease the performance as it will introduce more overhead and other unwanted side-effects (e.g. more debug logging). The benefit is that the solution is not coupled to the SDK (people may also use this approach e.g. to use few implementation of API). Basically it assumes that for complex pipelines the user is composing providers (and not processors).

@MSNev
Copy link
Contributor

MSNev commented Sep 17, 2024

I'm a little concerned that by "defining" the way that "enabled" should be implemented (especially adding layers of filtering (as part of the required implementation)) would (potentially) worse from a perf perspective than just constructing and emitting the Log that eventually gets dropped. ie. the whole point of the "Is enabled" is for avoid perf impact...

@MSNev
Copy link
Contributor

MSNev commented Sep 17, 2024

I think it's fine to define "what" options should be used (ie the parameters) to "check" whether something is enabled or not and this should be limited.
If an end-user application wants to do something more complicated then the "better" option would be to have "them" (not OTel) implement their own "Filtered" logger thingy.

@pellared
Copy link
Member Author

pellared commented Sep 17, 2024

I'm a little concerned that by "defining" the way that "enabled" should be implemented (especially adding layers of filtering (as part of the required implementation)) would (potentially) worse from a perf perspective than just constructing and emitting the Log that eventually gets dropped. ie. the whole point of the "Is enabled" is for avoid perf impact...

I am not sure if I follow.

Logger.Enabled gives the possibility to for the user to check if it is even necessary to construct and emit a big and expensive log record.

Which part of the implementation proposal (A or B or both) are you concerned about?

@pellared
Copy link
Member Author

I think it's fine to define "what" options should be used (ie the parameters) to "check" whether something is enabled or not and this should be limited.
If an end-user application wants to do something more complicated then the "better" option would be to have "them" (not OTel) implement their own "Filtered" logger thingy.

Agree. PTAL #4203 where I try to propose a minimal set of parameters.

@MSNev
Copy link
Contributor

MSNev commented Sep 19, 2024

I am not sure if I follow.

Logger.Enabled gives the possibility to for the user to check if it is even necessary to construct and emit a big and expensive log record.

Which part of the implementation proposal (A or B or both) are you concerned about?

Prescribing that it must be done via a specific method (A or B), rather than letting the language / environment determine the best way to provide the capability.

@MrAlias
Copy link
Contributor

MrAlias commented Sep 19, 2024

@MSNev the issue with not specifying behavior is that it can be specified later in a non-compatible way. Meaning, if an implementation implements A and the spec later specifies B that language SIG is non-compliant. Not having some language about implementation both stops implementations from adopting this feature for that reason and causes specification maintainer to not approve any further changes1.

This issue it trying to break out of this circular logic.

If you think things should be left undefined, please work with @jack-berg to reach a consensus as his statements ask for the opposite.

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-specification/pull/4203#pullrequestreview-2292967830

@jack-berg
Copy link
Member

jack-berg commented Sep 19, 2024

Proposal C - Extend Logger config

What about adding a severity level as part of the logger config?

It seems to match the common log workflow of changing severity threshold for specific loggers.

I'm suspicious of the idea of introducing the concept of an expandable set of EnabledParameters, since requiring users to create a new struct / object to check if code should proceed to emitting a log seems non-optimal from a performance and ergonomics standpoint. I understand the desire to want to retain flexibility to evolve the API, but I think getting the arguments right in the initial stable version and not trying to retain the ability to evolve is likely to produce the most ergonomic API.

@pellared
Copy link
Member Author

pellared commented Sep 20, 2024

What about adding a severity level as part of the logger config?

I think that it is reasonable to improve the logging configuration than introducing a new concept.
It also does not mean that a feature like "filterers" cannot be added in future to allow more flexibility/customization if there would be a bigger community demand.

It seems to match the common log workflow of changing severity threshold for specific loggers.

I agree. I assume that setting a global severity threshold is also a common workflow.

I'm suspicious of the idea of introducing the concept of an expandable set of EnabledParameters, since requiring users to create a new struct / object to check if code should proceed to emitting a log seems non-optimal from a performance and ergonomics standpoint. I understand the desire to want to retain flexibility to evolve the API, but I think getting the arguments right in the initial stable version and not trying to retain the ability to evolve is likely to produce the most ergonomic API.

PTAL at #4203 (comment)

@XSAM
Copy link
Member

XSAM commented Sep 20, 2024

Per discussion with @pellared, if the logger can have an Enabled method, we probably don't need an Enabled method for the processor, as the logger can solve the memory allocation optimization at the beginning of the logging workflow.

@MrAlias
Copy link
Contributor

MrAlias commented Sep 26, 2024

What about adding a severity level as part of the logger config?

It seems to match the common log workflow of changing severity threshold for specific loggers.

I'm suspicious of the idea of introducing the concept of an expandable set of EnabledParameters, since requiring users to create a new struct / object to check if code should proceed to emitting a log seems non-optimal from a performance and ergonomics standpoint. I understand the desire to want to retain flexibility to evolve the API, but I think getting the arguments right in the initial stable version and not trying to retain the ability to evolve is likely to produce the most ergonomic API.

If we unify on the logger config, we would not be able to have different severity level support for different processors. If you have one processor that wants to send to short-term storage and hold all debug logs and a long-term storage that only store error logs and above.

@pellared
Copy link
Member Author

pellared commented Sep 26, 2024

If we unify on the logger config, we would not be able to have different severity level support for different processors. [...]

This issue is also present in Proposal B - Filterer.

The possible workaround would be to compose logger providers as noted in proposal B. However, I do not feel like such workaround is good/acceptable.

@pellared
Copy link
Member Author

pellared commented Oct 2, 2024

Wouldn't it be acceptable if the SDK supports to multiple ways to implement Enabled? E.g. Proposal A and Proposal C.

I think that an ability to configure a logger to make it disabled or specify a minimum severity level would fulfill most use cases and may be well performing. While composing processors may be necessary to allow setting up complex processing pipelines.

@pellared
Copy link
Member Author

pellared commented Oct 3, 2024

Given we already have a few proposals and a prototype in Go. Shouldn't this issue be no longer a blocker for #4208?

CC @open-telemetry/technical-committee

@lmolkova
Copy link
Contributor

lmolkova commented Oct 3, 2024

The possible workaround would be to compose logger providers as noted in proposal B. However, I do not feel like such workaround is good/acceptable.

FWIW that's what .NET ILogger does - if any of the destinations have the logger + severity enabled, the logging is enabled.

Since every provider owns it's own configuration via LoggerConfigurator, the composite provider could be responsible for merging them and creating the final LoggerConfig.

It also means that filters would be an (optional) feature based on the LoggerConfig. Otherwise filtering processors and config may start to contradict each other and it'd be best if we could avoid it.

More practical approach could be to let processors own/participate in LoggerConfig and this would merge proposals A and C together.

@pellared
Copy link
Member Author

@open-telemetry/technical-committee, can I propose a PR based on Proposal A and Go implementation?

@MrAlias
Copy link
Contributor

MrAlias commented Oct 31, 2024

@open-telemetry/technical-committee FYI this is blocking Go stability of the Logs signal.

@tigrannajaryan
Copy link
Member

[Meta] I am commenting to help with the process, not to express an opinion about the proposed approaches.

@open-telemetry/technical-committee, can I propose a PR based on Proposal A and Go implementation?

@pellared We definitely need a PR with a proposal and then if it gets approved we need to merge that PR to the spec in Development status first. To emphasize again, I don't have an opinion about which of the proposed approaches we should go ahead with (I think I can see 3 different proposals in this thread).

@tigrannajaryan
Copy link
Member

FYI this is blocking Go stability of the Logs signal.

@MrAlias I think based on the agreed process we have to do this:

  1. Create a spec PR with a proposal about how this capability is supposed to work. (what @pellared wants to do). Merge it in Development status.
  2. Implement the capability in 3 languages in Development status. We have it in Go, which is good, we need 2 more languages (if we don't have them already).
  3. Publish implementations, collect community feedback (for a few weeks at least?).
  4. If we are happy with implementations mark the capability Stable in the spec.
  5. Mark the capability in language implementations Stable. We can also iterate Steps 4 and 5 with in-between maturity levels if we don't want to jump from Development to Stable in one leap.

I think you are saying this issue is blocking Step 5, which of course is true, I just want to make it clear there are a few additional steps before we get there. If you are expecting something else to happen then I may be misunderstanding what you are asking for.

P.S. The requirement for 3 language implementations was discussed but I still don't see it anywhere in the spec. We need to add it.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 31, 2024

Please also see C++ and Rust

@MrAlias
Copy link
Contributor

MrAlias commented Oct 31, 2024

Yeah, not expecting a direct jump. Just looking for continued progress here.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 31, 2024

There have been TC members taking an active role in blocking progress here:

I would appreciate a more active role from the TC in progressing this initiative. Is it possible to add @pellared to the TC?

@pellared
Copy link
Member Author

pellared commented Oct 31, 2024

@tigrannajaryan, I am not working on PR as the label is triage:deciding and not triage:accepted.
This the reason I asked

@open-telemetry/technical-committee, can I propose a PR based on Proposal A and Go implementation?

I do not want to waste the time of the reviewers and contributors until it is clear agreement that we can work on it.

PS. I am not blaming anyone. There still was (and still is) many other issues that was waiting for my time 😉

There have been TC members taking an active role in blocking progress here:

@MrAlias, the mentioned issue and PR was related to metrics. I do not see anyone actively blocking this issue.

@jack-berg
Copy link
Member

There have been TC members taking an active role in blocking progress here:

I've been blocking progress on the grounds of an incomplete proposal. I'll happily concede (though I'll still disagree) if the consensus is that we can / should evolve the API features independently of how those will be used in the SDK.

I would appreciate a more active role from the TC in progressing this initiative.

Here's my take:

While I understand why you might want the Logger, Meter, Tracer enabled methods to accept Context, I think including it as a parameter is not a good design.

Let's take stock of some of the API / SDK facilities we have which are related to this problem:

  • Tracing SDK has Sampler, which accepts context as an argument, and which is invoked when starting a span.
  • Tracing API has Tracer#enabled, which accepts no arguments, and which is used by a caller to avoid even starting a span.
  • Tracing SDK has TracerConfigurator, which computes the TracerConfig for a Tracer. TracerConfig has a boolean disabled parameter, which is used to determine the response to Tracer#enabled.
  • Metrics API instruments have Instrument#enabled, which accepts no arguments, and which is used by a caller to avoid recording measurements.
  • Metrics SDK has MeterConfigurator, which computes the MeterConfig for a Meter. MeterConfig has a boolean disabled parameter, which is used to determine the response to Instrument#enabled for all instruments registered under the meter.
  • Logs API has Logger#enabled, which accepts no arguments, and which is used by a caller to avoid emitting a log.
  • Logs SDK has LoggerConfigurator, which computes the LoggerConfig for a Logger. LoggerConfig has a boolean disabled parameter, which is used to determine the response to Logger#enabled.
  • Log SDK notably does not have a Sampler concept, although the idea has been floated.

If we extend Logger#enabled, Instrument#enabled to accept Context, then it follows that we should do the same for Tracer#enabled for the sake of symmetry.

Adding a Context argument to these #enabled methods suggests TracerConfig, MeterConfig, LoggerConfig would be evolved to have some sort of feature where users can provide a function which accepts Context (and maybe other parameters) and return a boolean indicating whether the Logger, Instrument, Tracer is enabled. If the SDK corollary to the API isn't TracerConfig, MeterConfig, LoggerConfig, we'll have to make up some new concept, which I think we should try to avoid.

Adding a programmable extension point to TracerConfig, MeterConfig, LoggerConfig is a sloppy design. Users now have to choose between writing code which turns things on and off in response to context in either a sampler, or in the {Signal}Config. When is it appropriate to use one or another?

A cleaner design is to:

  • Keep TracerConfig, MeterConfig, LoggerConfig static as a design principle.
  • Limit user-provided implementations of this kind of logic to samplers.
  • Provide a new log SDK sampler analagous to trace SDK sampler.

The benefit of keeping TracerConfig, MeterConfig, LoggerConfig static is that static config lends itself to declarative configuration, where as breaking this principle and accepting user-provided functions encourages bespoke, programmatic configuration. While more flexible, writing custom functions is a worse user experience. If we can carve out a portion of configuration which is static and rule-based, the ecosystem will have better user ergonomics and language interoperability in the long run.

As for what a missing log sampler might look like, we need it to have some symmetry to trace SDK sampler, but also reflect the differences between logging and tracing. In particular, spans have a start and end stage, and trace Sampler can only receive the bits of the span available at span start. In contrast, all bits of a log record are known at a time when it is omitted. Reflecting these differences, log sampler's might consist of:

  • A shouldSample operation accepting Context, Severity, Body, and Attributes and returning a boolean. Omit reosurce, scope, timestamp, observed timestamp for consistency with tracing. Omit severityText because it mostly duplicates severity. Return a boolean instead of SamplingResult returned by tracing because log sampling is less complicated than trace sampling.
  • Call shouldSample for each log emitted, and skip calling LogRecordProcessor#onEmit if sampler returns false.
  • Have build in log samplers for common scenarios:
    • always_on: always samples all log records
    • always_off: always drops all log record
    • trace_based: if there is a span in the context, sample log records where the span is sampled

One thing that's missing is the equivalent programmable extension point for metrics. I would solve this by introducing something like what's been called a measurement processor. An extension point that allows users to intercept all measurements before they're aggregated and do things like: transform value (i.e. unit conversions), drop, enrich with attributes from context, baggage. If this type of capability existed, a measurement processor could make decisions on the basis of Context, which would surely be an argument provided as part of the measurement.

Taking a step back, the proposal is to add a Context argument to the Logger#enabled, Instrument#enabled, Tracer#enabled methods. Since we make no guarantee that these methods are called by instrumentation, the only purpose they can possibly serve is to improve performance. By adding Context as an argument, we're implying some corresponding programmable SDK capability to respond based on the contents of Context. But its antithetical to call a user-provided function on an operation that is supposed to be blazing fast and eliminate overhead.

Let's do the sensible thing and establish a design principle where Tracer#enabled, Instrument#enabled, and Logger#enabled are always statically resolved.

@pellared
Copy link
Member Author

pellared commented Oct 31, 2024

I am not sure what "clean'" design means here but I think that the config design misses the fundamental functionality of supporting different log record processing pipelines. E.g. #4207 (comment) or different processing for events and bridged log records.

If we can carve out a portion of configuration which is static and rule-based, the ecosystem will have better user ergonomics

I find this statement as opinionated. Static configuration is less flexible and will be limiting the users.

But its antithetical to call a user-provided function on an operation that is supposed to be blazing fast and eliminate overhead.

I find this statement as not correct. Does it mean that we do not allow any customization for something that is supposed to be fast? Does it mean that the design of https://pkg.go.dev/log/slog or https://docs.rs/log/latest/log/ is antithetical?

I never proposed a dynamically configured loggerconfig. My preferred proposal is to extend LogRecordProcessor with OnEnabled. We are able to model a lot of different log pipelines using this design that are not possible with other designs. We have this already implemented since a few months. So far we received none negative feedback regarding this aspect of OTel Go Logs API and SDK.

for the sake of symmetry

I think symmetry is good as a guideline. However, it should never be a requirement especially that each signal has its own unique properties and use cases.

@jack-berg
Copy link
Member

I find this statement as opinionated. Static configuration is less flexible and will be limiting the users.

Yes. I'm taking an opinionated stance on what I think the design should be. Taking away flexibility (or put another way, adding guardrails) can be of great benefit to the user experience. After all, the removal of flexibility is what gives programming languages increased productivity and safety.

I never proposed a dynamically configured loggerconfig. My preferred proposal is to extend LogRecordProcessor with OnEnabled.

Still results in the ability for user code to run on the hot path. And the function has to run twice, right? Once when Logger#enabled is called, and a second time when the SDK is deciding whether to pass the log to a processor. The second time being necessary because instrumentation isn't required to call Logger#enabled. So I believe that's two calls for each registered LogRecordProcessor which provides a filter function, and also the need to iterate through a list of processors. I prefer a design where Logger#enabled is resolved by reading a boolean in memory - that's it.

To be honest, the design to prioritize multiple independent logging pipelines, each with their own filter, doesn't resonate with me. Its too different from the other signals, and adds complexity for what I consider to be a niche use case. I think those types of use cases are better served by the collector, instead of adding the burden to all the language implementations.

I find this statement as not correct. Does it mean that we do not allow any customization for something that is supposed to be fast? Does it mean that the design of https://pkg.go.dev/log/slog or https://docs.rs/log/latest/log/ is antithetical?

Presumably there's parts of those systems which encourage the pattern of calling user code in a similar scenario as our Logger#enabled operation? If so, I'm not familiar with those systems and can't comment.

This is all my personal opinion. If there's a consensus among community members to go in a different direction, the community should take that direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Blocked
Development

No branches or pull requests

9 participants