[🐛 BUG]: execTTL timeout elapsed #1992

cv65kr · 2024-08-21T09:58:37Z

No duplicates 🥲.

I have searched for a similar issue in our bug tracker and didn't find any solutions.

What happened?

On the metrics screen you can see that average of requests is 5ms, randomly requests start going in the queue and in logs you can see messages:

{"level":"error","ts":1724229813047911732,"logger":"grpc","msg":"method call was finished with error","error":"rpc error: code = Internal desc = worker_exec_with_timeout: ExecTTL: context deadline exceeded","method":"/api.test","start":1724229804042745317,"elapsed":9005}

My config:

  pool:
    num_workers: 4
    destroy_timeout: 30s
    supervisor:
      exec_ttl: 9s # Client timeout
      max_worker_memory: 70

It's pretty interesting because elapsing time for request is 9 sec. I checked the endpoint logic and this is simple query which not appears in any slow log of database.

Version (rr --version)

v2024.1.5

How to reproduce the issue?

Didn't found a way, it happened randomly

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

rustatian · 2024-08-21T10:01:32Z

Hey @cv65kr 👋 Thanks for the report 👍

Note for myself: load test with exec_ttl set. Build RR with the -race flag.

genhoi · 2024-08-23T09:59:44Z

I am experiencing a similar problem on GRPC with exec_ttl

rr version 2024.1.5 (build time: 2024-06-20T19:10:34+0000, go1.22.4), OS: linux, arch: amd64

route33 · 2024-08-29T07:38:48Z

@rustatian
I experience the same problem on latest RR
[INFO] RoadRunner server started; version: 2024.2.0, buildtime: 2024-07-25T13:25:10+0000
[INFO] sdnotify: not notified

{"level":"info","ts":1724864808550491636,"logger":"grpc","msg":"grpc server was started","address":"tcp://0.0.0.0:5002"}

{"level":"warn","ts":1724869469478675017,"logger":"server","msg":"worker stopped, and will be restarted","reason":"execTTL timeout elapsed","pid":148,"internal_event_name":"EventExecTTL","error":"worker_exec_with_timeout: ExecTTL: context canceled"}

{"level":"error","ts":1724869469478755068,"logger":"grpc","msg":"method call was finished with error","error":"rpc error: code = Internal desc = worker_exec_with_timeout: ExecTTL: context canceled","method":"/my.Service/HealthCheck","start":1724869468475914858,"elapsed":1002}

{"level":"warn","ts":1724890028555570695,"logger":"server","msg":"worker doesn't respond on stop command, killing process","PID":742}

{"level":"error","ts":1724890028555645505,"logger":"server","msg":"no free workers in the pool, wait timeout exceed","reason":"no free workers","internal_event_name":"EventNoFreeWorkers","error":"worker_watcher_get_free_worker: NoFreeWorkers:\n\tcontext canceled"}

{"level":"error","ts":1724890028555717208,"logger":"grpc","msg":"method call was finished with error","error":"rpc error: code = Internal desc = static_pool_exec: NoFreeWorkers:\n\tstatic_pool_exec:\n\tworker_watcher_get_free_worker:\n\tcontext canceled","method":"/my.Service/HealthCheck","start":1724890018477624974,"elapsed":10078}

That's the log on a pod with no traffic, just health checks every 5 seconds
Configuration:

grpc:
  listen: tcp://0.0.0.0:5002
  max_concurrent_streams: 10
  max_connection_age: 0s
  max_connection_age_grace: 0s8h
  max_connection_idle: 0s
  max_recv_msg_size: 50
  max_send_msg_size: 50
  ping_time: 2h
  pool:
    allocate_timeout: 30s
    destroy_timeout: 60s
    max_jobs: 10000
    num_workers: 8
    supervisor:
      exec_ttl: 1m
      idle_ttl: 2m
      max_worker_memory: 100
      ttl: 1h
      watch_tick: 1s
  proto: ../my.service/service.proto
  timeout: 1h

rustatian · 2024-09-05T17:06:39Z

Thank you, guys, for the comments. I hope next week I'd be able to dig into this problem. Will keep you all posted.

cv65kr added B-bug Bug: bug, exception F-need-verification labels Aug 21, 2024

cv65kr assigned rustatian Aug 21, 2024

rustatian added this to the v2024.3 milestone Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛 BUG]: execTTL timeout elapsed #1992

[🐛 BUG]: execTTL timeout elapsed #1992

cv65kr commented Aug 21, 2024

rustatian commented Aug 21, 2024

genhoi commented Aug 23, 2024

route33 commented Aug 29, 2024

rustatian commented Sep 5, 2024

[🐛 BUG]: execTTL timeout elapsed #1992

[🐛 BUG]: execTTL timeout elapsed #1992

Comments

cv65kr commented Aug 21, 2024

No duplicates 🥲.

What happened?

Version (rr --version)

How to reproduce the issue?

Relevant log output

rustatian commented Aug 21, 2024

genhoi commented Aug 23, 2024

route33 commented Aug 29, 2024

rustatian commented Sep 5, 2024