-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream 2.60.7 #219
Upstream 2.60.7 #219
Conversation
cherry pick of erigontech#10531 Co-authored-by: boyuan-chen <[email protected]>
Merging diagnostics code updates from main branch.
…#11560) Co-authored-by: Andrew Ashikhmin <[email protected]>
Removed unnecessary struct type Co-authored-by: Somnath <[email protected]>
…erigontech#11561) - Added CPU overall info to report file Example output: ![Screenshot 2024-08-06 at 09 07 51](https://github.com/user-attachments/assets/834c3f48-ec37-48ea-8688-ee6ea00186a4)
- Updated gopsutil version as it has improvements in getting processes and memory info.
…#11552) cherry-pick 391fc4b for E2 needed to unblock PR erigontech#11551 Co-authored-by: Andrew Ashikhmin <[email protected]>
…ch#11549) (erigontech#11551) cherry-pick 2a98f6a for E2 relates to: erigontech#10734 erigontech#11387 restart Erigon with `SAVE_HEAP_PROFILE = true` env variable wait until we reach 45% or more alloc in stage_headers when "noProgressCounter >= 5" or "Rejected header marked as bad"
Changes: - Collecting CPU and Memory usage info about all processes running on the machine - Running loop 5 times with 2 seconds delay and to calculate average - Sort by CPU usage - Write result to report file - Display totals for CPU and Memory usage to processes table - Display CPU usage by cores - Print CPU details to table Cherry pick from: - erigontech#11516 - erigontech#11526 - erigontech#11537 - erigontech#11544
- added flags which was applied to run command to report
Co-authored-by: Kewei <[email protected]>
…11583) Fix erigontech#11414 root cause: empty validator set
Fixed issue with setup diagnostics client on erigon start
…erigontech#11669) I have done specific integration test(on e2) and compare the result with etherscan (block number 0x3D08FF: 3999999). etherscan link: https://etherscan.io/tx/0x535bd65cfee8655a1ffd8a28e065b47bcf557b10641491a541ab1dd08496709e#statechange for account 0xEce701C76bD00D1C3f96410a0C69eA8Dfcf5f34E if use block_number + 1 the debug_accountRange() returns correctly for storage 0x0000000000000000000000000000000000000000000000000000000000000004 the new value: 0x000000000000000000000000000000000000000000000064a66fad67042ceb4a; using current software returns the old value (0x000000000000000000000000000000000000000000000064a3a922764918eb4b). debug_accountRange() returns : "storage": { ...... "0x0000000000000000000000000000000000000000000000000000000000000004": "64a66fad67042ceb4a", ---- } If approved I will make change for e3
…ontech#11649) - Added functionality to grab the heap profile by calling the diagnostics endpoint - Added support to pass heap profile file to diagnostics UI through WebSocket
Import Silkworm RPC Integration tests
…gon-2] (erigontech#11811) [Polygon] Bor: [PIP-30: increased max code size limit to 32KB](https://github.com/maticnetwork/Polygon-Improvement-Proposals/blob/main/PIPs/PIP-30.md)
…chan… (erigontech#11810) cherry-pick from PR erigontech#11061 from main (the regression is present also in release 2.60) Regression caused by this change in uint256 library: holiman/uint256@f24ed59 It was causing trace json to encode stack values as quoted (string) decimal values rather than a hex encoded value. Co-authored-by: bgelb <[email protected]>
In the first run, init validators was skipped due to the check. Updated it by checking the erigon3 code to init snapshot for 1st block (this logic is borrowed from main branch). cherry-pick adaptation of erigontech#11725
…ontech#11813) **Existing behaviour:** - Add up the possible value that user must pay beforehand to buy gas - Deduct that amount from the sender's account in `intraBlockState`, but: - Don't deduct the gas value amount if the user doesn't have enough, and `gasBailout` is set **New behaviour:** - Don't check if sender's balance is enough to pay gas value amount, nor deduct it, if `gasBailout` is set **More rationale** This would mean the sender's account would show `"balance": "="` in `trace_call` rpc method, that is, no change, if gas is the only thing the user pays for. This is fine because the gas price can fluctuate in a real transaction. This also removes the inconsistency of sometimes having to bother deducting the amount if it is less than sender's balance, thereby causing a bug/inconsistency.
…h#10672) (erigontech#11892) relates to erigontech#11890 cherry pick from E3 [2f2ce6a](erigontech@2f2ce6a) ----- #### Issue: At line 129 in `logsfilter.go`, we had the following line of code: ```go _, addrOk := filter.addrs[gointerfaces.ConvertH160toAddress(eventLog.Address)] ``` This line caused a panic due to a fatal error: ```logs fatal error: concurrent map read and map write goroutine 106 [running]: github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1({0xc009701db8?, 0x8?}, 0xc135d26050) github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go:129 +0xe7 github.com/ledgerwatch/erigon/turbo/rpchelper.(*SyncMap[...]).Range(0xc009701eb0?, 0xc009701e70?) github.com/ledgerwatch/erigon/turbo/rpchelper/subscription.go:97 +0x11a github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog(0x25f4600?, 0xc0000ce090?) github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go:131 +0xc7 github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).OnNewLogs(...) github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:547 github.com/ledgerwatch/erigon/cmd/rpcdaemon/rpcservices.(*RemoteBackend).SubscribeLogs(0xc0019c2f50, {0x32f0040, 0xc001b4a280}, 0xc001c0c0e0, 0x0?) github.com/ledgerwatch/erigon/cmd/rpcdaemon/rpcservices/eth_backend.go:227 +0x1d1 github.com/ledgerwatch/erigon/turbo/rpchelper.New.func2() github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:102 +0xec created by github.com/ledgerwatch/erigon/turbo/rpchelper.New github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:92 +0x652 ``` This error indicates that there were simultaneous read and write operations on the `filter.addrs` map, leading to a race condition. #### Solution: To resolve this issue, I implemented the following changes: - Moved SyncMap to erigon-lib common library: This allows us to utilize a thread-safe map across different packages that require synchronized map access. - Refactored logsFilter to use SyncMap: By replacing the standard map with SyncMap, we ensured that all map operations are thread-safe, thus preventing concurrent read and write errors. - Added documentation for SyncMap usage: Detailed documentation was provided to guide the usage of SyncMap and related refactored components, ensuring clarity and proper utilization. Co-authored-by: Bret <[email protected]>
…ontech#10718) (erigontech#11894) relates to erigontech#11890 cherry pick from E3 erigontech@adf3f43 ----- - Introduced `GetOrCreateGaugeVec` function to manage Prometheus `GaugeVec` metrics. - Implemented various subscription filter limits (max logs, max headers, max transactions, max addresses, and max topics). - Updated various test files to utilize the new `FiltersConfig` and added comprehensive tests for the new filter limits. - Added new flags for subscription filter limits. - Improved log filtering and event handling with better memory management and metric tracking. - Added configuration to handle RPC subscription filter limits. - Minor typo fixes. ------- RPC nodes would frequently OOM. Based on metrics and pprof, I observed high memory utilization in the rpchelper: ```text File: erigon Type: inuse_space Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 69696.67MB, 95.37% of 73083.31MB total Dropped 1081 nodes (cum <= 365.42MB) Showing top 10 nodes out of 43 flat flat% sum% cum cum% 65925.80MB 90.21% 90.21% 65925.80MB 90.21% github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1 1944.80MB 2.66% 92.87% 1944.80MB 2.66% github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1 1055.97MB 1.44% 94.31% 1055.97MB 1.44% github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 362.04MB 0.5% 94.81% 572.20MB 0.78% github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState 257.73MB 0.35% 95.16% 587.85MB 0.8% encoding/json.Marshal 57.85MB 0.079% 95.24% 1658.66MB 2.27% github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run 51.48MB 0.07% 95.31% 2014.50MB 2.76% github.com/ledgerwatch/erigon/rpc.(*handler).handleCallMsg 23.01MB 0.031% 95.34% 1127.21MB 1.54% github.com/ledgerwatch/erigon/core/vm.opCall 11.50MB 0.016% 95.36% 1677.68MB 2.30% github.com/ledgerwatch/erigon/core/vm.(*EVM).call 6.50MB 0.0089% 95.37% 685.70MB 0.94% github.com/ledgerwatch/erigon/turbo/transactions.DoCall (pprof) list distributeLog.func1 Total: 71.37GB ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go 64.38GB 64.38GB (flat, cum) 90.21% of Total . . 206: a.logsFilters.Range(func(k LogsSubID, filter *LogsFilter) error { . . 207: if filter.allAddrs == 0 { . . 208: _, addrOk := filter.addrs.Get(gointerfaces.ConvertH160toAddress(eventLog.Address)) . . 209: if !addrOk { . . 210: return nil . . 211: } . . 212: } . . 213: var topics []libcommon.Hash . . 214: for _, topic := range eventLog.Topics { 27.01GB 27.01GB 215: topics = append(topics, gointerfaces.ConvertH256ToHash(topic)) . . 216: } . . 217: if filter.allTopics == 0 { . . 218: if !a.chooseTopics(filter, topics) { . . 219: return nil . . 220: } . . 221: } 37.37GB 37.37GB 222: lg := &types2.Log{ . . 223: Address: gointerfaces.ConvertH160toAddress(eventLog.Address), . . 224: Topics: topics, . . 225: Data: eventLog.Data, . . 226: BlockNumber: eventLog.BlockNumber, . . 227: TxHash: gointerfaces.ConvertH256ToHash(eventLog.TransactionHash), (pprof) % ``` - Added Metrics Implemented metrics to gain visibility into memory utilization. - Optimized Allocations Optimized object and slice allocations, resulting in significant improvements in memory usage. ```text File: erigon Type: inuse_space Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 25.92GB, 94.14% of 27.53GB total Dropped 785 nodes (cum <= 0.14GB) Showing top 10 nodes out of 70 flat flat% sum% cum cum% 15.37GB 55.82% 55.82% 15.37GB 55.82% github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1 8.15GB 29.60% 85.42% 8.15GB 29.60% github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 0.64GB 2.33% 87.75% 1GB 3.65% github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog 0.50GB 1.82% 89.57% 0.65GB 2.35% github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState 0.36GB 1.32% 90.89% 0.36GB 1.32% github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1 0.31GB 1.12% 92.01% 0.31GB 1.12% bytes.growSlice 0.30GB 1.08% 93.09% 0.30GB 1.08% github.com/ledgerwatch/erigon/core/vm.(*Memory).Resize (inline) 0.15GB 0.56% 93.65% 0.35GB 1.27% github.com/ledgerwatch/erigon/core/state.(*IntraBlockState).AddSlotToAccessList 0.07GB 0.24% 93.90% 0.23GB 0.84% github.com/ledgerwatch/erigon/core/types.codecSelfer2.decLogs 0.07GB 0.24% 94.14% 1.58GB 5.75% github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run (pprof) list AddLogs.func1 Total: 27.53GB ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go 15.37GB 15.37GB (flat, cum) 55.82% of Total . . 644: ff.logsStores.DoAndStore(id, func(st []*types.Log, ok bool) []*types.Log { . . 645: if !ok { . . 646: st = make([]*types.Log, 0) . . 647: } 15.37GB 15.37GB 648: st = append(st, logs) . . 649: return st . . 650: }) . . 651:} . . 652: . . 653:// ReadLogs reads logs from the store associated with the given subscription ID. (pprof) list AddPendingTxs.func1 Total: 27.53GB ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go 8.15GB 8.15GB (flat, cum) 29.60% of Total . . 686: ff.pendingTxsStores.DoAndStore(id, func(st [][]types.Transaction, ok bool) [][]types.Transaction { . . 687: if !ok { . . 688: st = make([][]types.Transaction, 0) . . 689: } 8.15GB 8.15GB 690: st = append(st, txs) . . 691: return st . . 692: }) . . 693:} . . 694: . . 695:// ReadPendingTxs reads pending transactions from the store associated with the given subscription ID. (pprof) % ``` - Identified unbound slices that could grow indefinitely, leading to memory leaks. This issue occurs if subscribers do not request updates using their subscription ID, especially when behind a load balancer that does not pin clients to RPC nodes. 1. Architecture Design Changes - Implement architectural changes to pin clients to RPC Nodes ensuring subscribers that request updates using their subscription IDs will hit the same node, which does clean up the objecs on each request. - Reason for not choosing: This still relies on clients with subscriptions to request updates and if they never do and do not call the unsubscribe function it's an unbound memory leak. The RPC node operator has no control over the behavior of the clients. 2. Implementing Timeouts - Introduce timeouts for subscriptions to automatically clean up unresponsive or inactive subscriptions. - Reason for not choosing: Implementing timeouts would inadvertently stop websocket subscriptions due to the current design of the code, leading to a potential loss of data for active users. This solution would be good but requires a lot more work. 3. Configurable Limits (Chosen Solution) - Set configurable limits for various subscription parameters (e.g., logs, headers, transactions, addresses, topics) to manage memory utilization effectively. - Reason for choosing: This approach provides flexibility to RPC node operators to configure limits as per their requirements. The default behavior remains unchanged, making it a non-breaking change. Additionally, it ensures current data is always available by pruning the oldest data first. ```bash --rpc.subscription.filters.maxlogs=60 --rpc.subscription.filters.maxheaders=60 --rpc.subscription.filters.maxtxs=10_000 --rpc.subscription.filters.maxaddresses=1_000 --rpc.subscription.filters.maxtopics=1_000 ``` ```text File: erigon Type: inuse_space Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 1146.51MB, 89.28% of 1284.17MB total Dropped 403 nodes (cum <= 6.42MB) Showing top 10 nodes out of 124 flat flat% sum% cum cum% 923.64MB 71.92% 71.92% 923.64MB 71.92% github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 54.59MB 4.25% 76.18% 54.59MB 4.25% github.com/ledgerwatch/erigon/rlp.(*Stream).Bytes 38.21MB 2.98% 79.15% 38.21MB 2.98% bytes.growSlice 29.34MB 2.28% 81.44% 29.34MB 2.28% github.com/ledgerwatch/erigon/p2p/rlpx.growslice 26.44MB 2.06% 83.49% 34.16MB 2.66% compress/flate.NewWriter 18.66MB 1.45% 84.95% 21.16MB 1.65% github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState 17.50MB 1.36% 86.31% 64.58MB 5.03% github.com/ledgerwatch/erigon/core/types.(*DynamicFeeTransaction).DecodeRLP 15MB 1.17% 87.48% 82.08MB 6.39% github.com/ledgerwatch/erigon/core/types.UnmarshalTransactionFromBinary 13.58MB 1.06% 88.54% 13.58MB 1.06% github.com/ledgerwatch/erigon/turbo/stages/headerdownload.(*HeaderDownload).addHeaderAsLink 9.55MB 0.74% 89.28% 9.55MB 0.74% github.com/ledgerwatch/erigon/turbo/rpchelper.newChanSub[...] (pprof) list AddPendingTxs.func1 Total: 1.25GB ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go 923.64MB 923.64MB (flat, cum) 71.92% of Total . . 698: ff.pendingTxsStores.DoAndStore(id, func(st [][]types.Transaction, ok bool) [][]types.Transaction { . . 699: if !ok { . . 700: st = make([][]types.Transaction, 0) . . 701: } . . 702: . . 703: // Calculate the total number of transactions in st . . 704: totalTxs := 0 . . 705: for _, txBatch := range st { . . 706: totalTxs += len(txBatch) . . 707: } . . 708: . . 709: maxTxs := ff.config.RpcSubscriptionFiltersMaxTxs . . 710: // If adding the new transactions would exceed maxTxs, remove oldest transactions . . 711: if maxTxs > 0 && totalTxs+len(txs) > maxTxs { . . 712: // Flatten st to a single slice 918.69MB 918.69MB 713: flatSt := make([]types.Transaction, 0, totalTxs) . . 714: for _, txBatch := range st { . . 715: flatSt = append(flatSt, txBatch...) . . 716: } . . 717: . . 718: // Remove the oldest transactions to make space for new ones . . 719: if len(flatSt)+len(txs) > maxTxs { . . 720: flatSt = flatSt[len(flatSt)+len(txs)-maxTxs:] . . 721: } . . 722: . . 723: // Convert flatSt back to [][]types.Transaction with a single batch . . 724: st = [][]types.Transaction{flatSt} . . 725: } . . 726: . . 727: // Append the new transactions as a new batch 5.95MB 4.95MB 728: st = append(st, txs) . . 729: return st . . 730: }) . . 731:} . . 732: . . 733:// ReadPendingTxs reads pending transactions from the store associated with the given subscription ID. ``` With these changes, the memory utilization has significantly improved, and the system is now more stable and predictable. ----- <img width="1567" alt="image" src="https://github.com/ledgerwatch/erigon/assets/787344/264c9757-93cf-4be8-9883-5ca3187acd73"> Blue line indicates a deployment Sharp drop of green line is an OOM Co-authored-by: Bret <[email protected]>
…ntech#11908) relates to erigontech#11890 cherry pick from E3 to E2: erigontech@b760da2 ---- - Simplify and enhance tests. - Add test for invalid slice operations panic. - Remove unused Mutex ---- ### Issue I experienced a rare panic with the new filter code. ```text panic: runtime error: slice bounds out of range [121:100] goroutine 25311363 [running]: github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1({0xc011c67020?, 0xc00a049e20?, 0xc020720b40?}, 0x48?) github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:720 +0x31a github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).DoAndStore.func1(0x20?) github.com/ledgerwatch/[email protected]/common/concurrent/concurrent.go:52 +0x22 github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).Do(0x33209a0, {0xc013409fa0, 0x20}, 0xc00a049ee0) github.com/ledgerwatch/[email protected]/common/concurrent/concurrent.go:40 +0xff github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).DoAndStore(0xc0097a3c70?, {0xc013409fa0?, 0x30?}, 0xc000bc7d40?) github.com/ledgerwatch/[email protected]/common/concurrent/concurrent.go:51 +0x4b github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs(0xc010d587d0?, {0xc013409fa0?, 0xc0097de0f0?}, {0xc01239a800?, 0xc00c820500?, 0xc011beee70?}) github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:698 +0x6b github.com/ledgerwatch/erigon/turbo/jsonrpc.(*APIImpl).NewPendingTransactionFilter.func1() github.com/ledgerwatch/erigon/turbo/jsonrpc/eth_filters.go:24 +0x88 created by github.com/ledgerwatch/erigon/turbo/jsonrpc.(*APIImpl).NewPendingTransactionFilter github.com/ledgerwatch/erigon/turbo/jsonrpc/eth_filters.go:22 +0xca ``` ### Resolution 1. Create a unit test reproducing the panic. 2. Ensure the slicing indices are calculated correctly and do not produce an invalid range. ### Running the new unit test on unfixed code: ```bash $ go test --- FAIL: TestFilters_AddPendingTxs (0.00s) --- FAIL: TestFilters_AddPendingTxs/TriggerPanic (0.00s) filters_test.go:451: AddPendingTxs caused a panic: runtime error: slice bounds out of range [10:5] FAIL exit status 1 FAIL github.com/ledgerwatch/erigon/turbo/rpchelper 0.454s ``` Co-authored-by: Bret <[email protected]>
…ch#11912) cherry pick from E3 to E2 erigontech@d7d9ded relates to erigontech#11890
…r amoy network (erigontech#11901) [Polygon] Bor: Added Ahmedabad HF related configs and block number for amoy network This is the Ahmedabad block number - [11865856](https://amoy.polygonscan.com/block/countdown/11865856) PR in bor - [bor#1324](maticnetwork/bor#1324)
…rigontech#11912)" (erigontech#11913) This reverts commit 93ec6d9. based on advice from @bretep https://discord.com/channels/687972960811745322/983710221308416010/1281647424569344141 relates to erigontech#11890
Extended pprof read API to include: goroutine, threadcreate, heap, allocs, block, mutex
…1916) - Changed UI serving port to 5137 as 6060 is busy with diag API - Refactored request headers to be set in middleware
@@ -57,3 +95,134 @@ jobs: | |||
VERSION: ${{ steps.prepare.outputs.tag_name }} | |||
DOCKER_USERNAME: ${{ secrets.DOCKERHUB }} | |||
DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_KEY }} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out new implementation of releasing other binaries of erigon. We are using original method of deploying only erigon to docker, since other components aren't tested.
#branches: | ||
# - 'master' | ||
#tags: | ||
## only trigger on release tags: | ||
#- 'v*.*.*' | ||
#- 'v*.*.*-*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that release workflow now works on manual workflow trigger(build-release
) instead of automatically running when new tag is created. This change is introduced in upstream.
Sync with upstream Erigon's new release: https://github.com/erigontech/erigon/releases/tag/2.60.7