Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocompletion index generation could be much faster #8889

Open
2 tasks
nairb774 opened this issue Aug 29, 2024 · 1 comment
Open
2 tasks

Autocompletion index generation could be much faster #8889

nairb774 opened this issue Aug 29, 2024 · 1 comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue

Comments

@nairb774
Copy link

Describe the feature

Currently when generating the autocomplete index, the underlying code populates a sqlite3 database with the needed completion information. The underlying generation code doesn't perform explicit transactions, and as a result each DML statement issued is run through the full transaction cycle.

This can be vastly improved by increasing the size of the transactions used when populating the database. As a proof of concept, I added two explicit transaction blocks in awscli/autocomplete/local/indexer.py and awscli/autocomplete/serverside/indexer.py on top of 2.17.40. Running gen-ac-index I get the following:

$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)

real    0m10.893s
user    0m10.412s
sys     0m0.351s

Compare this to the current implementation with a transaction per insert:

$ time ./scripts/gen-ac-index --include-builtin-index --index-location $(mktemp)

real    7m33.245s
user    0m27.051s
sys     0m12.241s

Which is a fairly significant (97%) reduction in time.

Use Case

Each time the aws-cli-v2 package is updated in Arch it results in a from-source build of the cli. This is one of the longest steps in upgrading the software on my machine - often longer than rebuilding kernel modules and the initrd for half a dozen kernels. Improving the build and install time for the aws-cli would make it less likely for me to skip out on upgrading the cli when an update becomes available.

Proposed Solution

This is not a proper solution given the hacky nature, but this patch off of 2.17.40 was enough to realize the 97% performance improvement:

$ git diff 2.17.40
diff --git a/awscli/autocomplete/local/indexer.py b/awscli/autocomplete/local/indexer.py
index 6691e7abd..e69739a8a 100644
--- a/awscli/autocomplete/local/indexer.py
+++ b/awscli/autocomplete/local/indexer.py
@@ -74,10 +74,12 @@ class ModelIndexer(object):
         )
         help_command_table = clidriver.create_help_command().command_table
         command_table = clidriver.subcommand_table
-        self._generate_arg_index(command=parent, parent='',
-                                 arg_table=clidriver.arg_table)
-        self._generate_command_index(command_table, parent=parent,
-                                     help_command_table=help_command_table)
+        with self._db_connection._connection as conn:
+            conn.execute("BEGIN")
+            self._generate_arg_index(command=parent, parent='',
+                                     arg_table=clidriver.arg_table)
+            self._generate_command_index(command_table, parent=parent,
+                                         help_command_table=help_command_table)
 
         self._generate_table_indexes()
 
diff --git a/awscli/autocomplete/serverside/indexer.py b/awscli/autocomplete/serverside/indexer.py
index 9a6e240b7..f3e152fb4 100644
--- a/awscli/autocomplete/serverside/indexer.py
+++ b/awscli/autocomplete/serverside/indexer.py
@@ -51,8 +51,10 @@ class APICallIndexer(object):
         self._create_tables()
         session = clidriver.session
         loader = session.get_component('data_loader')
-        for key, command in self._iter_all_commands(clidriver):
-            self._construct_completion_data(loader, command)
+        with self._db_connection._connection as conn:
+            conn.execute("BEGIN")
+            for key, command in self._iter_all_commands(clidriver):
+                self._construct_completion_data(loader, command)
 
     def _iter_all_commands(self, clidriver):
         stack = sorted(clidriver.subcommand_table.items())

Other Information

Thanks for considering this massive quality of live improvement.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CLI version used

2.17.40

Environment details (OS name and version, etc.)

Arch Linux

@nairb774 nairb774 added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Aug 29, 2024
@tim-finnigan
Copy link
Contributor

Thanks for the feature request, we can track this for more discussion and input.

@tim-finnigan tim-finnigan added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants