You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since Anserini is often used for search performance benchmarks, enabling recursive graph bisection would help. For reference, most if not all PISA performance benchmarks seem to enable recursive graph bisection.
Since Lucene 9.9, Lucene is able to hook recursive graph bisection into the merging process, which makes it easier to enable. For instance, you can do the following to enable recursive graph bisection in the final merge if you plan on doing a IndexWriter.forceMerge(1) call before searching documents:
IndexWriterConfigiwc = newIndexWriterConfig();
BPIndexReordererreorderer = newBPIndexReorderer();
reorderer.setForkJoinPool(ForkJoinPool.commonPool()); // run reordering on multiple threadsBPReorderingMergePolicymp = newBPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(Integer.MAX_VALUE); // only run reordering on forced mergesiwc.setMergePolicy(mp);
But you can also enable it on background merges if you don't plan on doing a final force-merge, in a wat that the bigger segments will be reordered. Note: benchmarks on the Wikipedia dataset suggest that this approach yields an index-time overhead in the order of ~30%.
IndexWriterConfig iwc = new IndexWriterConfig();
BPIndexReorderer reorderer = new BPIndexReorderer();
BPReorderingMergePolicy mp = new BPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(100_000); // only reorder segments that have more than 100k docs
iwc.setMergePolicy(mp);
This assumes a default configuration for index reordering, which looks at all indexed fields, runs up to 20 iterations per level, etc. Much of this is configurable, see BPIndexReorderer javadocs and BPReorderingMergePolicy javadocs.
These classes are in the lucene-misc module, which I can't see in Anserini's current dependencies, so it would need to be added.
I'm happy to help on this, let me know if you have questions.
The text was updated successfully, but these errors were encountered:
Since Anserini is often used for search performance benchmarks, enabling recursive graph bisection would help. For reference, most if not all PISA performance benchmarks seem to enable recursive graph bisection.
Since Lucene 9.9, Lucene is able to hook recursive graph bisection into the merging process, which makes it easier to enable. For instance, you can do the following to enable recursive graph bisection in the final merge if you plan on doing a
IndexWriter.forceMerge(1)
call before searching documents:But you can also enable it on background merges if you don't plan on doing a final force-merge, in a wat that the bigger segments will be reordered. Note: benchmarks on the Wikipedia dataset suggest that this approach yields an index-time overhead in the order of ~30%.
This assumes a default configuration for index reordering, which looks at all indexed fields, runs up to 20 iterations per level, etc. Much of this is configurable, see BPIndexReorderer javadocs and BPReorderingMergePolicy javadocs.
These classes are in the
lucene-misc
module, which I can't see in Anserini's current dependencies, so it would need to be added.I'm happy to help on this, let me know if you have questions.
The text was updated successfully, but these errors were encountered: