Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation #6489

natefoo · 2024-10-25T15:28:56Z

Update the genome fetch and most commonly used indexer DMs to normalize the on-disk layout as proposed in galaxyproject/galaxy#19013.

In addition:

For those that had Python wrappers, I dropped the wrappers. In some cases this avoids building a special mulled container just for the DM
Updated some underlying tool versions
Added some tests of non-default options
For the STAR DM, automatically calculate the --genomeSAindexNbases and --genomeChrBinNbits options as recommended by the manual, to drastically reduce the index size for small genomes.

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

- Drop colorspace builder - Drop python wrapper - Update bowtie version - Add tests

- Drop python wrapper - Update bowtie2 version - Add test of non-default options

- Drop python wrapper - Update bwa version - Add test of non-default options

- Drop python wrapper - Add options to automatically calculate --genomeSAindexNbases and --genomeChrBinNbits

bgruening

Great work Nate!

bgruening · 2024-10-26T15:10:05Z

data_managers/data_manager_star_index_builder/test-data/rnastar_index2x_versioned.loc

Does this file need to be here?

Yes, this is just a rename from rnastar_index2_versioned.loc, which was inconsistent with its name in all other tables. An empty .loc as referenced in tool_data_table_conf.xml.test must exist in order to be written to by the test.

data_managers/data_manager_star_index_builder/data_manager/rna_star_index_builder.xml

bgruening · 2024-10-26T15:20:37Z

data_managers/data_manager_bwa_mem_index_builder/data_manager_conf.xml

@@ -10,13 +9,12 @@
                <column name="path" output_ref="out_file" >
                    <move type="directory" relativize_symlinks="True">
                        <!-- <source>${path}</source>--> <!-- out_file.extra_files_path is used as base by default --> <!-- if no source, eg for type=directory, then refers to base -->
-                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">${dbkey}/bwa_mem_index/${value}</target>
+                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">genomes/${dbkey}/bwa_mem_index/v1/${value}</target>


Where does the "v1" comes from?

See the proposal, this is because it's version 1 of bwa-mem.

bgruening · 2024-10-26T15:23:56Z

data_managers/data_manager_bwa_mem2_index_builder/data_manager_conf.xml

                    </move>
-                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${dbkey}/bwa_mem2_index/${value}/${path}</value_translation>
+                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/genomes/${dbkey}/bwa_mem_index/v2/${value}/${path}</value_translation>


Ah, now I see ... Not sure if the v1/v2 should be under bwa_mem_index. I would assume those are separate tools - separate indices.

Or in oder words, as an admin, I would search for bwa_mem2_index

I did consider that, I ended up with this because under the scheme in the proposal every DM will now contain a version directory. So this will result in:

genomes/${dbkey}/bwa_mem_index/v1/${value}

genomes/${dbkey}/bwa_mem2_index/v2/${value}

Which is kind of redundant and negates the purpose of the version directory in this case. That said I can understand how you would expect to have the directory name match the indexer name, although we already violate that for the other DMs, since the table name (bowtie_indexes) is plural and the tool ID/directory (bowtie_index) is not (to say nothing of the bowtie DMs using both "indexes" in the table/directory and "indices" in the loc file name).

…_star_index_builder.xml Co-authored-by: Björn Grüning <[email protected]>

natefoo · 2024-10-28T19:32:34Z

Converted to draft:

I need to add the sam_fasta_index DM as well.
I propose we remove the symlink to the reference genome as proposed here. @mvdbeek already tested this for one tool (bowtie2?).

bernt-matthias

Really appreciate that we have more and more DMs that do not require the extra python stuff.

I'm a bit worried about the side effects for admins.

bernt-matthias · 2024-10-30T11:23:23Z

data_managers/data_manager_bowtie2_index_builder/data_manager/bowtie2_index_builder.xml

    <inputs>
        <param name="all_fasta_source" type="select" label="Source FASTA Sequence">
            <options from_data_table="all_fasta"/>
        </param>
        <param name="sequence_name" type="text" value="" label="Name of sequence" />
        <param name="sequence_id" type="text" value="" label="ID for sequence" />
-        <param name="tophat2" type="boolean" truevalue="--data_table_name tophat2_indexes" falsevalue="" checked="True" label="Also make available for TopHat" help="Adds values to tophat2_indexes tool data table" />
+        <param name="tophat2" type="boolean" checked="True" label="Also make available for TopHat" help="Adds values to tophat2_indexes tool data table" />


Cool to cover this, but it seems SNAFU anyway, since the tophat2 data table refers to the bowtie2 loc file.

Maybe we just deprecate the tophat2 datatable (and update the tool to use the bowtie2 one)?

Yes, this is probably just legacy. Good idea to update the tophat2 tool although I suppose in the unlikely case an admin has indexes in the tophat2_indexes table that are not in bowtie2_indexes then they would disappear. And tophat is of course deprecated itself.

bernt-matthias · 2024-10-30T11:24:57Z

data_managers/data_manager_bowtie2_index_builder/data_manager_conf.xml

                    </move>
-                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${dbkey}/bowtie2_index/${value}/${path}</value_translation>
+                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/genomes/${dbkey}/bowtie_index/v2/${value}/${path}</value_translation>


I do not understand why this is done? It will require all admins to restucture reference data, or?
But certainly this would be nicer if we would start from scratch.

They don't have to - if you update to the new DMs and run them, they will just place data in a new folder, but the old loc with the old data at the old paths will continue to be loaded. That said, in the proposal I suggested that we recommend admins to specify a new tool_data_path just for organizational purposes. I also said I'd write a script to restructure the data for anyone who preferred to unify it.

bernt-matthias · 2024-10-30T11:26:39Z

...managers/data_manager_bowtie_index_builder/data_manager/bowtie_color_space_index_builder.xml

@@ -1,34 +0,0 @@
-<tool id="bowtie_color_space_index_builder_data_manager" name="Bowtie Color index" tool_type="manage_data" version="1.2.1" profile="23.0">


We should move these to the deprecated/data_managers/ folder of this repo.

natefoo added 6 commits October 18, 2024 16:06

Update data_manager_fetch_genome_dbkeys_all_fasta for normalized layout

79d1f68

Update data_manager_bowtie_index_builder for normalized layout, plus:

cd4fda2

- Drop colorspace builder - Drop python wrapper - Update bowtie version - Add tests

Update data_manager_bowtie2_index_builder for normalized layout, plus:

e7ece05

- Drop python wrapper - Update bowtie2 version - Add test of non-default options

Update data_manager_bwa_mem_index_builder for normalized layout, plus:

7b7896a

- Drop python wrapper - Update bwa version - Add test of non-default options

Update data_manager_bwa_mem2_index_builder for normalized layout

6986a9a

Update data_manager_star_index_builder for normalized layout, plus:

cd78304

- Drop python wrapper - Add options to automatically calculate --genomeSAindexNbases and --genomeChrBinNbits

natefoo changed the title ~~Normalize paths for genome fetch and some~~ Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation Oct 25, 2024

bgruening reviewed Oct 26, 2024

View reviewed changes

Update data_managers/data_manager_star_index_builder/data_manager/rna…

d1fa8a5

…_star_index_builder.xml Co-authored-by: Björn Grüning <[email protected]>

natefoo marked this pull request as draft October 28, 2024 19:31

bernt-matthias approved these changes Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation #6489

Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation #6489

natefoo commented Oct 25, 2024

bgruening left a comment

bgruening Oct 26, 2024

natefoo Oct 28, 2024

bgruening Oct 26, 2024

natefoo Oct 28, 2024

bgruening Oct 26, 2024

bgruening Oct 26, 2024

natefoo Oct 28, 2024

natefoo commented Oct 28, 2024

bernt-matthias left a comment

bernt-matthias Oct 30, 2024

natefoo Oct 31, 2024

bernt-matthias Oct 30, 2024

natefoo Oct 31, 2024

bernt-matthias Oct 30, 2024

		@@ -1,34 +0,0 @@
		<tool id="bowtie_color_space_index_builder_data_manager" name="Bowtie Color index" tool_type="manage_data" version="1.2.1" profile="23.0">

Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation #6489

Are you sure you want to change the base?

Normalize paths for genome fetch and some of the genome indexer data managers, plus additional moderinzation #6489

Conversation

natefoo commented Oct 25, 2024

bgruening left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natefoo commented Oct 28, 2024

bernt-matthias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment