Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A fix for the harvesting regression/tests introduced in 10836 #10990

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Oct 30, 2024

What this PR does / why we need it:

In one of the last commits in PR #10836, while addressing feedback from review, I rearranged/tried to clean up some validation and sanitizing code. Unfortunately, that introduced an error when importing harvested datasets (specifically, metadata-poor datasets created from oai_dc, where sanitizing invalid, or filling in missing required values is usually required). This PR fixes the regression.

The tests that are failing in develop have passed in https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-10990/1/.

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Very straightforward; trying to harvest anything from demo.dataverse.org using oai_dc is going to fail in develop as of now.
This configuration, for example:

server url: https://demo.dataverse.org/oai
set: controlTestSet
metadata format: oai_dc 

There are 7 datasets in the set; all 7 will fail in a develop build; all 7 should succeed with this PR.

The 2 api tests that are failing in the dev. branch:

testHarvestingClientRun_AllowHarvestingMissingCVV_False
testHarvestingClientRun_AllowHarvestingMissingCVV_True

should now be passing.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@landreev landreev marked this pull request as ready for review October 31, 2024 00:14
@landreev landreev added Feature: Harvesting Size: 3 A percentage of a sprint. 2.1 hours. labels Oct 31, 2024
@cmbz cmbz added the FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) label Oct 31, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I see it addresses bad values which were the source of the test fails.

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Oct 31, 2024

PR looks good but I'm having trouble passing the 2 API tests.

Here's what was done:

  1. Removed ./docker-dev-volumes directory
  2. Build PR on local
  3. Create a collection > Add Harvestable Client>
    server url: https://demo.dataverse.org/oai
    set: controlTestSet
    metadata format: oai_dc
  4. Run the harvest, verify that harvest ran successfully
  5. Run the API Tests: mvn test -Dtest=HarvestingClientsIT#testHarvestingClientRun_AllowHarvestingMissingCVV_False
    mvn test -Dtest=HarvestingClientsIT#testHarvestingClientRun_AllowHarvestingMissingCVV_True

Issue: API tests are both failing throwing similar error:
Note: I also tried running the scripts after a fresh build with no data (Before creating and getting harvested data) and it still failed.

image
image

@landreev
Copy link
Contributor Author

Note that you do not want to perform steps 3. and 4. before step 5. That will result in the API tests failing - because the tests will be trying again to harvest the datasets that are already harvested and exist locally on your system. (We should probably modify and improve the tests so that they work without the expectation that the datasets do not exist in the db; but that would be outside of this PR. They were written to run under Jenkins, and the database is always blank there. For now, make sure to remove the harvesting client and the associated datasets before running the API tests.)

But, since you reported that

I also tried running the scripts after a fresh build with no data (Before creating and getting harvested data) and it still failed.

there must be something else going on. Please copy and paste all the console output from these tests, plus any error messages from server.log around the time of failures, plus (probably most importantly) the dedicated log files left from the failed harvesting runs. These will be in the same directory as server.log, and named like harvest_<8 CHARACTER STRING>_<DATE STAMP>.log

3. Create a collection > Add Harvestable Client>
   server url: https://demo.dataverse.org/oai
   set: controlTestSet
   metadata format: oai_dc

4. Run the harvest, verify that harvest ran successfully

5. Run the API Tests: mvn test -Dtest=HarvestingClientsIT#testHarvestingClientRun_AllowHarvestingMissingCVV_False
   mvn test -Dtest=HarvestingClientsIT#testHarvestingClientRun_AllowHarvestingMissingCVV_True

@landreev
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: QA ✅
Development

Successfully merging this pull request may close these issues.

PR 10836 broke harvesting tests, needs an urgent fix
4 participants