Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[one-optimize] Refactor RemoveDeadNodeWithQueryPass #13863

Open
icodo98 opened this issue Aug 30, 2024 · 13 comments
Open

[one-optimize] Refactor RemoveDeadNodeWithQueryPass #13863

icodo98 opened this issue Aug 30, 2024 · 13 comments
Assignees
Labels
help wanted Extra attention is needed SSAFY

Comments

@icodo98
Copy link
Contributor

icodo98 commented Aug 30, 2024

I think code here is bug.

// Find the nodes that should not be dead node in candidates
for (auto node : candidates)
{
if (auto service = node->dialect()->service<DeadNodeQueryService>())
{
if (!service->isDeadNode(node))
{
candidates.erase(node);
}
}
}

The current code has an issue where, if both normal and dead nodes are mixed in the candidates, it deletes the normal nodes but skips over checking the other candidate nodes.

@icodo98 icodo98 changed the title [one-optimize] Refactor [one-optimize] Refactor RemoveDeadNodeWithQueryPass Aug 30, 2024
@seanshpark
Copy link
Contributor

do you have a model for this?

@icodo98
Copy link
Contributor Author

icodo98 commented Aug 30, 2024

Background

I found this issue when I'm trying to support non-const paddings for pad ops in #13790 .
I made a draft code and successfully made .opt.circle without any -O options.
But with onecc -O1 one fails with this error below.
circle2circle: circle2circle: /home/ssafy/one/compiler/luci/export/src/CircleTensorExporter.cpp:169: void {anonymous}::MultiOutputDetector::store_outputs(luci::CircleNode*, uint32_t): Assertion 'outs.size() == count' failed.

How to reproduce

Since I found this issue with code in my own repo, I made some sample model to reproduce this issue.
badDead.zip
you could download and reproduce it with onecc -C badDead.cfg.

@icodo98
Copy link
Contributor Author

icodo98 commented Aug 30, 2024

When run with debugger, following is result.

After passing through SubstitutePackToReshapePass, five DeadNode candidates are generated. Among them, the nodes with indices 0, 3, and 4 are optimized dead nodes, while the nodes with indices 1 and 2 are actually not dead. However, it is observed that the node with index 2 (0x5555555bbb10) still remains in the list of candidates even after passing through the for-loop.

  • First, node at 0 index is considered as dead.
    image
  • Next node at 1 index, which is not dead, is considered as alive and erased from candidates
    image
  • Then, the iterator jumps to node 0x5555555bc2c0 which was originally at index 3. It passes node 0x5555555bbb10 and 0x5555555bbb10 remain in candidates
    image

@Hanjin-Choi
Copy link
Contributor

Hanjin-Choi commented Aug 30, 2024

@icodo98
I agree that the problem occurs when skipping after deletion.
If we change the code as follows, the skipping behavior after deleting a node will disappear.

for (auto it = candidates.begin(); it != candidates.end(); ) 
{
    if (auto service = (*it)->dialect()->service<DeadNodeQueryService>()) 
    {
        if (!service->isDeadNode(*it)) 
        {
            it = candidates.erase(it);
            continue;
        }
    }
    ++it;
}

@jinevening
Copy link
Contributor

Nice catch! Thanks for reporting. I'll make a fix soon :)

@jinevening jinevening self-assigned this Sep 2, 2024
@Hanjin-Choi
Copy link
Contributor

Nice catch! Thanks for reporting. I'll make a fix soon :)

If it's alright with you, I was working on a draft based on the code I wrote. Would you like me to continue writing it?

@jinevening
Copy link
Contributor

Of course I'm okay with it :) One concern is that this seems to be out of the scope of ssafy project. @shs-park Is it ok @Hanjin-Choi contribute this topic?

One thing to note: Patch itself would be simple, but I guess that writing a test code would be more laborious.

@Hanjin-Choi
Copy link
Contributor

One thing to note: Patch itself would be simple, but I guess that writing a test code would be more laborious.

I agree. But it would be an honor to have the opportunity to take on the challenge.

@jinevening jinevening assigned Hanjin-Choi and unassigned jinevening Sep 2, 2024
@jinevening
Copy link
Contributor

I assigned @Hanjin-Choi to this issue.

@shs-park
Copy link
Contributor

shs-park commented Sep 2, 2024

Is it ok @Hanjin-Choi contribute this topic?

@jinevening,

The SSAFY project, in this year, also includes covering issues that can be derived from the Shape inference topic.
This part seems like a good opportunity.
Thank you!

@shs-park shs-park added the SSAFY label Sep 2, 2024
@Hanjin-Choi
Copy link
Contributor

I feel sorry for dragging this Issue.
I tried to write test code, but I feel lost.
So, I need some advices.

I thought I need to make graph to check whether node is dead or not.
I tried to make a graph two ways.

  1. First, I tried implementing the graph using loco::Push like other pass tests.

    void create_empty_test_net(loco::Graph *graph)
    {
    assert(graph);
    auto const_node = graph->nodes()->create<loco::ConstGen>();
    {
    const_node->dtype(loco::DataType::FLOAT32);
    const_node->rank(1);
    const_node->dim(0) = 1;
    const_node->size<loco::DataType::FLOAT32>(1);
    const_node->at<loco::DataType::FLOAT32>(0) = 1.0f;
    }
    auto push_node = graph->nodes()->create<loco::Push>();
    {
    push_node->from(const_node);
    }
    auto graph_output = graph->outputs()->create();
    {
    graph_output->name("output");
    graph_output->dtype(loco::DataType::FLOAT32);
    loco::link(graph_output, push_node);
    }
    }

    However, with this approach, it's impossible to assign a value to Node->dialect()->service(), making it difficult to determine if a node is dead using isDeadNode()

  2. Second, I tried implementing the graph using luci::CircleNode.
    I confirmed that if a CircleNode is created, the service is injected.
    However, when I attempted to create a CircleNode using luci/IR/CircleNodes.h, it seems that a build error occurs because luci is not built before logo.

@Hanjin-Choi Hanjin-Choi added the help wanted Extra attention is needed label Sep 12, 2024
@jinevening
Copy link
Contributor

It seems that you try to implement gtest. If you take that approach, you should implement a test under luci, not logo. Although logo is changed, the impact is shown in luci.

Rather than implementing gtest, it would be easier to implement a regression test for the problematic model.

You can add a test in circle2circle-dredd-recipe-test as follows.

  1. Generate tflite recipe from the problematic model (you can use tflchef-reverse tool)

  2. Put the recipe with a proper test.rule under res/TensorFlowLiteRecipes. -> You can just check op count of pack and reshape.

  3. Add test to circle2circle-dredd-recipe-test w/ substitute_pack_to_reshape.

  4. Check the test fails without your change in the debug mode (this problem is only visible in debug mode).
    ./nncc test -R circle2circle_dredd_recipe_test

  5. Check the test passes with your change in the debug mode.

It would be great if you can post the result of 4 and 5 in the draft PR,

@Hanjin-Choi
Copy link
Contributor

Hanjin-Choi commented Sep 13, 2024

It would be great if you can post the result of 4 and 5 in the draft PR,

Thanks a lot! I will follow your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed SSAFY
Projects
None yet
Development

No branches or pull requests

5 participants