Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactor dies unexpectedly #22

Open
avalanche123 opened this issue Dec 1, 2014 · 4 comments
Open

Reactor dies unexpectedly #22

avalanche123 opened this issue Dec 1, 2014 · 4 comments

Comments

@avalanche123
Copy link
Contributor

after modifying io_reactor.rb by adding the following on line 136:

rescue => e
  puts "#{e.class.name}: #{e.message}\n" + Array(e.backtrace).join("\n")

I get the following error:

TypeError: can't convert Ione::Io::Connection to IO (Ione::Io::Connection#to_io gives NilClass)
/home/kishan/.rvm/gems/ruby-1.9.3-p547@global/gems/ione-1.2.0/lib/ione/io/io_reactor.rb:396:in `select'
/home/kishan/.rvm/gems/ruby-1.9.3-p547@global/gems/ione-1.2.0/lib/ione/io/io_reactor.rb:396:in `tick'
/home/kishan/.rvm/gems/ruby-1.9.3-p547@global/gems/ione-1.2.0/lib/ione/io/io_reactor.rb:133:in `block in start'
integration/test.rb:38:in `join': deadlock detected (fatal)
    from integration/test.rb:38:in `block in create_sessions_concurrently2'
    from integration/test.rb:38:in `each'
    from integration/test.rb:38:in `create_sessions_concurrently2'
    from integration/test.rb:15:in `run_test'
    from integration/test.rb:61:in `<main>'

The issue is that the connections are created and closed from different threads, here is a trimmed down sample that consistently fails on linux, while passing on OS X:

require 'bundler/setup'
require 'cassandra'
require 'cassandra/version'

puts Cassandra::VERSION

class Test
  def run_test
    cluster_list = []
    cluster_list << Cassandra.cluster

    session_list = create_sessions_concurrently2(cluster_list[0], 1)
    p session_list
    session_list = close_sessions_concurrently2(session_list, 1)
    p session_list

    session_list2 = create_sessions_concurrently2(cluster_list[0], 1) # DEADLOCK HERE
    p session_list2

    session_list = close_sessions_concurrently2(session_list2, 1)
    p session_list
    cluster_list[0].close
  end

  def create_sessions_concurrently2(cluster, num_sessions)
    sessions = []
    threads = (1..num_sessions).map do
      Thread.new do
        begin
          session = cluster.connect
          sessions << session
        rescue Exception => e
          # session.close
          raise RuntimeError.new("Error while creating a session. #{e.class.name}: #{e.message}
                                          Backtrace: #{e.backtrace.inspect}")
        end
      end
    end

    threads.each {|th| th.join} # DEADLOCK HERE
    sessions
  end

  def close_sessions_concurrently2(session_list, num_sessions)
    session_list2 = session_list[0...num_sessions]
    threads = session_list2.map do |session|
      Thread.new do
        begin
          session.close
          session_list.delete(session)
        rescue Exception => e
          raise RuntimeError.new("Error while closing a session. #{e.class.name}: #{e.message}
                                  Backtrace: #{e.backtrace.inspect}")
        end
      end
    end

    threads.each {|th| th.join}
    session_list
  end
end

Test.new.run_test

I believe there is a race between close and connected? that causes a closing socket to be added to the poll list.

@avalanche123
Copy link
Contributor Author

Ione version is 1.2 (using cassandra-driver 1.0.0)
Ruby versions are 1.9.3, 2.0 and 2.1
Linux version 3.13.0-24-generic (buildd@batsu) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1))

@iconara
Copy link
Owner

iconara commented Dec 2, 2014

Yes, it looks like there is a possibility that a socket can be closed between the #closed? check in the reactor and the IO.select. It's very curious that it doesn't happen on OS X. It also doesn't seem to happen in JRuby (on OS X at least, but it feels like it should be the same in Linux).

I have two solutions to this, but first a workaround: schedule a timer and close sockets in the callback.

reactor.schedule_timer(0).on_complete do
  session.close
end

The handler is always called from the reactor thread so this ensures that the socket is properly closed without any racing.

The simplest fix to the underlying issue is to remove line 32 of lib/ione/io/base_connection.rb (https://github.com/iconara/ione/blob/master/lib/ione/io/base_connection.rb#L32). This will cause IO.select to raise an IOError in the situation where it could now raise TypeError because of the nil IO. That is something that the reactor expects (because there's always the possibility that a socket will be closed).

Could you try that out and see if it helps? If it does I can make a bug fix release.

A better solution is to rewrite socket operations so that they are always performed from the reactor thread. This is basically what happens when connecting or writing; the action is buffered and only performed when the socket is ready. #close should probably just set a flag that is acted on just before the select call. This will probably have to wait a bit, but it feels like a more robust solution. There's been a few issues with socket closing, and trying to make it work with concurrent modifications will just make it more complicated without adding any benefits.

@avalanche123
Copy link
Contributor Author

I've introduced your suggestion here datastax/ruby-driver@8c45eee, this should resolve issues that we faced in the ruby-driver.

I do think that the simples quickfix is to not set @io to nil and let the regular error handling mechanism take care of cleaning up closed sockets, but we don't need it with the fix I mentioned above.

Thanks for your help!

@iconara
Copy link
Owner

iconara commented Dec 6, 2014

I'll re-open this to use as a reminder to fix the underlying issue. The workaround should be applicable to all users of Ione in the meantime, if someone stumbles on this issue,

@iconara iconara reopened this Dec 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants