Skip to content

Commit

Permalink
[FIXED JENKINS-24050] don't let canceled keys kill the selector thread
Browse files Browse the repository at this point in the history
In looking at the proposed PR #24 (#24), I feel bit
uneasy to mask the problem like it does.

The code in question is looping through selected keys and processing it one by one.

The only code that calls key.cancel() is done from the selector thread that runs this loop.
So I don't understand how it is possible that the key picked up from selected key set is
already cancelled here. I wonder if something more is going on.

Regardless, I agree that this shouldn't kill the selector thread, which breaks all the slaves
in one go. This change flags and reports the problem, kill the connection related to that key,
 then continue to serve other connections.
  • Loading branch information
kohsuke committed Sep 5, 2014
1 parent 602dadc commit 1083a97
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions src/main/java/org/jenkinsci/remoting/nio/NioChannelHub.java
Expand Up @@ -17,6 +17,7 @@
import java.io.InputStream;
import java.io.InterruptedIOException;
import java.io.OutputStream;
import java.nio.channels.CancelledKeyException;
import java.nio.channels.ClosedSelectorException;
import java.nio.channels.ReadableByteChannel;
import java.nio.channels.SelectableChannel;
Expand Down Expand Up @@ -325,6 +326,9 @@ public void reregister() throws IOException {
}
}

/**
* If both directions are closed, cancel the whole key.
*/
@SelectorThreadOnly
private void maybeCancelKey() throws IOException {
SelectionKey key = ch.keyFor(selector);
Expand Down Expand Up @@ -579,6 +583,15 @@ public void run() {
} catch (IOException e) {
LOGGER.log(WARNING, "Communication problem", e);
t.abort(e);
} catch (CancelledKeyException e) {
// see JENKINS-24050. I don't understand how this can happen, given that the selector
// thread is the only thread that cancels keys. So to better understand what's going on,
// report the problem.
LOGGER.log(SEVERE, "Unexpected key cancellation for "+t, e);
// to be on the safe side, abort the communication. if we don't do this, it's possible
// that the key never gets re-registered to the selector, and the traffic will hang
// on this channel.
t.abort(e);
}
} else {
onSelected(key);
Expand Down

4 comments on commit 1083a97

@kbrowder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if anything should be allowed to kill the selector thread (with the possible exception of an InterrputedException), anyways just my 2c (although maybe it's worth less :D)

Also I guess I'm wondering the same things as: http://stackoverflow.com/questions/12912388/getting-java-nio-channels-cancelledkeyexception-without-explicit-cancellation-of
We do rip a ton of Slaves from Jenkins, usually not particularly abruptly (we remove them on the Jenkins side first then delete them).

Thanks a bunch for the fix, I'll pick it up as soon as it's released and toss the logs in JIRA when we get them.

@kohsuke
Copy link
Member Author

@kohsuke kohsuke commented on 1083a97 Sep 6, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrowder, thanks for finding that stackoverflow thread. The ansewr from EJP is something I can test, and it'd shed light on the behaviour of this. I'll try that.

@kbrowder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there's a new remoting jar (2.45) released with this change, will this make it into the next Jenkins (1.580)? Just wondering if I'm going to be updating our Jenkins install in the next week. Just noticed that the Jenkins pom is still on 2.44 on rc and master.

@oldelvet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted pull request jenkinsci/jenkins#1393 that will pull in 2.45 as part of some other changes that I made to remoting. Hopefully that should get merged soon.

Please sign in to comment.