Monday, November 15, 2010

Introducing GCD based AsyncSocket

I've rewritten AsyncSocket from the ground up using Grand Central Dispatch. It's now thread-safe and up to 400% faster. It makes it trivial to parallelize data processing tasks, and an ideal candidate for writing scalable servers.



Apple released Grand Central Dispatch with Mac OS X 10.6. GCD comes with asynchronous socket monitoring utilities that are written on top of the fast and efficient kqueue architecture. And while they were at it, they also rewrote the core of the CFNetworking stack (CFSocket) to tie into these GCD tools. But while they rewrote the underlying architecture of CFSocket, they did NOT change the exterior API's available to you and me. What this means is that Apple's CF/NS networking classes (CFSocket, CFStream, NSStream) are still RunLoop based, so if you want asynchronous notifications you'll need to add an observer to be invoked on some thread.

This runloop based architecture makes it more difficult to parallelize tasks. It also makes if more difficult if you want to write a server that can handle hundreds of clients simultaneously.

But this isn't the only the problem. Both kqueue's and GCD provide additional functionality that improves performance, and this functionality is hidden by Apple's current CF/NS networking API's. Let's consider reading data from a socket:

The old way to do it would be to poll the socket until you're notified there is data available to be read. But how much data is available? The poll doesn't tell you. You're just going to have to guess. If you guess too little, you'll end up invoking read more often. Which means more kernel calls, and a slower application. Guess too much, and you'll end up allocating larger than necessary buffers and wasting memory. Want to know exactly how much data is available? Technically there is a method for that. But now we're back at additional kernel calls, and we may as well invoke read multiple times.

Kqueues solve this problem. When you register to receive notifications of available data, the notification will tell you how much data is available to be read. Then you can slurp it all up in a single read, and go back to waiting for notifications. It's faster and more efficient.

Apple's runloop based socket API's don't tell us how much data is available to be read. Similar to a poll, they just tell us that data is available.

In addition to this, if you're writing a server and listening for incoming connections, a poll could tell you if there is an available connection ready to be accepted. But again, it doesn't tell you how many connections are pending. Kqueues provide this information.

GCDAsyncSocket is based on Apple's GCD socket tools, and thus it can take advantage of the features that kqueues provide.

So how does it perform?

To answer this question I've been porting CocoaHTTPServer to run atop GCDAsyncSocket. Preliminary benchmarks indicate a performance boost somewhere between 250% to 400% depending upon the amount of concurrent connections.

I'll be continuing the CocoaHTTPServer port this week. When it's finished, I'll post more detailed benchmarks. (With charts and graphs!)


Take a look at GCDAsyncSocket via the Google Code Page.

 

9 comments:

Anonymous said...

Very cool. I'd rather use an open source GCD networking class than the custom one I've been rolling with.

The only trivial bummer is that the delegate methods still have the "on" prefixes, presumably for compatibility. They go against the Cocoa convention of just starting with a noun. E.g., socket:didReadData:withTag: instead of onSocket:didReadData:withTag:. But that's just quibbling.

Robbie Hanson said...

That's a good point. Yes, I left them as-is for compatibility. But seeing as how this will "break" stuff anyway, now would be a good time to change it.

Robbie Hanson said...

Fixed in revision 118. Excellent suggestion. Thank you.

Anonymous said...

Thanks for the change. It's a superficial complaint, but the methods always looked unconventional in Cocoa code. I don't have to submit a patch now. :)

And thank you for the hard work on the project as a whole. It's appreciated and helps everybody.

Anonymous said...

Thanks for the excellent work. i am still pretty new to this and trying to read the source code to understand how this works. From what i see,"enableBackgroundingOnSocket" only works on the SSL/TLS communication with CFStream. Is this correct? I am not sure how GCD is handled in the iphone background mode.. i would appreciate if you could shed some light on this.

Eric

Federico Rodriguez said...

Amazing work!

My main question is: has you or anyone you've talked to had this pass through the App Store reviewers successfully? I know that to enable backgrounding on a socket, for example, you need to have an "is a VOIP app" flag in your info.plist...meaning your app must be a voice-over-IP app. Little things like these could lead to an App Store rejection, and I was wondering if there were any more little things you know of that we should be wary of when submitting to the App Store using your tool.

Thanks for your work!
-Fed

David S said...

I'm using the GCDAsyncUdpSocket class for multicast receive and it seems to have inherited an old bug from the non-GCD class.

When receiving udp multicast packets, two packets show up on receive for each packet sent. Each duplicate packet has an IPv6 address as its source address. Packet trace over the air shows that only one packet is being sent, with an IPv4 address.

The workaround is to disable IPv6 after creating the socket, then it works fine and the duplicates no longer appear.

Is it a bug, or perhaps a feature?

Regards,

David

Nick Redwood said...

Hey, this is great it has really helped me out a lot with a project I am working on. Do you think you could point me in the direction of a way so I can send udp messages over the net between a computer and iphone?

Anonymous said...

Cn anyone compare the speed of this to (gasp) Windows I/O completion ports? Now that I've accepted I'm a Mac, I want to port my code over. But I want to make sure I can still support a large number of simultaneous connections and still have the ability to quickly close dead clients as well as quickly discard half open sockets without having to implement one thread per connection.