Wednesday, January 29, 2014

Response: CoreData.next

Tony Arnold wrote a post entitled CoreData.next where he discusses what he "would investigate for a major revision to Core Data". At the end of the article he writes:

This is all pie in the sky, but it's nice to dream right?
And maybe to implement. Maybe. 
Hit me up on Twitter or App.net with your thoughts, or even better — write a post in response.

This is my post in response. I'm going to argue that it's not pie in the sky. And that it has been implemented. I'll go over every issue in his post, and discuss how YapDatabase has already addressed it.

But first, I'll answer a quick question you might have: "What is YapDatabase?"

YapDatabase is a powerful database framework for iOS and Mac OS X. It's built atop sqlite, and has 2 major components:

  1. The base layer: A key/value store.
  2. The extensions layer: A plugin system that allows you to build powerful extensions atop of the base layer.
If your mind is stuck on "key/value store" then you're missing the bigger picture. This is just the tip of the iceberg. The scaffolding for something much cooler. To give you an idea, here are a few features that are available in YapDatabase via extensions:

  • Views & a replacement for NSFetchedResultsController
  • Secondary Indexes & powerful query options
  • Full Text Search using the crazy fast module from Google
  • Relationships between objects and cascading delete rules
Not your typical key/value store. This thing was built from the ground up after many years of experience with Core Data. It's tailored to fit the requirements of a client side application. And it had concurrency in mind from the very start. For more information, check out the extensive documentation on the wiki.

And now back to the CoreData.next article, and list of issues with Core Data:

Issue #1

Setting up is too much work
Creating a usable Core Data stack for your application by hand requires far too much boilerplate. Apple should be providing a very basic [NSCoreDataStack newStackWithOptions:…] or something similar that gets your app up and running with the best possible set of defaults. 
Libraries like MagicalRecord provide this now, but it seems odd that the Core Data framework does little to aid setting up the stack in a recommended manner.
I agree. There's a lot of boiler plate required to get up and running with Core Data. Here's what it looks like with YapDatabase:

YapDatabase *database = [[YapDatabase alloc] initWithPath:databasePath];

Of course, if you want to get more advanced, there are other init methods that allow for more options. But for a basic usable stack, this one-liner is about all you need.

Issue #2

It's workarounds all the way down and then you hit NSMangedObject
Yes, I meant to spell it that way.
In the last couple of years, Apple have made attempts to simplify and extend some parts of Core Data, including the addition of parent contexts, and methods to help run your code on the correct thread for a specific NSManagedObjectContext (i.e. -(void)performBlock: and -(void)performBlockAndWait:).
The problem is, these additions are essentially workarounds for problems that shouldn't really be the concern of the developer using the API.
The additions I cited above all stem from NSManagedObjectContext and NSManagedObject instances not being thread safe. I'd rather see Apple create new, non-backwards compatible API to solve this than continue to patch underlying design issues with new API. I get this one is not trivial. Threading never is.

And now we get to the crux of the problem. And he's onto something here.

Much of the pain of using Core Data stems from one architectural decision that lead the Core Data team down a specific road:

    An object you fetch from the database should get updated in-place if it is changed on another thread/context.

If you think about this for a moment, you begin to see why some of the odd decisions were made.

You cannot store a plain-old NSString in Core Data. It must be wrapped inside a NSManagedObject. Why? Because of the mandate above. An NSString is immutable. And Core Data wants to update the object you have in your hand, at the moment the context merges changes from another context. And it can't change an immutable object. So it must wrap it inside something else, and change that.

And as we proceed down this road, we begin to see why NSManagedObject's are tied to a specific NSManagedObjectContext. Somebody needs to update the NSManagedObject right? This is the job of the context. So the context must own the managed object.

This also has implications on threading. Why is it not safe to pass a NSManagedObject to background threads? Because that darn NSManagedObjectContext may mutate the thing at any time. So now threading becomes a nightmare. The fact is, the rules for NSManagedObject don't follow the same rules you have for all other objects. Developers understand mutable vs immutable. Especially in objective-c where there are often 2 versions of a class, one mutable and one immutable. And developers understand the implications concerning threading: "Don't be mutating something on one thread if it's being used on another." This is something we've lived with for a long time, and we're quite adept at solving these problems. But NSManagedObject presents a whole new set of problems. We can't just wrap one of these things in a lock, or only access it through a serial queue. Because NSManagedObjectContext isn't going to follow our rules. It won't play nicely, and so we're forced to play entirely by its rules.

It's not impossible to use Core Data in a highly concurrent application. I did it for a long time. But it presents many frustrations. First of all, Core Data must be taught (in depth) to the entire team. There are many parts of an app that can be "black boxed". One developer dives deep, solves the problems, and presents a clean API for the rest of us. We don't know and we don't care 'cause it works. Core Data escapes this concept. Everybody is forced to deal with it. Secondly, because NSManagedObjects are not normal objects, you need new API's to deal with it. "I can't pass you an object, I need to pass you an NSManagedObjectID. And then you need to fetch your own NSManagedObject within your own context..."
I'd rather see Apple create new, non-backwards compatible API to solve this than continue to patch underlying design issues
I agree. Which is why YapDatabase is completely different. But different is sometimes scary. And it means you'd have to learn something new. So if this is too much for anyone, feel free to continue complaining about Core Data. Otherwise take the red pill, and learn about YapDatabase.

A "Hello World" introduction to YapDatabase.

YapDatabase doesn't use NSManagedObject, or any similar kind of silliness. You can store plain-old NSObjects. This includes your own NSObject subclasses, as well as the usual suspects such as NSString, NSNumber, UIColor, UIImage, etc. As long as the object can somehow be serialized & deserialized you're good to go. This could be via NSCoding, or any other way you want such as JSON serialization. It's completely configurable.

What kinds of objects does YapDatabase support?

YapDatabase will never mutate any of your objects. Ever. This gives you complete control over your objects, and how & when you access and mutate them. You know, the way you're used to dealing with objects.

And concurrency is a breeze. Because YapDatabase was designed with concurrency in mind. You access the database through connections. Here's how hard it is to create a connection:

YapDatabaseConnection *connection = [database newConnection];

And you can create multiple connections for concurrency. Accessing the database? That's done through a transaction. Just like you're used to with normal databases:

[connection readWithBlock:^(YapDatabaseReadTransaction *transaction) {
    coffee = [transaction objectForKey:drinkId inCollection:@"drinks"]);
}];

Of course, you can do as much within a transaction as you need to. Which means batch fetching or batch updates are straight-forward. And if you're on the main-thread, you can freeze your connection on a particular commit, and move it forward when ready, which means UI updates and animations are easy too.

Issue #3

Except the exceptions 
If you've used Core Data, chances are you've seen it do it's best impression of a monkey in a cage. It flings internal NSExceptions around like a monkey with poop. Debugging a Core Data app with lldb intercepting all exceptions is an exercise in frustration and sifting through "what's yours" and "what's mine". 
I'd like to see all exceptions thrown by the framework properly caught and dealt with before the developer ever sees them in Xcode.
I feel your pain. You may take solace in the fact that YapDatabase is open source. If there is an exception, and it's not in your own code, then at least you can see exactly where and why it happened. (But I'd like to think this is unlikely.)

Issue #4

Easier handling of binary data
One of the common things new users of Core Data try to do is store images and other data in their SQLite persistent store. It's entirely possible to do so, and the addition of external binary representations for Binary attributes helps improve performance when doing so.
The issue is, accessing that data can have significant memory implications as you can't really take advantage of any of the great optimisations Apple has made to image/file loading without first dumping the data of the attribute into a temporary file outside of your store.
Want to use any of the audio visual frameworks to access media that you've stored in your persistent store? Forget it. They all need a URL or file path.
Allowing direct access to binary data attributes external file URLs (and allowing access to read that file via the URL) would greatly simplify this scenario, and I'd imagine it would reduce disk I/O.

No arguments here. There are very good reasons why large data items should be stored directly in a file, as opposed to in the database. These reasons extend all the way down to the sqlite file format (and how it splits data into pages). Which is why YapDatabase has exactly the same recommendations that Core Data has:

    If size > 1mb, then store on disk and reference it inside of [database]

But in YapDatabase, everything is transparent. You have direct access to the filePath. And you can even use the relationship extension to specify that the database should automatically delete the associated file when the database object is deleted.

Issue #5

NSFetchedResultsController on both platforms
I use NSFetchedResultsController in almost all of my iOS apps. I set it up and forget about it, and it takes care of marshalling any changes to my datasources. NSArrayController is the prototype that we have on OS X, and it's just not setup to handle things like animating UI changes in table views and the like.
A simple, fast way to automatically query and update a datasource and handle discrete changes on both platforms would be a huge time saver. Making it capable of doing it's work nicely on a background thread would be a big win, too.

Yup, YapDatabase has this already. It's available as part of the views extension. Which means its super easy to animate changes to your tableView / collectionView. Regardless of platform.

Issue #6

Performance and simplicity
Any improvements to performance in any API are always a good thing, but Core Data has some specific areas that could use attention. Anything involving batch operations could use a bit of love (as outlined by Brent in his series of posts on Core Data in Vesper).
Performance and simplicity are the heart and soul of YapDatabase. A good example of this is caching. Caching is built into the system, and it caches your objects at the object level (post deserialization). Which means fetching can be crazy fast in common cases where you're fetching the same set of objects over and over (e.g., scrolling up and down). It's completely configurable too.

Understanding the built-in cache

And the transaction architecture means that batch operations are simple. Both to implement, and to understand.


YapDatabase isn't perfect. It's still missing features here and there. But it's still young. About 13 months old. I don't propose that it will solve all your problems. Or even that it's the right database for your app. But I do propose that it's a fairly solid alternative to Core Data. And it's getting better every day.


Wednesday, January 22, 2014

Mastering the XMPP Framework

Somebody wrote a book about one of my open source projects! (XMPPFramework)

eBookCover


Mastering the XMPP Framework: Develop XMPP Chat Applications for iOS

"In Mastering the XMPP Framework Peter explains step by step with great detail how to build professional iOS applications that make use of the XMPP framework using modern Objective C programming principles. If you are an iOS developer with some experience and want to develop a WhatsApp like application, this book is certainly worth adding to your library and reading it will save you precious time."

Author: Peter van de Put
ISBN: 978-1-483-51646-2
Release date: December 25th 2013
Publisher: Peter van de Put
Pages: 225
Format: eBook (Kindle, iPad)

Buy for iPad on iBooks
Buy for Kindle on Amazon

Tuesday, January 21, 2014

1Password Credits

A friend brought this to my attention the other day:
1Password Credits
It's great to mentioned in the credits. Even better when the application giving me credit is an application I use all the time!

Thursday, May 24, 2012

Does your Xcode do this?

Screen Shot 2013-08-23 at 10.30.09 AM

Take a look at the screenshot above. Notice anything different in the Xcode console? Yup, those log statements have color!

When I'm implementing a new feature, or tracking down some odd bug, I end up writing a bunch of log statements. This helps me check all my assumptions, and trace the code flow. Ultimately it helps me narrow down the location of the problem.

But one thing I've noticed is that, when I have a bunch of debug log statements flying by in my console, it becomes easier to miss error messages. I think most developers have experienced this. The simple solution is to do something like this:

NSLog(@"*************** I'm going to see this log statement ***************");

But this only goes so far. I mean, it only works if you know what log statement you're looking for. What if things go wrong in other parts of the application, and some warning is spit out.

The problem becomes easier if you use a professional logging framework like Lumberjack. That way your error statements are different from your debugging statements.

DDLogError(@"Some important error message");
DDLogDebug(@"Just some debug message");

Using a tool like Lumberjack you could automatically format error messages differently than debug messages. (No changes needed to your log statements, just add a formatter.) In fact, if you wanted to, you could automatically add all those asterisks to any error messages.

But you know what would be better?  Color.

Color naturally catches your eye. You don't even have to think about it. For example, can you spot the error below?

2012-05-24 02:10:58:101 Printer[51197:403] Warming up printer
2012-05-24 02:10:58:101 Printer[51197:403] Checking toner levels
2012-05-24 02:10:58:101 Printer[51197:403] Spooling document
2012-05-24 02:10:58:101 Printer[51197:403] Dispatching paper
2012-05-24 02:10:58:101 Printer[51197:403] Paper jam
2012-05-24 02:10:58:101 Printer[51197:403] Spooling document
2012-05-24 02:10:58:101 Printer[51197:403] Printer warmed
2012-05-24 02:10:58:101 Printer[51197:403] Toner levels OK
2012-05-24 02:10:58:101 Printer[51197:403] Saving document

This is possible with the help of the XcodeColors plugin for Xcode!

And it gets even better with Lumberjack. Because Lumberjack now natively supports XcodeColors! Just tell Lumberjack what colors you'd like to use, for what log levels, and you're done!

And if color isn't available (e.g. XcodeColors isn't installed), then the framework just automatically does the right thing. So if you install XcodeColors on your machine, and enable colors in your team project, your teammates (without XcodeColors) won't suffer, or even notice.

Plus Lumberjack colors automatically work if you run your application from within a terminal! (E.g. Terminal.app, not Xcode) If your terminal supports color (xterm-color or xterm-256color) like the Terminal.app in Lion, then Lumberjack automatically maps your color customizations to the closest available color supported by the shell!

Still using NSLog?!? Switch to Lumberjack. First, it's actually faster than NSLog. Second, most every other development community is already using a professional logging framework. Isn't it time the Apple development community caught up?

Tuesday, November 8, 2011

Logging & Grand Central Dispatch

Grand Central Dispatch is a marvelous technology. It provides our application with a thread-pool, and various methods to execute code on the threads in that thread pool. And best of all, it automatically manages the thread-pool size based on some very detailed knowledge of what the rest of the system is doing.


However, issuing log statements from a dispatch queue can be confusing. To see what I mean, take a look at what a normal NSLog statement spits out:

2011-11-08 17:09:47.642 MyAppName[50662:707] My log statement

Most of this is pretty self-explanatory. But the stuff in [brackets] may need a bit of explanation. It is actually [<process_id>:<mach_thread_id_in_hex>].

In the past, the thread id would help make it easier to read log statements coming from multi-threaded code. For example:

2011-11-08 17:09:47:643 MyAppName[50662:707] Log from main thread
2011-11-08 17:09:47:643 MyAppName[50662:2303] Log from thread 2
2011-11-08 17:09:47:644 MyAppName[50662:550b] Log from thread 3

However, with GCD, every block of code we hand it will get executed on some random thread from the thread-pool.  Even if we tell GCD to execute the code on a serial queue, it very well may execute each block on a different thread (albeit in a serial fashion). For example, this code:

for (i = 0; i < count; i++)
{
    dispatch_async(mySerialQueue, ^{ NSLog(@"help me"); });
}

might give you this result:

2011-11-08 17:25:18:062 MyAppName[50706:4f07] help me
2011-11-08 17:25:18:062 MyAppName[50706:5603] help me
2011-11-08 17:25:18:064 MyAppName[50706:5107] help me
2011-11-08 17:25:18.062 MyAppName[50706:520b] help me
2011-11-08 17:25:18:066 MyAppName[50706:1a03] help me

Now add to this the fact that your application my have multiple dispatch queues, all running in parallel.  So it sometimes becomes difficult to follow the log statements coming from our code.

What would be more helpful is if we could replace the mach_thread_id with the gcd_queue_name (when available).

Using CocoaLumberjack and a DispatchQueueLogFormatter, this becomes a simple operation.

If you've never heard of CocoaLumberjack before, you should really go check it out. There's a TON of information and documentation available on the GitHub project page.

Then you simply apply a DispatchQueueLogFormatter and you can get output like this:

2011-11-08 17:25:18:062 MyAppName[myQueue] help me
2011-11-08 17:25:18:062 MyAppName[myQueue] help me
2011-11-08 17:25:18:064 MyAppName[myQueue] help me
2011-11-08 17:25:18.062 MyAppName[myQueue] help me
2011-11-08 17:25:18:066 MyAppName[myQueue] help me


The log formatter comes with several different configuration options. But my favorite is the ability to set replacements for dispatch queue labels. For example, the main-thread has a queue label of "com.apple.main-thread". I find this to be a bit long, so I generally do something like this:

[formatter setReplacementString:@"main" forQueueLabel:@"com.apple.main-thread"];

Then I get:

2011-11-08 17:09:47:643 MyAppName[main] Log from main thread

Easy as cake!

Thursday, June 9, 2011

XMPP Framework v3 released

Version 3 of the XMPP Framework was officially released yesterday.

XMPP stands for "eXtensible Messaging and Presence Protocol". It is the protocol used by Google Talk and Facebook Chat. It can also be used for many other applications outside the realm of chat. For example, I have seen it used in P2P file transfer applications, home automation software, and even in medical applications.


Version 3 can be summarized as follows: "Massive Parallelism". It has been redesigned to take full advantage of Grand Central Dispatch (GCD). The entire stack is now thread-safe, and everything runs within its own dispatch queue.

Here are some of the highlights:

GCD Based Networking

We switched to GCDAsyncSocket. This moves all networking IO off the main thread and onto GCD. (You may recall how CocoaHTTPServer switched to GCDAsyncSocket and saw a 200% performance improvement.)

XML parsing in its own dedicated queue

Parallel network IO and XML parsing anyone?

Thread-Safe XMPPStream

XMPPStream, the heart of the XMPP Framework, is now thread-safe. So feel free to send elements from any thread you wish. This is accomplished internally by having XMPPStream run within its own dispatch queue.

If you're following along, that means parallel network IO, parallel XML parsing, and parallel xmpp stanza routing. Seeing a trend?

Every XMPPModule can now be run in its own queue

The XMPP Framework has a module plug-in system for extensions. Extensions include things like roster support, multi-user chat, publish-subscribe (PubSub), capabilities, ping, etc. Now all of these extensions can run in parallel.

And combine this parallelism with what we've mentioned so far... But wait, there's more.

Delegates can specify their own queue

All this parallelism within the framework is great. But ultimately you, the developer, are going to have to do some processing of your own. With v3 you can specify a dispatch queue to invoke your delegate methods on. So if you wanted to continue to do your xmpp processing on the main thread you could do this:

[xmppStream addDelegate:self delegateQueue:dispatch_get_main_queue()];
But you can trivially parallelize your processing by creating and specifying your own queue:

[xmppStream addDelegate:self delegateQueue:myProcessingQueue];


Dedicated logging framework

And if that wasn't enough, we threw in a GCD based logging framework. It's extremely fast and powerful, but more importantly it's extremely flexible. It gives you full control over what should be logged, and where those log statements should go.

More Information on the Wiki

What's new in V3
Intro to XMPP Framework

 

Tuesday, June 7, 2011

CocoaHTTPServer gets WebDAV support

An open source contribution to the CocoaHTTPServer project has just added WebDAV support!

CocoaHTTPServer is a small embeddable http server that you can add to your Mac or iOS application. WebDAV is a technology that allows users to mount web directories with the Finder. Put the two together, and now your users can use the finder to manage files on their iOS app!

Special thanks to Pierre-Olivier Latour for the contribution.