Wednesday, January 29, 2014

Response: CoreData.next

Tony Arnold wrote a post entitled CoreData.next where he discusses what he "would investigate for a major revision to Core Data". At the end of the article he writes:

This is all pie in the sky, but it's nice to dream right?
And maybe to implement. Maybe. 
Hit me up on Twitter or App.net with your thoughts, or even better — write a post in response.

This is my post in response. I'm going to argue that it's not pie in the sky. And that it has been implemented. I'll go over every issue in his post, and discuss how YapDatabase has already addressed it.

But first, I'll answer a quick question you might have: "What is YapDatabase?"

YapDatabase is a powerful database framework for iOS and Mac OS X. It's built atop sqlite, and has 2 major components:

  1. The base layer: A key/value store.
  2. The extensions layer: A plugin system that allows you to build powerful extensions atop of the base layer.
If your mind is stuck on "key/value store" then you're missing the bigger picture. This is just the tip of the iceberg. The scaffolding for something much cooler. To give you an idea, here are a few features that are available in YapDatabase via extensions:

  • Views & a replacement for NSFetchedResultsController
  • Secondary Indexes & powerful query options
  • Full Text Search using the crazy fast module from Google
  • Relationships between objects and cascading delete rules
Not your typical key/value store. This thing was built from the ground up after many years of experience with Core Data. It's tailored to fit the requirements of a client side application. And it had concurrency in mind from the very start. For more information, check out the extensive documentation on the wiki.

And now back to the CoreData.next article, and list of issues with Core Data:

Issue #1

Setting up is too much work
Creating a usable Core Data stack for your application by hand requires far too much boilerplate. Apple should be providing a very basic [NSCoreDataStack newStackWithOptions:…] or something similar that gets your app up and running with the best possible set of defaults. 
Libraries like MagicalRecord provide this now, but it seems odd that the Core Data framework does little to aid setting up the stack in a recommended manner.
I agree. There's a lot of boiler plate required to get up and running with Core Data. Here's what it looks like with YapDatabase:

YapDatabase *database = [[YapDatabase alloc] initWithPath:databasePath];

Of course, if you want to get more advanced, there are other init methods that allow for more options. But for a basic usable stack, this one-liner is about all you need.

Issue #2

It's workarounds all the way down and then you hit NSMangedObject
Yes, I meant to spell it that way.
In the last couple of years, Apple have made attempts to simplify and extend some parts of Core Data, including the addition of parent contexts, and methods to help run your code on the correct thread for a specific NSManagedObjectContext (i.e. -(void)performBlock: and -(void)performBlockAndWait:).
The problem is, these additions are essentially workarounds for problems that shouldn't really be the concern of the developer using the API.
The additions I cited above all stem from NSManagedObjectContext and NSManagedObject instances not being thread safe. I'd rather see Apple create new, non-backwards compatible API to solve this than continue to patch underlying design issues with new API. I get this one is not trivial. Threading never is.

And now we get to the crux of the problem. And he's onto something here.

Much of the pain of using Core Data stems from one architectural decision that lead the Core Data team down a specific road:

    An object you fetch from the database should get updated in-place if it is changed on another thread/context.

If you think about this for a moment, you begin to see why some of the odd decisions were made.

You cannot store a plain-old NSString in Core Data. It must be wrapped inside a NSManagedObject. Why? Because of the mandate above. An NSString is immutable. And Core Data wants to update the object you have in your hand, at the moment the context merges changes from another context. And it can't change an immutable object. So it must wrap it inside something else, and change that.

And as we proceed down this road, we begin to see why NSManagedObject's are tied to a specific NSManagedObjectContext. Somebody needs to update the NSManagedObject right? This is the job of the context. So the context must own the managed object.

This also has implications on threading. Why is it not safe to pass a NSManagedObject to background threads? Because that darn NSManagedObjectContext may mutate the thing at any time. So now threading becomes a nightmare. The fact is, the rules for NSManagedObject don't follow the same rules you have for all other objects. Developers understand mutable vs immutable. Especially in objective-c where there are often 2 versions of a class, one mutable and one immutable. And developers understand the implications concerning threading: "Don't be mutating something on one thread if it's being used on another." This is something we've lived with for a long time, and we're quite adept at solving these problems. But NSManagedObject presents a whole new set of problems. We can't just wrap one of these things in a lock, or only access it through a serial queue. Because NSManagedObjectContext isn't going to follow our rules. It won't play nicely, and so we're forced to play entirely by its rules.

It's not impossible to use Core Data in a highly concurrent application. I did it for a long time. But it presents many frustrations. First of all, Core Data must be taught (in depth) to the entire team. There are many parts of an app that can be "black boxed". One developer dives deep, solves the problems, and presents a clean API for the rest of us. We don't know and we don't care 'cause it works. Core Data escapes this concept. Everybody is forced to deal with it. Secondly, because NSManagedObjects are not normal objects, you need new API's to deal with it. "I can't pass you an object, I need to pass you an NSManagedObjectID. And then you need to fetch your own NSManagedObject within your own context..."
I'd rather see Apple create new, non-backwards compatible API to solve this than continue to patch underlying design issues
I agree. Which is why YapDatabase is completely different. But different is sometimes scary. And it means you'd have to learn something new. So if this is too much for anyone, feel free to continue complaining about Core Data. Otherwise take the red pill, and learn about YapDatabase.

A "Hello World" introduction to YapDatabase.

YapDatabase doesn't use NSManagedObject, or any similar kind of silliness. You can store plain-old NSObjects. This includes your own NSObject subclasses, as well as the usual suspects such as NSString, NSNumber, UIColor, UIImage, etc. As long as the object can somehow be serialized & deserialized you're good to go. This could be via NSCoding, or any other way you want such as JSON serialization. It's completely configurable.

What kinds of objects does YapDatabase support?

YapDatabase will never mutate any of your objects. Ever. This gives you complete control over your objects, and how & when you access and mutate them. You know, the way you're used to dealing with objects.

And concurrency is a breeze. Because YapDatabase was designed with concurrency in mind. You access the database through connections. Here's how hard it is to create a connection:

YapDatabaseConnection *connection = [database newConnection];

And you can create multiple connections for concurrency. Accessing the database? That's done through a transaction. Just like you're used to with normal databases:

[connection readWithBlock:^(YapDatabaseReadTransaction *transaction) {
    coffee = [transaction objectForKey:drinkId inCollection:@"drinks"]);
}];

Of course, you can do as much within a transaction as you need to. Which means batch fetching or batch updates are straight-forward. And if you're on the main-thread, you can freeze your connection on a particular commit, and move it forward when ready, which means UI updates and animations are easy too.

Issue #3

Except the exceptions 
If you've used Core Data, chances are you've seen it do it's best impression of a monkey in a cage. It flings internal NSExceptions around like a monkey with poop. Debugging a Core Data app with lldb intercepting all exceptions is an exercise in frustration and sifting through "what's yours" and "what's mine". 
I'd like to see all exceptions thrown by the framework properly caught and dealt with before the developer ever sees them in Xcode.
I feel your pain. You may take solace in the fact that YapDatabase is open source. If there is an exception, and it's not in your own code, then at least you can see exactly where and why it happened. (But I'd like to think this is unlikely.)

Issue #4

Easier handling of binary data
One of the common things new users of Core Data try to do is store images and other data in their SQLite persistent store. It's entirely possible to do so, and the addition of external binary representations for Binary attributes helps improve performance when doing so.
The issue is, accessing that data can have significant memory implications as you can't really take advantage of any of the great optimisations Apple has made to image/file loading without first dumping the data of the attribute into a temporary file outside of your store.
Want to use any of the audio visual frameworks to access media that you've stored in your persistent store? Forget it. They all need a URL or file path.
Allowing direct access to binary data attributes external file URLs (and allowing access to read that file via the URL) would greatly simplify this scenario, and I'd imagine it would reduce disk I/O.

No arguments here. There are very good reasons why large data items should be stored directly in a file, as opposed to in the database. These reasons extend all the way down to the sqlite file format (and how it splits data into pages). Which is why YapDatabase has exactly the same recommendations that Core Data has:

    If size > 1mb, then store on disk and reference it inside of [database]

But in YapDatabase, everything is transparent. You have direct access to the filePath. And you can even use the relationship extension to specify that the database should automatically delete the associated file when the database object is deleted.

Issue #5

NSFetchedResultsController on both platforms
I use NSFetchedResultsController in almost all of my iOS apps. I set it up and forget about it, and it takes care of marshalling any changes to my datasources. NSArrayController is the prototype that we have on OS X, and it's just not setup to handle things like animating UI changes in table views and the like.
A simple, fast way to automatically query and update a datasource and handle discrete changes on both platforms would be a huge time saver. Making it capable of doing it's work nicely on a background thread would be a big win, too.

Yup, YapDatabase has this already. It's available as part of the views extension. Which means its super easy to animate changes to your tableView / collectionView. Regardless of platform.

Issue #6

Performance and simplicity
Any improvements to performance in any API are always a good thing, but Core Data has some specific areas that could use attention. Anything involving batch operations could use a bit of love (as outlined by Brent in his series of posts on Core Data in Vesper).
Performance and simplicity are the heart and soul of YapDatabase. A good example of this is caching. Caching is built into the system, and it caches your objects at the object level (post deserialization). Which means fetching can be crazy fast in common cases where you're fetching the same set of objects over and over (e.g., scrolling up and down). It's completely configurable too.

Understanding the built-in cache

And the transaction architecture means that batch operations are simple. Both to implement, and to understand.


YapDatabase isn't perfect. It's still missing features here and there. But it's still young. About 13 months old. I don't propose that it will solve all your problems. Or even that it's the right database for your app. But I do propose that it's a fairly solid alternative to Core Data. And it's getting better every day.


12 comments:

Unknown said...

YapDatabase sounds promising, but is enough if it is not cross platform? Right now mobile development is starting to require both iOS and Android versions, so it looks a bit constraining to go for a platform specific solution (be CoreData or YapDatabase) instead of doing sqlite which at least allows some portability (more if you write logic in C++ and reuse it on android/ios).

While YapDatabase seems to solve CoreData neatly, it is still a one platform developer investment. Contrast this to other projects like SQLCipher allowing data model reuse.

Robbie Hanson said...

YapDatabase is open source. Meaning it could be ported to Android.

SQLCipher is just for encryption. It runs "below" sqlite, not "above" it like YapDatabase and Core Data. But I understand what you mean.

Greg Jaskiewicz said...

At some stage you'll stop developing Yay, lose interest, get a better job, whatever. In which case, investing a time in it is going to be lost on our part.
Better to try follow bigger OS projects, and build a community of developers and supporters around a project from day one.

Zav said...

You, sir, are an inspiration.

Zav said...

Robbie, it's = it is.

There are several places in your post where you use the wrong version of its/it's and it's (it is) confusing/looks bad.

it's = it is
its = the next thing belongs to it

You might want to proof your post and correct those.

Thanks man.

Robbie Hanson said...

>There are several places in your post where you use the wrong version of its/it's

Thank you. That has long been a weakness in my grammar. I think I fixed them all.

(Didn't change any quotations.)

Robbie Hanson said...

> Better to [...] build a community of developers and supporters around a project from day one.

What do you think I'm trying to do?

And if history is any indication of the future, here are two open source projects that I started, which are now community projects (read: community of developers and supporters)

CocoaLumberjack

XMPPFramework

Anonymous said...

thankYou thankYou thankYou for saying CoreData being another level of containers to reconcile data handling on different threads. Or persistent vs live

Now this total newbie can understand why mechanisms such as NSFetchedController and NSManagedObject exist. They are just the tools for the higher level of abstraction. No one has ever put it quite so succinctly.

I also suspect iCloud syncing is about to be taken to school.

--
if ( it's = it is )
{
gemma's = gemma is; // inverse of comment below
}

if ( its == possession )
{
gemmas = her ; // technically correct then, but looks like it wants to be as above
}

Sad that our language is slowly losing its diacriticals, so at the risk of opposing this trend, I'd like to introduce a a third usage ( one that had me in trouble with the teacher from third grade )

gemmas' = her
gemmas = plurality
gemma's = she is

John Brunelle said...

This article really got me interested in the project. I went to the wiki and read all the documentation and I am really liking the concept. In my current project I have been trying to avoid using that beast known as Core Data and this looks to be a great fit for what I am trying to accomplish. I am going to give it a spin. One thing that I am wondering, is there is a sample project in the works or online demonstrating the various features of the framework? I must admit the wiki is fantastic but to see how you would go about a setting up and using the framework would be outstanding. Thanks so much!!!

John Brunelle said...

To add to my previous commit - one thing in particular that I would like to see is an example of the CRUD party. In particular, when a fresh feed pull comes in, how you would handle the insert, update, delete and orphans of rows in the DB? Thanks Again!

Robbie Hanson said...

@john

> is there is a sample project in the works

Several sample projects are desperately needed. There's a github issue to address it:
https://github.com/yaptv/YapDatabase/issues/33

Someone posted an example project there, which is a good start.

Although YapDatabase is a big project, with a lot of different features, and it needs multiple sample projects to properly demonstrate things in a simple & small fashion.

ocrickard said...

YapDatabase is awesome, if I were to start a new app today, there is no doubt in my mind which storage solution/data model layer I'd use. I particularly liked your innovative use of the WAL and I loved reading your threading model discussions. Lots for all of us to learn here.