Large EF Models and Large Databases

an architectural design problem · the technology I'm going to make you use demands that you lose capabilities you enjoy today

We are frequently asked what to do about very large database schemas - databases with hundreds, perhaps thousands of tables and views.

It is impractical to build an Entity Framework model with more than 300 tables ... frankly the development process becomes sluggish and unwieldy at more than 100 and performance hiccups begin to show at that level too.

We've opened a Cookbook Recipe on the topic here.

I thought I'd share with you a recent email exchange I had on the subject. I've sanitized all references to other persons, organizations, and projects.

The topic remains open ended ... with so much more to say and explore ... but I hope you find it a helpful place to start.

====================================================================

From: S 
To: P 
Subject: Entity Framework issues

P,

We are currently experiencing some pretty serious issues with Entity Framework:

1.)    The current single project (.edmx) can no longer be used to take existing tables from the database and create EF objects – A. is getting out of memory errors.  We have done research and there are a number of tactics to help with this:
a.       Split the edmx into smaller files – this raises issues of what tables to include in each file/context and had to related common tables with these individual contexts (i.e. Foreign Key issues).
b.      Work with key EF files (csdl, msl, ssdl) manually to aid with handling large databases – this seems painful and error prone to manually work with these files like this.

2.)    Performance on start up – since currently the DBContext is fairly large and growing, initial access times are long.  I’ve researched this and there are several strategies to improve – Pre-Generate Views in particular – but I want to get feedback from experts as to the best approach.

I really think we need some assistance from Microsoft or IdeaBlade to help us in implemented the best possible solution to these EF issues.  We have done some research, and will continue to do so, but I think some outside validation that we are doing the right thing or missing some steps would be of great benefit.

Of course, we need to address this ASAP, in particular the single EDMX file.

Thanks,

S 
Architect
-----------------------------------
From: P
To: Ward Bell
Subject: RE: Entity Framework issues

Hi Ward,

...
Lately we have run into some issues detailed below by S. We found your article: http://www.ideablade.com/WardsCorner/LargeEFModels.pdf, which was helpful, but we’re wondering if we’re making the right EF model design decisions.

If you have an opportunity I’d like to get your input.

Regards,
P
----------------------------------------------------
From: Ward Bell
To: P
Subject: RE: Entity Framework Big Model Issues

Hi P –

Now that you have read my article on Large Models you know where I stand: don’t do it J 

In sum, I agree with everyone who is disappointed that EF bogs down after some few hundred entities (many factors determine what your own practical limit happens to be).

But I feel even more strongly that large models are an architectural design problem. A community of developers cannot understand a business domain … cannot grasp what is going on … manage its complexities … unless that domain is small enough for them to have a truly shared understanding and be able to communicate about it. If anyone in the group doesn’t know what every single entity is for and how it works … that’s a troubling sign.

I have seen in too many companies pursue the futile dream that there could be one master database … one db to rule them all. That has never worked. It is not a technical problem. We have the computing power to do it. The obstacles are human and institutional.

Best to have smaller models that are suited to smaller “bounded business contexts”.

The perennial objections:
1.       There are entities in common and we don’t want to repeat the business logic
2.       Everything is interconnected and we can’t predict what entity will want to navigate to what other entity.

#2 is most easily addressed. It means you probably have not refined the analysis of the domain well enough. I’m being blunt. But that’s 35 years of experience talking.

#1 reflects a comforting illusion, the notion that “Customer” in “Domain A” is the same as “Customer” in “Domain B”. No doubt they point to the same “thing” in the real world. No doubt they have data values in common (e.g., the name). 

But when you start to understand “Domain A” and “Domain B” … what users actually do in those domains and the rules of the game in each domain … you discover that they have different data and/or have different rules/timing for creating and changing data values. In many cases “Domain A” can modify the Customer and “Domain B” cannot. What is needed is the ability to communicate effectively across the two domains when a business process (in either domain) stimulates such a communication. For example, if I’m filling an order for a customer in Domain A and realize I need to update the customer headquarters address. There’s no good reason why updating the HQ address is a Domain A activity. Even if I am a user who could perform both order-filling tasks and customer-update tasks, I still see those tasks as distinct and I don’t mind that the UI makes me address them in separate “modules” … as long as the transition is graceful and efficient. 

Until and unless you are prepared to do the analysis and find the smaller domain models amidst the mass of tables … there is really no way anyone else can help you.

As for S points:

1.)    Forget wrangling the EDMX parts (CSDL/MSL/SSDL); face the music and rethink how you model your business and communicate across “bounded contexts”.

2.)    You’ll get some benefit from pre-generating views but not as much as you think. I suspect you’re working with EF in 2-tier fashion and paying the metadata construction price for every user … or almost every user. An n-tier approach means a server constructs the metadata only once … only the first user suffers … and all subsequent users get speedy service. Where will you find an n-tier EF approach? That happens to be my company’s forte.

You didn’t mention pre-caching queries. I’m glad. At some point you will grasp at that straw … and be disappointed. I’ll save you the trouble: don’t bother … it doesn’t save you a thing except in the rarest of circumstances.

Finally, I’ll bet you are developing your application directly against the EF apparatus and the database. I’ll bet you make-change-compile-run-wake-up-EF-hit-the-database-discover-problem-stop-debugging-rinse-and-repeat.  The cycle takes 2 minutes, 4 minutes, longer? If so, you’re developing like 95% of all shops. It’s painful. It’s unnecessary.

Regards, 

Ward
----------------------------------------------

From: P
To: Ward Bell
Subject: RE: Entity Framework Big Model Issues 

Hi Ward,

Thanks for the immediate feedback, we appreciate your help.
...

We are already proceeding down the path to reducing the size of the model, and splitting it up into smaller models. However, the big concern here is around the complexity and performance related to referencing entities across multiple models. We’re trying to keep the classes together that are most often used together in joins or various business operations, but at some point there will be scenarios that require classes from multiple models. Do you have any ideas around how best to deal with that?

Regards,
P

-------------------------------------------------------

From: Ward Bell 
To: P
Subject: RE: Entity Framework Big Model Issues

Avoid building custom cross-model navigations within your entities. Although that is comparatively easy in our DevForce product, it adds complexity you should rarely (if ever) need.

The best approach is to map the entity twice. If Domain A and Domain B both have a notion of “Customer”  - and you happen to keep pertinent data values in a common Customer table, give each model a “Customer” entity. This works especially well if only one of the domains can update Customer. 

Aside: if both domains can update customer, consider refactoring the Customer table so that each domain’s mutable columns are in different tables. I say this because if is imprudent to maintain the mutable data of two domains in a single table – imprudent architecturally; it’s “easy” technically.

The customer entities do not have to be identical … and probably shouldn’t be. You can privatize access to setters and saving of the read-only customer. You can privatize properties you don’t need.

A more sophisticated and sometimes better approach is to create “Defined Queries” (akin to database view) over the Customer table within your read-only Domain Model. This can flatten (denormalize) customer information so you get exactly what you want to know about customer in that domain in a simple entity tuned for domain needs.

Sometimes you have a boat load of “reference entities” – stuff like codes – that you are keeping in the database, possibly for reporting purposes. You can eliminate those from your model – just keep the FK ids – and relegate them to a “Reference Entity” Service. A full explanation is out of scope for this email but we have an approach that is performing yeoman service - more than 150 reference entity types managed as a service - at one of our major clients.

Finally, you should be able to construct a UI that references more than one model. That is not difficult. Imagine split screen with Order Module on left and General Ledger on right. They look unified to the customer (you don’t really even need split screen … I’m just painting a stark picture). You only need a small bridge between. If you need a wide bridge, then I am more likely to suspect the domain analysis than curse the technology (and I curse at my technology a lot J ).

Cheers,

W

Edited by DFFadmin - 17-Oct-2013 at 3:41pm