New Posts New Posts RSS Feed: Large EF Models and Large Databases
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

Large EF Models and Large Databases

 Post Reply Post Reply
Author
WardBell View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 31-Mar-2009
Location: Emeryville, CA,
Posts: 338
Post Options Post Options   Quote WardBell Quote  Post ReplyReply Direct Link To This Post Topic: Large EF Models and Large Databases
    Posted: 27-Sep-2010 at 2:56pm
We are frequently asked what to do about very large database schemas - databases with hundreds, perhaps thousands of tables and views.
 
It is impractical to build an Entity Framework model with more than 300 tables ... frankly the development process becomes sluggish and unwieldy at more than 100 and performance hiccups begin to show at that level too.
 
We've opened a Cookbook Recipe on the topic here.
 
I thought I'd share with you a recent email exchange I had on the subject. I've sanitized all references to other persons, organizations, and projects.
 
The topic remains open ended ... with so much more to say and explore ... but I hope you find it a helpful place to start.
 
====================================================================
 
From: S
To: P
Subject: Entity Framework issues
 
P,
 
We are currently experiencing some pretty serious issues with Entity Framework:
 
1.)    The current single project (.edmx) can no longer be used to take existing tables from the database and create EF objects – A. is getting out of memory errors.  We have done research and there are a number of tactics to help with this:
a.       Split the edmx into smaller files – this raises issues of what tables to include in each file/context and had to related common tables with these individual contexts (i.e. Foreign Key issues).
b.      Work with key EF files (csdl, msl, ssdl) manually to aid with handling large databases – this seems painful and error prone to manually work with these files like this.
 
2.)    Performance on start up – since currently the DBContext is fairly large and growing, initial access times are long.  I’ve researched this and there are several strategies to improve – Pre-Generate Views in particular – but I want to get feedback from experts as to the best approach.
 
I really think we need some assistance from Microsoft or IdeaBlade to help us in implemented the best possible solution to these EF issues.  We have done some research, and will continue to do so, but I think some outside validation that we are doing the right thing or missing some steps would be of great benefit.
 
Of course, we need to address this ASAP, in particular the single EDMX file.
 
Thanks,
 
S
Architect
-----------------------------------

From: P
To: Ward Bell
Subject: RE: Entity Framework issues

 

Hi Ward,

...

Lately we have run into some issues detailed below by S. We found your article: http://www.ideablade.com/WardsCorner/LargeEFModels.pdf, which was helpful, but we’re wondering if we’re making the right EF model design decisions.

 

If you have an opportunity I’d like to get your input.

 

Regards,

P

----------------------------------------------------

From: Ward Bell
To: P
Subject: RE: Entity Framework Big Model Issues

 

Hi P –

Now that you have read my article on Large Models you know where I stand: don’t do it J

 
In sum, I agree with everyone who is disappointed that EF bogs down after some few hundred entities (many factors determine what your own practical limit happens to be).

 

But I feel even more strongly that large models are an architectural design problem. A community of developers cannot understand a business domain … cannot grasp what is going on … manage its complexities … unless that domain is small enough for them to have a truly shared understanding and be able to communicate about it. If anyone in the group doesn’t know what every single entity is for and how it works … that’s a troubling sign.

 

I have seen in too many companies pursue the futile dream that there could be one master database … one db to rule them all. That has never worked. It is not a technical problem. We have the computing power to do it. The obstacles are human and institutional.

 

Best to have smaller models that are suited to smaller “bounded business contexts”.

 

The perennial objections:

1.       There are entities in common and we don’t want to repeat the business logic

2.       Everything is interconnected and we can’t predict what entity will want to navigate to what other entity.

 

#2 is most easily addressed. It means you probably have not refined the analysis of the domain well enough. I’m being blunt. But that’s 35 years of experience talking.

 

#1 reflects a comforting illusion, the notion that “Customer” in “Domain A” is the same as “Customer” in “Domain B”. No doubt they point to the same “thing” in the real world. No doubt they have data values in common (e.g., the name).

 

But when you start to understand “Domain A” and “Domain B” … what users actually do in those domains and the rules of the game in each domain … you discover that they have different data and/or have different rules/timing for creating and changing data values. In many cases “Domain A” can modify the Customer and “Domain B” cannot. What is needed is the ability to communicate effectively across the two domains when a business process (in either domain) stimulates such a communication. For example, if I’m filling an order for a customer in Domain A and realize I need to update the customer headquarters address. There’s no good reason why updating the HQ address is a Domain A activity. Even if I am a user who could perform both order-filling tasks and customer-update tasks, I still see those tasks as distinct and I don’t mind that the UI makes me address them in separate “modules” … as long as the transition is graceful and efficient.


Until and unless you are prepared to do the analysis and find the smaller domain models amidst the mass of tables … there is really no way anyone else can help you.

 

As for S points:

 

1.)    Forget wrangling the EDMX parts (CSDL/MSL/SSDL); face the music and rethink how you model your business and communicate across “bounded contexts”.

 

 

2.)    You’ll get some benefit from pre-generating views but not as much as you think. I suspect you’re working with EF in 2-tier fashion and paying the metadata construction price for every user … or almost every user. An n-tier approach means a server constructs the metadata only once … only the first user suffers … and all subsequent users get speedy service. Where will you find an n-tier EF approach? That happens to be my company’s forte.

You didn’t mention pre-caching queries. I’m glad. At some point you will grasp at that straw … and be disappointed. I’ll save you the trouble: don’t bother … it doesn’t save you a thing except in the rarest of circumstances.

 

Finally, I’ll bet you are developing your application directly against the EF apparatus and the database. I’ll bet you make-change-compile-run-wake-up-EF-hit-the-database-discover-problem-stop-debugging-rinse-and-repeat.  The cycle takes 2 minutes, 4 minutes, longer? If so, you’re developing like 95% of all shops. It’s painful. It’s unnecessary.

 

Regards,

 

Ward

----------------------------------------------

From: P
To: Ward Bell
Subject: RE: Entity Framework Big Model Issues

 

Hi Ward,


Thanks for the immediate feedback, we appreciate your help.

...

 

We are already proceeding down the path to reducing the size of the model, and splitting it up into smaller models. However, the big concern here is around the complexity and performance related to referencing entities across multiple models. We’re trying to keep the classes together that are most often used together in joins or various business operations, but at some point there will be scenarios that require classes from multiple models. Do you have any ideas around how best to deal with that?

 

Regards,

P

-------------------------------------------------------

From: Ward Bell

To: P
Subject: RE: Entity Framework Big Model Issues

 
Avoid building custom cross-model navigations within your entities. Although that is comparatively easy in our DevForce product, it adds complexity you should rarely (if ever) need.

 

The best approach is to map the entity twice. If Domain A and Domain B both have a notion of “Customer”  - and you happen to keep pertinent data values in a common Customer table, give each model a “Customer” entity. This works especially well if only one of the domains can update Customer.

 

Aside: if both domains can update customer, consider refactoring the Customer table so that each domain’s mutable columns are in different tables. I say this because if is imprudent to maintain the mutable data of two domains in a single table – imprudent architecturally; it’s “easy” technically.

 

The customer entities do not have to be identical … and probably shouldn’t be. You can privatize access to setters and saving of the read-only customer. You can privatize properties you don’t need.

 

A more sophisticated and sometimes better approach is to create “Defined Queries” (akin to database view) over the Customer table within your read-only Domain Model. This can flatten (denormalize) customer information so you get exactly what you want to know about customer in that domain in a simple entity tuned for domain needs.

 

Sometimes you have a boat load of “reference entities” – stuff like codes – that you are keeping in the database, possibly for reporting purposes. You can eliminate those from your model – just keep the FK ids – and relegate them to a “Reference Entity” Service. A full explanation is out of scope for this email but we have an approach that is performing yeoman service - more than 150 reference entity types managed as a service - at one of our major clients.

 

Finally, you should be able to construct a UI that references more than one model. That is not difficult. Imagine split screen with Order Module on left and General Ledger on right. They look unified to the customer (you don’t really even need split screen … I’m just painting a stark picture). You only need a small bridge between. If you need a wide bridge, then I am more likely to suspect the domain analysis than curse the technology (and I curse at my technology a lot J ).

 

Cheers,

 

W



Edited by DFFadmin - 17-Oct-2013 at 3:41pm
Back to Top
HankFay View Drop Down
Newbie
Newbie
Avatar

Joined: 04-Oct-2010
Location: Berea, KY
Posts: 3
Post Options Post Options   Quote HankFay Quote  Post ReplyReply Direct Link To This Post Posted: 04-Oct-2010 at 3:31pm
Hi Ward,

Our model is, well, large: 550 or so entities.  Employee has employee access rights linked to store group also linked to merchandise hierarchy and the ability to markdown merchandise or schedule discounts, or perform certain types of inventory adjustments, and it goes on and on.  In our present technology (Visual FoxPro, hitting against SQL Server) this is no problem.  Anyone who wants to understand the model can look in xCase and everything (including field and entity triggers, and a bunch of other stuff) is right there.

While I have been involved with apps where things could be separated out, there would be a rebellion if I told the domain programmers "hey, we've got to break it all up so the Entity Framework can handle it."  They tend to frown (and much worse <s>) on new technology that removes abilities they already have.

Assuming I can generate all the Entity Models I need from my xCase model (in theory it should work, as I have all the information there), I can see an application that has an assembly for each module, where the developer selects the entities they want for a given module, and then I generate the EF model for them (in some to-be-determined manner).  So if the Purchase Order form uses about 40 or 50 entities, they would be in one model.  And if some of the same entities existed in another Model (some would be the same in the Product form, e.g.), no harm done.  There might be 100 or 200 models: would that be a problem?  We actually work that way, in effect, now, within VFP's datasessions: the customer table could be open and accessed, even modified, in different datasessions opened by one or many users.

tia,

Hank Fay
Back to Top
WardBell View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 31-Mar-2009
Location: Emeryville, CA,
Posts: 338
Post Options Post Options   Quote WardBell Quote  Post ReplyReply Direct Link To This Post Posted: 05-Oct-2010 at 2:20am
I understand the difficulty of saying "the technology I'm going to make you use demands that you lose capabilities you enjoy today."  Not a good marketing angle <s>
 
On the other hand, when you write "Anyone who wants to understand the model can look in xCase and everything is right there" that kind of implies that ... although anyone can ... no one really does understand the uber-model that way.
 
It's a bit like saying that anyone who doesn't know a particular word can look it up in the dictionary. The issue isn't "do you know how to find it" but "what do you really know". And, in practice, perhaps only a very few folks "really know" more than a subset of the 550. To be less presumptious, if it were MY MODEL even if I'd spent years building it, I still couldn't claim to know more than about 100; the rest I'd always be looking up. Call it early dementia if you must.
 
So the case for breaking it up is about boundaries and defining domain contexts that developers can actually understand. We're talking about models that are coherent, without the confusion and uncertainty of having to deal with entities that seem extraneous to the present purpose.
 
It is encouraging that you work today within an analogous paradigm of "datasessions".
 
I liked where you were going ... until you said "There might be 100 or 200 models: would that be a problem?"
 
Technically, no problem at all. How you manage all of those models (and their associated modules) is beyond me; makes my brain hurt. If you can handle it ...
 
Good on ya, bro.
Back to Top
mikewishart View Drop Down
Groupie
Groupie
Avatar

Joined: 26-Feb-2010
Location: Reno, NV
Posts: 49
Post Options Post Options   Quote mikewishart Quote  Post ReplyReply Direct Link To This Post Posted: 08-Nov-2010 at 12:39pm
Hi Ward,

Any chances of a DevForce 2010 version of BigModelBreakup in the near future?

Thanks!
Back to Top
WardBell View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 31-Mar-2009
Location: Emeryville, CA,
Posts: 338
Post Options Post Options   Quote WardBell Quote  Post ReplyReply Direct Link To This Post Posted: 08-Nov-2010 at 7:27pm

We are not planning on converting that to DF 2010 because our recommendation is that you re-factor large models into smaller models ... a step that (at least in my mind) does not benefit substantially from a code sample.

Is there something about the explanation or example that you feel would be clearer or more compelling if we converted to DevForce 2010?
Back to Top
mikewishart View Drop Down
Groupie
Groupie
Avatar

Joined: 26-Feb-2010
Location: Reno, NV
Posts: 49
Post Options Post Options   Quote mikewishart Quote  Post ReplyReply Direct Link To This Post Posted: 08-Nov-2010 at 8:44pm
Ward, thanks for the quick reply.

That was actually one of the first things we did.  We split the db into proprietary information vs client information and put it into two datastores.  The client datastore is about 90 tables and the proprietary one is 55.  Then the client one is multi-tenant.  Right now both edmx files are in the same DLL which gives us the advantage of a single entitymanager.  Once the app has enough information, it simply reconnects using the tenant entity key and copies the cache to the new EM.  Having one EM also makes cross-db navigation db properties fairly simple to write.  We can also cache a lot of data locally to avoid large downloads (sort of a manual replication).  The downside of this is the huge number of classes in the one namespace and also the long startup time while it does MEF.  Some of our other challenges - we're using multiple EntityManagers to support suspending work, or saving a small subset of entities when necessary.  Also relying fairly heavily on the EntityServerSaveInterceptor.
So given all this, I'm always hoping to eek out a bit more performance.  One of which could be to split the two sets into seperate DLLs and maybe split them up into 4 or 5 more subsets to get us back down to the 30 table magic number.
In the end, a code example is always a bit helpful.  I'm not sure the BigModelBreakup model uses enough of the big DevForce elements to help substantially, but it might get me started.  Or just other suggestions...  :)

Back to Top
mikewishart View Drop Down
Groupie
Groupie
Avatar

Joined: 26-Feb-2010
Location: Reno, NV
Posts: 49
Post Options Post Options   Quote mikewishart Quote  Post ReplyReply Direct Link To This Post Posted: 21-Nov-2010 at 2:11pm
Ok, I took the plunge and started working on a breakup.  Two modules for now as stated plus a domain library module to hold common things - verifiers, interfaces, pocos, etc.

To make this easy on us developers, I simply added another EntityManager partial class to one of the modules and copied the EntityQueries region from the generated (unused) entitymanager in the other module to the partial class.  The first module references the second.  The partial class implements an interface from the second module for the same EntityQuery properties so that entities in the second module have easy access to other entities by simply casting the EntityManager to the interface.

This seems to have sped things up.  Cross model navigation properties are still easy to write and all the back-end aspects continue to work without any changes.

So, by having a primary entity module which references other entity modules, have I defeated the purpose?  There's the real question.
Back to Top
WardBell View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 31-Mar-2009
Location: Emeryville, CA,
Posts: 338
Post Options Post Options   Quote WardBell Quote  Post ReplyReply Direct Link To This Post Posted: 22-Nov-2010 at 9:41am
@mikewishart - if you're looking for papal dispensation I cannot help you ;-)
 
I have no objection in principle and it sounds like it is working for you in practice.
 
If I understand correctly, the "primary entity module" is your bridge between the other models/modules. It serves as the "isolation layer" at the "context boundary" that the DDD people talk about.
 
I assume you are only writing interfaces for the entities at the cross-roads rather than all entities in each model. It sounds like Model A refers directly to Model B while Model B uses interfaces to get back to Model A. I would be inclined to define interfaces in both directions so that neither A nor B referred to the other. The interfaces sit ... where? I would have expected them in what you're calling the "primary entity module".
 
That doesn't seem possible because of circular references. Did I misread? You say that the primary entity module refers to the modules A and B. I expected it to be the other way around; that the primary module to is referred to by A and B. I'm missing the diagram :-)
 
Devil is in the details as always.
 
Anyway, far from "defeating the purpose", it seems to be fulfilling the purpose.
 
Good luck!
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down