New Posts New Posts RSS Feed: Batch Processing Best Practices with DevForce EF
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

Batch Processing Best Practices with DevForce EF

 Post Reply Post Reply
Author
danielp37 View Drop Down
Newbie
Newbie


Joined: 18-Mar-2008
Location: United States
Posts: 29
Post Options Post Options   Quote danielp37 Quote  Post ReplyReply Direct Link To This Post Topic: Batch Processing Best Practices with DevForce EF
    Posted: 05-Dec-2008 at 1:50pm
Our application has web and winforms front ends and it is fairly straight-forward working with DevForce EF working in these domains, but we also have a batch processing back-end that does the importing and exporting of files to partners as well as other various scheduled tasks.

In the batch processing scenarios, we often are dealing with large amounts of data at a time.  We may need to grab thousands of members with their associated account information.  We want to make sure that these batch processes use the same business rules as the web and console side and therefore want to have them written using DevForce EF.  Are there some best practices/design patterns for implementing these kinds of things?

For example, say we have a service that needs to charge all of our members their monthly fees and to do that it needs to pull in data for the member from around 10 tables (some tables only a couple of records that would be the same for each member, but some tables might have hundreds of records that are unique to just this member). 

We would need to page the access to the members' data since we won't be able to load all 100,000 members at a time and we would like to be able to load as much of the data in single queries rather than having to pull individual records (IE we may get all of the member's transferlines in a single query, but transferlines are associated with payments and we would like to be able to get all the associated payments in one query rather than individually for each transferline as we access the payment).

We are currently converting to DevForce from a home-grown ORM which we had built-in some ways of handling these kind of things but I'm curious to see if IdeaBlade has any best practices or examples for processing large amounts of data.
Back to Top
IdeaBlade View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 30-May-2007
Location: United States
Posts: 353
Post Options Post Options   Quote IdeaBlade Quote  Post ReplyReply Direct Link To This Post Posted: 07-Dec-2008 at 1:34pm
From Ward Bell:
 
Object relational orientation is not always the best approach to processing high volumes of data. I always ask myself “what will I do with the data once retrieved?”

 If I think I’m going to want to navigate object graphs and apply business logic (e.g., validations or in-object workflow), I’ll strongly favor the object approach and live with the performance consequences. On the other hand, if I’m reading a short record, adding something to it, and shoving the revised record out to the database again … well I’m not getting much value from converting data into an entity.

 As with all things performance related I would stick to my usual practices until I had a compelling reason to change them. It confuses everyone to change data access modalities within an application. So, even in the second example, I’d stay with the entity approach until I was convinced that the performance was unacceptable. Even then I’d consider scaling with hardware before I changed the programming model. Why? Because humans (read “developers”) are your most expensive resources.

 On the other hand, if I sensed impending danger, I’d make sure that I had a reasonable alternative option … some place to move to if necessary.

 And you do. If you had to do so, you could author and invoke a server-side batch process that has direct access to the database and uses traditional ADO.NET techniques. According to a recent study, straight ADO could yield a 9x query improvement even with a 3-part object graph. Of course you won’t be able to re-use all of your entity business logic and you’ll spend a lot of time doing the plumbing – writing and testing – that could otherwise go into your application. I wouldn’t go here unless the logic were exceptionally simple and the volume / response time requirements were especially severe.

 Example: there are millions of ACH records in the database. ACH records are the basis for electronic funds transfers. I must generate a file of selected and reformatted ACH records for an external  ACH process that reads and processes flat files. My job is simple: query, format, write. DataReader and Writer sounds like the right call here.

 So much for general principles. What about your case?

 You describe a complex object graph (10 objects in a graph = complex to me) with business logic that is applicable to your high-volume task. You say you use a “home-grown ORM” today to handle the “batch process”. You don’t seem to be complaining about the performance from your own ORM. This suggests that, whatever the volume, the ORM-approach is working for you. Will it work with DevForce EF? Well I don’t know what “built-in some ways of handling” high-volume data you’ve got … but we can talk about some facilities available in DevForce EF

 ·         You can page your queries

·         Adding “includes” to a query enables fetching of just those objects that are related to the query results. If your paged query returns 10 Order objects, with  appropriate include clauses, it could also return just the Customer, LineItem, and Product objects that are related to those 10 Orders … and will do so in the same trip to the server as the root Order query itself.

I’m betting you’ll do something like this with your member query:
   manager.TransferLines.Where(t => t.Member.Id = theCurrentMemberId).Include(“Payments”);
·         Caching small tables can save a lot of time. I often prime the cache with the complete “StatesOfTheUS” or “CountriesOfTheWorld” before performing any queries. Then you don’t have to include the states and countries in your queries.

·         You can subdivide your batches into reasonable chunks (say 1000 members at a time). Let a single EntityManager hold the data for a single chunk. Create multiple EMs for each chunk as follows:

o   break the process into a load phase, a calc phase, and a save phase.

o   Your calc phase is the main thread and will organize the workflow as well as perform calculations on a chunk.

o   In the “Load phase” you fetch a chunk’s worth of data asynchronously into one (or more) EMs; when the EM is primed, put it back on the main thread’s “work queue”; the async approach lets your main calculation phase thread iterate over the pre-loaded EMs without waiting for IO.

o   use our async Save method to persist EMs after you’ve run your calculations and produced output going back to the database; again, the async approach frees your main thread from the slower IO task of saving data.

o   You don’t have to create your own background threads to do any of this; we do it for you. You do have to manage the queues and the multiple EMs.

 You might recognize this as the ACTIVE OBJECT pattern. It is well described in Robert Martin’s great book Agile Principles, Patterns, and Practices in C#, pp.305-310

·         For scalability you might manage this batch processing within a DAEMON client. Such a daemon would be written just like any DevForce client application ... except it has no UI. It could be hosted in a Windows Service running on a server and it would watch for work to do on a database “queue” table.  You can have any number of these daemons running.  I wouldn’t go here on the first day … but it’s nice to know that you can do these kinds of things if you have to and still retain the simplicity and convenience of your entity-oriented programming model.


Edited by IdeaBlade - 07-Dec-2008 at 1:36pm
Back to Top
danielp37 View Drop Down
Newbie
Newbie


Joined: 18-Mar-2008
Location: United States
Posts: 29
Post Options Post Options   Quote danielp37 Quote  Post ReplyReply Direct Link To This Post Posted: 08-Dec-2008 at 7:27am
Thank you for your reply.  This was basically what I was looking for.  We currently handle our large batches in a "Load/Calculate/Save" manner but I wasn't sure that this was the best pattern for ORM based systems.  In most cases our batch processes do require quite a bit of business logic therefore in most cases I would not want to rewrite them to work on the DB directly as that would cause a bunch of duplicate business logic.

We actually do have a windows service that acts as our "DAEMON" client that we have setup to perform our batch processes so what you explained there is something that we already have.  Most of what you described, we are already doing so I guess this is just confirmation that we are going down the right path with this.

Thanks,

Dan
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down