New Posts New Posts RSS Feed: Slow Cache QueryStrategy
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

Slow Cache QueryStrategy

 Post Reply Post Reply Page  12>
Author
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Topic: Slow Cache QueryStrategy
    Posted: 13-Nov-2007 at 2:52pm
Hi
 
In the current scenario, I go down to the database and retrieve scheduled leave records for 2000 employees for a given time period. As I understand it, these records are then put in the cache. I then have another  devforce query which has a querystrategy of CacheOnly. I then call this CacheOnly query for each personnel. It takes 80 seconds! I've checked SQL Profiler and devforce does NOT go back to the server (which is good).
 
80 seconds seems to me like a very long time in computer land, so I created another method which basically retrieves the 2000 employees and creates a Dictionary<string, EntityList<Leave>>. It now only takes 4 seconds to do the same task!
 
My question is: I was under the impression that caching should be blistering quick since it all happens in memory, why does there seem to be such a performance hit to go to the devforce cache?!?
 
Cheers.
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 13-Nov-2007 at 7:50pm
People often forget that, in DevForce",  there are two parts to a query.  The first part is the "getting the data".    The second part is the "merging of the retrieved data with what already exists (in the list or in the cache)".  I suspect what is taking so long is the Merge portion of the query.  However, without knowing the exact specifics of your performance test, I can only guess.  Can you give me the source for your benchmark?
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 13-Nov-2007 at 8:14pm
I understand that there are two parts to the query. However, in this case all the data that is required is already in the cache and the subsequent query has a query strategy of CacheOnly. So I don't understand what exactly is getting merged behind the scenes.
 
Thanks.
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 14-Nov-2007 at 8:02am
What is your Merge Strategy?  I believe that the default Merge Strategy is "PreserveChanges".
 
The result of every query must have a destination.  The destination may already have versions of the same objects.  For example, I may read from the database, and get an instance of the same object that I already have in the cache.  The Merge Strategy tells me what to do if the two versions are different.
 
Send me the source code for your performance test.  The best case would be if you could send me a complete solution along with a small database to support@ideablade,com
 
Back to Top
pnschofield View Drop Down
Newbie
Newbie
Avatar

Joined: 14-Sep-2007
Location: United States
Posts: 18
Post Options Post Options   Quote pnschofield Quote  Post ReplyReply Direct Link To This Post Posted: 14-Nov-2007 at 2:45pm
Does the initial query pull down all 2000 employees along with their leave records by adding a span to the query?  Or are you just pulling down the leave records, and then querying for each leave record's employee with the second round of queries?

If the initial query is not pulling down the employees, then the second round of 2000 cache-only queries is actually hitting the database.  A cache-only query will hit the database if the PM can't determine that the query can be fulfilled from the cache.

If you could spell out a little more explicitly what each query is doing, I think we'll have a better understanding what is going on.
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 14-Nov-2007 at 3:35pm
Hi pnschofield
 
I'm just pulling down the leave records. I'm running sql profiler and I have confirmed  that it does NOT hit the database on subsequent queries, as i have no need to retrieve the employee record.
 
I've sent davidklitzke an example of this showing that regardless of which merge strategy is used the performance is very slow. Hopefully he will be able to shed some light on the matter.
 
 
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 14-Nov-2007 at 3:56pm
Thanks David, it's on its way.
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 14-Nov-2007 at 11:59pm
I received Joss's benchmark test, and did a performance test and analysis.  I found that (in accordance with Jozza's earlier claims) that the type of Merge Strategy did not make a measurable difference.  All Merge Strategies completed in less than 35 seconds while the CustomList Dictionary completed in around 7 seconds.
 
What I did find was that Jozza's dictionary is highly optimized for finding all orders that contained a specific Product Id:
 

m_Collection = new Dictionary<long, EntityList<OrderDetail>>();

m_List = new List<OrderDetail>();

foreach (OrderDetail order in result) {

  if (m_Collection.ContainsKey(order.ProductId) == false) {

    m_Collection.Add(order.ProductId, new EntityList<OrderDetail>());

  }

  m_Collection[order.ProductId].Add(order);

  m_List.Add(order);

}

 
However, this dictionary would not be helpful If I were looking for orders for a specific customer or a specific sales rep.  I think that what this shows is that it is possible to create a dictionary that will outperform the PersistenceManager cache it you target a very specific kind of query.  What is not clear is whether you can come up with a dictionary that will outperform the PersistenceManager cache (on average) on all kinds of queries.
 
You should also keep in mind that the PersistenceManager can support a wide variety of Query and Merge Strategies, something that a Dictionary cannot possibly support.


Edited by davidklitzke - 15-Nov-2007 at 12:11am
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 2:05pm
Hi David
 
This is exactly the reason why I included a test that uses a Generic List. The Generic List is not optimized for searching for anything particular as you specify your own query as the delegate. The FindAll method in the Generic List is an order n operation.
 
I can easily substitute:
 
List<OrderDetail> result = m_List.FindAll(delegate(OrderDetail order) { return order.ProductId == productId && order.SalesRep == salesRep && order.Quantity > someQuantity; });
 
and still greatly outperform the persistence manager cache.
 
I thought i had demonstrated that in final method call of my test app.
 
Cheers
 
Jacob.
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 2:13pm

Hi David

And in addition, I do realise that persistence manager supports a wide variety of query and merge strategies, however, if I'm only specifically querying the cache, then the merge strategy becomes irrelevant. Cache's by their very definition are "An area of computer memory devoted to  high-speed retrieval" and it concerns me that I can iterate over every entity in an entitylist searching for a specific criteria 5 times quicker than querying the persistence manager cache.
 
Cheers


Edited by jozza - 15-Nov-2007 at 2:20pm
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 2:37pm
But your custom list was tailored for this specific performance test.  To get the same great performance on another and different test, you would have to create a brand new custom list.  Not so for the Persistence Manager cache.
 
We have been looking at our performance numbers and have found a way to reduce execution times from 35 seconds to 11 seconds.  The mechanism that we used was to make just one query for each detail record instead of the 10 queries it was previously making.
 
I think that this modification in our test reflects the fact that you are not going to get great performance on any retrieval if you retrieve objects one by one.  If you have to retrieve a large number of objects in a short amount of time, do it by making as few queries as possible.
 
I'll send you the source for the revised performance test,
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 3:18pm
Hi David
 
I disagree that my custom list is tailored to the specifc performance test. I merely have a List<OrderDetail> which contains all the records from the db. I then do a FindAll query passing in a predicate.
 
This is exactly the same as me doing:
 
List<OrderDetail> results = new List<OrderDetail>();
 
foreach(OrderDetail order in m_List)
{
   if(order.ProductId == productId && order.SalesRep == salesRep)
   {
        results.Add(order);
   }
}
 
return ReadOnlyEntityList<OrderDetail>(new EntityList(OrderDetail>(results));
 
I can't see how this list is tailored to any specific performance test. I agree that the Dictionary version was tailored to the test because it was keyed on productid. The generic list version however is not keyed on anything, and it is up to the developer to specify the predicate to be used in the query, much like it is up to the developer to create an EntityQuery to return the data from the persistence manager cache.
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 7:34pm
We continued to work on our performance and now have a result that is 5.9 seconds - a result 47% faster than our previous best of 11.1seconds.  This new result is even faster than the 6.2 seconds we got using Jozza's custom lookup.
 
       public EntityList<OrderDetail> GetOrderDetailFromCacheTable(int productId)  {

          EntityList<OrderDetail> list = new EntityList<OrderDetail>();

          EntityTable table = PersistenceManager.DefaultManager.GetTable(typeof(OrderDetail));

          foreach (DataRow row in table.Rows) {

            if ((long)row["ProductId"] == (long)productId)

              list.Add((OrderDetail)row);

          }

          return list;

        }

 

 



Edited by davidklitzke - 15-Nov-2007 at 8:11pm
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 15-Nov-2007 at 8:16pm
I've further increased the performance by using the Select method in the EntityTable to retrieve the rows of interest.
 
  public EntityList<OrderDetail> GetOrderDetailFromCacheTableSelect(int productId)
        {           
            EntityTable table = PersistenceManager.DefaultManager.GetTable(typeof(OrderDetail));
            EntityList<OrderDetail> list = new EntityList<OrderDetail>();
            DataRow[] result = table.Select("ProductId = " + productId);
           
            foreach(DataRow row in result)
            {
                list.Add((OrderDetail)row);
            }
            return list;
        }
 
This has further reduced the time taken to about 890 ms. It looks like the moral of the story is, if you want performance from the "cache", you need to bypass devforce entity queries, and deal with the underlying datatable directly instead.
Back to Top
davidklitzke View Drop Down
IdeaBlade
IdeaBlade
Avatar

Joined: 14-Jun-2007
Posts: 715
Post Options Post Options   Quote davidklitzke Quote  Post ReplyReply Direct Link To This Post Posted: 22-Nov-2007 at 10:37pm

Congratulations on your additional performance improvements.

While I don't quarrel with your results, I would like to make a few points.

Our use of DataSets is an implementation detail. By making use of this knowledge, you are breaking encapsulation as surely as if you had used reflection to probe the private members of a .NET class. We make this clear in our documentation. Just because the innards are accessible does not mean that you should exploit them.
 
Reviewing the thread. I think we made a mistake.  Instead of taking your initial post and trying to solve your particular problem we quickly moved to a benchmark test that made 10,000 small queries on 1,000 orderdetail records.  I now doubt whether that was a good benchmark test because the answer could have been obtained through a single query.
 
Let's get back to your original problem.  Do you really have a performance problem today?  How long does it take a user to get the answer to his/her query?  Does it really take a very large number of small queries, or can the answer be obtained efficiemtly with a few queries.
 
Can you give us a reproducible case for your original post:
 

Say for example I have a hierarchical table as follows:

 

Id          Level              Name             Parent

1           1                    Bob                 NULL

2           1                    Steve              NULL

3           2                    Rachel             1

4           2                    Brian               1

5           2                    Colin               1

6           2                    Tracy               2

7           2                    Sally                2

8           3                    Trevor             6

9           3                    Reginald          6

10         3                    Sue                 6

11         4                    Boris               8

 

So we have a hierarchy which is n levels deep. Where for example, Tracy's boss is Steve and Tracy has Trevor, Reginald, Sue underneath her as direct reports and Boris as an indirect report.

 

No I'd like to create a query that given an Id, will return all the reports for the given person as well as being able to reference their bosses, but at the sametime not allowing to traverse back down the tree from the boss level. Let me give an example to make this clear.

 

 



Edited by davidklitzke - 22-Nov-2007 at 11:02pm
Back to Top
jozza View Drop Down
Newbie
Newbie


Joined: 11-Jul-2007
Posts: 25
Post Options Post Options   Quote jozza Quote  Post ReplyReply Direct Link To This Post Posted: 25-Nov-2007 at 3:23pm
Hi David
 
Firstly the "original post" (Bob, Steve etc) that you have included in your response, is in fact not the original post for this question. It is totally unrelated to the problem.
 
Second of all, I think that you are missing my point. My concern lies in the fact that the "cache" should be high speed, where in reality, it is far from that. I expect some overhead from the devforce framework, but the fact that going to the dataset directly is so much quicker, is of great concern.
 
In the end, the solution for me is to retrieve all the records from the database and store them in a custom dictionary, and work on them that way. It's quick, it's easy, and only uses devforce minimally. Which is a shame, because it would be nice to be able to quickly query the devforce cache for data. I don't understand why the devforce cache can't be comparable to the speed of the dataset in the case when the QueryStrategy is specifically set to CacheOnly.
 
 
Cheers
 
Back to Top
 Post Reply Post Reply Page  12>

Forum Jump Forum Permissions View Drop Down