Getting Crafty with Queries

05/12/2011

I’ve recently found myself running into several pain points when it came to extracting data from salesforce.com using SOQL. Since salesforce.com/force.com runs on a shared infrastructure they don’t allow all the bells and whistles of traditional SQL as they need to protect their resources and prevent some crazy guy like me from bringing down then entire system with some redonkulous query. You may be thinking by now that I am referring to the specific governor limits salesforce.com imposes on your queries but I am actually talking about fairly simple extractions that don’t always have the most straight forward solution.

Let’s say that you have a table of data like this:

Now let’s also say the requirement is to query items from this table but only return items based on a unique Order_Number__c. If there are duplicate Order Numbers in the table return the oldest record based on create date. The records we need to return would be the ones highlighted green.

What some of you are probably thinking is this is a totally unrealistic example because in any well designed system a field like Order Number should be unique. Well guess what? Not everyone gets to live this this happy database fairly land where duplicates don’t exist. If you you, whoop-di-do, hooray for you, click here for your prize. If you don’t live in fairy database land this example could be a totally realistic problem.

So how to solve it? Out of the box SOQL doesn’t have a solution for this problem (if it does I just wasted some time writing this) so we need to get crafty! You could maybe use aggregate queries to group by Order Number and then determine the oldest record. Or maybe query records, add them to unique Lists based on Order Number, sort these, and then return the oldest record. These are all good…well…they would work, not so sure about the “good” part. Below is the way I approached this problem and I’m sure there may be better ways to do this but I thought it was pretty slick.

We know one of the requirement is to use the oldest records if there is a duplicate. In our query we can sort by Created Date ascending so the oldest records a processed first. Before running this query we will also create a Map where the key is a Integer representing the Order Number and the value is a salesforce object, in this example Opportunity. Data returned would be processed in this order top to bottom:

As we process the returned results we will check to see if the Map contains a value for the given Order Number on the record we are currently processing. If it does not, it means this is the oldest record for this Order Number and we add it to the map. If the map does return a value for any given Order Number it means there was already an older record found and this record should not be used. In the end with have a nice clean map with the keys being the Order Number and the values being the oldest record with this Order Number. If we then need to add all these records to a list we can do that with a simple values() method called on the Map. Below is the code for this, it assumes the object being queried is an Opportunity but this logic could be applied to any object/table.

//Create the map Order => Opportunity
Map<Integer,Opportunity> oppMap = new Map<Integer,Opportunity>();
 
//Query the records and sort by CreatedDate ascending so oldest records are returned first
for(Opportunity opp : [select Id, Name, Order_Number__c, CreatedDate from Opportunity order by CreatedDate asc]){
 
    //Check if map contains Opportunity for given Order Number
    if(oppMap.get(opp.Order_Number__c) == null){
	//If null add opp to map, it is the oldest
	oppMap.put(opp.Order_Number__c,opp);
    }
 
    //Else do nothing, ignore the record as we already found an older one.
}
 
//If we need the opps in a list just do this	
List<Opportunity> opps = oppMap.values();

The one down side to this approach is that you are querying duplicate records that will never be used. Ideally you would only query the exact records needed without the duplicates even returned at all. I know of no way to do this but if you do please leave a comment below. Even with this added overhead of returned records that are never used I like this approach over the other described earlier because it is super simple and I am a big fan of simple.

UPDATE:
Based on some great comments below Greg has pointed that if you sort in desc order you don’t need the if() condition because the oldest records will be processed last and when you add them to the map it will replace any value that is currently there for the given key. The for loop could be simplified to this:

//Query the records and sort by CreatedDate descend so oldest records are processed last
for(Opportunity opp : [select Id, Name, Order_Number__c, CreatedDate from Opportunity order by CreatedDate desc]){
 
    //Add opp to map, it will replace any existing value in the map with the same key
    oppMap.put(opp.Order_Number__c,opp);
 
}