…it is not contained in either an aggregate function or the GROUP BY clause

I cannot tell you how annoying I find this error, and before you post a comment, I know it’s not the fault of SQL Server. I’ve asked it for something that is not logically solvable. I found a pretty good explanation of it:

If I asked you “For each city in Massachusetts, what is the total number of residents and what is their age?” Now, think logically and carefully about what I am asking you.  Does it make sense?  The total number of people does — it is just a single value, the count of how many people live in the city.  But how can I return “their age”?  What is really being asked for?  A long list of every age of every person, all delimited by commas (e.g., a list of thousands of repeating numbers)?  A distinct list of all ages, also separated by commas (e.g., 0,1,2,3 ….) ?  Perhaps the average age?  The median age?  The minimum and/or the maximum age? The result of adding up the ages of everyone in the city?

As you can see, we cannot answer the question, since it does not make logical sense without providing more specific information about what you need.

So as I started digging into this some more I had an idea. What if I could aggregate my unique column (Tracking_ID) in a way that I know isn’t going to affect the results…

Here goes…

SELECT Description, FundCode, CounterpartyCode, TradingCity
FROM DupsToRemove
GROUP BY Description, FundCode, CounterpartyCode, TradingCity

The above query returns exactly what I am after. My  4861 rows are uniquely (on those 4 fields) brought down to 252 rows. The problem is I need to also include about 4 other columns that are not in the GROUP BY criteria.

So, I decided to aggregate Tracking_ID in a way that wouldn’t impact the actual data being returned…

SELECT Description, FundCode, CounterpartyCode, TradingCity, MAX(Tracking_ID) as Tracking_ID
FROM DupsToRemove
GROUP BY Description, FundCode, CounterpartyCode, TradingCity

The above query returns my same unique 252 rows (notice my GROUP BY has not changed) but now I have a derived column ‘Tracking_ID’. Since Tracking_ID is a GUID and a PK on the table it cannot be duplicated, therefore the MAX value will always be the ONLY value and therefore the UNIQUE value!

I Copy/Paste my ‘Tracking_ID’ column into Excel and build my quick UPDATE statement.

Excel Update Statement

Running that in SSMS I have now set my field IsUnique = TRUE for my unique records.

Now I can simply run this query:

SELECT * FROM DupsToRemove
WHERE IsUnique = 'TRUE'

I get ALL 8 columns I need, where they are unique. My results = 252 which is exactly how many unique records I have!