Sunday, March 25, 2012

DISTINCT question

Hello all!

I understand that if you place a DISTINCT in a SELECT statement, it will return only unique values of that row, depending how many coumns you have. So look at this.

Code Snippet

SELECT DISTINCT Active_Orders.First_Name, Active_Orders.Last_Name, Active_Orders.Account_Number, Active_Orders.Service_Date_Time, Active_Orders.Stat,

Order_Status.Status, Active_Orders.Order_ID, Active_Orders.Age, Active_Orders.DOB, Active_Orders.Rm_Desc, Active_Orders.Check_Out,

Active_Orders.Remarks, Locations.Loct_Desc, Active_Orders.FilePath, Active_Orders.Misc, Active_Orders.File_Date_Time

FROM Active_Orders INNER JOIN

Order_Status ON Active_Orders.Status_ID = Order_Status.Status_ID INNER JOIN

Locations ON Active_Orders.Location_ID = Locations.Location_ID

WHERE (Order_Status.Status = 'InProcess') OR

(Order_Status.Status = 'Pending') OR

(Order_Status.Status = 'OnHold') OR

(Order_Status.Status = 'C1') OR

(Order_Status.Status = 'C2') OR

(Order_Status.Status = 'C3') OR

(Order_Status.Status = 'C4') OR

(Order_Status.Status = 'C5') OR

(Order_Status.Status = 'C6') OR

(Order_Status.Status = 'C7')

ORDER BY Active_Orders.Stat DESC, Order_Status.Status DESC

I need to only return rows with unique Active_Orders.Account_Number. The way it is now, even if I have the same Account_Number, it will return both, because according the statment above, it still is a unique record despite the fact the account number is the same. The Order_ID is diffrent, Service_Date_time is different, etc.

So, How can I return all the fields above, but eliminate account numbers are the same?

Thanks!!

Rudy

Your query currently returns distinct rows.

You want it to return fewer rows than it does--specifically--only one row for each Account_Number. The question then is "Which rows do you want?" Unfortunately, there is no "any one but I don't care which one" aggregate in SQL, so you have to give a precise condition that can be evaluated in SQL to describe for a given account number, which row you want to see out of the many you're currently getting.

You might end up with something like (if your rule "the one with the latest File_Date_Time" value)

with YourQuery as (

rank() over (partition by Account_Number order by File_Date_Time desc) as rk,

<the rest of your current query>

)

select * from YourQuery as Q1

where rk = 1

If you aren't using SQL Server 2005, it's harder to write, but can still be done. In that case, you express this:

select <columns>

from <wherever>

where <whatever> as T1

and NOT EXISTS (

select *

from <same wherever> as T2

where <same whatever>

and T2.account_number = T1.account_number

and T2.File_Date_Time > T1.File_Date_Time

)

Steve Kass

Drew University

http://www.stevekass.com

|||

Just Try a Group statement instead

like this

SELECT a.* FROM mytable AS a

WHERE a.Account_Number IN(

SELECT B.Account_Number FROM (

SELECT Account_Number , count(Account_Number ) AS ACC

FROM mytable GROUP BY Account_Number

) AS B

WHERE B.ACC=1

)

So You'll get only the rows of The Account_numbers represented once in the table

I hope this will help

Best regards

Raimund

|||

Thank you guys for your suggestion!

Steve, I am using SQL 2005. I like the idea of the rule with the time. I'm actually going to have to put that in place anyway. If an order has the same account number within 2 minutes of the FILE_TIME, then only show one account number.

So do I put "with YourQuery as (

rank() over (partition by Account_Number order by File_Date_Time desc) as rk,

this on the very top of my procedure?

Then put in my query in,

then put in "select * from YourQuery as Q1

where rk = 1"

Sorry, I'm just sure of the order of this.

Thanks!

Rudy

sql

No comments:

Post a Comment