Wednesday, March 21, 2012

Distinct and Group By

*** This is a follow-up to my recent post "Distinct For Only Some Columns In
A Row".
Recently, I posted a question. That question was answered, but a new
question surfaced immediately afterwards. What follows is a combination of
my former post and the new question...
I'm trying to work with this query:
SELECT T1.UserID, T2.MiscID
FROM Table2 T2 INNER JOIN Table1 T1 ON T2.ID = T1.ID
WHERE (T2.Category IN (2, 3, 4, 5)) AND (T1.ZipCode IN (22201,22202,22203))
Based on the data in the tables, this query generates the following result:
UserID MiscID
10 105
10 107
11 109
11 120
11 122
The problem is that I do not want duplicate UserID's (I show only 10 and 11,
but the table actually has many rows with duplicate UserID values). Instead,
I want unique (distinct) UserID's and the lowest MiscID for each row. I
would therefore like the result set to look like this:
Using Group By, I get this query:
SELECT T1.UserID, MIN(T2.MiscID) AS MiscID
FROM Table1 T1 INNER JOIN
Table2 T2 ON T1.ID = T2.ID
WHERE (T2.Category IN (2, 3, 4, 5))
AND (T1.ZipCode IN (22201,22202,22203))
GROUP BY T1.UserID
UserID MiscID
10 105
11 109
This solves the need for non-duplicated UserID's. But now I require an
additional column in my query
results--but I still want only the unique UserID keys. My query is now:
SELECT T1.UserID, MIN(T2.MiscID) AS MiscID, T2.Age
FROM Table1 T1 INNER JOIN
Table2 T2 ON T1.ID = T2.ID
WHERE (T2.Category IN (2, 3, 4, 5))
AND (T1.ZipCode IN (22201,22202,22203))
GROUP BY T1.UserID, T2.Age
This is giving me the following results:
UserID MiscID Age
10 105 32
10 106 37
11 109 50
11 112 52
11 118 42
What I require of my results is only the minimum (MIN) MiscID for each
UserID--and its Age:
UserID MiscID Age
10 105 32
11 109 50
I need all three columns because they are returned to a third-tier (client)
application, but I can only have unique UserID values.
I think once I understand how to do this, I will be able to tackle similar
issues that would arise when I add more columns. If there are many ways to
approach this problem, I would prefer a more complex method that would
perform best. This is a query that will be used many times in an enterprise
environment, so response time is key.
Thank you so much for any help.
-- JimWe'd need to see your DDL for the tables involved. Here's a guess:
SELECT
x.UserID, x.MiscID, t3.Age
FROM
(
SELECT T1.UserID, MIN(T2.MiscID) AS MiscID
FROM Table1 T1 INNER JOIN
Table2 T2 ON T1.ID = T2.ID
WHERE (T2.Category IN (2, 3, 4, 5))
AND (T1.ZipCode IN (22201,22202,22203))
GROUP BY T1.UserID
) as x
JOIN Table2 t2 on t2.UserID = x.UserID and t2.MiscID = x.MiscID
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com
.
"Jim Little" <123@.yahoo.com> wrote in message
news:Ly4Ud.60429$911.14807@.fe2.texas.rr.com...
*** This is a follow-up to my recent post "Distinct For Only Some Columns In
A Row".
Recently, I posted a question. That question was answered, but a new
question surfaced immediately afterwards. What follows is a combination of
my former post and the new question...
I'm trying to work with this query:
SELECT T1.UserID, T2.MiscID
FROM Table2 T2 INNER JOIN Table1 T1 ON T2.ID = T1.ID
WHERE (T2.Category IN (2, 3, 4, 5)) AND (T1.ZipCode IN (22201,22202,22203))
Based on the data in the tables, this query generates the following result:
UserID MiscID
10 105
10 107
11 109
11 120
11 122
The problem is that I do not want duplicate UserID's (I show only 10 and 11,
but the table actually has many rows with duplicate UserID values). Instead,
I want unique (distinct) UserID's and the lowest MiscID for each row. I
would therefore like the result set to look like this:
Using Group By, I get this query:
SELECT T1.UserID, MIN(T2.MiscID) AS MiscID
FROM Table1 T1 INNER JOIN
Table2 T2 ON T1.ID = T2.ID
WHERE (T2.Category IN (2, 3, 4, 5))
AND (T1.ZipCode IN (22201,22202,22203))
GROUP BY T1.UserID
UserID MiscID
10 105
11 109
This solves the need for non-duplicated UserID's. But now I require an
additional column in my query
results--but I still want only the unique UserID keys. My query is now:
SELECT T1.UserID, MIN(T2.MiscID) AS MiscID, T2.Age
FROM Table1 T1 INNER JOIN
Table2 T2 ON T1.ID = T2.ID
WHERE (T2.Category IN (2, 3, 4, 5))
AND (T1.ZipCode IN (22201,22202,22203))
GROUP BY T1.UserID, T2.Age
This is giving me the following results:
UserID MiscID Age
10 105 32
10 106 37
11 109 50
11 112 52
11 118 42
What I require of my results is only the minimum (MIN) MiscID for each
UserID--and its Age:
UserID MiscID Age
10 105 32
11 109 50
I need all three columns because they are returned to a third-tier (client)
application, but I can only have unique UserID values.
I think once I understand how to do this, I will be able to tackle similar
issues that would arise when I add more columns. If there are many ways to
approach this problem, I would prefer a more complex method that would
perform best. This is a query that will be used many times in an enterprise
environment, so response time is key.
Thank you so much for any help.
-- Jim|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.
In particular, you have a "magical universal" id column in both tables
and a truly vague "misc_id" in another. That is absurd, since an
identifier has to identify a particular kind of entity.
The names table1 and Table2 give us no hint as to what they mean in the
data model, either. Can we get better specs?|||OK--Fair enough. I posted the "generic" tables because I'm not authorized to
give the actual ones. However, let me create some tables that illustrate my
issue and I think it could benefit everyone.
I'll upload those very soon.
Thanks,
-- Jim|||Hi Jim
Assuming you want the lowest ImageNo per user. Two possible (untested) ways:
SELECT T.UserID AS Expr1,
T.FirstName AS Expr2,
T.LastName AS Expr3,
M.LastLogon AS Expr4,
I.Description AS Expr5,
I.ImageNo AS Expr6,
I.PixelHeight AS Expr7,
I.PixelWidth AS Expr8,
I.FileName AS Expr9
FROM dbo.T_Images I
JOIN dbo.T_Members M ON I.UserID = M.UserID
JOIN dbo.T_Users U ON I.UserID = U.UserID
JOIN ( SELECT UserId, MIN(ImageNo) AS ImageNo FROM dbo.T_Images GROUP BY
UserId ) L ON I.UserID = L.UserID AND I.ImageNo = L.ImageNo
SELECT T.UserID AS Expr1,
T.FirstName AS Expr2,
T.LastName AS Expr3,
M.LastLogon AS Expr4,
I.Description AS Expr5,
I.ImageNo AS Expr6,
I.PixelHeight AS Expr7,
I.PixelWidth AS Expr8,
I.FileName AS Expr9
FROM dbo.T_Images I
JOIN dbo.T_Members M ON I.UserID = M.UserID
JOIN dbo.T_Users U ON I.UserID = U.UserID
WHERE I.ImageNo = ( SELECT MIN(L.ImageNo) FROM dbo.T_Images L WHERE I.UserID
= L.UserID )
You may want to look at example data as insert statements
http://vyaskn.tripod.com/code.htm#inserts
instead of attaching data files.
John
"Jim Little" <123@.yahoo.com> wrote in message
news:Bd6Ud.61154$911.53142@.fe2.texas.rr.com...
> Hello:
> I hope this helps: I've attached an image of the tables involved. I have
> also attached a text document with a Create script (no drops are included
> for safety in case you have the same table names in your db). I pasted the
> attached Create script below for convenience (it is the same as the
> text-file attachment).
> In addition, I attached three data files, one for each data table. These
> are ASCII delimitted format.
> The query I am using is:
> SELECT dbo.T_Users.UserID AS Expr1, dbo.T_Users.FirstName AS Expr2,
> dbo.T_Users.LastName AS Expr3, dbo.T_Members.LastLogon AS Expr4,
> dbo.T_Images.Description AS Expr5,
> dbo.T_Images.ImageNo AS Expr6, dbo.T_Images.PixelHeight AS Expr7,
> dbo.T_Images.PixelWidth AS Expr8,
> dbo.T_Images.FileName AS Expr9
> FROM dbo.T_Images INNER JOIN
> dbo.T_Members ON dbo.T_Images.UserID =
> dbo.T_Members.UserID INNER JOIN
> dbo.T_Users ON dbo.T_Images.UserID =
> dbo.T_Users.UserID
> This is giving me (sorry for the word wrapping):
> Expr1,Expr2,Expr3,Expr4,Expr5,Expr6,Expr
7,Expr8,Expr9
> 1,Jim ,Smith ,2005-01-05 00:00:00.000,Hawaii
> ,1,500,500,hawaii.jpg
> 1,Jim ,Smith ,2005-01-05 00:00:00.000,Grand
> Canyon ,2,350,350,gcanyon.jpg
> 1,Jim ,Smith ,2005-01-05 00:00:00.000,Time
> Square ,3,400,600,timessq.jpg
> 2,Steve ,Jones ,2005-01-22
> 00:00:00.000,Orlando ,2,500,500,orlando.jpg
> 2,Steve ,Jones ,2005-01-22 00:00:00.000,LA
> What I would like to see is:
> Expr1,Expr2,Expr3,Expr4,Expr5,Expr6,Expr
7,Expr8,Expr9
> 1,Jim ,Smith ,2005-01-05 00:00:00.000,Hawaii
> ,1,500,500,hawaii.jpg
> 2,Steve ,Jones ,2005-01-22
> 00:00:00.000,Orlando ,2,500,500,orlando.jpg
>
> Notice that the results I would like to see are unque to the first (Expr1,
> or UserID) column--but I get to see all the other columns that apply to
> that row.
> One idea I had is to first create a Select that selects only the UserIDs.
> Then, use those UserIDs in an "IN" or "EXISTS" clause.
> Thank you for any insights.
> Regards,
> -- Jim
> ===============
>
> CREATE TABLE [dbo].[T_Images] (
> [ImageID] [int] IDENTITY (1, 1) NOT NULL ,
> [UserID] [int] NOT NULL ,
> [ImageNo] [smallint] NOT NULL ,
> [Description] [char] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [PixelHeight] [smallint] NOT NULL ,
> [PixelWidth] [smallint] NOT NULL ,
> [FileName] [char] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE TABLE [dbo].[T_Members] (
> [UserID] [int] NOT NULL ,
> [LastLogon] [datetime] NOT NULL ,
> [MemberAlias] [char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [Password] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE TABLE [dbo].[T_Users] (
> [UserID] [int] IDENTITY (1, 1) NOT NULL ,
> [FirstName] [char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [LastName] [char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
> ) ON [PRIMARY]
> GO
>|||Looks great--Thank you John. I will try both and I'll post feedback either
way within a day.
Thanks,
-- Jim

No comments:

Post a Comment