2012年2月26日星期日

Full join problem...

Hi,
My scenario is that i do a full join for check after duplicates name with
the difference function.
The problem is that i want to set a group value for each probably
duplicates.
After that i can set a distinct or something.
Help.
Here are the result i want to get help with (group value column):
name,id,name,id,diff_value,group_value
Smyth 2 Smythe 1 4 ?
Smith 3 Smythe 1 4 ?
Smythe 1 Smyth 2 4 ?
Smith 3 Smyth 2 4 ?
Smythe 1 Smith 3 4 ?
Smyth 2 Smith 3 4 ?
Anderson 5 Andersson 4 4 ?
Anderzon 6 Andersson 4 4 ?
Andersson 4 Anderson 5 4 ?
Anderzon 6 Anderson 5 4 ?
Andersson 4 Anderzon 6 4 ?
Anderson 5 Anderzon 6 4 ?
For all Smith spelling a want to set a group_value 1,
for anderson group_value 2.
And so on.
Here are the ddl example.
Create Table table1
(
ID Integer Identity(1,1),
name varchar(50)
)
Insert Into table1 (name)
Values ('Smythe')
Insert Into table1 (name)
Values ('Smyth')
Insert Into table1 (name)
Values ('Smith')
Insert Into table1 (name)
Values ('Andersson')
Insert Into table1 (name)
Values ('Anderson')
Insert Into table1 (name)
Values ('Anderzon')
And the query:
select a.name,a.id,b.name,b.id,DIFFERENCE(a.name,b.name) as diff_value,'?'
as group_value from
table1 as a,table1 as b
where DIFFERENCE(a.name,b.name)>2 and a.id<>b.id
Thanx
// twHi
I am not sure why you are want all (a,b) and (b,a) tuples as they are
effectively duplicates?
If not then maybe you can use a.id as the grouping value
e.g.
select a.name,a.id,b.name,b.id,DIFFERENCE(a.name,b.name) as diff_value,a.id
as group_value
from table1 as a
left join table1 as b ON a.id<b.id
WHERE DIFFERENCE(a.name,b.name)>2
John
"tw" <tw@.tactics.se> wrote in message
news:OMkWVRSlFHA.3828@.TK2MSFTNGP12.phx.gbl...
> Hi,
> My scenario is that i do a full join for check after duplicates name with
> the difference function.
> The problem is that i want to set a group value for each probably
> duplicates.
> After that i can set a distinct or something.
> Help.
> Here are the result i want to get help with (group value column):
> name,id,name,id,diff_value,group_value
> Smyth 2 Smythe 1 4 ?
> Smith 3 Smythe 1 4 ?
> Smythe 1 Smyth 2 4 ?
> Smith 3 Smyth 2 4 ?
> Smythe 1 Smith 3 4 ?
> Smyth 2 Smith 3 4 ?
> Anderson 5 Andersson 4 4 ?
> Anderzon 6 Andersson 4 4 ?
> Andersson 4 Anderson 5 4 ?
> Anderzon 6 Anderson 5 4 ?
> Andersson 4 Anderzon 6 4 ?
> Anderson 5 Anderzon 6 4 ?
> For all Smith spelling a want to set a group_value 1,
> for anderson group_value 2.
> And so on.
> Here are the ddl example.
> Create Table table1
> (
> ID Integer Identity(1,1),
> name varchar(50)
> )
> Insert Into table1 (name)
> Values ('Smythe')
> Insert Into table1 (name)
> Values ('Smyth')
> Insert Into table1 (name)
> Values ('Smith')
> Insert Into table1 (name)
> Values ('Andersson')
> Insert Into table1 (name)
> Values ('Anderson')
> Insert Into table1 (name)
> Values ('Anderzon')
>
> And the query:
> select a.name,a.id,b.name,b.id,DIFFERENCE(a.name,b.name) as diff_value,'?'
> as group_value from
> table1 as a,table1 as b
> where DIFFERENCE(a.name,b.name)>2 and a.id<>b.id
> Thanx
> // tw
>|||Thanx for the help.
But i want all "Smythe" spellings in the same group,
and all "Andersson" in the same.
Is that possible?
Now is the group value 1 and 2 for "Smith" and 4 and 5 for "Andersson".
// tw
"John Bell" <jbellnewsposts@.hotmail.com> skrev i meddelandet
news:OOVutGTlFHA.572@.TK2MSFTNGP15.phx.gbl...
> Hi
> I am not sure why you are want all (a,b) and (b,a) tuples as they are
> effectively duplicates?
> If not then maybe you can use a.id as the grouping value
> e.g.
> select a.name,a.id,b.name,b.id,DIFFERENCE(a.name,b.name) as
> diff_value,a.id as group_value
> from table1 as a
> left join table1 as b ON a.id<b.id
> WHERE DIFFERENCE(a.name,b.name)>2
> John
> "tw" <tw@.tactics.se> wrote in message
> news:OMkWVRSlFHA.3828@.TK2MSFTNGP12.phx.gbl...
>|||Hi
This may have already be posted but I can't be certain!
The group value for Smythe is 1
The group value for Smyth is 2
The group value for Andersson is 4
The group value for Anderson is 5
If you wish to remove the child group values then you could do
something like:
SELECT P.ParentName, P.parent_id, P.Childname, P.Child_id,
P.diff_value, P.GroupValue
FROM
( SELECT a.name as ParentName,a.id as parent_id,b.name as Childname,
b.id as Child_id,DIFFERENCE(a.name,b.name) as diff_value, A.id AS
GroupValue
from table1 as a
left join table1 as b ON a.id<b.id
WHERE DIFFERENCE(a.name,b.name)>2 ) P
WHERE P.parent_id NOT IN ( SELECT b.id FROM table1 as a
left join table1 as b ON a.id<b.id
WHERE DIFFERENCE(a.name,b.name)>2 )
But the ParentName will depend on the order of the ids.
John

没有评论:

发表评论