Home » SQL & PL/SQL » SQL & PL/SQL » Update based on grouping (11.0.2.10)
Update based on grouping [message #660521] |
Mon, 20 February 2017 04:41 |
|
OraFerro
Messages: 433 Registered: July 2011
|
Senior Member |
|
|
Hi All,
I have the following case of duplication that I need to resolve.
The following code resolves the problem by adding one sequence to all cases:
create table test_rem_dup
(
id number primary key,
fk_id number,
v1 char(22),
v2 char(10)
);
insert ALL
into TEST_REM_DUP values (10,100,'ppp','ooo')
into TEST_REM_DUP values (20,100,'ppp','ooo')
into TEST_REM_DUP values (30,300,'ccc','')
into TEST_REM_DUP values (40,300,'cc','')
into TEST_REM_DUP values (50,300,'cc','k')
into TEST_REM_DUP values (60,300,'cc','k')
into TEST_REM_DUP values (70,300,'cc','')
into TEST_REM_DUP values (80,300,'cc','')
into TEST_REM_DUP values (90,400,'uu','')
into TEST_REM_DUP values (100,100,'xx','')
select * from dual;
drop sequence seq_temp;
create sequence seq_temp start with 1;
update TEST_REM_DUP p set v2 = trim(v2) || '*IT' || seq_temp.NEXTVAL
where (p.ID) in
(
select ID from TEST_REM_DUP p, (select FK_ID, v1, v2 from TEST_REM_DUP group by FK_ID, v1, v2 having count(*)>1) src
where p.FK_ID = src.FK_ID
and trim(p.V1) = trim(src.V1)
and (
( trim(p.V2) = trim(src.v2) )
OR
(trim(p.V2) is null and trim(src.v2) is null)
)
);
However, I want to number all duplicate cases with the same fk_ID seprately such as:
ID FK_ID V1 V2
10 100 ppp ooo*IT1
20 100 ppp ooo*IT2
30 300 ccc (null)
40 300 cc *IT1
50 300 cc k*IT1
60 300 cc k*IT2
70 300 cc *IT2
80 300 cc *IT3
90 400 uu (null)
100 100 xx (null)
Thanks,
Ferro
[Updated on: Mon, 20 February 2017 04:42] Report message to a moderator
|
|
|
Re: Update based on grouping [message #660523 is a reply to message #660521] |
Mon, 20 February 2017 05:07 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
SQL> select id, fk_id, v1,
2 case
3 when count(*) over (partition by fk_id, v1) = 1 then null
4 else v2||'*IT'||row_number() over (partition by fk_id, v1, v2 order by id)
5 end v2
6 from TEST_REM_DUP
7 order by 1
8 /
ID FK_ID V1 V2
---------- ---------- ----- ----------
10 100 ppp ooo*IT1
20 100 ppp ooo*IT2
30 300 ccc
40 300 cc *IT1
50 300 cc k*IT1
60 300 cc k*IT2
70 300 cc *IT2
80 300 cc *IT3
90 400 uu
100 100 xx
[Updated on: Mon, 20 February 2017 05:09] Report message to a moderator
|
|
|
Re: Update based on grouping [message #660545 is a reply to message #660523] |
Mon, 20 February 2017 21:53 |
|
OraFerro
Messages: 433 Registered: July 2011
|
Senior Member |
|
|
Thanks Michel,
I tend to forget about the partitioning option every time!
My update query is:
MERGE INTO TEST_REM_DUP p
USING (
select id, fk_id, v1,
case
when count(*) over (partition by fk_id, v1,v2) = 1 then null
else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
end v2_new
from TEST_REM_DUP
) Src
ON (p.id = Src.id)
WHEN MATCHED THEN UPDATE SET p.v2 = src.v2_new;
For the sake of knowledge, Is there a way to do it using normal UPDATE clause instead of merge?
Thanks,
Ferro
|
|
|
|
|
|
|
|
|
|
Re: Update based on grouping [message #660554 is a reply to message #660553] |
Tue, 21 February 2017 02:14 |
|
OraFerro
Messages: 433 Registered: July 2011
|
Senior Member |
|
|
Sure and sorry for any confusion my words might have caused:
drop table test_rem_dup;
create table test_rem_dup
(
id number primary key,
fk_id number,
v1 char(22),
v2 char(2)
);
insert ALL
into TEST_REM_DUP values (10,100,'ppp','o')
into TEST_REM_DUP values (20,100,'ppp','o')
into TEST_REM_DUP values (30,300,'ccc','')
into TEST_REM_DUP values (40,300,'cc','')
into TEST_REM_DUP values (50,300,'cc','k')
into TEST_REM_DUP values (60,300,'cc','k')
into TEST_REM_DUP values (70,300,'cc','')
into TEST_REM_DUP values (80,300,'cc','')
into TEST_REM_DUP values (90,400,'uu','')
into TEST_REM_DUP values (100,100,'xx','l')
into TEST_REM_DUP values (110,100,'ppp','o2')
select * from dual;
update TEST_REM_DUP p set v2 =
(
select v2_new
FROM
(
select id, fk_id, v1,
case
when count(*) over (partition by fk_id, v1,v2) = 1 then v2
else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
end v2_new
from TEST_REM_DUP
) src
where p.ID = src.ID
)
;
MERGE INTO TEST_REM_DUP p
USING (
select id, fk_id, v1,
case
when count(*) over (partition by fk_id, v1,v2) = 1 then V2
else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
end v2_new
from TEST_REM_DUP
) Src
ON (p.id = Src.id)
WHEN MATCHED THEN UPDATE SET p.v2 = src.v2_new;
ALTER TABLE TEST_REM_DUP ADD CONSTRAINT UNQ_v1_v2 UNIQUE (v1, v2);
The constraint will fail due to:
1 ORA-02299: cannot validate (AFESD2.UNQ_V1_V2) - duplicate keys found
as the newly updated value (o2) created a new duplicate with an already existing value. I am trying to also avoid that.
Thanks,
Ferro
|
|
|
Re: Update based on grouping [message #660555 is a reply to message #660554] |
Tue, 21 February 2017 02:55 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
What is the purpose of your MERGE/UPDATE? Fix the current data (one time pass) or is it the permanent that should contain the table?
In the later case, why don't you insert the correct data instead?
In the former, how do you handle the case there are more than 9 or 99 duplicated rows?
In addition, define "duplicated rows", in my query I partitioned by "fk_id, v1" (which was I thought the duplicate condition from your first post), you partitioned by "fk_id, v1, v2".
What is the correct case?
In short, post a complete and deterministic specification of your issue.
|
|
|
|
|
Re: Update based on grouping [message #660563 is a reply to message #660558] |
Tue, 21 February 2017 03:39 |
|
OraFerro
Messages: 433 Registered: July 2011
|
Senior Member |
|
|
Dear Michel,
In the later example, the merge/update script will add rows that will create duplication with already existing rows.
The data before running the merge is:
ID FK_ID V1 V2
10 100 ppp o
20 100 ppp o
30 300 ccc (null)
40 300 cc (null)
50 300 cc k
60 300 cc k
70 300 cc (null)
80 300 cc (null)
90 400 uu (null)
100 100 xx l
110 100 ppp o2
running constraint will fail:
ALTER TABLE TEST_REM_DUP ADD CONSTRAINT UNQ_v1_v2 UNIQUE (fk_id, v1, v2);
ORA-02299: cannot validate (AFESD2.UNQ_V1_V2) - duplicate keys found
After running the merge statement:
MERGE INTO TEST_REM_DUP p
USING (
select id, fk_id, v1,
case
when count(*) over (partition by fk_id, v1,v2) = 1 then V2
else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
end v2_new
from TEST_REM_DUP
) Src
ON (p.id = Src.id)
WHEN MATCHED THEN UPDATE SET p.v2 = src.v2_new;
the data became:
ID FK_ID V1 V2
10 100 ppp o1
20 100 ppp o2
30 300 ccc (null)
40 300 cc 1
50 300 cc k1
60 300 cc k2
70 300 cc 2
80 300 cc 3
90 400 uu (null)
100 100 xx l
110 100 ppp o2
Now although all previous duplicate cases have been solved, a new one was created between
ID FK_ID V1 V2
20 100 ppp o2
110 100 ppp o2
and the constraint will fail for the same reason. This is the difference between the former and the later examples. I am trying to avoid this case in my script.
Thanks,
Ferro
|
|
|
Re: Update based on grouping [message #660564 is a reply to message #660563] |
Tue, 21 February 2017 03:45 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Ah OK, I understand now.
In this case, I should either add a new column instead of updating the original one or use a temporary table, if you don't want to modify the original one and then determine the generated duplicated rows.
You have to specify what you want to do in this specific case.
Anyway, depending on what you want to do, it will most likely be a 2-pass process not a single statement.
[Updated on: Tue, 21 February 2017 03:47] Report message to a moderator
|
|
|
|
|
|
Re: Update based on grouping [message #660575 is a reply to message #660573] |
Tue, 21 February 2017 04:51 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
So your option is to change the new o2 to o3 or the first one that currently does not exist?
You could loop until no more duplicates exist. I have to go so can't do it now but the plan is: "after the first is made, loop "incrementing" the modified value until there is no more duplicates" (and so the necessary new column to compare to other old one). The number of rows should decrease at each loop.
[Updated on: Tue, 21 February 2017 04:53] Report message to a moderator
|
|
|
Re: Update based on grouping [message #660576 is a reply to message #660573] |
Tue, 21 February 2017 06:35 |
|
Barbara Boehmer
Messages: 9097 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
If the numbers are used only to eliminate duplication and it does not matter which id ends up with v2 value of o2 or o3, then you could do it all in one update, by trimming the trailing numbers during comparison and update, as demonstrated below.
SCOTT@orcl_12.1.0.2.0> SELECT * FROM test_rem_dup
2 /
ID FK_ID V1 V2
---------- ---------- ---------------------- --
10 100 ppp o
20 100 ppp o
30 300 ccc
40 300 cc
50 300 cc k
60 300 cc k
70 300 cc
80 300 cc
90 400 uu
100 100 xx l
110 100 ppp o2
11 rows selected.
SCOTT@orcl_12.1.0.2.0> UPDATE test_rem_dup p
2 SET p.v2 =
3 (SELECT v2_new
4 FROM (SELECT id,
5 CASE
6 WHEN COUNT (*) OVER (PARTITION BY fk_id, v1, RTRIM(v2,'1234567890 ')) = 1 THEN
7 RTRIM(v2,'1234567890 ')
8 ELSE RTRIM(v2,'123456790 ')
9 || ROW_NUMBER () OVER (PARTITION BY fk_id, v1, RTRIM(v2,'1234567890 ') ORDER BY id)
10 END v2_new
11 FROM test_rem_dup) src
12 WHERE p.id = src.id)
13 /
11 rows updated.
SCOTT@orcl_12.1.0.2.0> SELECT * FROM test_rem_dup
2 /
ID FK_ID V1 V2
---------- ---------- ---------------------- --
10 100 ppp o1
20 100 ppp o2
30 300 ccc
40 300 cc 1
50 300 cc k1
60 300 cc k2
70 300 cc 2
80 300 cc 3
90 400 uu
100 100 xx l
110 100 ppp o3
11 rows selected.
SCOTT@orcl_12.1.0.2.0> ALTER TABLE test_rem_dup ADD CONSTRAINT unq_v1_v2 UNIQUE (fk_id, v1, v2)
2 /
Table altered.
[Updated on: Tue, 21 February 2017 06:41] Report message to a moderator
|
|
|
|
Re: Update based on grouping [message #660599 is a reply to message #660598] |
Tue, 21 February 2017 22:58 |
|
Barbara Boehmer
Messages: 9097 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
SCOTT@orcl_12.1.0.2.0> SELECT * FROM test_rem_dup
2 /
ID FK_ID V1 V2
---------- ---------- ---------------------- --
10 100 ppp o
20 100 ppp o
30 300 ccc
40 300 cc
50 300 cc k
60 300 cc k
70 300 cc
80 300 cc
90 400 uu
100 100 xx l
110 100 ppp o2
11 rows selected.
SCOTT@orcl_12.1.0.2.0> DECLARE
2 v_seq NUMBER;
3 v_id NUMBER;
4 v_fk_id NUMBER;
5 v_v1 CHAR(22);
6 v_v2 CHAR(2);
7 BEGIN
8 FOR src IN
9 (SELECT id, fk_id, v1, v2
10 FROM (SELECT id, fk_id, v1, v2,
11 COUNT (*) OVER (PARTITION BY fk_id, v1, v2) cnt
12 FROM test_rem_dup)
13 WHERE cnt > 1
14 ORDER BY fk_id, v1, v2, id)
15 LOOP
16 IF v_id IS NULL OR v_fk_id != src.fk_id OR v_v1 != src.v1
17 OR NVL(v_v2,'XX') != NVL(src.v2,'XX') THEN
18 v_seq := 1;
19 ELSE
20 v_seq := v_seq + 1;
21 END IF;
22 LOOP
23 UPDATE test_rem_dup p
24 SET p.v2 = TRIM (p.v2) || v_seq
25 WHERE p.id = src.id
26 AND NOT EXISTS
27 (SELECT t.*
28 FROM test_rem_dup t
29 WHERE t.fk_id = p.fk_id
30 AND t.v1 = p.v1
31 AND t.v2 = TRIM (p.v2) || v_seq);
32 IF SQL%ROWCOUNT > 0 THEN EXIT;
33 ELSE v_seq := v_seq + 1;
34 END IF;
35 END LOOP;
36 v_id := src.id;
37 v_fk_id := src.fk_id;
38 v_v1 := src.v1;
39 v_v2 := src.v2;
40 END LOOP;
41 END;
42 /
PL/SQL procedure successfully completed.
SCOTT@orcl_12.1.0.2.0> SELECT * FROM test_rem_dup ORDER BY id
2 /
ID FK_ID V1 V2
---------- ---------- ---------------------- --
10 100 ppp o1
20 100 ppp o3
30 300 ccc
40 300 cc 1
50 300 cc k1
60 300 cc k2
70 300 cc 2
80 300 cc 3
90 400 uu
100 100 xx l
110 100 ppp o2
11 rows selected.
SCOTT@orcl_12.1.0.2.0> ALTER TABLE test_rem_dup ADD CONSTRAINT unq_v1_v2 UNIQUE (fk_id, v1, v2)
2 /
Table altered.
|
|
|
Re: Update based on grouping [message #660604 is a reply to message #660575] |
Wed, 22 February 2017 02:05 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Here's the way I had in mind (I added one more duplicated row):
SQL> alter table test_rem_dup add v3 char(2);
Table altered.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 V3
---------- ---------- ---------------------- -- --
10 100 ppp o
20 100 ppp o
30 300 ccc
40 300 cc
50 300 cc k
60 300 cc k
70 300 cc
80 300 cc
90 400 uu
100 100 xx l
110 100 ppp o2
120 100 ppp o
12 rows selected.
SQL> merge into test_rem_dup t1
2 using (select id,
3 case
4 when count(*) over (partition by fk_id, v1, v2) = 1 then v2
5 else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
6 end v3
7 from TEST_REM_DUP) t2
8 on (t2.id = t1.id)
9 when matched then update set t1.v3 = t2.v3
10 /
12 rows merged.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 V3
---------- ---------- ---------------------- -- --
10 100 ppp o o1
20 100 ppp o o2
30 300 ccc
40 300 cc 1
50 300 cc k k1
60 300 cc k k2
70 300 cc 2
80 300 cc 3
90 400 uu
100 100 xx l l
110 100 ppp o2 o2
120 100 ppp o o3
12 rows selected.
SQL> begin
2 loop
3 update test_rem_dup t1
4 set v3 = rtrim(v3,'1234567890 ')
5 || to_char(to_number(regexp_substr(v3,'\d+ *$'))+1)
6 where nvl(v3,'mc') != nvl(v2,'mc')
7 and exists (select v2 from test_rem_dup t2
8 where t2.v3 = t1.v3
9 and ( ( t2.id != t1.id and t2.v3 = t2.v2 )
10 or ( t2.id < t1.id and t2.v3 != t2.v2 ) ));
11 exit when sql%rowcount = 0;
12 end loop;
13 end;
14 /
PL/SQL procedure successfully completed.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 V3
---------- ---------- ---------------------- -- --
10 100 ppp o o1
20 100 ppp o o3
30 300 ccc
40 300 cc 1
50 300 cc k k1
60 300 cc k k2
70 300 cc 2
80 300 cc 3
90 400 uu
100 100 xx l l
110 100 ppp o2 o2
120 100 ppp o o4
12 rows selected.
SQL> alter table test_rem_dup drop column v2;
Table altered.
SQL> alter table test_rem_dup rename column v3 to v2;
Table altered.
SQL> ALTER TABLE test_rem_dup ADD CONSTRAINT unq_v1_v2 UNIQUE (fk_id, v1, v2);
Table altered.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2
---------- ---------- ---------------------- --
10 100 ppp o1
20 100 ppp o3
30 300 ccc
40 300 cc 1
50 300 cc k1
60 300 cc k2
70 300 cc 2
80 300 cc 3
90 400 uu
100 100 xx l
110 100 ppp o2
120 100 ppp o4
12 rows selected.
Now a question raises: "the number in v2 are no more in the same order than id, is this a problem?".
You have to choose: either the order of "v2" is not the same than "id" (in each partition), either you have to modify the original "v2" to keep this order.
[Updated on: Wed, 22 February 2017 02:09] Report message to a moderator
|
|
|
Re: Update based on grouping [message #660607 is a reply to message #660604] |
Wed, 22 February 2017 04:49 |
|
OraFerro
Messages: 433 Registered: July 2011
|
Senior Member |
|
|
@Barbara
@Michel
Thanks a lot.
@Michel, right you are considering the example. The production table has a date field and a description field that make different meanings to each row. Also in my script instead of dropping V3 I will replace it with IS_AUTO_UPDATED to indicate fields that were changed by IT until user review is completed then will drop it.
Thanks a lot,
Ferro
[Updated on: Wed, 22 February 2017 04:50] Report message to a moderator
|
|
|
|
Re: Update based on grouping [message #660609 is a reply to message #660608] |
Wed, 22 February 2017 05:24 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
You can change my script to match your new requirements like:
SQL> alter table test_rem_dup add IS_AUTO_UPDATED number(1);
Table altered.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 IS_AUTO_UPDATED
---------- ---------- ---------------------- -- ---------------
10 100 ppp o
20 100 ppp o
30 300 ccc
40 300 cc
50 300 cc k
60 300 cc k
70 300 cc
80 300 cc
90 400 uu
100 100 xx l
110 100 ppp o2
120 100 ppp o
12 rows selected.
SQL> merge into test_rem_dup t1
2 using (select id,
3 case
4 when count(*) over (partition by fk_id, v1, v2) = 1 then v2
5 else trim(v2)||row_number() over (partition by fk_id, v1, v2 order by id)
6 end v2,
7 case
8 when count(*) over (partition by fk_id, v1, v2) = 1 then 0
9 else 1
10 end flag
11 from TEST_REM_DUP) t2
12 on (t2.id = t1.id)
13 when matched then update set t1.v2 = t2.v2, t1.IS_AUTO_UPDATED = t2.flag
14 /
12 rows merged.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 IS_AUTO_UPDATED
---------- ---------- ---------------------- -- ---------------
10 100 ppp o1 1
20 100 ppp o2 1
30 300 ccc 0
40 300 cc 1 1
50 300 cc k1 1
60 300 cc k2 1
70 300 cc 2 1
80 300 cc 3 1
90 400 uu 0
100 100 xx l 0
110 100 ppp o2 0
120 100 ppp o3 1
12 rows selected.
SQL> begin
2 loop
3 update test_rem_dup t1
4 set v2 = rtrim(v2,'1234567890 ')
5 || to_char(to_number(regexp_substr(v2,'\d+ *$'))+1)
6 where IS_AUTO_UPDATED = 1
7 and exists (select v2 from test_rem_dup t2
8 where t2.v2 = t1.v2
9 and ( ( t2.id != t1.id and t2.IS_AUTO_UPDATED = 0 )
10 or ( t2.id < t1.id and t2.IS_AUTO_UPDATED = 1 ) ));
11 exit when sql%rowcount = 0;
12 end loop;
13 end;
14 /
PL/SQL procedure successfully completed.
SQL> select * from TEST_REM_DUP order by 1;
ID FK_ID V1 V2 IS_AUTO_UPDATED
---------- ---------- ---------------------- -- ---------------
10 100 ppp o1 1
20 100 ppp o3 1
30 300 ccc 0
40 300 cc 1 1
50 300 cc k1 1
60 300 cc k2 1
70 300 cc 2 1
80 300 cc 3 1
90 400 uu 0
100 100 xx l 0
110 100 ppp o2 0
120 100 ppp o4 1
12 rows selected.
|
|
|
Goto Forum:
Current Time: Fri Jun 28 09:52:05 CDT 2024
|