Home » SQL & PL/SQL » SQL & PL/SQL » percentages of data population (Oracle 11.2.0.3)
percentages of data population [message #652027] |
Tue, 31 May 2016 07:50 |
pointers
Messages: 451 Registered: May 2008
|
Senior Member |
|
|
Hi,
I have two tables 1. master 2. lookup
CREATE TABLE SCOTT.MASTER
(
ID NUMBER,
NAME VARCHAR2(20 BYTE),
LOC_ID VARCHAR2(20 BYTE),
REC_ID NUMBER
)
/
ID and NAME are key columns in the master table i.e. id and name forms a unique record and to get rid of duplicates.
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(1, 's', '1000', 100);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(2, NULL, '2000', NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(NULL, NULL, NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(NULL, 'i', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(NULL, 'k', NULL, 200);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(6, NULL, NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(7, NULL, '4000', 400);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(8, NULL, NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(1, 's', '1000', 100);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
COMMIT;
CREATE TABLE SCOTT.LOOKUP
(
RECORDID NUMBER,
COLUMN_NAME VARCHAR2(20 BYTE)
)
/
Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
Values(100, 'id');
Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
Values(100, 'name');
Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
Values(200, 'name');
Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
Values(400, 'id');
Below query gives the unique records of master table -
SELECT m.*
FROM (SELECT ID, NAME, MAX (ROWID) rid
FROM MASTER
GROUP BY ID, NAME) i,
MASTER m
WHERE i.rid = m.ROWID
Now I would like to write a query to pull pecentage for each column of master table where the percentage is calculated
(number of not null values for a column) / (total number of unique rows)
ID column got total 9 records (after above query using rowid) and it got 6 rows populated and 3 rows are null
so the percentage for id is 6/9*100
I could achieve this by using -
SELECT count(m.id)/count(*) "%id", count(m.name)/count(*) "%name",count(m.loc_id)/count(*) "%loc_id"
FROM (SELECT ID, NAME, MAX (ROWID) rid
FROM MASTER
GROUP BY ID, NAME) i,
MASTER m
WHERE i.rid = m.ROWID
but now, I would like calculate the percentages after the below change -
I would like to consider the "master" table column value as NULL for a record if the rec_id of master table is mapped to any value in the "lookup" table.
and lookup table contains the column for which it need to be considered as NULL.
for e.g. rec_id=100 of master table is mapped to 2 columns in the lookup table i.e. id and name so id and name for this record need to be considered as null.
where as rec_id=200 of master table is mapped to only one column i.e. name. so name need to be considered as null.
Thank you in advance.
Regards,
Pointers
|
|
|
Re: percentages of data population [message #652052 is a reply to message #652027] |
Tue, 31 May 2016 09:39 |
|
Michel Cadot
Messages: 68693 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Quote:ID and NAME are key columns in the master table i.e. id and name forms a unique record and to get rid of duplicates.
So how this is possible:
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
Values(9, 'z', NULL, NULL);
1/ Do NOT post your statement with a schema, tablespace or other specific stuff like that, why should be put garbage in my SCOTT schema?
2/ Create your tables with all the constraints, this will prevent your from posting wring stuff
3/ What does MAX(ROWID) (and ROWID itself) has to do with the question?
4/ What is the purpose of LOC_ID and REC_ID in the problem? They are mentioned nowhere.
[Updated on: Tue, 31 May 2016 09:40] Report message to a moderator
|
|
|
Re: percentages of data population [message #652053 is a reply to message #652052] |
Tue, 31 May 2016 10:29 |
pointers
Messages: 451 Registered: May 2008
|
Senior Member |
|
|
- Sorry for the schema names.
- Its a warehouse environment. There are no constraints but as per the business ID and NAME forms a unique record.
-loc_id is just another column for which we calculate percentage.
- ROWID is used to pick the latest record incase of duplicate records w.r.t ID and NAME.
- REC_ID is a column which is used to join lookup table (recordid column)
Please let me know your suggestion.
Thank you.
Regards,
Pointers
|
|
|
|
|
Re: percentages of data population [message #652057 is a reply to message #652053] |
Tue, 31 May 2016 11:41 |
John Watson
Messages: 8951 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
This is off topic for your question, but this statement
Quote:- Its a warehouse environment. There are no constraints but as per the business ID and NAME forms a unique record
is awful. If you do not define constraints, you are crippling the optimizer. Constraints are not just about data integrity, you know: they give the optimizer the information it needs to develop good execution plans. In a warehouse, this is even more important than in TP systems.
|
|
|
|
|
Re: percentages of data population [message #652061 is a reply to message #652059] |
Tue, 31 May 2016 12:23 |
|
Michel Cadot
Messages: 68693 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Ignoring your "key", assuming there is none, and ignoring for a first step the lookup table, the percentages query is:
SQL> select round(100*count(distinct id)/count(*),2) "%id",
2 round(100*count(distinct name)/count(*),2) "%name",
3 round(100*count(distinct loc_id)/count(*),2) "%loc_id"
4 from master
5 /
%id %name %loc_id
---------- ---------- ----------
46.15 30.77 23.08
Now your lookup table is just a way to dynamically modify the master table, so replace in the query the master table by an inline view reflecting these changes).
First you have to pivot the lookup table:
SQL> select recordid, id, name, loc_id
2 from lookup
3 pivot (max(column_name)
4 for column_name in ('id' "ID", 'name' "NAME", 'loc_id' "LOC_ID"))
5 /
RECORDID ID NAME LOC_ID
---------- -------------------- -------------------- --------------------
100 id name
400 id
200 name
Then join this to master table get the new master table:
SQL> select nvl2(l.id, null, m.id) id,
2 nvl2(l.name, null, m.name) name,
3 nvl2(l.loc_id, null, m.loc_id) loc_id,
4 rec_id
5 from master m,
6 ( select recordid, id, name, loc_id
7 from lookup
8 pivot (max(column_name)
9 for column_name in ('id' "ID", 'name' "NAME", 'loc_id' "LOC_ID")) ) l
10 where l.recordid (+) = m.rec_id
11 /
ID NAME LOC_ID REC_ID
---------- -------------------- -------------------- ----------
1000 100
1000 100
4000 400
200
9 z
9 z
9 z
9 z
8
6
i
2 2000
compare to:
SQL> select * from master;
ID NAME LOC_ID REC_ID
---------- -------------------- -------------------- ----------
1 s 1000 100
2 2000
i
k 200
6
7 4000 400
8
1 s 1000 100
9 z
9 z
9 z
9 z
Is this correct?
|
|
|
Re: percentages of data population [message #652062 is a reply to message #652061] |
Tue, 31 May 2016 12:57 |
pointers
Messages: 451 Registered: May 2008
|
Senior Member |
|
|
I cant stop myself thanking you Micheal.
This should really solve the issue.
I would test it tomorrow and let you know.
But as I believe this should help me.
Thank you very much again.
Just curious is there any other way to achieve the same (Just to learn other way. I am learning new techniques from you )
Regards,
Pointers
|
|
|
|
Goto Forum:
Current Time: Sat Sep 28 13:51:20 CDT 2024
|