Re: CS614 Assignment 3 Solution and Discussion
Assignment No. 3
Semester: Spring 2020
CS614 – Data Warehousing
Total Marks: 10
Due Date:
July 27, 2020
Objectives:
After completing this assignment, the students will be able to:
• compare Parallel Processing and Serial Processing
• describe what and when to Parallelize
• calculate Speed up using Amdahl’s Law
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
• Assignment is submitted after due date.
• Submitted assignment does not open or file is corrupt.
• Assignment is copied (From internet/ to from students).
• Assignment is submitted other than word format (.doc, .docx).
Assignment
Scenario:
XYZ is a fabric manufacturing company. This company is planning to implement DWH for its existing OLTP system implemented in 300 + stores all over Pakistan. Size of company’s Database is approx. 298 GB that grows at the rate of 0.8 GB per day approximately. Company has designed a state of the art DWH in its head office in terms of hardware and software. Hardware consists of 7 Systems having quad-core processors. In ideal situation, a complex query would return results in 27 minutes on single processor (serial execution).
After analyzing the scenario given above, you are required to answer the following questions;
Question # 1 – If a complex query is executed parallel on 6 single core processors then what would be the quantified speed up time.
Question # 2 – Calculate Speed up ratio using Amdahl’s Law, if 80% of query processing is done through parallel execution on DWH hardware.
Deadline:
Your assignment must be uploaded on VULMS on or before July 27, 2020. While July 28, 2020 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Re: CS614 Assignment No.2 Solution and Discussion
Assignment No. 2
Semester: Spring 2020
CS614 – Data Warehousing
Total Marks: 15
Due Date:
June 17, 2020
Objectives:
After completing this assignment, the students will be able to:
• De-Normalize the given table using horizontal splitting technique
• Calculate the Total space used with normalization.
• Calculate the Total space used after de-normalization.
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
• Assignment is submitted after due date.
• Submitted assignment does not open or file is corrupt.
• Assignment is copied (From internet/ to from students).
• Assignment is submitted other than word format (.doc, .docx).
Assignment
Question No. 1
Consider the following table having the information of students of a university:
Student ID Student Name Campus ID Student Age Degree Program
1 Ali VLHR01 27 MS
2 Kamran VISB01 24 BS
3 Akmal VRWP01 24 BS
4 Ahmad VLHR01 26 MS
5 Rehan VISB01 23 BS
6 Rizwan VRWP01 29 MS
7 Umer VISB01 25 BS
8 Javed VLHR01 26 MS
You are required to completely de-normalize the above table using “horizontal splitting” on the basis of Degree Program.
Question No. 2
Consider the following normalized tables for a telecommunication company showing the daily call record details of customers:
Customer_ID Customer Phone No. Balance
1 033XXXXX 300
2 033YYYYY 250
3 033ZZZZZZ 300
4 033AAAAA 1000
5 033BBBBB 80
6 033CCCCC 554
…
… …
Call_ID Customer_ID Dialed Phone Number Duration Call Charges
1 1 032ABCVD 1 minute 2 RS
2 1 032ABCVG 2 minutes 4 RS
3 1 032ABCVD 1 minute 2 RS
4 2 032ANNNN 3 minutes 6 RS
5 2 032AMMM 4 minutes 8 RS
6 3 033RRRRR 1 minute 2 RS
… … … .. …
Due to certain performance factors company wants to de-normalize the tables using pre-joining technique.
Table Information is given below:
• Assume 1:4 record count ratio between customer Info (master) and Call record detail (detail).
• Assume 15 million customers.
• Assume 10 byte Customer_ID.
• Assume 50 byte header for customer Info (master) and 80 byte header for Call record detail (detail) tables.
You are required to perform the following tasks:
• Calculate the Total space in GBs used with normalization.
• Calculate the Total space in GBs used after de-normalization.
Deadline:
Your assignment must be uploaded on VULMS on or before June 17, 2020. While June 18, 2020 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Re: CS614 Assignment 2 Solution and Discussion
Assignment No. 2
Semester: Spring 2020
CS614 – Data Warehousing
Total Marks: 15
Due Date:
June 17, 2020
Objectives:
After completing this assignment, the students will be able to:
• De-Normalize the given table using horizontal splitting technique
• Calculate the Total space used with normalization.
• Calculate the Total space used after de-normalization.
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
• Assignment is submitted after due date.
• Submitted assignment does not open or file is corrupt.
• Assignment is copied (From internet/ to from students).
• Assignment is submitted other than word format (.doc, .docx).
Assignment
Question No. 1
Consider the following table having the information of students of a university:
Student ID
Student Name
Campus ID
Student Age
Degree Program
1
Ali
VLHR01
27
MS
2
Kamran
VISB01
24
BS
3
Akmal
VRWP01
24
BS
4
Ahmad
VLHR01
26
MS
5
Rehan
VISB01
23
BS
6
Rizwan
VRWP01
29
MS
7
Umer
VISB01
25
BS
8
Javed
VLHR01
26
MS
You are required to completely de-normalize the above table using “horizontal splitting” on the basis of Degree Program.
Question No. 2
Consider the following normalized tables for a telecommunication company showing the daily call record details of customers:
Customer Info
Customer_ID
Customer Phone No.
Balance
1
033XXXXX
300
2
033YYYYY
250
3
033ZZZZZZ
300
4
033AAAAA
1000
5
033BBBBB
80
6
033CCCCC
554
…
… …
Call record detail
Call_ID
Customer_ID
Dialled Phone Number
Duration
Call Charges
1
1
032ABCVD
1 minute
2 RS
2
1
032ABCVG
2 minutes
4 RS
3
1
032ABCVD
1 minute
2 RS
4
2
032ANNNN
3 minutes
6 RS
5
2
032AMMM
4 minutes
8 RS
6
3
033RRRRR
1 minute
2 RS
… … … … …
Due to certain performance factors company wants to de-normalize the tables using pre-joining technique.
Table Information is given below:
• Assume 1:4 record count ratio between customer Info (master) and Call record detail (detail).
• Assume 15 million customers.
• Assume 10 byte Customer_ID.
• Assume 50 byte header for customer Info (master) and 80 byte header for Call record detail (detail) tables.
You are required to perform the following tasks:
• Calculate the Total space in GBs used with normalization.
• Calculate the Total space in GBs used after de-normalization.
Deadline:
Your assignment must be uploaded on VULMS on or before June 17, 2020. While June 18, 2020 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Re: CS614 Assignment 1 Solution and Discussion
Assignment No. 1
Semester: Spring 2020
CS614 – Data Warehousing
Total Marks: 20
Due Date:
June 01, 2020
Objectives:
After completing this assignment, the students will be able to:
• Identify Database entities from a given scenario
• Understanding and designing system’s constraints from given scenario
• Understand the database table structure
• Normalize a database table up to 2nd Normal Form (2NF)
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
o Assignment is submitted after due date.
o Submitted assignment does not open or file is corrupt.
o Assignment is copied (From internet/ to from students).
o Assignment is submitted other than word format (.doc, .docx).
Assignment
XYZ polyclinic is a well-reputed clinic providing good medical facilities in a posh area of Lahore. They also have a facility to admit a patient if the treatment requires. XYZ polyclinic was working manually for the last 10 years. You are hired by XYZ polyclinic management to design a database by reading the system’s requirements. Management of XYZ polyclinic decides to improve their system that would lead to convert the Database system to Data Warehouse for XYZ polyclinic in the future.
Database Requirements:
The organization is interested in storing name, patient’s CNIC, address, city, zip, province, country, phone number, referred by, admission date, and discharge date for storing patient’s information in the database. The referred by (would contain doctor’s id), admission date and discharge date attributes are only used in case of patient’s admission in a specific room. From the doctor’s point of view, they are interested to store doctor name, CNIC of doctor, address, city, zip, province, country, doctor’s phone number, area of specialization. We must store the treatment information that is suggested by the doctor for a specific patient which includes prescribed medicines. Medicine details must contain medicine id, name, dosage, and potency as attributes along with reference of treatment that is suggested by a doctor. If a patient would require getting admit in polyclinic then the room type (executive/common), phone extension and charges per day are going to be stored in the system as room attributes.
Some additional constraints about the system are as under;
One patient is referred to one/more doctor at a time.
Multiple patients would be examined by the doctor in a day
One patient may have multiple visits to a single doctor
Room capacity/facilities for attendants would not be handled at this time
One patient may use many medicines as suggested by doctor
Every time visit to a doctor may result in a change of medicine or admitted to the polyclinic
TASK to Perform and Submit:
Identify relevant entities, primary keys, foreign keys, proper attributes and relations as per 2NF for the above scenario and provide database/schema/table-structure in MS-Word format that is Normalized up to 2nd Normal Form.
Important Note:
There’s no need to implement the solution using any DBMS
Attribute Names should be clearly mentioned
Deadline:
Your assignment must be uploaded on VULMS on or before June 01, 2020. While June 02, 2020 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Assignment No. 03
Semester: Fall 2019
CS614: Data Warehousing Total Marks: 15
Due Date: 27-Jan-2020
Objective:
The objective of this assignment is to enhance the learning capabilities of the students about:
• Join Techniques
• Hash based join
• Sort-Merge join
Instructions:
Please read the following instructions carefully before submitting assignment:
You need to use MS word document to prepare and submit the assignment on VU-LMS.
It should be clear that your assignment will not get any credit if:
The assignment is submitted after due date.
The assignment is not in the required format (.doc or docx)
The submitted assignment does not open or file is corrupt.
Assignment is copied(partial or full) from any source (websites, forums, students, etc)
Assignment
Question
HyperMart is an online shopping store currently acquiring a large number of customers.
Normalized table structures for the shopping store are given below:
Order Table
Order_ID
Order_Date
100
2-Feb-2019
201
2-Feb-2019
300
4-Feb-2019
….
…
Order details Table
Detail_ID
Order_ID
Product_ID
Product_Quantity
Sale_Amount
100
100
2
1
100
101
100
3
2
200
200
201
5
1
50
201
201
7
1
90
300
300
15
1
600
301
300
56
1
800
302
300
57
1
850
….
…
…
….
…
Table Information
• Assume 1:12 record count ratio between Order and Order detail for online store’s database
• Assume 10,000 orders
Task
For the given relations you are required to calculate the following costs in terms of I/O operations and find out which joining technique is better for given scenario. Tasks you are required to perform are as under;
Cost of Sort-Merge Join
Cost of Hash-Join
On the basis of your calculations, suggest better joining technique between sort-merge and hash join for the given scenario
Best of Luck!
Total Marks 5
Starting Date Wednesday, January 15, 2020
Closing Date Thursday, January 16, 2020
Status Open
Question Title GDB-CS614
Question Description
Scenario
Pulse Globe Energy Limited (PGEL) is involved in the undertaking of gas exploration, development and production activities in Australia and Asia. PGEL has investments in upstream gas activities and electricity generation that complement wholesale energy contracts to support the retail customer base.
For one of the projects PGEL has already acquired the Seismic data which provides a “time picture” of subsurface structure to aid in gas exploration.
You are hired by the company as Data Analyst to suggest / identify the most desirable areas for gas exploration by using your analytical stills, as company would have to invest a huge budget of 1 million dollar for this task.
After reading the above scenario you are required to answer the following question:
Suggest the most suitable data mining technique for the given scenario and also support your answer with one valid reason (Not lengthy more than 2 – 3 lines).
Format of your answer would be as given below:
Technique Name: ________________________
1 Strong Reason to Choose: _______________________________________________________
Assignment No. 02
Semester: Fall 2019
CS614: Data Warehousing Total Marks: 10
Due Date: November 28, 2019
Objective:
The objective of this assignment is to enhance the learning capabilities of the students about:
• De-Normalization
• Pre-Joining
• Storage Issues of Pre-joining
Instructions:
Please read the following instructions carefully before submitting assignment:
You need to use MS word document to prepare and submit the assignment on VU-LMS.
It should be clear that your assignment will not get any credit if:
The assignment is submitted after due date.
The assignment is not in the required format (.doc or docx)
The submitted assignment does not open or file is corrupt.
Assignment is copied (partial or full) from any source (websites, forums, students, etc)
Assignment
Question
“Kare Pharma” is an online Medical store currently acquiring a large number of customers. To manage some performance issues this online Medical store requires to de-normalize its database using pre-joining technique for Prescription and Prescription details tables.
Normalized table structures are given below:
Prescription Table
Prescription_ID
Patient_Name
Doctor_Name
Prescription_Date
….
…
Prescription details Table
Transaction_ID
Prescription_ID
Med_ID
Med_Quantity
Sale_Amount
….
…
…
….
…
Table Information
• Assume 1:11 record count ratio between Prescription table as master table and Prescription details table for online Medical store’s database.
• Assume 10 million records in Prescription Table.
• Assume 10 bytes reserved for Prescription_ID in memory.
• Assume 40 bytes header for master table and 70 bytes header for details table.
Task
You are required to perform the following tasks:
Calculate the total space reserved in memory using normalization
Calculate the total space reserved in memory after de-normalization using pre-joining technique
Best of Luck!
Assignment No. 1
Semester: Fall 2019
CS614 – Data Warehousing
Total Marks: 15
Due Date:
November 14, 2019
Objectives:
After completing this assignment the students will be able to:
• Identify Database entities from a given scenario
• Understand the database table structure
• Normalize a database table up to 2nd normal form
• De-normalize relationships using collapsing table technique
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
o Assignment is submitted after due date.
o Submitted assignment does not open or file is corrupt.
o Assignment is copied (From internet/ to from students).
o Assignment is submitted other than word format (.doc, .docx).
Assignment
Question No. 1
Consider the following schema related to a Social Media website named as ‘userPosts’. You have to perform following tasks related to the provided schema:
1- Identify appropriate keys for following structure (Primary and/or foreign key(s))
2- Convert this schema into 2 NF
userPosts (userID, userName, password, address, postId, postDate, postContent)
Question No. 2
Consider the following schemas relevant to a hotel booking website. You are required to De-Normalization the given schemas using Collapsing Tables Technique.
roomVisitor (roomID, visitorCNIC, dateTime)
roomCharges (roomID, spentDays, roomRent)
Deadline:
Your assignment must be uploaded on VULMS on or before November 14, 2019. While November 15, 2019 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Total Marks 5
Starting Date Monday, July 15, 2019
Closing Date Tuesday, July 16, 2019
Status Open
Question Title GDB - CS614
Question Description
Scenario
“Choice Restaurant” is a chain of restaurants having five branches across the country. This restaurant typically deals in fast food and has 40 to 100 customers a day in each branch.
The Management of this company has introduced an Online Transaction Processing (OLTP) system to handle the daily transactions. The Management is very concerned for cutting up the cost and attain customer satisfaction, therefore currently there is only one computer system per branch which is handling all the computing tasks (including transaction processing) but in future they have plans to induce DSS.
To speed up the online transaction processing the database manager is compelling the management to implement parallelism for this OLTP system during business hours for daily sales calculations.
Management is yet to decide that whether they should invest on the parallelism or not.
GDB Question:
Keeping the above scenario in mind do you think that implementing parallelism in above situation is a good option? Justify your answer with appropriate reason(s) in either case.
GDB Answer Template:
Choice: Parallelism used or not
Points of Justification:
Point 1:
Point 2:
Important Notes:
NO GDB is accepted via e-mail in any case
Lengthy replies of GDB will cause in deduction of marks. So you write your answer precisely within 3 to 5 lines in 2 points along with choice.
If you would not mention your choice at start of your answer, your GDB points would not be marked.
Spring 2019_CS614_1.doc
Assignment No. 1
Semester: Spring 2019
CS614 – Data Warehousing
Total Marks: 20
Due Date:
May 15, 2019
Objectives:
After completing this assignment the students will be able to:
• Identify Database entities from a given scenario
• Understand the database table structure
• Normalize a database table up to 3rd normal form
Instructions
Please read the following instructions carefully before submitting assignment:
It should be clear that your assignment will not get any credit if:
o Assignment is submitted after due date.
o Submitted assignment does not open or file is corrupt.
o Assignment is copied (From internet/ to from students).
o Assignment is submitted other than word format (.doc, .docx).
Assignment
XYZ company was established in 1989 in Karachi and deals in the business of garments export. There are 250 permanent employees that are working with XYZ company. XYZ’s admin department is maintaining employee’s leave records in hard form. The leave form currently in practice is shown below:
Now XYZ company wants you to design a Database solution for their organization, you are required to develop a database that is normalized up to the 3rd Normal form. In the future, this database can be used for developing front end application.
A raw table structure along with attributes are provided by the XYZ’s management to start your working on it, is as following;
employeeLeaveForm (employeeID, empName, designation, officeName, dateOfFill, leaveType, leaveFromDate, leaveToDate, leaveReason, leaveAddress, leaveContactNo, leaveProcessDate, leaveStatus, leaveProcessedBy),
Sample data for the table is provided on VULMS for your reference. To view / download this data, use the following link;
https://vulms.vu.edu.pk/Courses/CS614/Downloads/CS614 - SP 2019 - Assignment 1 - Sample Data.xlsx
You have to use exact given attributes names while normalizing this table to 3rd Normal form.
Instructions to solve assignment:
Use table employeeLeaveForm and attributes in it. You are not allowed to remove any of the provided attributes
You may introduce new tables by using the same attributes in employeeLeaveForm and also add new attributes for indexing / normalization purpose
You are also required to declare Primary and Foreign Keys from the provided list of attributes (Also represent them with suitable DB notation while writing table schema)
Transform the table in 3rd normal form step by step i.e. First Normal form transformation then into 2nd normal form and then into 3rd. Also, mention the title of Normal Form before the transformation.
Direct transform relation into 3rd normal form would result in zero marks whether your solution is correct. And this will be imposed strictly while marking this assignment with no later excuses.
Example of resultant tables / solution should be in the form provided below:
tableNo1 (attribute1, attribute2, …)
tableNo2 (attributeX, attributeY, …)
tableNo3 (attributeI, attributeX, …)
attribute1, attributeX and attributeI (with solid underline) are primary keys in tableNo1, tableNo2 and tableNo3 respectively while attribute (With dotted underline) is foreign key in tableNo3 from referee tableNo2.
Deadline:
Your assignment must be uploaded on VULMS on or before May 15, 2019. While May 16, 2019 will be a bonus day for assignment submission. After the bonus day, no assignment would be entertained via email.
Normalization effects performance?
-
-
True
-
False
-
-
Full normalisation will generally not improve performance, in fact it can often make it worse but it will keep your data duplicate free. In fact in some special cases I’ve denormalised some specific data in order to get a performance increase. Normalization comes from the mathematical concept of being “normal.”
Source


