Questions: First session
- Year: 2018-2019
- Duration: 2 hours
- Total: 15 points
- Documents: allowed
- Type of allowed documents: All documents allowed
- Electronic devices : not allowed
Question 1.a
What are the different ways to acquire data for the purpose of data analysis? (1 point)
Question 1.b
Online survey is one way to obtain feedback for projects and products. However, we still see persons asking us questions in commercial malls and sometimes doing door to door surveys. Why do you think that manual or face to face surveys are still important? (1 point)
Question 2.a
What are ACID constraints? Which of these constraints were relaxed by NoSQL data stores and why? (1 point)
Question 2.b
What are the different types of NoSQL data stores? Briefly explain them. (1 point)
Question 3
What is data visualization? Why data visualization methods are needed? Explain with some examples. (1 point)
Question 4
What are the goals of Data Mining? (1 point)
Question 5.a
What is a classifier? What are the different types of classifier? (1 point)
Question 5.b
How do you evaluate and compare the efficiency of a classifiers? (1 point)
Question 5.c
What are the different clustering methods that you used in your practical sessions? What are their advantages and limitations? (1 point)
Question 6
Consider a CSV file containing the following columns: PhotographId, City, Year, and ViewCount. It contains the detailed information about photographs on a photography website: PhotographId: unique identifier of an image, City: the city where the photograph was taken, Year: the year in which the photograph was taken and ViewCount: the number of times, the photograph was viewed on this website. Your goal is to write a Python program (preferably using pandas library) that can read this CSV file and perform the following:
- Find the most viewed and least viewed photograph
- Find the city with the maximum and least number of photographs
- Find the year with the highest number of photograph
- For every city, calculate the average number of views for photographs in the year 2018
(2.5 points)
Question 7.a
What is an artificial neural network? (1 point)
Question 7.b
Why do you think reinforcement learning is relevant for internal and outdoor navigation by robots? (1 point)
Question 8
An annotation website asked 10 users to describe a picture using 5 hashtags.Given below is a table detailing user’s use of hashtags for describing this 1 picture. The table consist of 5 columns and 10 rows. Each row correspond to one user. Each column corresponds to one hashtag and the column values consists of 0 and 1. If a value is 0, the user did not use the hashtag and if the value is 1, the user used the hashtag. Find all possible association rules from this table. What do you conclude about this picture? (1.5 points)
User | #Architecture | #Nature | #Paris | #StreetArt | #Fractals |
U1 | 1 | 0 | 0 | 1 | 0 |
U2 | 1 | 1 | 1 | 1 | 1 |
U3 | 1 | 0 | 0 | 1 | 0 |
U4 | 1 | 1 | 1 | 1 | 1 |
U5 | 0 | 1 | 0 | 0 | 1 |
U6 | 0 | 1 | 1 | 1 | 0 |
U7 | 0 | 0 | 0 | 0 | 0 |
U8 | 0 | 0 | 0 | 0 | 0 |
U9 | 0 | 1 | 1 | 1 | 1 |
U10 | 1 | 0 | 0 | 1 | 0 |