NOTE: Article in Progress

Introduction

Depending on the domain, acquiring data is a challenging task. Even in the internet age, automated mechanisms seem to be the dominant way to acquire data, but there are several fields where data is still acquired manually. Manual data acquisition usually involves field visits, face-to-face conversations, interviews, etc. However, automated methods are used where direct human intervention is impossible and not feasible due to socio-economic reasons. It is essential to understand the characteristics and limitations of data acquisition tools before using them.

Data availability

We usually encounter the following situations in the field of data science:

In the first case, we are aware that the data is already available, somewhere, but we may or may not have access to it. In other words, the data sources are known to us, but we need to investigate how to get access the data for our needs. Usually, this problem is referred to as data extraction, since we require an approach to extract the data from the source. The analogy with the mines could be interesting here. This is to say, that we know that the desired mineral(s) are available at a given location; however we require some ways to extract it.

In the second case, the data is not available at all. Therefore, we require some ways to acquire the data. However, we have some data availabile in the third case which needs to be complemented with the data from supplmentary sources. This are the two cases, where data acquisition plays a part.

Common approaches

In the field of data science, some of the most commonly used approaches for data acquisition include the following:

  1. Surveys
    • Manual surveys
    • Online surveys
  2. Sensors
  3. Social networks
  4. Video surveillance cameras
  5. Web

Sensors

Sensors are commonly used for automated data acquisition. Temperature, pressure, rainfall, humidity, luminosity sensors are placed at strategic locations. Some of these sensors come along with memory devices that can store data for quite a long period of time, whereas other group of sensors do not have any or possesse very limited storing capacity. Such low-memory sensors are installed in places with network connectivity so that they can send measurement values periodically to data centers.

Privacy, licence and ethics

Privacy, ethics

Licence

References

  1. Data acquisition
  2. ...