⬅ Back to Publications

📥 Collecting Data

The Foundation of Descriptive Analytics

Collecting data is the first and most critical step in any data analytics process. Without reliable, comprehensive, and well-structured data, even the most advanced models and analyses cannot produce meaningful insights. The goal of this stage is to gather raw information that truly represents the phenomena we want to analyze.

🔎 Typical Sources of Data

Data Collection Sources

⚙️ ETL (Extract, Transform, Load) in Data Collection

Data Collection ETL Process

In professional environments, data collection is often part of an ETL process:

  1. Extract: Retrieve raw data from multiple sources (databases, APIs, flat files).
  2. Transform: Clean, normalize, and format data into a consistent structure.
  3. Load: Store the processed data in a target system, such as a data warehouse or cloud platform, ready for analysis.

Tools commonly used in ETL processes include:

💡 Best Practices

Data Collection Management

📊 Example: Collecting Traffic Data

For a project predicting traffic accidents, data may come from multiple sources:

Integrating these sources through an ETL pipeline ensures analysts can work with a unified and consistent dataset.

📖 Back to Publications