Traditional Data Collection

Defining Sensitive Data

Understanding what constitutes sensitive data is fundamental for organizations aiming to comply with regulatory frameworks and ensure robust data protection. Sensitive data, as defined by various regulatory bodies, involves types of information that, if compromised, could lead to significant harm or discrimination against individuals. This chapter will delve into the specifics of sensitive data categories guided by the General Data Protection Regulation (GDPR) and other relevant frameworks.

Personal Information: Names, addresses, email addresses, phone numbers, and other identifying details.
Financial Data: Credit card numbers, bank account information, and transaction history.
Health Data: Medical records, genetic information, and biometric data.
Biometric Data: Fingerprints, facial recognition, and voice patterns.
Racial or Ethnic Origin: Information about an individual's race, ethnicity, or nationality.
Political Opinions: Information about an individual's political beliefs or affiliations.
Religious or Philosophical Beliefs: Information about an individual's religion or philosophical beliefs.
Trade Union Membership: Information about an individual's membership in a trade union.
Sexual Orientation: Information about an individual's sexual orientation.
Criminal Convictions: Information about an individual's criminal history.

Traditional Data Collection Models

The methodologies for collecting sensitive data have evolved considerably over the years. Traditional data collection models serve as the foundation for understanding current practices and the inherent vulnerabilities that necessitate robust data protection measures. This chapter outlines some of the key traditional methods organizations have employed to gather and manage sensitive data.

Centralized databases
represent one of the earliest and most widely-used methods for data collection and storage:
- Single Repository: Sensitive data is stored in a unified location, simplifying data management.
- Organizational Control: The organization maintains direct control over data access and security measures.
- Advantages: Simplifies data management, facilitates integrated analytics, and ensures consistent data governance.
- Disadvantages: Presents a single point of failure, making it a prime target for cyberattacks and data breaches.
Data brokers
specialize in collecting and aggregating data from various sources to sell to third parties:
- Aggregated Data Sources: These entities compile data from public records, online activities, purchases, and other available sources.
- Commercial Transactions: Data is sold to businesses seeking targeted marketing, risk assessment, and other purposes.
- Advantages: Provides organizations with extensive datasets without investing in their own data collection efforts.
- Disadvantages: Raises ethical and privacy concerns, and often lacks transparency about data origins and consent.
Third-Party Data Collection
Relying on third-party services for data collection has become a common practice among organizations:
- Outsourced Services: Third-party providers, such as analytics services and advertising platforms, collect data on behalf of the organization.
- Specialized Expertise: These providers offer advanced tools and methodologies for efficient data collection and analysis.
- Advantages: Reduces operational burden, leverages specialized expertise, and allows access to comprehensive data analytics tools.
- Disadvantages: Complicates data governance, introduces dependency risks, and may engender compliance issues due to varying data protection standards.
Cookies and Tracking

Websites and applications often utilize cookies and tracking technologies to gather data on user behavior and preferences:
- Cookies: Small text files stored on users’ devices, tracking their online activities, preferences, and session information.
- Tracking Technologies: Includes web beacons, pixels, and fingerprinting methods to monitor user interactions across multiple sessions.
- Advantages: Enhances user experience through personalization, provides valuable insights for targeted marketing, and tracks user engagement.
- Disadvantages: Invokes significant privacy concerns, necessitates transparent consent mechanisms, and faces increasing regulatory scrutiny.

Data Lifecycle

Understanding the data lifecycle is crucial for organizations to manage sensitive data responsibly, ensuring privacy and compliance with regulatory frameworks. The data lifecycle encompasses various stages, each with its own set of processes and challenges. This chapter delves into the typical stages of the data lifecycle and highlights a critical issue related to data collection and context retention.

Collection: Data is gathered from various sources, including user input, public records, and third-party providers.
Storage: Data is stored in databases, data warehouses, or other storage systems.
Processing: Data is analyzed, transformed, and used for various purposes, such as marketing, personalization, and decision-making.
Sharing: Data may be shared with third parties, such as business partners or service providers.
Disposal: Data is eventually deleted or anonymized when no longer needed.

A critical issue arises at the very beginning of this lifecycle. While data is often collected through specific interactions (e.g., filling out a form), the link to this original source is frequently lost. This means that data becomes detached from its context, making it difficult to trace its origin and for individuals to exercise control over its usage.

Data Ownership and Control

In the traditional model, organizations typically assert ownership over the data they collect. This grants them significant control over data usage, sharing, and retention. Individuals have limited rights to access, modify, or delete their data.

Privacy Concerns

Traditional data collection practices raise substantial privacy concerns:

Data Breaches: Centralized databases are vulnerable to cyberattacks, potentially exposing sensitive information to unauthorized parties.
Unauthorized Access: Employees, contractors, and third-party service providers may have access to sensitive data, increasing the risk of misuse.
Data Retention: Organizations often retain data for extended periods, even when it is no longer necessary, creating ongoing privacy risks.
Lack of Transparency: Individuals often lack visibility into how their data is collected, used, and shared.
Data Monetization: The practice of collecting and selling user data for commercial purposes raises ethical concerns.

To address these challenges, we propose a new approach based on the concept of Data Identities (DIDs).

A DID represents a unique identifier linked to a specific data collection event. It acts as a container for associated data and provides individuals with granular control over their information. By establishing a clear link between data and its origin, DIDs enable greater transparency, accountability, and user empowerment.

Revision #1
Created 4 August 2024 11:35:01 by Thorsten Zoerner
Updated 4 August 2024 11:46:21 by Thorsten Zoerner