What is Privacy? Why should we care about that? How is my privacy endangered? All of these are the burning questions asked by people daily.
Privacy is a person’s fundamental right to keep their personal data hidden from the world, be it a known person, anonymous user, or server. The data can be anything from financial to medical, from trade secrets to government plans, and from marketing materials to legal documents.
Privacy is essential to avoid the use of malpractice of this confidential data. But, in today’s technology-driven world preserving privacy has become a bone of contention. Digital information is collected everywhere. By the government, the corporates, the hospitals, and any other sector. Due to business and development drives, there is exchange and publication of this data among different parties. This makes the data vulnerable and available over many platforms.
So, to minimize the attacks the new filed of Preserving-Privacy Data Publishing have been developed. It provides answers to questions about how to publish required data while preserving user’s privacy and without violating privacy policies. There has not been a clear understanding about the origin of this model. However, Randomisation is the primary approach (Fung, 2010).
In the Preserving-Privacy Data Publishing model, the publisher creates tabular data with specific attributes namely — Explicit Identifiers, Quasi Identifiers, Sensitive Attributes, and others (Wang, 2019). Anonymization happens here, identity and sensitive data-owners are hidden.
The present Preserving-Privacy Data Publishing techniques are (Abdalaal, 2012):
- k-anonymity: This feature makes the (k)th record different from the remaining (k-1) records.
- l-diversity: This feature is up when there are minimum l values available for a particular sensitive attribute.
- t-closeness: t-closeness is measured when the distance between attributes in an equivalent class is less than some decided threshold.
These techniques have secured the dataset. For implementation, the data is collected from multiple users and then integrated using the techniques mentioned above. This transforms the data to an anonymous form before publishing, such that untrusted users couldn’t easily identify an individual from the available dataset table.
Today, Privacy and Publishing are qualitative terms and one cannot only achieve Privacy or Publishing by compromising the other. Countries like Russia, China, Sri Lanka, India are relying on the PPDP model for publishing their records such as the Development Index data, data tally, medical records, and social network interactions.
However, PPDP scheme still have some loopholes. As it presents algorithms that sanitize data, these algorithms have high computational complexity due to large dataset. This causes a large computation time and till then the natural data collected gets somewhat modified. Another loophole is the probability of re-identification risk, which is caused due to integration of collected data.
In near future, the loopholes mentioned can surely be addressed. But for now, we can rely on the Preserving-Privacy Data Publishing model because it’s advantages overcomes loopholes. The high complexity algorithms make it time consuming for attackers to break into database. And even after breaking in, one cannot distinguish the particular dataset.
Posted By : Saloni Gupta