Big Data Privacy in Biomedical Research
DOI:
https://doi.org/10.55524/Keywords:
Data Privacy, Biomedical ResearcH, Data Security, Bioethics, Genome AnalysisAbstract
The examination of patient data, which may contain personally identifiable information, is a common part of biomedical research. If these data are misused, it could result in the disclosure of private patient information, which would put the patients' right to privacy at risk. The challenge of protecting the privacy of patients in an era dominated by big data has garnered a growing amount of attention in recent years. There have been a lot of different privacy approaches created to protect against different attack models. In the context of research in biomedicine, this publication provides a review of pertinent subjects. It is discussed how technology can protect privacy, particularly in relation to record linking, synthetic data production, and the privacy of genomic data. In addition to this, we conduct an analysis of the ethical implications of the privacy of big data in biomedicine and we emphasise the obstacles that lie ahead for future research pathways aimed at strengthening data privacy in biomedical investigations. Both of these topics are covered in detail throughout this article. After the paper was first published, it was highlighted in the publication Biomedical Research.
Downloads
References
Health Information Technology for Economic and Clinical Health. 2010.
L. Slaughter, Genetic Information Nondiscrimination Act of 2008, vol. 50. HeinOnline, 2008, p. 41.
“Health Insurance Portability and Accountability Act (HIPAA).” [Online]. Available: http://www.hhs.gov/ocr/hipaa.
D. Lafky, “The Safe Harbor method of de-identification: An empirical test,” Fourth Natl. HIPAA Summit West, 2010. [5] D. McGraw, “Why the HIPAA privacy rules would not adequately protect personal health records: Center for Democracy and Technology (CDT) brief,” 2008. [Online]. Available: http://www.cdt.org/brief/why-hipaa-privacy
rules-would-not-adequately-protect-personal-health records. [Accessed: 20-Sep-2015].
K. Benitez and B. Malin, “Evaluating re-identification risks 103.with respect to the HIPAA privacy rule,” J. Am. Med.Informatics Assoc., vol. 17, no. 2, pp. 169–177, 2010.
P. Kwok, M. Davern, E. Hair, and D. Lafky, “Harder than you think: a case study of re-identification risk of HIPAA compliant records,” Chicago NORC Univ. Chicago. Abstr.,vol. 302255, 2011.
L. Sweeney, “Data sharing under HIPAA: 12 years later,” in Workshop on the HIP A A Privacy Rule’s De-Identification Standard, 2010.
S. J. Nass, L. A. Levit, and L. O. Gostin, Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. The National Academies Press, 2009.
X. Jiang, A. D. Sarwate and L. Ohno-Machado, "Privacy technology to support data sharing for comparative effectiveness research: A systematic review", Med. Care, vol. 51, no. 8 Suppl 3, pp. S58-565, Aug. 2013.
B. A. Bernhardt, E. S. Tambor, G. Fraser, L. S. Wissow and G. Geller, "Parents’ and children's attitudes toward the enrollment of minors in genetic susceptibility research: Implications for informed consent", Amer. J. Med. Genetics
Part A, vol. 116, no. 4, pp. 315-323, 2003.
A. L. McGuire et al., "To share or not to share: A randomized trial of consent for data sharing in genome research", Genetics Med., vol. 13, no. 11, pp. 948-955, 2011.
N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V Pearson, D. A. Stephan, S. F. Nelson, and D. W. Craig, “Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high
density SNP genotyping microarrays.,” PLoS Genet., vol. 4, no. 8, p. e1000167, Aug. 2008.
M. Humbert, E. Ayday, J.-P. Hubaux, and A. Telenti, “Addressing the concerns of the lacks family: Quantification of kin genomic privacy,” in Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, 2013, pp. 1141–1152.
A. L. McGuire, T. Caulfield, and M. K. Cho, “Research ethics and the challenge of whole-genome sequencing.,” Nat. Rev. Genet., vol. 9, no. 2, pp. 152–6, Feb. 2008.
W. J. Dondorp and G. M. W. R. de Wert, “The ‘thousand dollar genome’: an ethical exploration,” Eur. J. Hum. Genet., vol. 21, pp. S6–S26, 2013.
S. Sankararaman, G. Obozinski, M. I. Jordan, and E. Halperin, “Genomic privacy and limits of individual detection in a pool.,” Nat. Genet., vol. 41, no. 9, pp. 965–7, Sep. 2009.
E. Ayday, J. L. Raisaro, U. Hengartner, A. Molyneaux, and J.- P. Hubaux, “Privacy-Preserving Processing of Raw Genomic Data,” Data Priv. Manag. Auton. Spontaneous Secur., vol. 8247, pp. 133–147, 2014.
S. Wang, Y. Zhang, W. Dai, K. Lauter, M. Kim, Y. Tang, H. Xiong, and X. Jiang, “HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS.,” Bioinformatics, vol. 32, no. 2, pp. 211–8, Jan. 2016.
K. Lauter, A. López-Alt, and M. Naehrig, “Private computation on encrypted genomic data,” in 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy . ht tp://seclab. soic. indiana.edu/GenomePrivacy/papers/Genome% 20Privacy
paper9. pdf.(29 July 2014, date last accessed), 2014. [21] M. Kim and K. Lauter, “Private genome analysis through homomorphic encryption.,” BMC Med. Inform. Decis. Mak., vol. 15 Suppl 5, no. Suppl 5, p. S3, Dec. 2015. [22] C. Dwork, F. McSherry, K. Nissim and A. Smith, "Calibrating noise to sensitivity in private data analysis", Theory Cryptography, vol. 3876, no. 1, pp. 265-284, 2006. [23] M. Togan and C. Plesca, “Comparison-based computations over fully homomorphic encrypted data,” in
Communications (COMM), 2014 10th International Conference on, 2014, pp. 1–6.
T. Graepel, K. Lauter, and M. Naehrig, “ML confidential: Machine learning on encrypted data,” in Information Security and Cryptology--ICISC 2012, Springer, 2013, pp. 1–21.
Z. Huang, E. Ayday, J. Fellay, J.-P. Hubaux, and A. Juels, “GenoGuard: Protecting Genomic Data against Brute-Force Attacks,” in 36th IEEE Symposium on Security and Privacy, 2015.
G. Danezis, “Simpler Protocols for Privacy-Preserving Disease Susceptibility Testing,” in 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy (GenoPri’14), 2014.
Y. Zhang, M. Blanton, and G. Almashaqbeh, “Secure distributed genome analysis for GWAS and sequence comparison computation.,” BMC Med. Inform. Decis. Mak., vol. 15 Suppl 5, no. Suppl 5, p. S4, Dec. 2015.
M. Naehrig, K. Lauter and V. Vaikuntanathan, "Can homomorphic encryption be practical?", Proc. 3rd ACM Workshop Cloud Comput. Secur. Workshop, 2011.
S. D. Constable, Y. Tang, S. Wang, X. Jiang, and S. Chapin, “Privacy-preserving GWAS analysis on federated genomic datasets,” BMC Med. Inform. Decis. Mak., vol. 15, no. Suppl 5, p. S2, Dec. 2015.
B. A. Malin, “Protecting genomic sequence anonymity with generalization lattices.,” Methods Inf. Med., vol. 44, no. 5, pp. 687–92, Jan. 2005.
G. Loukides, A. Gkoulalas-Divanis, and B. Malin, “Anonymization of electronic medical records for validating genome-wide association studies,” Proc. Natl. Acad. Sci. U. S. A., vol. 107, no. 17, pp. 7898–7903, 2010.
F. Yu, M. Rybar, C. Uhler, and S. E. Fienberg, “Differentially- Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases,” in Privacy in Statistical Databases, vol. 8744, J. Domingo-Ferrer, Ed. Cham: Springer International Publishing, 2010, pp. 170– 184.
A. Johnson and V. Shmatikov, “Privacy-preserving data exploration in genome-wide association studies,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’13, 2013, p. 1079.
C. Uhler, A. B. Slavkovic, and S. E. Fienberg, “Privacy preserving data sharing for genome-wide association studies,” J. Priv . Confidentiality , vol. 5, no. 1, pp. 137–166, 2013.
A. K. Elmagarmid, P. G. Ipeirotis and V. S. Verykios, "Duplicate record detection: A survey", IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
[36] L. Getoor and A. Machanavajjhala, "Entity resolution: Theory practice & open challenges", Proc. VLDB Endow., vol. 5, no. 12, pp. 2018-2019, Aug. 2012.
F. Yu, S. E. Fienberg, A. B. Slavković, and C. Uhler, “Scalable privacy-preserving data sharing methodology for genome- wide association studies.,” J. Biomed. Inform., vol. 50, no. 50C, pp. 133–141, Feb. 2014.
F. Yu and Z. Ji, “Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies: An Application to iDASH Healthcare Privacy Protection Challenge,” BMC Med. Informatics Decis. Mak. [submitted], 2014.
Y. Zhao, X. Wang, X. Jiang, L. Ohno-Machado, and H. Tang, “Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery.,” J. Am. Med. Inform. Assoc., vol. 22, no. 1, pp. 100–8, Jan. 2015.
D. Chen and H. Zhao, “Data Security and Privacy Protection Issues in Cloud Computing,” 2012 Int. Conf. Comput. Sci. Electron. Eng., vol. 1, no. 973, pp. 647–651, 2012.
M. Yakout, M. J. Atallah and A. Elmagarmid, "Efficient and practical approach for private record linkage", J. Data Inf. Quality, vol. 3, no. 3, pp. 5:1–5:28, Aug. 2012.
A. Al-Lawati, D. Lee and P. McDaniel, "Blocking-aware private record linkage", Proc. 2nd Int. Workshop Inf. Quality Inf. Syst., pp. 59-68, 2005.
H.-C. Kum, A. Krishnamurthy, A. Machanavajjhala, M. K. Reiter and S. C. Ahalt, "Privacy preserving interactive record linkage (PPIRL)", J. Amer. Med. Inf. Assoc., vol. 21, no. 2, pp. 212-220, 2014.
B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors", Commun. ACM, vol. 13, no. 7, pp. 422- 426, Jul. 1970.
M. Kuzu, M. Kantarcioglu, E. Durham and B. Malin, "A constraint satisfaction cryptanalysis of bloom filters in private record linkage", Proc. 11th Int. Symp. Privacy Enhancing Technol., vol. 6794, pp. 226-245, 2011.
E. Durham et al., "Composite Bloom filters for secure record linkage", IEEE Trans. Knowl. Data Eng., vol. 26, no. 12, pp. 2956-2968, Dec. 2014.
M. Scannapieco, I. Figotin, E. Bertino and A. K. Elmagarmid, "Privacy preserving schema and data matching", Proc. 2007 ACM SIGMOD Int. Conf. Manage. Data, pp. 653-664, 2007.
M. Kuzu, M. Kantarcioglu, A. Inan, E. Bertino, E. Durham and B. Malin, "Efficient privacy-aware record integration", Proc. 16th Int. Conf. Extending Database Technol., pp. 167- 178