Getting Hep on HIPAA
Privacy Rule's Impact on Medical Research
Data Management


       HIPAA, Health Insurance Portability and Accountability Act of 1996, requires health plans, health care clearinghouses, and health care providers who handle protected health information (PHI) to comply with HIPAA, including its Standards for Privacy of Individually Identifiable Health Information (the "Privacy Rule").   PHI is any information in any media about a person's medical or mental health -- past, present and future -- along with provision or payment for that care. But more than that, anyone who comes into contact with PHI, employee and contractor alike, must understand HIPAA.  

    Compliance isn't optional; the bill includes both civil and criminal penalties.  The first phase of HIPAA became effective April 14, 2003.  These pages are aimed mostly at my clients, as well as other contract data managers, data analysts and statistical programmers, people like myself who generally do not do heavily regulated research and may accidently fall into an information vacuum.  DISCLAIMER:  Please take these pages with a large grain of salt and not as "professional" or legal advice.


Contents/FAQ's

Is My Data De-Identified? (The 18 Identifiers)

    If your datasets contain ANY of the18 Identifiers listed below, associated with health care data (aka protected health information or individual health information, the buzzword used locally) you must comply with HIPAA.  Which rules depends upon what you're using the data for and where it came from.  Put another way, if all of these items have been removed, then the dataset has been de-identified (aka the safe harbor method) and may be exempt from those requirements.  The other way to have de-identified data is for

uses accepted scientific principles and methods to determine that the risk is very small that the data could be used to identify the individual. This person needs to document the method used to justify this conclusion.

somebody with the appropriate skills to vouch that you can't identify someone from your dataset.  HOW exactly you're supposed to do that is the $64K question.

    Please realize this is a simplified view of the rules.  Check with your IRB for their requirements.  Things change and your institution or state may have additional constraints.

  PI's may need to certify on paper that their data does not include any of the magic items.  For example, here's New York University School of Medicine's HIPAA De-Identification Certification Form .  

    Limited data sets have had most of these items removed except #3 - Dates, and the street address portion of #2.  


The 18 Identifiers
 1. Names
 2. All geographic subdivisions smaller than a State, including:  Street address, city, county, precinct, zip code and equivalent geocodes  (NOTE:  You can keep the first 3 digits of the zip code IF they represent an area larger than 20,000 people)
 3. Dates (except year) directly related to an individual:  4. Telephone numbers
 5. Fax numbers
 6. E-mail addresses
 7. Social Security numbers
 8. Medical record numbers
 9. Health plan beneficiary numbers
10. Account numbers
11. Certificate/License numbers
12. Vehicle identifiers & serial number, including license plate numbers and VIN's
13. Device identifiers & serial numbers
14. Web universal resource locators (URLs)
15. Internet protocol (IP) addresses
16. Biometric identifiers, including finger and voice prints
17. Facial photographs and any comparable images
18. Any other unique identifying number, characteristic or code



Sources
    Go to Google , enter 'HIPAA de-identified safe harbor' and you'll find sources with the same list.  Here's a short selection:

Updated 3/20/03     Back to Top

How Do I "De-Identify" A Dataset?

    Why are you de-identifying the dataset?  Are you handing it off to another person or project to use, or are you archiving the data at the end of the study?    If you're just sanitizing one copy of the dataset that's a different challenge from hunting down all the data generated by the project.

Confirm with your PI's which technique they want you to use, either #1 "Strip Out All ID's" or #2 Mystery Method.  Also, confirm that what they didn't really want is a "limited dataset" where you can still keep dates and the address, except for the street portion.  Do they want to leave in a re-identifier?  This part is confusing -- the U of M says the recipient can't know what the re-identification code is.  Hmm.  So where DOES that code go?  The joys of new, confusing regulations; like I said, don't take this as professional advice but rather a heads-up on things to learn about.

    Technique #1:  "Safe Harbor"/Strip Everything Out

    Familiarize yourself with the 18 identifiers .  The kicker is #18 -- do you have any data that in combination could pinpoint an individual?  This may pop up if you work with rare diseases, in smaller towns or unusual populations where if you put A, B and C together a savvy person could figure out this record is from Fred Flintstone.  You must delete out all the pieces, which adds up to way more than 18 items depending upon your dataset.

    Technique #2:  Mystery Method

    Supposedly someone using the appropriate skills and scientfic method can vouch that you can't identify the patient from the remaining data in the dataset.  When I find references on how exactly you do this, I'll add them here.
   

Updated 3/21/03     Back to Top

Learning More About HIPAA


Tips for Better Security



Last updated January 8, 2004.   Updated sporadically.   / calbright@visi.com     Back to Top    Back to Carol's Home Page