HIPAA, State Laws & De-identified Health Data for AI Research — FAQ

Question 1

What is HIPAA and whom does it cover?

Accepted Answer

The Health Insurance Portability and Accountability Act (HIPAA) is a federal law enacted in 1996 that established national standards for protecting sensitive patient health information. HIPAA covers three types of "covered entities":

Healthcare providers who transmit health information electronically in connection with certain transactions
Health plans including health insurance companies, HMOs, company health plans, and government programs like Medicare and Medicaid
Healthcare clearinghouses that process health information from nonstandard formats to standard formats

Additionally, HIPAA extends to "business associates" — persons or entities that perform functions or activities on behalf of covered entities that involve access to protected health information.

Question 2

What is Protected Health Information (PHI)?

Accepted Answer

Protected Health Information (PHI) is individually identifiable health information held or transmitted by a covered entity or business associate. PHI includes:

Demographic information (name, address, birth date, Social Security Number)
Medical histories, diagnoses, treatment information, and prognoses
Payment and billing information related to healthcare services
Any other information that identifies an individual and relates to their health status, healthcare provision, or healthcare payment

PHI can exist in any form: electronic, paper, or oral.

Question 3

What information is NOT protected by HIPAA?

Accepted Answer

HIPAA does not protect de-identified health information (information that cannot be linked to an individual), health information not held by covered entities or business associates, employment records held by employers in their capacity as employers, or education records covered by FERPA. Once properly de-identified, health information is no longer subject to HIPAA's Privacy Rule restrictions and can be used or disclosed freely without patient authorization.

Question 4

What are the main HIPAA Rules relevant to data sharing?

Accepted Answer

The two main HIPAA Rules relevant to data sharing are: (1) The Privacy Rule (45 CFR Part 164, Subparts A and E), which establishes national standards for protecting PHI and governs when and how covered entities can use and disclose PHI; and (2) The Security Rule (45 CFR Part 164, Subparts A and C), which establishes standards for protecting electronic PHI through administrative, physical, and technical safeguards.

Question 5

What is the difference between "use" and "disclosure" under HIPAA?

Accepted Answer

Under HIPAA, "use" refers to the internal employment, application, or utilization of PHI within a covered entity — for example, a hospital using patient records for internal quality improvement. "Disclosure" refers to the release, transfer, or providing access to PHI outside the covered entity — for example, a hospital sharing patient data with an AI research company. HIPAA's restrictions on disclosure are generally more stringent than those on internal use. This distinction is critical when evaluating data sharing arrangements.

Question 6

What does "de-identified" mean under HIPAA?

Accepted Answer

Under 45 CFR § 164.514(a), health information is considered de-identified — and therefore not PHI — when it does not identify an individual and there is no reasonable basis to believe that the information can be used to identify an individual. Once properly de-identified, the information is no longer subject to HIPAA's Privacy Rule restrictions and can be used or disclosed freely without patient authorization.

Question 7

What are the two HIPAA-approved de-identification methods?

Accepted Answer

HIPAA provides two methods for achieving de-identification:

Expert Determination Method (45 CFR § 164.514(b)(1)) — A qualified expert applies statistical or scientific principles to determine that re-identification risk is "very small." Offers flexibility to preserve data useful for AI training while maintaining compliance.
Safe Harbor Method (45 CFR § 164.514(b)(2)) — A prescriptive checklist requiring removal of 18 specific identifiers. Straightforward to implement with clear compliance documentation.

A covered entity may use either method. Satisfying either method demonstrates compliance with the de-identification standard.

Question 8

What are the 18 identifiers that must be removed under the Safe Harbor Method?

Accepted Answer

The following 18 identifier types must be removed to satisfy HIPAA's Safe Harbor de-identification standard: (1) Names; (2) Geographic data below the state level; (3) Dates except year; (4) Phone numbers; (5) Fax numbers; (6) Email addresses; (7) Social Security numbers; (8) Medical record numbers; (9) Health plan beneficiary numbers; (10) Account numbers; (11) Certificate and license numbers; (12) Vehicle identifiers and serial numbers; (13) Device identifiers and serial numbers; (14) Web URLs; (15) IP addresses; (16) Biometric identifiers; (17) Full-face photos and comparable images; and (18) Any other unique identifying number, characteristic, or code. The covered entity must also have no actual knowledge that remaining information could be used — alone or in combination — to identify an individual.

Question 9

What are the special considerations for medical imaging de-identification?

Accepted Answer

Medical imaging presents unique de-identification challenges beyond standard data field removal. DICOM headers — the metadata embedded in imaging files — may hold extensive patient information that is not visible on screen and must be fully scrubbed. Burnt-in PHI is a particular concern: certain modalities, especially ultrasound, can have patient information rendered directly into the pixel data of the image itself, requiring specialized image processing to detect and redact without compromising diagnostic value. Derived data — including structured reports, measurements, and annotations associated with imaging studies — may also contain PHI. Radiant AI Health Data, Inc. has developed purpose-built software that handles complete de-identification across all modalities, including burnt-in PHI, before data ever leaves the originating facility. No PHI ever leaves the facility's domain.

Question 10

Is patient consent required to use de-identified data for AI research?

Accepted Answer

Under HIPAA, no individual patient authorization is required to use or share properly de-identified health information. Once data meets HIPAA's de-identification standard — through either the Safe Harbor or Expert Determination method — it is no longer considered PHI and is not subject to HIPAA's authorization requirements. This is one of HIPAA's most important provisions for enabling AI research: compliant de-identification removes the consent barrier while preserving the scientific value of the data.

Question 11

Can patients opt out of having their de-identified data used for research?

Accepted Answer

HIPAA does not require an opt-out mechanism for de-identified data. However, some hospitals include data use provisions within their standard HIPAA consent agreements, and individual facilities may offer patients the ability to express preferences.

Beyond individual patient opt-out, our model provides institutional-level controls: each partner facility can opt out of specific categories of research at the time of signing — for example, excluding their data from DNA modification research or any other category that conflicts with their institutional values or community expectations.

Question 12

When is a HIPAA Authorization required for research?

Accepted Answer

A HIPAA Authorization is required when a covered entity discloses identifiable PHI for research purposes, unless another exception applies such as:

An IRB or Privacy Board waiver of authorization
Research involving only decedents' information
Use of a Limited Data Set with a Data Use Agreement
Preparatory research activities that do not involve removing PHI

Properly de-identified data bypasses this requirement entirely, as it is no longer PHI.

Question 13

Do state laws add requirements beyond HIPAA?

Accepted Answer

Yes — and this is a critically important consideration. Many states have enacted health privacy laws that impose requirements stricter than HIPAA, and HIPAA expressly permits states to do so. Organizations must comply with both federal HIPAA requirements and any applicable state laws, following the more stringent standard. State law compliance is not optional. Operating in multiple states requires a careful review of each state's specific health privacy requirements, as penalties can be significant.

Question 14

Which states have the most significant health data privacy laws?

Accepted Answer

Several states have enacted particularly notable health privacy legislation:

California — The Confidentiality of Medical Information Act (CMIA) and the California Consumer Privacy Act (CCPA/CPRA) impose additional requirements. California AB 713 (2020) harmonized CCPA de-identification with HIPAA and created new notice requirements for de-identified data sales.
Washington — The My Health My Data Act (MHMDA), effective March 2024, is among the strictest in the nation. It applies to consumer health data and creates a private right of action.
New York, Texas, Florida — Have specific health privacy statutes that supplement HIPAA requirements.
Illinois — The Genetic Information Privacy Act adds protections for genetic data.

By 2023–2024, most states have enacted or are considering comprehensive privacy laws. Hospitals should monitor state legislatures where they have significant patient populations.

Question 15

What governance structures should hospitals establish for data sharing?

Accepted Answer

Effective data governance for AI research sharing typically requires: a Data Governance Committee with representation from legal, compliance, clinical, and IT; written data governance policies covering classification, access controls, and sharing approvals; a formal data request and approval process for evaluating AI research partnerships; designated data stewards responsible for specific data domains; and regular audits of data sharing arrangements and de-identification processes.

Question 16

What should hospitals look for in a data sharing partner?

Accepted Answer

When evaluating a clinical data sharing partner, hospitals should assess: de-identification expertise — whether the partner has demonstrated capability across all relevant modalities including imaging; on-site processing — whether de-identification is performed at the facility before data leaves, or whether PHI travels to a third-party environment first; auditability — whether the partner maintains traceable, documentable records of how each dataset was processed; revenue share transparency — whether compensation terms are clearly defined and verifiable; research category controls — whether the facility can restrict which types of research their data may be used for; and BAA compliance — whether the partner is prepared to execute a HIPAA-compliant Business Associate Agreement. Radiant AI Health Data, Inc. performs all de-identification on-site, within the facility's own domain, before any data moves. PHI never leaves the originating facility.

Question 17

What makes health data particularly valuable for AI research?

Accepted Answer

AI models for healthcare require large, diverse, representative datasets to produce reliable, equitable results. Large academic medical centers alone cannot provide the breadth needed — AI trained only on data from major urban hospitals may perform poorly on populations seen in rural and community facilities.

Small and mid-sized hospitals hold data representing patient populations that are underrepresented in current AI training sets — including rural communities, underserved populations, and geographic regions with different disease prevalence patterns. Including this data is not just commercially valuable — it is essential for developing AI that works equitably across all patients.

Question 18

Are there special regulatory considerations for AI trained on health data?

Accepted Answer

Yes. When de-identified health data is used to train AI models, several regulatory frameworks may apply depending on the AI application. FDA oversight applies to AI and machine learning software used as a medical device (SaMD); the FDA has issued guidance on AI/ML in Software as a Medical Device. The Common Rule (45 CFR Part 46) may apply if AI development involves federally funded research. The NIST AI Risk Management Framework provides voluntary guidance for responsible AI development. An emerging area of state AI laws may also affect healthcare AI applications.

Question 19

What are the re-identification risks and how are they mitigated?

Accepted Answer

Re-identification — the process of linking de-identified data back to specific individuals — is a legitimate concern that must be addressed through robust de-identification architecture. Key mitigations include: comprehensive identifier removal covering all 18 Safe Harbor identifiers and additional quasi-identifiers that could enable re-identification in combination; geographic suppression removing not just patient location but also the originating facility, city, and state from each study; no reference preservation — ensuring no internal identifiers or cross-reference fields that could link back to source records remain in the data; and on-site processing, completing all de-identification within the facility's domain so only fully anonymized data ever moves. In a zero-retention architecture, identifiers are removed — not transformed — meaning re-identification attacks have no surface to work against.

Question 20

What are the penalties for HIPAA violations related to data sharing?

Accepted Answer

HIPAA violations carry civil and criminal penalties. Civil penalties range from $100 to $50,000 per violation, with annual caps up to $1.9 million per violation category. Criminal penalties include: for knowing violations, up to $50,000 and 1 year imprisonment; for violations under false pretenses, up to $100,000 and 5 years; and for violations with intent to sell or harm, up to $250,000 and 10 years. State AGs may bring additional enforcement actions, and states such as Washington under the MHMDA create private rights of action. Proper de-identification eliminates PHI status and removes data from HIPAA's penalty framework. Compliance documentation is essential protection.

Question 21

What contractual agreements govern health data sharing?

Accepted Answer

Several types of agreements may be required depending on the nature of the data sharing arrangement. A Business Associate Agreement (BAA) is required under HIPAA when a business associate will access PHI on behalf of a covered entity; it specifies permitted uses, safeguards, and breach notification obligations. A Data Use Agreement (DUA) is required when sharing Limited Data Sets that retain some dates and geographic information but exclude direct identifiers. A Data License Agreement governs commercial arrangements for de-identified data, including permitted uses, restrictions, revenue sharing, and audit rights. A Letter of Intent (LOI) is a non-binding agreement outlining proposed terms of a data partnership prior to executing definitive agreements.

Question 22

What key provisions should be included in a data sharing agreement?

Accepted Answer

Comprehensive data sharing agreements for AI research should address: permitted and prohibited uses, including specific research categories authorized, opt-out categories, and prohibition on re-identification attempts; de-identification standards specifying which HIPAA method applies and verification and audit rights; revenue sharing terms covering how compensation is calculated, tracked, and distributed back to originating facilities; data security requirements including technical and organizational safeguards and breach notification obligations; term and termination provisions covering duration, grounds for termination, and data return or destruction upon termination; and representations and warranties setting out each party's compliance obligations and liability limitations.

Question 23

What recent regulatory changes affect health data for AI?

Accepted Answer

The regulatory landscape has evolved significantly in 2023–2024. Most U.S. states now have comprehensive privacy laws, many enacted in 2023–2024. Washington's My Health My Data Act became effective March 2024 and is one of the most expansive state health privacy laws in the country. Maryland's Online Data Privacy Act was enacted in 2024. The FDA has issued updated guidance on AI and machine learning in Software as a Medical Device. The FTC has brought enforcement actions against companies making deceptive claims about de-identification. No comprehensive federal privacy law has passed Congress as of 2024, but proposed legislation continues to be introduced. Organizations should monitor federal developments and build flexibility into their data governance programs.

Question 24

How are courts treating de-identification issues?

Accepted Answer

Judicial and regulatory scrutiny of de-identification practices is increasing. The FTC has brought enforcement actions against companies for deceptive claims about data de-identification, resulting in settlements requiring robust de-identification programs. Patients have brought class actions alleging that de-identified data was actually re-identifiable; courts are scrutinizing data sharing practices and consent mechanisms. State attorneys general are increasingly active in privacy enforcement; Washington's MHMDA creates a private right of action that may generate litigation. State AG offices are also increasingly coordinating on privacy enforcement actions. Best practice: document de-identification processes thoroughly and be prepared to demonstrate compliance if investigated. Auditability is the strongest defense.

HIPAA, State Laws &
De-identified Health Data
for AI Research

HIPAA Fundamentals

De-identification Methods

State Law Variations

Practical Implementation

AI-Specific Considerations

Risk Management

Contractual Protections

Recent Developments

Glossary

Official Resources

HIPAA, State Laws &De-identified Health Datafor AI Research