Radiant AI Health Data, Inc.
www.radiantaihealthdata.com
Compliance Resource
Frequently Asked Questions

HIPAA, State Laws &
De-identified Health Data
for AI Research

A comprehensive guide for hospitals, healthcare organizations, and AI researchers navigating the legal and regulatory framework governing compliant health data sharing.

Compliance Officers Healthcare Counsel AI Researchers Healthcare IT Data Governance Teams IRBs Healthcare Executives
01

HIPAA Fundamentals

The Health Insurance Portability and Accountability Act (HIPAA) is a federal law enacted in 1996 that established national standards for protecting sensitive patient health information. HIPAA covers three types of "covered entities":

  1. Healthcare providers who transmit health information electronically in connection with certain transactions
  2. Health plans including health insurance companies, HMOs, company health plans, and government programs like Medicare and Medicaid
  3. Healthcare clearinghouses that process health information from nonstandard formats to standard formats

Additionally, HIPAA extends to "business associates" — persons or entities that perform functions or activities on behalf of covered entities that involve access to protected health information.

Protected Health Information (PHI) is individually identifiable health information held or transmitted by a covered entity or business associate. PHI includes:

  • Demographic information (name, address, birth date, Social Security Number)
  • Medical histories, diagnoses, treatment information, and prognoses
  • Payment and billing information related to healthcare services
  • Any other information that identifies an individual and relates to their health status, healthcare provision, or healthcare payment

PHI can exist in any form: electronic, paper, or oral.

HIPAA does not protect:

  • De-identified health information — information that cannot be linked to an individual
  • Health information not held by covered entities or business associates
  • Employment records held by employers in their capacity as employers
  • Education records covered by FERPA
Once properly de-identified, health information is no longer subject to HIPAA's Privacy Rule restrictions and can be used or disclosed freely without patient authorization.
  1. Privacy Rule (45 CFR Part 164, Subparts A and E) — Establishes national standards for protecting PHI and governs when and how covered entities can use and disclose PHI.
  2. Security Rule (45 CFR Part 164, Subparts A and C) — Establishes standards for protecting electronic PHI through administrative, physical, and technical safeguards.

Use refers to the internal employment, application, or utilization of PHI within a covered entity — for example, a hospital using patient records for internal quality improvement.

Disclosure refers to the release, transfer, or providing access to PHI outside the covered entity — for example, a hospital sharing patient data with an AI research company.

HIPAA's restrictions on disclosure are generally more stringent than those on internal use. This distinction is critical when evaluating data sharing arrangements.
02

De-identification Methods

Under 45 CFR § 164.514(a), health information is considered de-identified — and therefore not PHI — when it "does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual."

Once properly de-identified, the information is no longer subject to HIPAA's Privacy Rule restrictions and can be used or disclosed freely without patient authorization.

HIPAA provides two methods for achieving de-identification:

  1. Expert Determination Method (45 CFR § 164.514(b)(1)) — A qualified expert applies statistical or scientific principles to determine that re-identification risk is "very small." Offers flexibility to preserve data useful for AI training while maintaining compliance.
  2. Safe Harbor Method (45 CFR § 164.514(b)(2)) — A prescriptive checklist requiring removal of 18 specific identifiers. Straightforward to implement with clear compliance documentation.

A covered entity may use either method. Satisfying either method demonstrates compliance with the de-identification standard.

The following 18 identifier types must be removed to satisfy HIPAA's Safe Harbor de-identification standard:

Names
Geographic data (below state)
Dates (except year)
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers & serial numbers
Device identifiers & serial numbers
Web URLs
IP addresses
Biometric identifiers
Full-face photos & images
Any other unique identifying number
The covered entity must also have no actual knowledge that remaining information could be used — alone or in combination — to identify an individual.

Medical imaging presents unique de-identification challenges that go beyond standard data field removal:

  • DICOM headers — Imaging files in DICOM format contain embedded metadata headers that may hold extensive patient information not visible on screen
  • Burnt-in PHI — Certain modalities, particularly ultrasound, can have patient information rendered directly into the pixel data of the image itself — requiring specialized image processing to detect and redact
  • Derived data — Structured reports, measurements, and annotations associated with imaging studies may also contain PHI
Radiant AI Health Data, Inc. has developed purpose-built software that handles complete de-identification across all modalities — including burnt-in PHI — before data ever leaves the originating facility. No PHI ever leaves the facility's domain.
04

State Law Variations

Yes — and this is a critically important consideration. Many states have enacted health privacy laws that impose requirements stricter than HIPAA, and HIPAA expressly permits states to do so. Organizations must comply with both federal HIPAA requirements and any applicable state laws, following the more stringent standard.

State law compliance is not optional. Operating in multiple states requires a careful review of each state's specific health privacy requirements, as penalties can be significant.

Several states have enacted particularly notable health privacy legislation:

  • California — The Confidentiality of Medical Information Act (CMIA) and the California Consumer Privacy Act (CCPA/CPRA) impose additional requirements. California AB 713 (2020) harmonized CCPA de-identification with HIPAA and created new notice requirements for de-identified data sales.
  • Washington — The My Health My Data Act (MHMDA), effective March 2024, is among the strictest in the nation. It applies to consumer health data and creates a private right of action.
  • New York, Texas, Florida — Have specific health privacy statutes that supplement HIPAA requirements.
  • Illinois — The Genetic Information Privacy Act adds protections for genetic data.

By 2023–2024, most states have enacted or are considering comprehensive privacy laws. Hospitals should monitor state legislatures where they have significant patient populations.

05

Practical Implementation

Effective data governance for AI research sharing typically requires:

  • A Data Governance Committee with representation from legal, compliance, clinical, and IT
  • Written data governance policies covering classification, access controls, and sharing approvals
  • A formal data request and approval process for evaluating AI research partnerships
  • Designated data stewards responsible for specific data domains
  • Regular audits of data sharing arrangements and de-identification processes

When evaluating a clinical data sharing partner, hospitals should assess:

  • De-identification expertise — Does the partner have demonstrated capability across all relevant modalities, including imaging?
  • On-site processing — Is de-identification performed at the facility before data leaves, or does PHI travel to a third-party environment first?
  • Auditability — Does the partner maintain traceable, documentable records of how each dataset was processed?
  • Revenue share transparency — Are the terms of data use compensation clearly defined and verifiable?
  • Research category controls — Can the facility restrict which types of research their data may be used for?
  • BAA compliance — Is the partner prepared to execute a HIPAA-compliant Business Associate Agreement?
Radiant AI Health Data, Inc. performs all de-identification on-site, within the facility's own domain, before any data moves. PHI never leaves the originating facility.
06

AI-Specific Considerations

AI models for healthcare require large, diverse, representative datasets to produce reliable, equitable results. Large academic medical centers alone cannot provide the breadth needed — AI trained only on data from major urban hospitals may perform poorly on populations seen in rural and community facilities.

Small and mid-sized hospitals hold data representing patient populations that are underrepresented in current AI training sets — including rural communities, underserved populations, and geographic regions with different disease prevalence patterns. Including this data is not just commercially valuable — it is essential for developing AI that works equitably across all patients.

Yes. When de-identified health data is used to train AI models, several regulatory frameworks may apply depending on the AI application:

  • FDA oversight — AI/ML software used as a medical device (SaMD) is subject to FDA regulation. The FDA has issued guidance on AI/ML in Software as a Medical Device.
  • Common Rule — If AI development involves federally funded research, the Common Rule (45 CFR Part 46) may apply.
  • NIST AI RMF — The NIST AI Risk Management Framework provides voluntary guidance for responsible AI development.
  • State AI laws — An emerging area; several states are considering AI-specific regulations that may affect healthcare AI.
07

Risk Management

Re-identification — the process of linking de-identified data back to specific individuals — is a legitimate concern that must be addressed through robust de-identification architecture. Key mitigations include:

  • Comprehensive identifier removal — Including all 18 Safe Harbor identifiers and any additional quasi-identifiers that could enable re-identification in combination
  • Geographic suppression — Removing not just patient location, but also the originating facility, city, and state from each study
  • No reference preservation — Ensuring no internal identifiers or cross-reference fields that could link back to source records remain in the data
  • On-site processing — Completing all de-identification within the facility's domain, so only fully anonymized data ever moves
In our architecture, there are no encrypted fields to decrypt and no hashed values to attack. The identifiers are removed — not transformed — meaning re-identification attacks have no surface to work against.

HIPAA violations carry civil and criminal penalties:

  • Civil penalties — Range from $100 to $50,000 per violation, with annual caps up to $1.9 million per violation category
  • Criminal penalties — For knowing violations: up to $50,000 and 1 year imprisonment; for violations under false pretenses: up to $100,000 and 5 years; for violations with intent to sell or harm: up to $250,000 and 10 years
  • State penalties — State AGs may bring additional enforcement actions, and states like Washington (MHMDA) create private rights of action
Proper de-identification eliminates PHI status and removes data from HIPAA's penalty framework. Compliance documentation is your best protection.
08

Contractual Protections

Several types of agreements may be required depending on the nature of the data sharing arrangement:

  • Business Associate Agreement (BAA) — Required under HIPAA when a business associate will access PHI on behalf of a covered entity. Specifies permitted uses, safeguards, and breach notification obligations.
  • Data Use Agreement (DUA) — Required when sharing Limited Data Sets (which retain some dates and geographic information but exclude direct identifiers).
  • Data License Agreement — Governs commercial arrangements for de-identified data, including permitted uses, restrictions, revenue sharing, and audit rights.
  • Letter of Intent (LOI) — Non-binding agreement outlining the proposed terms of a data partnership prior to executing definitive agreements.

Comprehensive data sharing agreements for AI research should address:

  • Permitted and prohibited uses — Specific research categories authorized; opt-out categories; prohibition on re-identification attempts
  • De-identification standards — Which HIPAA method applies; verification and audit rights
  • Revenue sharing — How compensation is calculated, tracked, and distributed back to originating facilities
  • Data security — Technical and organizational safeguards; breach notification obligations
  • Term and termination — Duration; grounds for termination; data return or destruction upon termination
  • Representations and warranties — Each party's compliance obligations and liability limitations
09

Recent Developments

The regulatory landscape has evolved significantly in 2023–2024:

  • Most U.S. states now have comprehensive privacy laws, many enacted in 2023–2024
  • Washington's My Health My Data Act became effective March 2024 — one of the most expansive state health privacy laws in the country
  • Maryland's Online Data Privacy Act was enacted in 2024
  • The FDA has issued updated guidance on AI/ML in Software as a Medical Device
  • FTC has brought enforcement actions against companies making deceptive claims about de-identification
No comprehensive federal privacy law has passed Congress as of 2024, but proposed legislation continues to be introduced. Organizations should monitor federal developments and build flexibility into their data governance programs.

Judicial and regulatory scrutiny of de-identification practices is increasing:

  • FTC enforcement — The FTC has brought actions against companies for deceptive claims about data de-identification, resulting in settlements requiring robust de-identification programs
  • Class action litigation — Patients have brought class actions alleging that "de-identified" data was actually re-identifiable; courts are scrutinizing data sharing practices and consent mechanisms
  • State AG enforcement — State attorneys general are increasingly active in privacy enforcement; Washington's MHMDA creates a private right of action that may generate litigation
  • Multi-state coordination — State AG offices are increasingly coordinating on privacy enforcement actions
Best practice: Document de-identification processes thoroughly and be prepared to demonstrate compliance if investigated. Auditability is your strongest defense.
GL

Glossary

Business Associate
Under HIPAA, a person or entity that performs functions or activities involving the use or disclosure of PHI on behalf of a covered entity.
Common Rule
The Federal Policy for the Protection of Human Subjects, governing federally funded research involving human subjects. 45 CFR Part 46.
Consumer Health Data
Under state privacy laws, personal information related to a consumer's health status, condition, or diagnosis. Washington's MHMDA defines this broadly.
Covered Entity
Under HIPAA, a healthcare provider, health plan, or healthcare clearinghouse that transmits health information electronically.
De-identification
The process of removing or modifying information in health data so that individuals cannot be identified.
Expert Determination
A HIPAA de-identification method where a qualified expert certifies that re-identification risk is very small.
IRB (Institutional Review Board)
A committee that reviews and oversees research involving human subjects to protect their rights and welfare.
Limited Data Set
Under HIPAA, a data set that excludes direct identifiers but may include dates and geographic information. Requires a data use agreement.
PHI (Protected Health Information)
Individually identifiable health information held by a covered entity or business associate.
Re-identification
The process of linking de-identified data back to specific individuals.
Safe Harbor
A HIPAA de-identification method requiring removal of 18 specified identifier types.
Sensitive Personal Information
Under state privacy laws, categories of personal data requiring additional protections, including health data.
RE

Official Resources

⚠ Legal Disclaimer This FAQ is provided for informational purposes only and does not constitute legal advice. Laws and regulations governing health data privacy are complex, vary by jurisdiction, and change frequently. Consult with qualified legal counsel for specific legal questions regarding HIPAA compliance, de-identification, and data sharing arrangements applicable to your organization.