Overview
Japan has established a sophisticated framework for health data de-identification that balances privacy protection with the desire to leverage health data for research and innovation. The framework is characterized by a two-tiered approach: general data protection laws that apply to all sectors, and health-specific legislation that provides additional requirements and opportunities for health data use.
Key Developments in Japan's Health Data Framework
- May 2017: Next Generation Medical Infrastructure Law enacted
- May 2018: Next Generation Medical Infrastructure Law came into effect
- June 2020: Major amendments to the Act on the Protection of Personal Information (APPI) passed
- April 2022: Amended APPI came into effect, introducing the concept of pseudonymously processed information
- October 2022: First certified medical information providers approved under NGMIL
- April 2023: Enhanced guidelines for anonymization of medical data published
- June 2023: Japan's Digital Health Strategy released, emphasizing secure health data utilization
- March 2024: Updated PPC guidelines on anonymization techniques published
- May 2024: New certification standards for medical information handling organizations
Legal Framework
Japan's health data de-identification framework is built on several key pieces of legislation:
Primary Legislation
- Act on the Protection of Personal Information (APPI): Japan's comprehensive data protection law, significantly amended in 2020 with implementation in 2022. It establishes the general framework for personal data protection, including special categories like health data.
- Next Generation Medical Infrastructure Law (NGMIL): Specific legislation enacted in 2017 to facilitate the use of medical data for research and development. It creates a framework for anonymized medical data sharing.
- Medical Practitioners' Act and Medical Care Act: Contain provisions on medical confidentiality and management of medical records that complement data protection requirements.
- Act on Anonymized Medical Data to Contribute to Research and Development in the Medical Field: Provides specific rules for anonymization of medical data for research purposes.
- Digital Health Enhancement Act (2023): Promotes digital transformation in healthcare while ensuring appropriate data protection.
Reference Links:
- Personal Information Protection Commission Japan (PPC): https://www.ppc.go.jp/en/
- APPI English Translation (2022 version): https://www.ppc.go.jp/files/pdf/Act_on_the_Protection_of_Personal_Information.pdf
- Ministry of Health, Labour and Welfare (MHLW): https://www.mhlw.go.jp/english/
- Next Generation Medical Infrastructure Law Overview: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000148944.html (Japanese)
- Japan Agency for Medical Research and Development (AMED): https://www.amed.go.jp/en/
- Cabinet Office Healthcare Policy: https://www.kantei.go.jp/jp/singi/kenkouiryou/en/
Key Concepts and Definitions
Under APPI
The APPI establishes several important categories of data:
| Concept | Definition | Regulatory Status | Examples in Health Context |
|---|---|---|---|
| Personal Information | Information that can identify a specific individual, including information that can be easily collated with other information to identify an individual | Fully regulated under APPI | Patient name, address, phone number, combined with medical record number |
| Special Care-Required Personal Information | Sensitive data including medical history and healthcare records that requires special handling | Subject to stricter requirements under APPI, including explicit consent for collection in most cases | Diagnosis information, treatment history, genetic test results, disability status |
| Pseudonymously Processed Information | Personal information that has been processed so that it cannot identify a specific individual without additional information | Still regulated but with some exemptions from APPI requirements, including for internal analysis purposes | Medical records with patient identifiers replaced by codes, with the mapping table stored separately |
| Anonymously Processed Information | Information that has been irreversibly processed to prevent identification of specific individuals | Falls outside most APPI requirements and can be used and shared with fewer restrictions | Statistical summaries of patient outcomes with all identifiers removed and data generalized |
Reference:
PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/files/pdf/211119_guidelines_anonymously_processed_information.pdf
PPC Guidelines on Pseudonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/
Example: Different Data Categories in Practice
Original Data (Personal Information):
- Name: Tanaka Hiroshi
- Date of Birth: May 15, 1975
- Address: 1-2-3 Chiyoda, Tokyo
- Diagnosis: Type 2 Diabetes
- Treatment: Metformin 500mg twice daily
- Hospital: Tokyo Medical Center
- Insurance ID: 12345678
- Phone: 03-1234-5678
- Email: tanaka.h@example.jp
Pseudonymously Processed Information:
- Patient ID: PT-12345
- Age: 50
- Region: Tokyo
- Diagnosis: Type 2 Diabetes
- Treatment: Metformin 500mg twice daily
- Hospital: Hospital A
- Insurance Category: Employee Health Insurance
Anonymously Processed Information:
- Age Group: 50-55
- Region: Kanto
- Diagnosis: Type 2 Diabetes
- Treatment Category: Oral hypoglycemic agent
- Hospital Type: Large Urban Medical Center
Special Rules for Health Data
The APPI and NGMIL establish special rules for health data:
- Enhanced Consent Requirements: Collection of health data generally requires explicit consent unless specific exceptions apply
- Opt-out for Research: Under NGMIL, medical institutions can share data with certified providers based on opt-out consent
- Data Minimization: Only necessary health data should be collected and processed
- Purpose Limitation: Health data should only be used for specified purposes
- Security Requirements: Enhanced security measures are required for health data
Example: NGMIL Opt-out Notice
Under the NGMIL, medical institutions must provide patients with an opt-out notice that includes:
- The name and contact information of the medical institution
- The name and contact information of the certified medical information provider
- The types of medical information to be provided
- The purposes for which the anonymized medical information will be used
- The patient's right to opt out and the procedure for doing so
- The fact that the data will be anonymized before being provided to third parties
Example notice from Keio University Hospital (translated):
"Keio University Hospital participates in the Next Generation Medical Infrastructure program to support medical research. Your medical information may be provided to Life Data Initiative (certified medical information provider) for anonymization and research use. If you do not wish your data to be used, please notify our reception desk. For more information, please see our website or ask our staff."
Next Generation Medical Infrastructure Law (NGMIL)
The NGMIL creates a specific framework for the use of medical data for research and innovation:
Key Features
- Certified Medical Information Providers: Medical institutions can provide patient data to certified organizations (認定匿名加工医療情報作成事業者) that are authorized to collect and anonymize medical data
- Opt-out Mechanism: Uses an opt-out rather than opt-in approach for data sharing, where patients must be notified but don't need to provide explicit consent
- Anonymization Standards: Establishes specific standards for medical data anonymization that are more detailed than general APPI requirements
- Data Utilization Framework: Creates a structured process for researchers to access anonymized health data
- Prohibition of Re-identification: Explicitly prohibits attempts to re-identify anonymized medical data with significant penalties
- Certification Requirements: Detailed requirements for organizations seeking certification, including technical capabilities, security measures, and governance structures
Reference:
MHLW Next Generation Medical Infrastructure Law Information: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000148944.html (Japanese)
Cabinet Office Healthcare Data Policy: https://www.kantei.go.jp/jp/singi/kenkouiryou/jisedai_kiban/ (Japanese)
MHLW Guidelines for Certified Medical Information Providers (2023 update): https://www.mhlw.go.jp/content/10800000/001088247.pdf (Japanese)
NGMIL Anonymization Process
The NGMIL establishes a specific process for medical data anonymization:
- Medical institutions provide data to certified medical information providers after notifying patients (with opt-out option)
- These providers process the data according to strict anonymization standards specified in MHLW guidelines
- Anonymized data can then be provided to researchers and companies for approved purposes
- Results of research using the anonymized data must be reported back to the certified provider
- Certified providers must conduct regular audits and report to MHLW
- MHLW conducts periodic inspections of certified providers
Case Study: Life Data Initiative (LDI)
In 2022, the Life Data Initiative was certified as a medical information provider under NGMIL. The initiative:
- Collects medical data from over 100 participating hospitals and clinics
- Processes approximately 10 million patient records using standardized anonymization protocols
- Makes anonymized data available to approved researchers for medical innovation projects
- Has supported research on treatment efficacy for various conditions including diabetes, cardiovascular disease, and cancer
- Implements a secure data access environment with strict controls to prevent re-identification attempts
- Provides detailed reports to MHLW on data utilization and security measures
- Conducts regular risk assessments of anonymization techniques
This initiative demonstrates how the NGMIL framework enables large-scale health data utilization while maintaining privacy protections.
Example: NGMIL Certification Requirements
Organizations seeking certification under NGMIL must meet stringent requirements, including:
- Technical capability: Demonstrated expertise in medical data anonymization
- Security measures: Physical, technical, and administrative safeguards
- Governance structure: Independent oversight committee with medical and ethics experts
- Financial stability: Sufficient resources to maintain operations
- Operational procedures: Documented processes for data handling
- Personnel qualifications: Staff with appropriate expertise
- Audit capabilities: Systems for tracking data access and use
- Breach response plan: Procedures for handling security incidents
As of May 2024, only four organizations have received this certification, highlighting the rigorous nature of the requirements.
Technical Requirements for De-identification
Japanese regulations specify technical requirements for both pseudonymization and anonymization:
Pseudonymous Processing Requirements
To qualify as "Pseudonymously Processed Information" under APPI, data controllers must:
- Replace all direct identifiers with codes or pseudonyms
- Delete descriptions that could easily identify the individual (such as rare disease information)
- Store any information linking pseudonyms to original identities separately and securely
- Implement security measures to prevent unauthorized access
- Not attempt to re-identify the information
- Document the pseudonymization process and security measures
- Limit use to internal analysis, testing, and research purposes only
- Conduct risk assessment for potential re-identification vulnerabilities
Example: Hospital Pseudonymization Process
A large Tokyo hospital implements pseudonymization for internal quality improvement research as follows:
- Patient names replaced with randomly generated codes (e.g., "PT-2024-78945")
- Dates of birth converted to ages
- Addresses generalized to prefecture level
- Rare conditions (affecting fewer than 0.1% of population) grouped into broader categories
- The pseudonymization key (mapping between patient IDs and codes) stored in a separate secure database with access limited to the hospital's data protection officer
- Access to pseudonymized data restricted to authorized research staff
- All access logged and monitored
- Regular audits of access logs and security controls
- Specific training for staff on pseudonymized data handling
- Prohibition on combining the pseudonymized data with external datasets
Anonymous Processing Requirements
To qualify as "Anonymously Processed Information" under APPI, data controllers must:
- Delete all direct personal identifiers
- Process any code numbers or indirect identifiers so that individuals cannot be identified
- Delete any free-text descriptions that could identify individuals
- Take additional measures based on a risk assessment considering properties of the data and processing methods
- Publicly disclose information about the categories of data being processed anonymously
- Not combine the anonymized data with other information to attempt re-identification
- Implement security measures to prevent unauthorized access to the data
- Document the anonymization process and risk assessment
- Regularly review the effectiveness of anonymization techniques
Reference:
PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/
MHLW Guidelines on Medical Data Anonymization (Japanese): https://www.mhlw.go.jp/content/10601000/000499627.pdf
PPC Handbook on Anonymization (2024 version, Japanese): https://www.ppc.go.jp/personalinfo/legal/anonymously_processed_information/
Technical Methods Recommended in Japanese Guidelines
| Method | Description | Example in Health Context |
|---|---|---|
| K-anonymity | Ensuring each record is indistinguishable from at least k-1 other records | Ensuring any combination of age, gender, and prefecture appears at least 5 times in the dataset |
| L-diversity | Ensuring sensitive attributes have at least l different values within each group | Ensuring patients with the same demographic characteristics have at least 3 different diagnoses |
| T-closeness | Ensuring the distribution of sensitive attributes within each group is similar to the overall distribution | Ensuring the distribution of diagnoses within each demographic group is similar to the overall patient population |
| Generalization | Reducing precision of data | Converting exact ages to 5-year ranges, specific locations to prefecture level |
| Data Suppression | Removing high-risk values | Removing information about very rare conditions or treatments |
| Noise Addition | Adding statistical noise to numerical values | Adding small random variations to laboratory values or vital signs |
| Top/Bottom Coding | Grouping extreme values | Reporting ages as "90+" for all patients over 90 years old |
| Differential Privacy | Adding mathematical noise to query results | Adding calibrated noise to statistical queries on health data to provide mathematical privacy guarantees |
Example: Anonymization Process for a National Health Survey
Japan's National Health and Nutrition Survey data is anonymized using these steps:
- Direct identifier removal: Names, addresses, phone numbers, and other direct identifiers are completely removed
- Age generalization: Exact ages are converted to 5-year age bands (e.g., 45-49)
- Geographic aggregation: Specific locations are generalized to prefecture level
- Rare value suppression: Very rare conditions or characteristics affecting fewer than 0.5% of the population are either suppressed or grouped into broader categories
- K-anonymity verification: Data is checked to ensure each combination of quasi-identifiers appears at least 5 times
- L-diversity implementation: Ensuring sensitive attributes have sufficient diversity within each demographic group
- Date shifting: Exact dates are converted to months or seasons
- Free text processing: Any free text fields are either removed or processed to remove potential identifiers
- Outlier management: Extreme values are top/bottom coded
- Risk assessment: Final dataset undergoes re-identification risk assessment
Regulatory Oversight
Japan's framework includes strong regulatory oversight:
- Personal Information Protection Commission (PPC): The primary regulatory authority for personal data protection, with powers to investigate, issue orders, and impose penalties for APPI violations
- Ministry of Health, Labour and Welfare (MHLW): Oversees the implementation of the NGMIL and certifies medical information providers
- Certified Organizations: Special entities certified under the NGMIL to process medical information, subject to strict operational requirements and regular audits
- Healthcare Information Technical Committee: Advisory body that develops technical standards for health data security and anonymization
- Japan Medical Association: Professional body that provides guidance on ethical handling of medical data
- Digital Agency of Japan: Coordinates digital health initiatives and data protection standards
Enforcement Example: PPC Actions
In 2023, the PPC took enforcement action against a healthcare application provider for improper handling of health data. The company had:
- Failed to properly pseudonymize health data used for product development
- Not implemented adequate security measures for sensitive health information
- Shared what it claimed was anonymized data with third parties without meeting APPI anonymization standards
- Failed to disclose to users how their health data would be processed
The PPC issued an administrative guidance order requiring the company to:
- Implement proper pseudonymization procedures
- Enhance security measures
- Cease sharing data until proper anonymization could be verified
- Submit to regular audits for the following two years
- Revise privacy notices to clearly explain data processing
- Provide additional training to staff on data protection
Reference:
Personal Information Protection Commission: https://www.ppc.go.jp/en/
PPC Annual Report 2023 (English): https://www.ppc.go.jp/en/aboutus/roles/annual/
MHLW Certified Medical Information Providers List (Japanese): https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000202384.html
Digital Agency of Japan: https://www.digital.go.jp/en/
Penalties for Non-compliance
Japan's framework includes significant penalties for violations:
- APPI Violations: Fines up to 100 million yen (approximately $700,000) for corporations
- NGMIL Violations: Fines up to 1 million yen (approximately $7,000) and/or imprisonment up to 1 year for unauthorized re-identification attempts
- Administrative Actions: The PPC can issue improvement orders, business suspension orders, and public announcements of violations
- Certification Revocation: MHLW can revoke certification of medical information providers for serious violations
- Civil Liability: Affected individuals can seek damages for privacy violations
Practical Implementation
In practice, Japan's framework supports several mechanisms for health data use:
1. Jisedai Iryo-ban (Next-Generation Healthcare Platform)
A system established under the NGMIL that:
- Collects and anonymizes medical data from hospitals and clinics
- Makes anonymized data available for research and development
- Operates under strict certification requirements
- Includes multiple certified medical information providers
- Supports both academic and commercial research initiatives
- Implements standardized data formats and anonymization protocols
- Provides secure access environments for researchers
Case Study: JMDC Health Data Bank
JMDC Inc. operates one of Japan's largest health data banks as a certified medical information provider. Their platform:
- Contains anonymized data from over 10 million individuals
- Includes health insurance claims, medical checkups, and prescription data
- Implements the NGMIL's opt-out mechanism through participating healthcare providers
- Applies standardized anonymization protocols developed with MHLW guidance
- Provides data access to pharmaceutical companies, medical device manufacturers, and academic researchers
- Has supported over 300 research projects, including COVID-19 treatment effectiveness studies
- Operates a secure cloud environment for data analysis with access controls and audit logging
- Conducts regular re-identification risk assessments
- Publishes annual transparency reports on data utilization
In 2023, JMDC data was used in a major study on diabetes treatment outcomes that led to updated clinical guidelines, demonstrating the practical value of the NGMIL framework.
2. Clinical Innovation Network (CIN)
A network for sharing clinical data for research purposes using de-identified information that:
- Connects multiple hospitals and research institutions
- Standardizes clinical data collection and de-identification
- Focuses on rare disease and specialty care research
- Operates under MHLW oversight
- Implements registry-based research using pseudonymized data
- Supports clinical trial optimization and real-world evidence generation
Reference:
Clinical Innovation Network: https://cinc.ncgm.go.jp/en/
CIN Research Publications: https://cinc.ncgm.go.jp/en/achievements/publications.html
3. National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB)
A large database of health insurance claims and health checkup data that provides anonymized data for research. The NDB:
- Contains data from approximately 95% of Japan's population
- Includes over 16 billion medical claims records
- Implements specialized anonymization techniques for large-scale health data
- Provides data access to approved researchers through a controlled process
- Supports policy research and healthcare system planning
- Enables population-level health analyses
- Implements tiered access controls based on research needs and data sensitivity
Reference:
NDB Open Data: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000177221.html (Japanese)
MHLW NDB User Guide (2023 version): https://www.mhlw.go.jp/content/12400000/001053185.pdf (Japanese)
4. Medical Information Database Network (MID-NET)
A specialized network for pharmacovigilance and drug safety research that:
- Connects electronic health record systems from multiple hospitals
- Enables near real-time monitoring of drug safety signals
- Uses pseudonymized patient data for analysis
- Implements distributed analysis to minimize data transfer
- Supports regulatory decision-making on pharmaceuticals
- Operates under the Pharmaceuticals and Medical Devices Agency (PMDA)
Reference:
PMDA MID-NET: https://www.pmda.go.jp/english/safety/surveillance-analysis/0018.html
Case Study: COVID-19 Data Platform
During the COVID-19 pandemic, Japan established a specialized data platform that demonstrated the flexibility of its health data framework:
- Combined data from multiple sources including testing centers, hospitals, and public health departments
- Implemented expedited anonymization protocols while maintaining privacy protections
- Created tiered access levels for different user groups (public health officials, researchers, policymakers)
- Enabled rapid analysis of treatment outcomes, vaccine effectiveness, and variant impacts
- Supported both domestic policy decisions and international research collaboration
- Demonstrated how Japan's framework could adapt to emergency situations while maintaining privacy principles
Recent Developments and Future Directions
Japan continues to evolve its approach to health data de-identification:
- AI and Healthcare: Development of specific guidelines for using de-identified health data in AI development (2023-2024)
- International Data Transfers: Japan has received an adequacy decision from the EU, facilitating health data transfers for research
- Genomic Data Initiatives: The Genomic Medicine Implementation Platform with specialized anonymization protocols for genetic information
- Enhanced Technical Standards: Ongoing development of more sophisticated technical standards for health data anonymization
- Public-Private Partnerships: Increased collaboration between government, academia, and industry for health data utilization
- Digital Health Strategy: Japan's 2023 Digital Health Strategy emphasizes secure data sharing for innovation
- Interoperability Standards: Development of standardized formats for health data exchange
- Federated Learning: Exploration of privacy-preserving analytics that don't require data centralization
Emerging Approach: Differential Privacy
Japan's National Institute of Information and Communications Technology (NICT) has been developing differential privacy techniques specifically for healthcare applications. This approach:
- Adds mathematical noise to query results rather than to the underlying data
- Provides provable privacy guarantees regardless of external information
- Is being piloted in several Japanese healthcare research initiatives
- May be incorporated into future MHLW guidelines for health data anonymization
- Has been implemented in a pilot project with three university hospitals for rare disease research
- Enables more precise analysis while maintaining strong privacy protections
Reference:
NICT Research on Healthcare Data Privacy: https://www.nict.go.jp/en/research/research-areas-and-research-centers.html
Japan's Digital Health Strategy (Cabinet Office): https://www.kantei.go.jp/jp/singi/kenkouiryou/suisin/ (Japanese)
AMED Genomic Medicine Program: https://www.amed.go.jp/en/program/list/04/01/genome_medicine.html
Challenges and Ongoing Work
Despite its sophisticated framework, Japan continues to address several challenges:
- Balancing Innovation and Privacy: Finding the right balance between data access for innovation and privacy protection
- Public Trust: Building and maintaining public trust in health data systems
- Technical Complexity: Managing the technical complexity of advanced anonymization techniques
- International Harmonization: Aligning Japanese standards with global frameworks
- Emerging Technologies: Adapting the framework to emerging technologies like AI and genomics
- Implementation Costs: Addressing the costs of implementing sophisticated de-identification systems
How It Compares to HIPAA Safe Harbor
Japan's approach differs from HIPAA Safe Harbor in several key ways:
- Legal Categories: Creates a clear legal distinction between pseudonymized data (still regulated) and anonymized data (less regulated), whereas HIPAA has a binary approach (either PHI or de-identified)
- Dedicated Framework: Provides a specific legal framework (NGMIL) dedicated to medical data sharing for innovation, more specialized than HIPAA
- Consent Approach: Uses an opt-out rather than opt-in approach for health data sharing for research under NGMIL, while HIPAA requires authorization unless data is de-identified
- De-identification Approach: Takes a more risk-based approach to de-identification rather than a specific list of identifiers as in HIPAA Safe Harbor
- Intermediary Model: Creates a certified intermediary model for processing health data, not present in HIPAA
- Purpose Emphasis: Places greater emphasis on the purpose and context of data use in determining appropriate de-identification methods
- Risk Assessment: Incorporates privacy impact assessments into the de-identification process more explicitly than HIPAA
- Technical Standards: Provides more specific guidance on technical methods like k-anonymity and l-diversity
- Certification Process: Implements a formal certification process for organizations handling health data
- Transparency Requirements: Has more explicit requirements for transparency about data processing
Practical Comparison Example
For a clinical research project using patient data:
- Under HIPAA Safe Harbor: Remove 18 specific identifiers to create a de-identified dataset that can be used without patient authorization
- Under Japan's Framework: Either:
- Work with a certified medical information provider under NGMIL, who collects data from medical institutions (with patient opt-out option) and anonymizes it according to MHLW standards, or
- Create anonymously processed information according to APPI and PPC guidelines, with a documented risk assessment and disclosure of data categories being processed
The Japanese approach typically involves more stakeholders and formal processes, but may enable more flexible use of the data while maintaining strong privacy protections.
Reference:
Comparative Analysis of Global Health Data Protection Frameworks (MHLW): https://www.mhlw.go.jp/content/10904750/000923639.pdf (Japanese)
Japan-US Digital Health Cooperation: https://www.mofa.go.jp/files/100064068.pdf