Unsupervised Thoughts

Deep Learning - Week 7

Written on

Week 7 Learning Material

Structuring Machine Learning Projects - Week 2

This week covered the following topics:

  • Carrying Out Error Analysis
  • Cleaning Up Incorrectly Labeled Data
  • Build your First System Quickly, then Iterate
  • Training and Testing on Different Distributions
  • Bias and Variance with Mismatched Data Distributions
  • Addressing Data Mismatch
  • Transfer Learning
  • Multi-task Learning
  • What is End-to-end Deep Learning?
  • Whether to use End-to-end Deep Learning

Applied Techniques

I had a discussion with a physician at work who is working on taking unstructured data from clinical notes from patient records, uploading to Microsoft Fabric and testing the Microsoft models to convert the data to the OMOP Common Data Model (CDM). There's an opportunity to assist in this work so I started focusing on methods to convert unstructured clinical data to the OMOP CDM.

I don't have any actual clinical notes to work with so I started looking for data sets when I stumbled across Synthea that allows you to generate "synthetic, realistic (but not real), patient data and associated health records in a variety of formats". Plan so far is to:

  • Generate simple synthetic patient data including unstructured clinical notes
  • Test different prompts to convert this data to OMOP CDM
  • Test different models, both local and over API
  • Measure accuracy of prompt/model combinations

This week I got as far as generating a few hundred patient records with notes, setting up Mistral-7B-Instruct-v0.2 and starting to test.