Following up on MELLODDY and the accuracy and privacy frontier
Also: I'm resuming writing with an expanded focus!
Sorry for not writing for a bit over a month. There was one week where I found myself too busy to write and I let my habit slip after that.
I’m excited to start writing regularly again and I’ll be doing so with an expanded focus. Instead of only focusing on blockchain and healthcare, I’ll be writing about healthcare and technology more broadly. To be clear, that includes blockchain and healthcare but I won’t limit what I write about to just that. Staying within the limits of only blockchain and healthcare was a bit of a struggle for me - I’m simply a very curious person with a lot I want to write about - and this was a change a few readers had requested as well.
An expanded focus also reflects my personal progression in writing. I started this newsletter 85 issues ago with a plan to simply share links to the ~5 articles you would need to read each week to stay up to date on blockchain and healthcare. Over time I began to share more of my thoughts on each week’s happenings and that gave way to writing essays. And now I think the natural next step for my newsletter is to expand the scope of things I write about generally.
As before I plan to send out a regular newsletter and occasional essays. Feedback is always appreciated and feel free to drop an email to say hi as well.
Now, on to this week’s edition.
This week’s edition
Last newsletter I wrote about the tradeoff between accuracy and privacy in federated learning networks. There are two key ideas to recall. The first idea is that as you fit an algorithm closer and closer to a dataset that algorithm will become more accurate, but there is also an increased risk of someone being able to reverse engineer your algorithm and be able to recover some of your sensitive data from it. Thus this forms a private-accuracy frontier represented by the purple line above. The second idea is that participants in federated networks will demand a minimum level of privacy - which is the green line above - and proving that you are able to meet this minimum is a very difficult task.
When I wrote this newsletter MELLODDY had just launched the first real run of training in their federated learning network. A little under two months later and they’ve announced this was successful. From the press release:
We now have an operational platform, rigorously vetted by the consortium’s 10 pharmaceutical partners, found to be secure to host their data – an enormous accomplishment. Over the next year we’ll turn our focus on studying the hypothesis that multi-partnered modeling will yield superior predictive models for drug discovery.
The rigorous vetting included “extensive and rigorous security audits by an external company and by the IT teams of each pharmaceutical partner to ensure data privacy and protection.” In other words they were ensuring that the platform built can meet the minimal level of privacy shown by the green line above.
With this milestone achieved MELLODDY’s focus can now turn towards applying their platform and improving the accuracy of the models it generates. Or: they are focusing on moving from “A” to “B.”
It is also worth briefly noting what a competitive advantage MELLODDY has now. MELLODDY’s technology has passed audits by 10 pharma companies and a third party. That is a very high bar for competitors to reach and a compelling valuable proposition for future users.
if we are to have scientific revolutions, we must have scientific revolutionaries. the historical course of science has been punctuated by the concentrated works of small groups and individuals. in our era, science operates at a dehumanizing scale, resulting in a metrics-driven mechanization of inquiry from which nothing original can ever hope to emerge, yet consuming enormous resources to do so. scientific work must be decoupled from the external forces introduced by dehumanizing scale, and the work must be driven foremost by the norms, tastes, relationships and talents of small groups.
A thought provoking essay musing about the effects of “science at scale” - which is the author’s way of describing the paradigm of our scientific establishment today. The author argues for shifting the production of scientific work to a smaller scale, thus rehumanizing and reinvigorating science and aspirationally enabling more people to pursue disjunctive ideas.
It should be possible to realize the author’s suggestions today, at least on a small scale. There are many influential people in Silicon Valley and adjacent to it (e.g. Patrick Collison and Tyler Cowen) that are willing to support experiments in new ways of funding science. One such experiment has been the Fast Grants program. What is needed then are groups of talented and dedicated individuals that are willing to break from “science at scale” to pursue different organizational models.
A little bit over a year ago Nebula announced an offering that let you get your genome sequenced anonymously. Following up on this a few weeks later was a NYTimes article on Oasis Labs, which creates privacy preserving technology, that highlighted a pilot they were conducting with Nebula. I described the tech and pilot in an issue at the time:
Oasis is building tools to let people control their data by leveraging blockchains and special hardware called trusted execution environments (“TEEs”). In short, TEEs provide an isolated area to store sensitive data and run software on it, and this area is both private and resistant to tampering even if you have access to the physical hardware. Your mobile phone likely uses a TEE to store your biometrics or financial data.
Oasis hopes to marry the confidentiality and integrity of TEEs with smart contracts to give people real “control” over their data. The article highlights a pilot being done with Nebula Genomics, a blockchain and genomics company, where users will retain control over their genomic data while still enabling Nebula to run specific analysis on their data without revealing it to Nebula.
Now a year later the two parties are moving out of pilot and offering a public beta. Specifically Nebula’s customers can control access to their data, track how it is being used with a blockchain enabled audit log, and trusted execution environments are used to do privacy preserving analysis.
This weekend’s musings
iDASH is an NIH funded yearly workshop/challenge on applying various privacy preserving techniques to health data. This year I’m leading a team to take up the third challenge, which is to create a differentially private federated learning network for genomic data. One place we could use help is in developing ways of drastically reducing the dimensionality of gene expression data. If you have any insights into this please reach out!
Books I’m reading
Podcasts I’m listening to
I have found myself reading fewer articles and more books these days, so I will probably share more content like this in the future.
Enjoy the week everyone.
Thanks for reading
If you found this newsletter valuable then you can click the button below to sign up for free.
If you’re an existing reader I would deeply appreciate it if you share this with people who would find it a valuable resource. You can also “like” this newsletter by clicking the heart just below this, which helps me get visibility on SubStack.
My content is free, but if you would like to support me, you can do so on Patreon.
Connect with me