Home Blog

0 2234

If you spend much time in the patient community you meet someone who has been burned, badly by the “out of network” game that insurance companies play with/against healthcare providers.

Its simple, you get insurance plan A from company Z. Then you go to a specialist or get a scan or something and you ask, “do you take company Z insurance”? They say “sure”. You hand them the insurance card. What they don’t tell you is that they will be billing “out of network” which means they will be hardly covered at all.

You go to the insurance company, they point to the provider. You go to the provider, they point to the insurance company. Who is left with the huge bill? The patient.

Sometimes this gets really bad, in the worst cases important treatments to relieve suffering are delayed.

Are you tired of this? In order to fix this, we need to be able to build systems that tell us for sure which providers are in a given plan at a given time. We need to have that system available when we purchase our health insurance so that we can buy insurance that covers the doctors that we already use, or the ones that we want to use. We can imagine a theoretical tool called JustShowMeTheDoctorNetwork.com that solves this problem in a user friendly way.

There are lots of companies and journalists in the DocGraph community that would love to be able to build such a tool. DocGraph would love to provide the data for such a tool but right now that would require that we scrape the websites of every insurance company provider directory in the country. Those websites are really unfriendly to such efforts. The following text was taken from the user agreement of the doctor finder tool for Aetna:

By using DocFind, you acknowledge and agree that DocFind and all of the data contained in DocFind belongs exclusively to Aetna Inc. and is protected by copyright and other law. DocFind is provided solely for the personal, non-commercial use of current and prospective Aetna members and providers. Use of any robot, spider or other intelligent agent to copy content from DocFind, extract any portion of it or otherwise cause DocFind to be burdened with unwarranted high access or transaction activity is strictly prohibited. Aetna reserves all rights to take appropriate civil, criminal or injunctive action to enforce these terms of use. 

Provider information contained in this directory is updated 6 days per week, excluding holidays, Sundays, or interruptions due to system maintenance, upgrades or unplanned outages. This information is subject to change at any time. Therefore please check with the provider before scheduling your appointment or receiving services to confirm he or she is participating in Aetna’s network. Participating physicians, hospitals and other health care providers are independent contractors and are neither agents nor employees of Aetna. The availability of any particular provider cannot be guaranteed, and provider network composition is subject to change. Notice of the change shall be provided in accordance with applicable state law.

The underlines are mine.

First, Aetna does not want anyone scrapping their website. They do not want people like DocGraph to create these data sets. They view their list of providers as a protected information asset, that only they can leverage.

But more importantly, they put the responsibility on “who is in what plan” squarely on the doctors. Which really means the patients, because the doctors websites will just say “check the insurance company website”. See what I mean about finger pointing?

Insurance companies, and healthcare providers need to be held accountable for their in vs out status. The only way to do this is to create open data set that maps Plans to Providers so that projects like JustShowMeTheDoctorNetwork.com is really easy to build.

The policy wonks at HHS/CMS/ONC et al get this. The have recently added the following text to the rules for the 2016 insurance plans.

…we propose that a QHP issuer must publish an up-to-date, accurate, and complete provider directory, including information on which providers are accepting new patients, the provider’s location, contact information, specialty, medical group, and any institutional affiliations, in a manner that is easily accessible to plan enrollees, prospective enrollees, the State, the Exchange, HHS and OPM. As part of this requirement, we propose that a QHP issuer must update the directory information at least once a month, and that a provider directory will be considered easily accessible when the general public is able to view all of the current providers for a plan on the plan’s public Web site through a clearly identifiable link or tab without having to create or access an account or enter a policy number….(blah blah)…We also are considering requiring issuers to make this information publicly available on their Web sites in a machine-readable file and format specified by HHS.

underlines are mine…

This would solve the problem. Anyone who wanted to could create a website that showed what plans any given provider accepted, would be able to easily do so.

But they key word here is “propose”. Insurance companies in this country benefit greatly from the confusion about in network and out of network, and so do some unethical healthcare providers. There will be lots of people who oppose this proposal.

I hope that I have made the case that this information needs to be open and machine readable. If your convinced, then you can find the comment page to support this policy here. If you disagree with us, and you still want to submit a comment, you can use this page.

Please take a few moments and write in to support this policy change. The comments are due Dec 22nd 2014 which is basically tomorrow.

If you would like to read the in-progress comments from the DocGraph Journal you can go here. Feel free to cut and paste from out comments into your own comments, we would be flattered.

Feel free to tell them that I sent you ;)

-Fred Trotter


0 426

The DocGraph Summit is just around the corner!

This “unconference” will include short presentations on current projects of the participants, and discussions on the topics, challenges, and ideas deemed most relevant and paramount to the open health data community. Our goal is to set an atmosphere conducive to in-depth dialogue, concept mapping, networking, and brainstorming.

The Summit will also review DocGraph’s open healthcare data initiatives. These projects include food, medical, doctor, and hospital data, as well as other fun topics that are not easily categorized.

Currently we have academics, corporate delegates, researchers and entrepreneurs attending the Summit. Their areas of focus include data analytics, open source drug databases, EHRs, gene/drug interactions, VistA, Health IT, ACOs, statistics, etc. Attendees are coming from from Rice, Stony Brook, UTHSC, e-mds, PwC, Baylor Medicine, the DocGraph community, and more.

Join us!

Eventbrite - The DocGraph Summit

Email atrotter@docgraph.org for university student and faculty discount codes.

The DocGraph Summit is being held alongside International Conference on Biomedical Ontology (ICBO) 14

0 442

DocGraph Omni was a website that we used to display a merged set of the open data that is available on healthcare providers.

It was a good idea, but it did not work. Or at least, it is used so infrequently that it is not worth the resources that DocGraph is spending on it. Omni was interesting only to the degree that it could serve as a crowdsourcing mechanism for even more awesome open data about doctors and hospitals. Omni is just not doing its job as a crowdsourcing tool.

More importantly, two of our informal journalism partners, Propublica and US News, have both begun offering more popular consumer facing systems, using our data.

We would rather invest in doing a better job providing US News and Propublica with data, then offer a clearly inferior consumer facing product ourselves. We will do our best to ensure that both Propublica and US News at least have the option of replicating the all of the functionality of DocGraph Omni.

More importantly, CareSet Systems, the sister company to DocGraph which focuses on healthcare system analytics, is offering a commercial product called Patch that does far more than Omni ever did. But Patch functionality is focused on the needs of healthcare organizations, like Hospitals, ACOs, SNFs and LTACs.

We have decided to retire Omni, and invest in our relationships with other data journalists and in CareSet Patch service. We will leave the Omni server up for the next few days, but expect that site to forward to DocGraph.org soon.

-Fred Trotter

0 638

The DocGraph Journal creates multiple, unprecedented datasets to improve healthcare. It is focused on building an open community of data scientists primed to share analysis of the torrential amount of new healthcare data posted by federal and state governments. The DocGraph Journal interfaces government affairs (with HHS and CMS), to journalism organizations (O’Reilly Media, US News, ProPublica), to academics and entrepreneurs. The journal is supported by research grants from Merck, athenahealth, and Robert Wood Johnson Foundation.

The 2014 Summit will review DocGraph’s open data healthcare projects. These projects include food, medical, doctor, and hospital data, as well as other fun topics that are not easily categorized.

Join fellow health data enthusiasts for an engaging day of unconference-style discussions and presentations, as well as meals and happy hour within walking distance of the venue.

Date: October 8, 2014

Location: Houston Technology Center

Eventbrite - The DocGraph Summit

The DocGraph Summit is being held alongside International Conference on Biomedical Ontology (ICBO) 14

ICBO 14 runs Oct 6-9 and we are encouraging DocGraph Summit participants to attend the first two days of ICBO (Oct 6, 7), which will feature workshops discussing the options for an Open Source Medication Database.

0 811

Today, word came out that NY released taxi data that has been entirely reidentified.

The technique and concepts to conduct the attack can be found here, and I also found the slashdot discussion interesting.

The result is that the identity and paths of specific named taxi cabs is now public information. This is not entirely bad, since now the data set will be extensively used to detect specific bad actors. Still it was more than the NY government intended and will probably result in a lawsuit.

That lawsuit will be mostly justified, since it is well-understood among security professionals how you do de-identification right and the rules were not followed. If you are doing this with health data, I can recommend fellow O’Reilly Author Khaled El Emam who wrote both Anonymizing Health Data and also Guide to the De-Identification of Personal Health Information both of which I can recommend. You can hire him through Privacy Analytics. He is the de-identification expert that I know the best and I can endorse, but he is far from the only one.

Generally, hashing can be a reasonable approach as long as salts are used in combination with a secure hash algorithum. I prefer to use a different salt for every id, which makes a rainbow attack (like this one) pretty hard to do. 

More importantly, it also entirely appropriate to simply use a randomly generated number instead of a hash. Hashes are convenient when you need to rely on a dynamic and extensible process, rather than static data. It also allows you to throw away the original data, and know that you can reliably repeat the process given new data. That is why it is used so frequently in password storage.

This will result in a chilling effect for open data releases unfortunately, but I am glad it happened. This is a relatively unimportant data set. Which is to say, this could have been much worse. This could have happened with patient data. I work with stuff like HIV and TB infection data, as well as EHR notes containing infidelities etc. I hate to say it, but its better for governments to learn on taxi cabs.

Lastly, I would encourage those who are considering doing data releases like this to reach out to organizations like Propublica and/or DocGraph. If you cannot afford to hire Khaled, we can at least help to ensure that you avoid the basic mistakes. Believe it or not, data journalists like myself are not interested in violating legitimate privacy rights (although we can have a healthy debate around the word “legitimate”) and we would be more than happy to help ensure that a data release is free from reidentification drama.

Part of me wonders why they didn’t just release the taxi data with the taxi numbers intact. I strongly prefer real-name accountability in data sets like this. It might be because by learning the identity of the taxi, you might be able to infer the identity of the passenger, who has a legitimate privacy concern.

Accidents like this will happen, and NY was right to make a release rather than hold back a release because there “might” be a way to reidentify a data set. My hat is off again to NY state/city… innovators in open data.

-Fred Trotter

0 2555

The DocGraph Alliance is a new group of organizations committed to supporting data journalism and data science community efforts. Three global leaders in healthcare, athenahealth, CareSet, and Merck (known as MSD outside the United States and Canada), have signed on as founding members of the Alliance.

The DocGraph Alliance’s community mission is to encourage an ecosystem of innovators to collaborate and share tools and research methodologies around open healthcare datasets. This Alliance will help further develop technical analysis and methods around data released by federal, county, and state entities, as well as those originated by the community.

The DocGraph Alliance is a project of The DocGraph Journal, who shares data with a community of quantitatively minded professionals who mine publicly available clinical datasets to uncover interesting and meaningful insights. Support from the Alliance members means the DocGraph Journal can continue providing support for the growing community of data scientists focused on leveraging initiatives of transparency in healthcare.  As a result of the community’s work, specific news coverage has incorporated DocGraph data, including work from US News, Propublica and Kansas City Star.

“The DocGraph project created a platform for data scientists to collaborate openly on publicly available health data sources where nothing existed before”, said James Ciriello, Associate Vice President of Merck IT Strategy and Innovation, “and as we watched this community become more and more active in trying to address significant problems, we wanted to support it and help it grow. As publicly available healthcare data continues to grow at a fast pace, coordination and comparatives of care become commonplace, and insights on therapy start to drive novel innovation.”

“We are thrilled to partner with the DocGraph Alliance. Fred Trotter in particular has taken on ambitious and important work to socialize open data assets in healthcare and to leverage data in meaningful ways to advance the industry,” said Todd Rothenhaus, chief medical officer, athenahealth, Inc. “At athenahealth, we believe healthcare could benefit from more data openness and transparency. Access to expanded and new types of data through the DocGraph Alliance will support our work to improve our cloud-based services and further innovate based on evidence-based insights and industry trends.”

“Our business, as well as countless others, rely on the availability of Open Healthcare datasets. Our healthcare system modeling tools improve with every Open Data release..”, said Ashish Patel, founder of CareSet Systems. “We want to ensure that DocGraph continues to flourish! The healthcare system needs a cadence of Open Data in order to effectively pursue the Triple Aim.”

DocGraph will work to grow and nurture an open community of data professionals through a series of trainings and events with a focus on further use of open health datasets and development new methods and tools to analyze those datasets.

About The DocGraph Journal

The DocGraph Journal seeks to create and disseminate new open healthcare data sets, and to foster a community of data scientists who contribute tools and expertise to the analyses of open healthcare data. The Journal was founded after Fred Trotter’s crowdfunding of the first DocGraph data set demonstrated a demand for open healthcare data. The original data set, created from a FOIA request, showed how physicians and other healthcare providers collaborate to deliver care to Medicare patients. This original DocGraph data set remains the largest real-­‐name social graph available to the public.


DocGraph Journal

Alma Trotter, atrotter@docgraph.org<mailto:atrotter@docgraph.org>