Fred Trotter

Healthcare Data Journalist

FOSS Culture, NPI, Values, VistA

Computer Science should be required for Medical School

Hi,

Currently, in Texas,  one is required to take Physics I and II and Calculus I (or equivalent stats class) to apply for Medical School. That is not all, of course, but they are requirements.

So far, I have never meet a Medical Doctor who needed to use calculus. In fact the only ones that might need to really understand the subject are those who are doing high-level mathematical modeling for Bio-Informatics. For these researchers Calculus is not enough. Your average primary care doctor, *never* uses calculus. They also *never* use Physics! Of course they have all kinds of systems that obey the laws of physics, including blood pressure, syringes etc. But they never treat a patient and think “hmm.. what was the relationship between volume and temperature of a gas….”. They think in higher level abstractions like “When blood pressure is high that could mean X, or Y or Z depending on….”

The real benefit of Physics and Calculus is that they introduce you to new ways of thinking. That new way of thinking makes it easier to understand higher level concepts that you *will* use everyday as a doctor.

I recently had a conversation with a clinician about some work that I am doing on the NPI database which lists every doctor (that prescribes medicine) in the United States. The conversation went like this.

Fred: “I need to munge the data, I need to process the data in a different way than it is listed (a flat CVS file), I need to turn into a real normalized database before I am going to be able to use it effectively”

Clinician: “Wow… Ok… How long will that take you”

Fred: “Well you do not want me to work on this full-time do you? I only have about 5 hours a week I can work on this in my current schedule..”

Clinician: “Yea.. but your other projects are pretty important… given only 5 hours a week, how long would this take?”

Fred: “three months.”

Clinician: “Boy… that’s a long time… I know! Why don’t you just create a database with Texas doctors, instead of all the doctors in the United States! How long will that take?”

Fred: “three months.”

Clinician: “That makes no sense at all. How can that be possible?”

Fred: “Well making the database smaller does not really help me at all, that is the part of the problem that the computer takes care of, not me”

Clincian: “Cmon. A smaller database should mean a shorter time, this seems almost obvious to me. You should be able to do it faster than that. ”

Fred: “Ok.”

Clincian: “Ok, so how long for just a Texas-only database?”

Fred: “three months”

and so on…

Now lets point out… first of all… that this Clinician is not stupid. He was ignorant of how computers, and more importantly of the process of programming computers. This issue is worth discussing in detail because it really illustrates the thought gap between someone who knows how programming works, and someone who does not.

There are many NPI records, specifically in a recent (May 2008) release of the database there were 2,557,650  lines in the comma delimited file as revealed by “wc -l” (subtracting one to account for the first line in the file, which is full of labels… not a real NPI record)

The changes that I need to make to fix the NPI database are pretty complex (fodder for another post) but for now, I will just say that it is a “Complex Reordering of each Record”. Here is how my process for approaching this problem looks:

First I need how to process a single record. So I write a function to do that. For the sake of prose, I will call that function

complexReorderingOfEachRecord( $Record )

I will look at the one Record in the NPI database and then try to pass that Record into the function and see if it does the right thing. The complexReorderingOfEachRecord is a long function, it does lots of really complex things. So complex in fact that I really cannot keep all of its functionality in my head at one time. I use various ways of abstracting the problem so that I can think about the problem in useful chunks, and figure out if each chunk is working.

I am going to actually include some psuedocode in this post.  Psudeocode is code that is not exact enough for a computer to execute, but is clear enough that a human can read it and understand what it does. Programs are like recipies, they are simply exact  instructions that the computer will follow. I will use some basic programming elements in my examples (Note to programmers: this blog is also for clinicians… so you can safely skip this…)

  • Sequential execution – Each line of code is read and executed by the computer before moving down to the next line
  • Variables – a variable is a changing placeholder for information. Each time a program is run, it is possible that the variables will contain different values. I use php, so I mark my variables with dollar signs “$”. This ends of working alot like the “x” in an algebra problem, it can have different values depending…
  • Functions – It is often useful to merge many simple lines of code into a single function. Later you can execute all of the code inside the function by calling the function name and passing data to the function by putting inside the parens “()” after the function. It is basically a way to group useful bits of commands together.
  • If/Else statements – when the computer reaches an IF statment it looks at the contents of the paranthesis “()”beside the if statment. If the contents are “true” then the code inside the braket symbols “{}” following the if statement is run. If the statement in the parens “()” is “false” then the code in brackets “{}” following the ELSE statement is followed.

So the inside of the complexReordingOfEachRecord looks like this

function complexReorderingOfEachRecord( $Record){

reorderingStepOne($Record);

reorderingStepTwo($Record);

reorderingStepThree($Record);

}

(Note to Programmers: I am actually using an OOP design for my project, so in reality these would be function calls on objects, but I want to keep this on a procedural level to make my point)

complexReordingOfEachRecord, reordingStepOne,  reordingStepTwo, reordingStepThree are all functions. The contents of recordingStepOne are not shown, but they are custom functions, meaning that I wrote them. $Record is a variable. There are no IF statements yet.

Ok.. I write this code, test it, debug it about 15 times before it works to import 1 record. But then I run the code on the first 10 records my system blows up! Some NPI records are not for Doctors at all, they are for organizations that provide healthcare: Doh! I need to run the program differently for people vs organizations!

So I modify my function to look like this:

function complexReorderingOfEachRecord( $Record){

reorderingStepOne($Record);

reorderingStepTwo($Record);

$isAPerson = reorderingStepThree($Record);

if($isAPerson){

doOneThing($Record);

}else{

doAnother($Record);

}

}

Now the function has one IF statement that looks in the variable isAPerson and then executes either doOneThing or doAnother based on the contents of $isAPerson.

I have to code, test and debug this another 30 times to get it working. I have to test it more times because the new function calls doOneThing and doAnother do not work without modifications to reorderingStepOne and reorderingStepTwo. I have to switch between thinking about different part of the problem very quickly to make sure it works. To start, everything breaks, but as I discover why, by running the program again and again, I make small changes that eventually make the whole process work correctly. The shorthand for this process is code, test, debug, repeat.

As I am working I start to run the program on the first 100 records. I notice that often the person in the record is not an M.D., there are also dentists and other clinicians who are in the database ! But my work is focused only on M.Ds. So I modify the code again:

function complexReorderingOfEachRecord( $Record){

reorderingStepOne($Record);

reorderingStepTwo($Record);

$isAPerson = reorderingStepThree($Record);

if($isAPerson){

$isAnMD = doOneThing($Record);

if($isAnMD){

processMD($Record);

}else{

processNonMD($Record);

}

}else{

doAnother($Record);

}

}

Now I have a “nested” IF statement, an IF statement that exists in another IF statement.

As before all of the other functions must be modified to make my two new functions processMD and processNonMD work correctly. This requires 50 repetitions of code, test and debug. Sometimes one code, test and debug cycle takes 30 seconds. Usually it takes about 5 minutes. Sometimes it takes as much as 15 hours.

Now I am testing against 1000 rows of the NPI database, and it works perfectly! I have put in about 40-50 hours (or about 3 months at 5 hours a week)

But now what! I have only imported only 1000 rows of the database. Now I will explain how I ran the code on one row, 100 rows and then 1000 rows. I will introduce the WHILE statement to my simple psuedo code.

$i = 1

while($i < 1000){

$Record = getANewRecord()

complexReorderingOfEachRecord($Record);

$i = $i + 1

}

The “while”is just like an if statement, except that when the contents of the curly brackets “{}” are done, then the contents in the parens “()” are re-evaluated. If they are still true, then the contents of the “{}” are run again. The $i variable starts at 1, and then grows by one every time the contents of the curly braces are run “{}”

So how do I import the whole NPI database? I change to code to look like this:

$i = 1

while($i < 2557650){

$Record = getANewRecord()

complexReorderingOfEachRecord($Record);

$i = $i + 1

}

Then I start the program and go to sleep. In the morning, all 2,557,650 records are correctly processed.

Once I had done the work to determine “How to change an NPI record” the computer simply repeated that process for as long as I wanted. Computers are so fast now, that even very very complex processes can be repeated very quickly.

You see *I* never import any data. The computer does that part. *I* the programmer tell it *how* to import that data. Like doctors, when programmers have a simple concept with big implications, we create an important sounding word for it. The important word for *how* to do an information task is “algorithm“.

If you get an algorithm right, computers do just what you want. If you get the algorithm wrong… computers do other things. If you get the algorithm badly wrong… God help you.  This is why computers often seem to have a “mind of their own”; when programmers tell them to do the right thing, they do exactly that. When programmers tell the computer to do the wrong thing… they do exactly that.

Any programmer reading this is likely going blind with boredom. But someone who has not programmed might likely be asking “Wait… what’s a function?” This is actually a pretty terrible introduction to programming. For something more real, I suggest you start here.

My point is this, computers make some types of tasks really easy. Getting to them to do those tasks, without making a serious mistake, is pretty difficult and time-consuming work. If you, as a clinician, do not understand what tasks are hard, and what tasks are easy, then it is almost impossible to evaluate the software you are using. I cannot tell how many times a clinician has requested a “simple change” that has taken me three weeks of programming. On the other hand, I cannot tell you how many times I have seen clinicians (or more often clinicians staff members) subject them selves to terrible software designs that would be trivial to fix.

To create an algorithm you need to understand two things:

  • What the computer can do
  • What the computer should do

There are some people, like my friend Ignacio Valdes, who have been extensively trained in Computer Science and Medicine. These people are amazing, because you can watch them switching back and forth between one part of healthcare IT (Clinical know-how) and the other (Computer Science know-how). But even these few gems (rare as hens teeth), cannot actually hold the complexity of even a single clinical IT problem in their head at one time. That is just not the way that programming, clinical care or anything truly complex works! Programmers must ignore parts of a program to improve on any given part. Clinicians must ignore parts of a patients body to address a problem with one part. (Most heart surgeons, for instance, remain unconcerned about the flaky skin problem while their patients are in open heart surgery.) Knowing what to ignore, and what deserves attention is often the true test of expertise.

The only way to deal with Healthcare IT is to create teams of people to manage the complexity together. The problem with that is that for any given problem domain, there is a danger that the communication cost will grow exponentially in relation to the number of participants. It is common for the communication costs to totally destroy all productivity in a given group. But at the same time, it is simply not possible for a single person to correctly navigate the complexity of even a simple Health IT software project.

The solution to this problem is found in the VA VistA development model. Here are the rules:

  1. You do not work on “the system”, you work on part of the system. VistA is actually hundreds of programs that work together.
  2. Whenever possible you work in pairs. Any more gets unmanageable.
  3. One person must understand everything they need to about the programming of the clinical issue. We can call this person the Programmer. (In the VA this is a Programmer or a CAC)
  4. It helps if the Programmer has a basic healthcare vocabulary.
  5. Another person must understand everything they need to about the clinical problem itself. We can call this person the Clinician.
  6. It helps if the Clinician understands, basically, what is easy, what is hard, what is possible and what is impossible with computers.
  7. You rely on other pairs to address other clinical problems.
  8. You intentionally have redundant “programming pairs” so that you are forced to compete to make better solutions.
  9. When another pair makes a better solution to your problem, you celebrate that and adopt their code as the new starting point.

Its number 6 that this article is focused on. It would be really helpful if Physicians in particular were required to know what a “for loop” meant. Just like calculus and physics they will rarely, if ever, use that information. But for the time being, the fundamental lack of understanding of computer science in clinicians is holding healthcare back. Can you imagine speaking to your doctor if he or she had no idea what the word “pressure” meant in the phrase “blood pressure”. As it stands, most doctors do not really understand what the implications of the word “Information” in the phrase “Health Information Systems”.

What scares the hell out of me is not that the clinician above did not know how the programming process worked. Ignorance has a simple cure: learning. What scares me is that he was willing to pressure me to speed up the schedule, even after I explained how things worked. Trying to force a programmer to take short-cuts to make a deadline is a very very bad idea (see point number 4 here). Doctors, like military officers, often fail to recognize that in “being in charge” is contextual. It does not matter if a Doctor is right about a clinical issue, if they are wrong about a software design issue. The resulting software will fail to perform, despite its clinical correctness. Doctors cannot “be in charge” in software design the way they can in an operating room or in clinical practice. That does not mean they are not vital, it just means they should not be in charge. The programmers should not be “in charge” either. The “Clinical Pair Programming” that I am describing above is a description of the peer thinking that is required to solve these problems. When someone is “the boss”, (meaning they actively back only their own priorities) the system breaks.

The irony is that the few Doctors I know who are my peers with regards to computer science education, are often more hesitant to challenge me regarding my information systems opinions. Do not get me wrong; they often disagree with me, but not more than any programmer would disagree with any other programmer.

This is why I support an undergraduate computer science prerequisite for medical school.

-FT