SXSW Action - Sample Data

We've posted some data to illustrate the process of fitting model parameters to data.

(Back to the Stats Overview)

Contents

Sample Data File

You can get the sample file "data.txt" here.

The file is a tab-delimited table containing:

  • An iteration_id in the first column.
  • A lossfunction (measure of goodness of fit) in the second column
  • Parameter values in the remaining columns. (In this case 24 parameters are required to parameterize the model)

Data comes from an optimization run using a genetic algorithm to guide sampling of new parameter proposals iteratively.


Goal

The goal is to mimimize the loss function.


Your Input

  • If you have any thoughts about how to work with this data, please add them to the "Stats Methods" page. (Be sure to take a look first at the general overview.)
  • Editing and adding pages to this documentation is easy; here's how.
    • Ie, if you have a very detailed suggestion, please feel free to create a page for it: just link to it froma quick overview in "Stats Methods", for instance.
  • Please also join the discussion.

Q&A About the Sample Files

Q: Are the iterations numbers accurately representing ordered runs by the evolutionary algorithm? In many cases (like parameter_5) the choices seem to be actually getting more wide-spread over time, rather than narrowing in on an ideal value. Or is this just a hint that this parameter is not particularly important?

A: Yes, these ids are ordered by sampling order. I don't have a good answer to the second question, just to say that the current algorithm does no not vary the sampling range of parameter values as a function of iteration number.

Q: I haven't read the linked papers - do they describe the particular model being evaluated here? It might be helpful to have some idea of how the parameters interact in the model. The data set has parameters numbered 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, and 30. What's going on? Is this purely a strange numbering, or did someone already decide that parameters 1 and 2, for example, were no good?

A: This publication gives an overview of the baseline model, and Table 1 lists the parameters to be estimated. I can provide a partial mapping of the numbers in the dataset to these parameters if this would be helpful (some new parameters have been added in the model variants we're currently fitting, and others are not relevant for the new models)


More on Malaria Models

(See links under "about the project" here.)