## Error message

Deprecated function: Array and string offset access syntax with curly braces is deprecated in include_once() (line 20 of /mnt/web104/a0/07/510451407/htdocs/includes/file.phar.inc).

# Games and genetics

From theory to application
Author/s:
Roberto Lucchetti
Issue number:
6 - March 2015
Series:
From Theory to Application
Year:
2015
It is well known that some diseases have a genetic origin, often originated by a group of genes. Differently for the case of the diseases originated by a single gene, it is more difficult to have a clear picture of the group of genes that are crucial for the onset of these diseases. Even though statistics is quite useful in this field, it is not fully satisfactory, since in general one faces a huge number of objects to deal with (the genes), and few samples to analyze (the patients involved in a single experiment). Thus other tools, enforcing the statistical intuitions, can be of some help. In particular, in recent years some game theoretical models have been proposed, with the aim of identifying genes with an important role in the onset of a disease. Here I briefly describe one of them.

The basic idea is to use cooperative game theory, in particular simple games. These are games where the coalitions are partitioned into two groups, the group of the winning ones, and its complement. As a typical situation modeled by a simple game, one can consider a parliament. The parties are the players, and subsets of parties controlling the prescribed majority of members form winning coalitions. In this setting, the values are indices defined in order to detect the strength of parties, i.e. of the players. The most famous values are the Shapley and the Banzhaf indices, but many more were defined and used, among them the semivalues, a particular class of probabilistic values.

Let us now see how this game theoretical machinery is used in the above genetic context. The idea is the following (see [M]). The game is built by means of a binary matrix, where each row represents a gene (thus the rows are several thousand) and each column represents a patient (usually less than fifty). When the coefficient in position ij  is 1  this means that the gene i is abnormally expressed in patient j. The expression of the genes of patients is obtained with the microarray technique, by analyzing the DNA expression of two groups: the patients and a reference group of healthy people.  Abnormal expression is defined in some reasonable way, by comparing the expression of the healthy and sick people.

The basic idea underlying the model is that, for a given patient, the group of the abnormally expressed genes, as a whole, generates the disease. In this way, a unanimity game is naturally defined for each column, where the unique minimal winning coalition is formed by those genes that have an expression equal to 1 over that column. The microarray game is then the average of the unanimity games obtained from the columns, and some probabilistix index can be applied to the game in order to get a ranking of the genes. It is interesting to observe that in principle the Banzhaf and Shapley indices do provide different ranking. Banzhaf tends to give more relevance to those genes whose corresponding row contains possibly very few 1’s (meaning that the gene is rarely abnormally expressed), but such that, where a 1 is found, the corresponding column must contain very few 1’s. In other words, when the gene is abnormally expressed in some patient then there are very few genes abnormally expressed for the same patient. This depends from the fact that the Banzhaf index assigns to the abnormally expressed genes a value that is exponentially inverse with respect to the size of the coalition of 1’s, while the same effect is much less remarkable with the Shapley index, which has an inverse linear behavior. This fact suggests to consider new intermediate indices, defined in the paper [L2], where we show that they belong to the set of the regular semivalues. For, it is quite important to find out if significant differences in the ranking, which are theoretically possible, actually tend to disappear in real cases.

The reason for modeling the microarray game as an average of unanimity games is not only theoretical. As already mentioned, this game has several thousand of players, making the evaluation of any value intractable, unless the game itself has a very particular structure. This is the case for the unanimity games. Thus, not only a theoretical model is provided to possibly detect relevant genes, but also the relative calculations can be performed on real cases.

A generalization of the former model was also proposed in [L1]. First, the initial matrix generating the microarray game is not binary, since for each gene it is also taken into account how much it is over and under expressed. Thus for instance, finding a 5 instead of a 1 in some place not only tells us that the gene is abnormally expressed, but also that its expression is far away from normality. Then an asymmetric Shapley index is used to get a first ranking. Now a second step is performed.  The first n genes in the ranking are considered (usually n can range from 50 to 100) and a weighted majority game is played among these genes (for each patient): the index for the gene, in a given patient, is the digit of the matrix.  Finally, new and classical indices are applied to the new game (also for the weighted majority games fast algorithms for the evaluation of the indices are known). This procedure allows for a much better differentiation of the genes, while the previous one generates too many ties.

Our procedure was finally tested on a real case. We analyzed data from early onset of a colon rectal cancer. Gene expression analysis was performed by using Human Genome.  The data set contained ten healthy samples and twelve derived from tumor tissues. We found that seven genes were mentioned in the literature as a potential prediction of early onset of a colon rectal cancer. Only one of these seven genes did not appear in the first fifty places according to our first ranking. Thus, it was excluded from the subsequent weighted majority game. On the contrary, five of the remaining six were ranked in the first ten places of our ranking by the three indices we used in the experiment: an interesting result that cannot be explained by a random variability.

To conclude, a paper using this model, jointly with statistical tools, appeared in the Journal Cancer, showing that this type of game theoretical models are of interest also for the medical community.

References

[M] Moretti S., Patrone F., Bonassi S. (2007): The class of microarray games and the relevance index for genes. TOP 15, 256-280.

[A] Albino, D., Scaruffi, P., Moretti, S., Coco, S., Di Cristofano, C., Cavazzana, A., Truini, M., Stigliani, S., Bonassi, S., Tonini, G.P. (2008): Stroma poor and stroma rich gene signatures show a low intratumoural gene expression heterogeneity in neuroblastic tumours. Cancer 113, 1412–1422

[L1] Lucchetti R.,  Radrizzani P. (2010): Microarray Data Analysis Via Weighted Indices and Weighted Majority Games Computational Intelligent Methods for Bioinformatics and Biostatistics II, Masulli, Peterson, Tagliaferri (Eds), Lecture Notes in Computer Science, Springer, 179-190

[L2]  Lucchetti R., Radrizzani P., Munarini E (2011) . A new family of regular semivalues and applications, Int. J. Game Theory 40,  655-675