**Geometry of gene cophylogenies as relates to genome evolution and speciation**(Supported by NIH R01 grant.) See UK Statistical Phylogenetics Group site

In this project, we are interested in two of the basic processes underlying speciation: mutation (in a rather broad sense) and genetic isolation. We are specifically interested in refining phylogenetic and phylogenomic methods to elucidate the history of these processes by analysis of macromolecular sequences. The basic innovations are (1) to develop methods for simultaneous derivation of gene trees (cophylogeny), to allow rigorous tests of their codivergence or deviation from codivergence while moving away from the mathematical assumption of independent evolution; and (2) to apply such methods to the large number of genes available from genome sequences, in order to better assess the history of speciation and genome evolution.

Among the biologically important problems to which we will apply these methods are testing the evolutionary relationships (codivergence, orthology, etc.) of genes within a genotype (such as those coding for multimodular enzymes); and studying codivergence and coevolution in host-parasite relationships, including fungal endophytes of grasses.

The data sets we are working with include house keeping genes from endophytes (Schardl's lab) and genes from salamanders (Weisrock's lab).**Computing Markov subbases for contingency tables**

Diaconis-Sturmfels showed that a set of binomial generators of a toric ideal for a statistical model of discrete exponential families is equivalent to a Markov basis and initiated Markov chain Monte Carlo approach based on a Groebner basis computation for testing statistical fitting of the given model, many researchers have extensively studied the structure of Markov bases for models in computational algebraic statistics. Despite the computational advances, there are applied problems where one may never be able to compute a Markov basis. In general, the number of elements in a minimal Markov basis for a model can be exponentially many. Thus, it is important to compute a reduced number of moves which connect all tables instead of computing a Markov basis. In some cases, such as logistic regression, positive margins are shown to allow a set of Markov connecting moves that are much simpler than the full Markov basis. Thus, in this project we are interested in computing Markov subbases for contingency tables with assumption of positive margins.**Genome-level resolution of species boundaries and phylogeny of the North American tiger salamander radiation**(Supported by NSF).

Tiger salamanders are used in a range of biological and biomedical research, with much emphasis placed on one species, the Mexican Axolotl, due to ease of rearing and experimental manipulation. However, it is just one of a diverse radiation of tiger salamanders. Collectively, they can serve as a model system for research ranging from speciation to regeneration, but little is known about how many species exist and their evolutionary relationships. This research will leverage new DNA sequencing technologies and existing genome resources to provide a comprehensive assessment of species diversity across North America and place them into a comparative evolutionary framework. Concurrently, there will be development of new statistical tools to identify genes uniquely influenced by idiosyncratic factors, such as natural selection.

This research forges strong links between statistical and biological research programs, an increasingly important form of collaboration as Biology moves further towards genomics. Support will be provided at postdoctoral and graduate levels and persons employed by this grant will be integrated into a multidisciplinary research program. This work will result in one of the largest population genomic data sets for a naturally distributed organism and will help guide the way for future integrations of genomics into evolutionary biology.