Guohua Yan

Professor

PhD

Mathematics and Statistics

Tilley Hall 424

Fredericton

gyan@unb.ca

1 506 458 7360

http://www.math.unb.ca/~gyan/

Research interests

Methods for cluster analysis

Cluster analysis seeks to discover groups of objects or clusters so that, within a cluster, objects are similar to each other, and between clusters, objects are dissimilar to each other. The objects are usually represented as a vector of measurements or a point in a multidimensional space. Linear clustering, or hyperplane clustering, is an extension of traditional clustering that seeks to find linear structures in a dataset.

For example, in allometry studies some types of animal species form one linear relationship between their body weight and brain weight while some other types have some other linear relationship; in single nucleotide polymorphism genotyping, fluorescent signals of patients with the same genotype may scatter around a straight line. One of my research focuses is to develop efficient and robust methods/algorithms for cluster analysis in general and linear clustering in particular.

Inferences for clustered count data

Clustered count data with excessive zeros are common in medical, health, ecological and biological studies. One approach to modeling these data introduces cluster-level random effects. For example, in the analysis of clustered data in medical studies, random effects are often used to model the heterogeneity between subjects. These random effects characterize the varying susceptibilities of subjects to certain diseases.

However, for many diseases, there is often a considerable portion of subjects who are insusceptible to the diseases; therefore there are excessive zeros in the observed counts. In large-scale studies, for example in epidemiology studies, missing data and measurement errors in covariates pose further challenges to model building; clustered count data may also be longitudinal or spatial. Collaborated with colleagues, we are working on modelling strategies for complex clustered count data.