International Conference on Statistical Distributions and Applications
ICOSDA 2019

Oct. 10-12, 2019, at Eberhard Conference Center, Grand Rapids, MI, USA

Home

Conference Committee

Topic-Invited Sessions

Scheduled Program Conference Map

Call for Papers

(Expired)

Grant/Travel/Lodging Information

Accepted Abstracts

Keynotes & Plenary Speakers

Conference Registration

Important Deadlines

Titles and abstracts for Keynote and Plenary speakers are on the ‘Keynotes & Plenary Speakers’ Page.

Abstracts – Topic-Invited Speakers (Alphabetically Ordered)

TI_1_0	Abdelrazeq, Ibrahim	Rhodes College
Title	Goodness of fit Tests
In general, the goodness-of-fit-tests are used to test whether a sampled data fits a claimed distribution, a particular model, or even a stochastic process. This area has become very vast, and many approaches are now used to find the appropriate goodness-of-fit test: parametric, non-paramedic, classical, or even Bayesian approaches. In this session's talks, you will explore goodness-of-fit tests that exemplify many of these different approaches.
TI_1_4	Abdelrazeq, Ibrahim	Rhodes College
Title	The Spread Dynamics of S&P 500 vs Levy-Driven OU Processes
When an Ornstein-Uhlenbeck process is claimed and observed at discrete times 0, h, 2h,··· [T/h]h, the unobserved driving process can be approximated from the observed process. Approximated increments of the driving process are used to test the assumption that the process is Levy-driven. Asymptotic behavior of the test statistic at high sampling frequencies is developed assuming that the model parameters are known. The behavior of the test statistics using an estimated parameter is also studied. Performance of the test is illustrated through simulation.
TI_3_4	Abdurasul, Emad	James Madison University
Title	The Product Limit survival function Distribution with Small Sample Inference
Our contribution is deriving the exact distribution of product limit estimators and developed mid-p population tolerance interval for it. Then we develop a saddlepoint-based method for the population survival function from the product limit (PL) survival function estimator, under the proportional hazards model to generate a small sample confidence bands for it. The saddlepoint technique depends upon the Mellin transform of the zero-truncated product limit estimator. This transform is inverted through saddlepoint approximations to yield highly accurate approximations to the cumulative distribution functions of the respective cumulative hazard function estimator. Then we compare our saddlepoint confidence interval with what we got from the exact distribution and with that we got from the large sample method. From our simulation study we found that the saddlepoint confidence interval is very close to the confidence interval derived from the exact distribution, while being much less difficult to compute, and outperform the competing large sample methods in terms of coverage probability.
TI_48_4	Aburweis, Mohamed	University of Central Florida
Title	Comparative study of the distribution of repetitive DNA in model organisms abstract
Repetitive DNA elements are abundant in the genome of a wide range of organisms. In mammals, repetitive elements comprise about 40-50% of the total genomes. However, their biological functions remain largely unknown. Analysis of their abundance and distribution may shed some light on how they affect genome structure, function, and evolution. We conducted a detailed comparative analysis of repetitive DNA elements across ten different eukaryotic organisms, including chicken (G. gallus), zebrafish (D. rerio), Fugu (T. rubripes), fruit fly (D. melanogaster), and nematode worm (C. elegans), along with five mammalian organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat (R. norvegicus), and rhesus (M. mulatta). Our results show that repetitive DNA content varies widely, from 7.3% in the Fugu genome to 52% in the zebrafish, based on Repeat Masker data. The most frequently observed transposable elements (TEs) in mammals are SINEs (Short Interspersed Nuclear Elements), followed by LINEs (Long Interspersed Nuclear Elements). In contrast, LINEs, DNA transposons, simple repeats, and low complexity repeats are the most frequently observed repeat classes in the chicken, zebrafish, fruit fly, and nematode worm genomes, respectively. LTRs (Long Terminal Repeats) have significant genomic coverage and diversity, which may make them suitable for regulatory roles. With the exception of the nematode worm and fruit fly, the frequency of the repetitive elements follows a log-normal distribution, characterized by a few highly prevalent repeats in each organism. In mammals, SINEs are enriched near genic regions, and LINEs are often found away from genes. We also identified many LTRs that are specifically enriched in promoter regions, some with a strong bias towards the same strand as the nearby gene. This raises the possibility that the LTRs may play a regulatory role. Surprisingly, most intronic repeats, with the exception of DNA transposons, have a strong tendency to be on the opposite DNA strand as the host gene. One possible explanation is that intronic RNAs which result from splicing may contribute to retro transposition to the original intronic loci.
TI_2_3	Ahmad, Morad	University of Jordan
Title	On the class of Transmuted-G Distributions
In this talk, we compare the reliability and the hazard function between a baseline distribution and the corresponding transmuted-G distribution. Some examples based on existing transmuted-G distributions in literature are used. Three tests of parameter significance are utilized to test the importance of a transmuted-G distribution over the baseline distribution, and real data is used in an application of the inference about the importance of transmuted –G distributions.
TI_47_0	Akinsete, Alfred	Marshall University, Huntington, WV
Title	A new class of generalized distributions
This session presents a new class of generalized statistical distributions, which may provide the robustness and versatility for scientists and practitioners dealing with real life data. Each paper presents detailed mathematical and statistical properties of distribution, parameter estimation, and applications to various types of datasets.
TI_2_0	Al-Aqtash, Raid	Marshall University
Title	Generalized Distributions and Applications
The first speaker, Dr. Elkadry, presents his work that relates to Bayesian statistics with application to real life data. The other speakers, Drs. Aljarrah, Ahmed & Al-Aqtash, present their work on recently developed generalized statistical distributions with application to real data.
TI_2_4	Al-Aqtash, Raid	Marshall University
Title	On the Gumbel-Burr XII Distribution; Regression and Application
Additional properties of the Gumbel-Burr XII distribution GBXII(L) are studied. We consider useful characterizations for the GBXII(L) distribution in addition to some structural properties including mean deviations and the distribution of the order statistics. A simulation study is conducted to assess the performance of the MLEs and then usefulness of the GBXII(L) distribution is illustrated by means of real data. A log-GBXII(L) regression model is proposed and a survival data is used in an application of the proposed regression model.
TI_5_3	Aldeni, Mahmoud	Western Carolina University
Title	TX Family and Survival Models
We introduce a generalized family of lifetime distributions, namely, the uniform-R{generalized lambda} (U-R{GL}) and derive the corresponding survival models. Two members of this family are derived, namely, the U-Weibull{GL} (U-W{GL}), a generalized Weibull distribution, and U-loglogistic{GL} (U-LL{GL}), a generalized loglogistic distribution. The hazard function of U-R{GL} family can be monotonic, bathtub, upside-down bathtub, N-shaped, and bimodal shaped. The U-W{GL} distribution is applied to fit two lifetime data sets. The survival model, based on the U-W{GL} distribution, is applied to fit a right censored lifetime data set.
TI_2_2	Aljarrah, Mohammad A.	Tafila Technical University, Tafila, Jordan
Title	A new generalized normal regression model.
We develop a regression model using the new generalized normal distribution. Assuming censored data, maximum likelihood estimates for the model parameters are obtained. The implementation of this model is demonstrated through applications to censored survival data. A diagnostic analysis and a model check was performed based on martingale-type residuals.
TI_1_1	Al-Labadi, Luai	University of Toronto, Mississauga
Title	A Bayesian Nonparametric Test for Assessing Multivariate Normality
A novel Bayesian nonparametric test for assessing multivariate normal models is presented. The use of the procedure has been illustrated through several examples, in which the proposed approach shows excellent performance.
TI_16_1	Al-Mofleh, Hazem	Tafila Technical University, Tafila, Jordan
Title	Wrapped Circular Statistical Distributions and Applications
Measurements in direction is common in science and real-life data observations. Therefore, a circular distribution with random angle is used to describe these phenomena. There are many techniques to getting a circular distribution form the underlying density function, one of the very effective techniques is called “wrapping”.
TI_31_3	Almohalwas, Akram	UCLA
Title	Analysis of Donald Trump's Twitter Data Using Text Mining and Social Network Analysis
As the U.S. grows more accustomed to social media, it has started to be incorporated into many aspects of American life, thus, it becomes one of the most efficient “weapon” for politicians campaigning and communicating with people. One of the most famous examples is Donald Trump on Twitter. Twitter is one of the well-known social media tools, it has a huge size of data that needs to be swift through to get some insights into the owner of the Twitter account.
TI_5_1	Almomani, Ayman	Almomany Trade
Title	TX: The Extended Family
Consider two CDFs T and F with supports [0,1] and S, respectively, then G(x) = ToF(x) is a CDF whose support is S and whose parameters include both those of T and F. The distribution T is called a complementary distribution and its choice is crucial in defining the distributional properties and moments of the newly generated G. We investigate the connection between complementary distributions and the TX family and present different ways of extending the TX family through different choices of the function T. We make recommendations on how to select the appropriate T-transformations.
TI_14_3	Alzaatreh, Ayman	American University of Sharjah
Title	Truncated T-X family of distributions
The time and cost to start a business are highly related to the degree of transparency of business information, which strongly impacts the loss due to illicit financial flows. In order to study the distributional characteristics of time and cost to start a business, we introduce right-truncated and left-truncated T-X families of distributions. These families are used to construct new generalized families of continuous distributions. Relationships between the families are investigated. Real data sets including time and cost to start a business are analyzed and the results show that the truncated families perform very well for fitting highly skewed data.
TI_3_0	Alzaghal, Ahmad	State University of New York at Farmingdale
Title	Distributions and Applications

TI_37_2	Alzaghal, Ahmad	State University of New York at Farmingdale
Title	A Generalized Family of Lindley Distribution: Properties and Applications
In this talk, we introduce new families of generalized Lindley distribution, using the T-R{Y} framework, named T-Lindley family of distributions. The new families are generated using the quantile functions of uniform, exponential, Weibull, logistic, log logistic and Cauchy distributions. Several general properties of the T-Lindley family are studied in detail including moments, mean deviations, mode and Shannon’s entropy. Several new members of T-Lindley distributions are studied in more detail. The distributions in the T-Lindley family can be skewed to the right, symmetric, skewed to the left, or bimodal. A data set is used to demonstrate the flexibility and usefulness of the T-Lindley family of distributions.
TI_4_0	Amezziane, Mohamed	Central Michigan University
Title	Models for Complex Data
Models for densities, spatial autoregressive inference, post selection inference and false discovery rate control.
TI_15_2	Andrews, Beth	Northwestern University
Title	Partially specified spatial autoregressive model with artificial neural network
For spatial modeling and prediction, we propose a spatial autoregressive model with nonlinear neural network component. This allows for model flexibility in describing the relationship between the dependent variable and covariates. We consider model/variable selection and use a maximum likelihood technique for parameter estimation. The estimators are consistent and asymptotically Normal under general conditions. Simulation results indicate the asymptotic theory holds for finite, large samples, and we use of our methods to model United States voting patterns. This is joint work with Wenqian Wang (Northwestern University).
TI_6_0	Arslan, Olcay	Ankara University
Title	Some non-normal distributions and their applications in robust statistical analysis
In this topic-invited session, some non-Gaussian distributions used for modeling as alternatives to the normal distribution will be discussed and some new extensions of these distributions will be proposed. Several different applications of these distributions will be given to demonstrate the performances of these distributions for conducting robust statistical analysis of data sets that may have non-normal empirical distributions.
TI_6_1	Arslan, Olcay	Ankara University
Title	Multivariate Laplace and multivariate skewed Laplace distributions and their applications in robust statistical analysis
In this study, we will consider multivariate Laplace distribution and its skew extension that can be used alternatives to the multivariate normal or other multivariate distributions for modeling non-normal data sets. One of the advantages of these distributions is that they can model thick-tailed and skew datasets and have a simpler form than other multivariate or skew multivariate distributions. Concerning the number of parameters, these distributions have the same number of parameters with the multivariate normal distribution and its skew extensions. This will be an advantage in terms of the parameter estimation. We will explore some properties of these distributions and study the parameter estimation via EM algorithm. We will also discuss some applications to demonstrate the modeling strength of these distributions.
TI_47_1	Aryal, Gokarna	Purdue University Northwest, Hammond, IN
Title	Transmuted-G Poisson Family
In this talk, we present a new family of distributions called the Transmuted–G Poisson (TGP) family. This family of distributions is constructed by using the genesis of the zero truncated Poisson (ZTP) distribution and the transmutation map. Some mathematical and statistical properties of TGP family are provided. The parameter estimation and simulation procedures are also discussed. Usefulness of TGP family is illustrated by modeling couple of real-life data.
TI_9_3	Babic, Sladana	Ghent University
Title	Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data
We present, compare and classify the most popular families of flexible multivariate distributions. By flexible distribution we mean that, besides the usual location and scale parameters, the distribution has also both skewness and tail parameters. The following families are presented: elliptical distributions, skew-elliptical distributions, multiple scaled mixtures of multinormal distributions, multivariate distributions based on the transformation approach, copula-based multivariate distributions and meta-elliptical distributions. Our classification is based on the tail behavior (a single tail weight parameter or multiple tail weight parameters) and the type of symmetry (spherical, elliptical, central symmetry or asymmetry). We compare the flexible families both theoretically (comparing the relevant properties and distinctive features) and with a Monte Carlo study (comparing the fitting abilities in finite samples).
TI_5_4	Bahadi, Taoufik	University of Tampa
Title	TX Family of Link functions for Binary Regression
The link function in binary regression is used to specify how the probability of success is linked to the model’s systematic component. These link functions are chosen to be quantile functions of popular distributions such as the logistic (logit), Gaussian (probit) and Gumbel (cloglog) distributions. We choose new flexible link functions from the TX family of distributions, build an inference framework for their regression models and derive a new model validation procedure.
TI_46_1	Bandyopadhyay, Tathagata	St. Ambrose University
Title	Inference problems in binary regression model with misclassified responses
The problem of predicting a future outcome based on the past and currently available samples arises in many applications. Applications of prediction intervals (PIs) based on continuous distributions are well-known. Compared to continuous distributions results on constructing PIs for discrete distributions are very limited. The problems of constructing prediction intervals for the binomial, Poisson and negative binomial distributions are considered here. Available approximate, exact and conditional methods for these distributions are reviewed and compared. Simple approximate prediction intervals based on the joint distribution of the past samples and the future sample are proposed. Exact coverage studies and expected widths of prediction intervals show that the new prediction intervals are comparable to or better than the available ones in most cases.
TI_7_3	Baron, Michael	American University
Title	Sequential testing and post-analysis of credibility
Actuaries routinely make decisions that are sequential in nature. During each insured period, the new claims and losses data are collected, and together with the new economic and financial situation and other factors, they are taken into account for the calculation of revised premiums and risks. This talk focuses on the assessment of credibility, estimation of credibility factors, and testing for full credibility based on sequentially collected actuarial data. Proposed sequential tests for full credibility control the overall error rate and power. They result in a rigorous set of conditions under which an insured cohort becomes fully credible. Following sequential decisions, methods are developed for the computation of sequential p-values. Inversion of the derived sequential test leads to a construction of a sequence of repeated confidence intervals for the credibility factor. Methods are detailed for Gamma, Weibull, and Pareto loss distributions and applied to CAS Public Loss Simulator data sets.
TI_9_2	Bekker, Andriette	University of Pretoria, South Africa.
Title	Class of matrix variate distributions: a flexible approach based on the mean-mixture of normal model
Limited research has been conducted on matrix variate data that can describe skewness present in data. This paper introduces a new class of matrix variate distributions based on the mean-mixture of normal (MMN) model. The properties of the new matrix variate class - stochastic representation, moments and characteristic function, linear and quadratic forms as well as marginal, conditional distributions are investigated. Three special cases including the restricted skew-normal, exponentiated MMN and the half-normal exponentiated MMN matrix variate distributions are highlighted. An EM-algorithm is implemented to obtain maximum likelihood estimates of the parameters. The usefulness and practical utility of the proposed methodology are illustrated using two conducted simulation studies. To investigate the performance of the developed model in the real-world analysis, Landsat satellite data (LSD) originally, obtained from NASA, are used. Numerical results show that the new models, within this proposed class, performed well when applied to skewed matrix variate experimental data.
TI_15_0	Berrocal, Veronica	University of California Irvine
Title	Comparing Spatial Fields
In weather forecast verification, the need for more advanced methods for analyzing high-resolution forecasts prompted a lot of new methodology to be introduced; largely from image analysis and computer vision, some from spatial statistics. In this genre, it is important to capture information about how similar features within the fields are, and there has not been much, if any, work done on statistical inference in this arena, which is a more general topic than just weather forecast verification. Deciding on how close, or far away, two spatial fields are in some context is an important question in many areas of research.
TI_15_1	Berrocal, Veronica	University of California Irvine
Title	Comparing spatial fields to detect systematic biases in regional climate models
Since their introduction in 1990, regional climate models (RCMs) have been widely used to study the impact of climate change on human health, ecology, and epidemiology. To ensure that the conclusions of impact studies are well founded, it is necessary to assess the uncertainty in RCMs. This is not an easy task because two major sources of uncertainties can undermine an RCM: uncertainty in the boundary conditions needed to initialize the model and uncertainty in the model itself. Using climate data for Southern Sweden over 45 years, in this paper, we present a statistical modeling framework to assess an RCM driven by analyses. More specifically, our scientific interest here is determining whether there exist time periods during which the RCM inconsideration displays the same type of spatial discrepancies from the observations. The proposed model can be seen as an exploratory tool for atmospheric modelers to identify time periods that require a further in-depth examination. Focusing on seasonal average temperature, our model relates the corresponding observed seasonal fields to the RCM output via a hierarchical Bayesian statistical model that includes a spatio-temporal calibration term. The latter, which represents the spatial error of the RCM, is in turn provided with a Dirichlet process prior, enabling clustering of the errors in time. We apply our modeling framework to data from Southern Sweden spanning the period 1 December 1962 to 30 November 2007, revealing intriguing tendencies with respect to the RCM spatial errors of seasonal average temperature.
TI_4_2	Bhattacharjee, Abhishek	University of Northern Colorado
Title	Empirical Bayes Intervals for the Selected Mean
Empirical Bayes (EB) methods are very useful for post selection inference. Following Datta et al. (2002), construction of EB confidence intervals for the selected population mean will be discussed in this presentation. The EB intervals are adjusted to achieve the target coverage probabilities asymptotically up to the second order. Both unconditional coverage probabilities of EB intervals and corresponding probabilities conditional on ancillary statistics are found.
TI_27_1	Bonner, Simon	University of Western Ontario
Title	Modelling Score Based Data from Photo-Identification Studies of Wild Animals
Photographic identification has become an invaluable tool for studying populations of animals that are hard to follow in the wild. Photographs are often compared in-silico with computer algorithms that produce continuous scores which are then classified to identify matches based on some predefined cut-off. This process is prone to errors (false positive or negative matches) which bias estimates of the population’s demographics. We present a general framework for modelling photo-id data based on the raw scores, describe the Bayesian framework for fitting this model, discuss computational issues, and present an application to a long-term study of whale sharks (Rhincodon typus).
TI_7_0	Brazauskas, Vytaras	University of Wisconsin-Milwaukee
Title	Actuarial Statistics
In this session, we will discuss several statistical methodological techniques that appear in actuarial studies, including credibility, modeling of random variables affected by coverage modifications and dependence, and non-standard distributions relevant to insurance data.
TI_7_4	Brazauskas, Vytaras	University of Wisconsin-Milwaukee
Title	Modeling severity and measuring tail risk of Norwegian fire claims
The probabilistic behavior of the claim severity variable plays a fundamental role in calculation of deductibles, layers, loss elimination ratios, effects of inflation, and other quantities arising in insurance. Among several alternatives for modeling severity, the parametric approach continues to maintain the leading position, which is primarily due to its parsimony and flexibility. In this paper, several parametric families are employed to model severity of Norwegian fire claims for the years 1981 through 1992. The probability distributions we consider include: generalized Pareto, lognormal-Pareto (two versions), Weibull-Pareto (two versions), and folded-t. Except for the generalized Pareto distribution, the other five models are fairly new proposals that recently appeared in the actuarial literature. We use the maximum likelihood procedure to fit the models and assess the quality of their fits using basic graphical tools (quantile-quantile plots), two goodness-of-fit statistics (Kolmogorov-Smirnov and Anderson-Darling), and two information criteria (AIC and BIC). In addition, we estimate the tail risk of 'ground up' Norwegian fire claims using the value-at-risk and tail-conditional median measures. We monitor the tail risk levels over time, for the period 1981 to 1992, and analyze predictive performances of the six probability models. In particular, we compute the next-year probability for a few upper tail events using the fitted models and compare them with the actual probabilities.
TI_16_4	Broniatowski, Michel	Université Pierre et Marie Curie (Sorbonne Université)
Title	A review on divergence-based inference in parametric and semiparametric models
The Csiszar class of divergences has the main advantage to fit to both parametric and non-parametric settings, in contrast with other classes of dissimilarity indexes. Starting from the dual representation for Csiszar divergences the talk will fist provide a unified treatment for parametric inference, with some accent to non-regular models, as occurs for the number and the nature of components in mixture models. We will then turn to semi parametric models of two kinds: firstly, we will consider mixtures with a parametric component and a nonparametric one, a useful class of models for applications. Other semi parametric models defined by moment conditions have been widely considered in the present literature, rooting in the wellknown empirical likelihood paradigm (Owen 1980). We will show that divergence based approaches can be applied in semiparametric models de.ned by conditions on moments of L-statistics; typical examples are provided when considering models defined as neighborhoods of parametric classes, such as Weibull or Pareto ones, when those neighborhoods are defined through conditions on their first L moments. The basic dual representation of divergences in parametric and non arametric models have been considered independently by Liese and Vajda (2006) and Broniatowski and Keziou (2006,2009). Semi parametric mixtures have been considered in the frame of Csiszar divergence-based inference in Al Mohamad and Bumahdaf (2016), and inference under L-moment conditions have been studied by Broniatowski and Decurninge (2017).
TI_6_3	Bulut, Yakup Murat	Eskişehir Osman Gazi University
Title	Matrix variate extensions of symmetric and skew Laplace distributions: Properties, parameter estimation and applications
In this work, we introduce symmetric and skew matrix variate Laplace distributions using mixture approaches. To obtain symmetric version of the matrix variate Laplace distribution, we use scale mixture approach. To drive a skew version of the matrix variate Laplace distribution, we apply the variance-mean mixture approach. Some statistical properties of newly defined distributions are investigated. Further, we give EM based algorithm to estimate the unknown parameters. A small simulation study and a real data example are given to explore the performance of the proposed algorithm for finding the parameter estimates and also to illustrate the capacity of the proposed distribution for modeling matrix variate data sets.
TI_27_3	Burkett, Kelly	University of Ottawa
Title	Markov chain Monte Carlo sampling of gene genealogies conditional on genotype data from trios
To discover genetic associations with disease, it is useful to model the latent ancestral trees (gene genealogies) that gave rise to the observed genetic variability. Though the true tree is unknown, we model its distribution conditional on observed genetic data and use Monte Carlo methods to sample from this distribution. In this presentation, I first describe my sampler, ‘sampletrees’, that conditions on data from unrelated individuals. I then discuss an extension to the algorithm when the observed data is from trios, consisting of two parents and a child. Finally, as illustration, the trio-based sampler will be applied to real data.
TI_06_2	Çelikbıçak, Müge B.	Gendermarie and Coast Guard Academy
Title	Parameter Estimation in MANOVA with Repeated Non-normal Measures
Repeated measures design which multiple observations are made on each experimental unit play an important role in the health and behavioral sciences. In these designs, there are many methods to the analysis of repeated measures data. Statistically the difference between these methods is in the assumptions underlying the models. Many of these methods are based on normality assumptions. In this study, we introduce an alternative non-normal distribution as a scale-mixture of normal distribution to analyze multivariate repeated measure data. We use EM algorithm to obtain maximum likelihood estimators of parameters of analysis of variance model for multivariate repeated measure.
TI_19_3	Chacko, Manjo	University of Kerala, India
Title	Bayesian Analysis of Weibull distribution based on Progressive type-II Censored Competing Risks Data
In this work, we consider the analysis of competing risk data under progressive type-II censoring by assuming the number of units removed at each stage is random and follows a binomial distribution. Bayes estimators are obtained by assuming the population under consider follows a Weibull distribution. A simulation study is carried out to study the performance of the different estimators derived in this paper. A real data set is also used for illustration
TI_11_4	Chaganty, Rao	Old Dominion University
Title	Models for selecting differentially expressed genes in microarray experiments
There have been many advances in microarray technology, enabling researchers to quantitatively analyze expression levels of thousands of genes simultaneously. Two types of microarray chips are currently in practice - the spotted cDNA chip developed by microbiologists at Stanford University in the mid-1990’s and the oligonucleotide array first commercially released by Affymetrix Corporation in 1996. Our focus is on the spotted cDNA chip, which is more popular than the later microarray. In a cDNA microarray, or “two-channel array,” the experimental sample is tagged with red dye and hybridized along with a reference sample tagged with green dye on a chip which consists of thousands of spots. Each spot contains preset oligonucleotides. The red and green intensities are measured at each spot by using a fluorescent scanner. In this talk, we aim to discuss bivariate statistical models for the red and green intensities, which enable us to select differentially expressed genes.
TI_41_1	Chang, Won	University of Cincinnati
Title	Ice Model Calibration using Semi-continuous Spatial Data
Rapid changes in Earth's cryosphere caused by human activity can lead to significant environmental impacts. Computer models provide a useful tool for understanding the behavior and projecting the future of Arctic and Antarctic ice sheets. However, these models are typically subject to large parametric uncertainties due to poorly constrained model input parameters that govern the behavior of simulated ice sheets. Computer model calibration provides a formal statistical framework to reduce and quantify the uncertainty due to such parameters. Calibration of ice sheet models is often challenging because the relevant model output and observational data take the form of semi-continuous spatial data, with a point mass at zero and a right-skewed continuous distribution for positive values. The current calibration approaches cannot readily handle such data type. Here we introduce a hierarchical latent variable model that sequentially handles binary spatial patterns and positive continuous spatial patterns in two stages. To overcome challenges due to high-dimensionality we use likelihood-based generalized principal component analysis to impose low-dimensional structures on the latent variables for spatial dependence. We demonstrate that our proposed reduced-dimension method can successfully overcome the aforementioned challenges in the example of calibrating PSU-3D ice model for the Antarctic ice sheet and provide improved future ice-volume change projections.
TI_8_0	Chatterjee, Arpita	Georgia Southern University
Title	Statistical Advancements in Health Sciences
Statistics plays a pivotal role in research, planning, and decision-making in the health sciences. In recent years there has been an increasing interest for new statistical methodologies in the field of biomedical sciences. This session will address statistical advances to explore complex data emerging from non-inferiority clinical trials and microarray experiments.
TI_8_4	Chatterjee, Arpita	Georgia Southern University
Title	An Alternative Bayesian Testing to Establish Non-inferiority.
Noninferiority clinical trials have gained immense popularity within the last decades. Such trials are designed to demonstrate that a new experimental drug is not unacceptably worse than an active control by more than a pre-specified small margin. Three-arm non-inferiority trials have been widely acknowledged as the Gold Standard because they can simultaneously establish both non-inferiority and the assay sensitivity. Bayesian testing, based on the posterior probability, for Non-inferiority trials, have already been established in the context of continuous and count data. We propose a Bayesian non-inferiority test based on Bayes factors. The performance of our proposed test is demonstrated through simulated data.
TI_22_0	Chen, Din (Org Lio, Yuhlong)	University of North Carolina at Chapel Hill
Title	Statistical Modeling for Degradation Data I
In recent years, statistical modeling and inference techniques have been developed based on different degradation measures. This invited session is based on the book “Statistical Modeling for Degradation Data” co-edited by Professors Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony Ng, Tzong-Ru Tsai, published by Springer in 2017. The book strives to bring together experts engaged in statistical modeling and inference to present and discuss the most recent important advances in degradation data analysis and related applications. The speakers in this session are invited to contribute to this book and further present their recent development in this research area.
TI_32_1	Chen, Din	University of North Carolina at Chapel Hill
Title	Homoscedasticity in the Accelerated Failure Time Model
The semiparametric accelerated failure time (AFT) model is a popular linear model in survival analysis. Current research based on the AFT model assumed homoscedasticity of the survival data. Violation of this assumption has been shown to lead to inefficient and even unreliable estimation, and hence, misleading conclusions for survival data analysis. However, there is no valid statistical test in the literature that can be utilized to test this homoscedasticity assumption. This talk will discuss a novel quasi-likelihood ratio test for the homoscedasticity assumption in the AFT model. Simulation studies are conducted to show the satisfactory performance of this novel statistical test. A real dataset is used to demonstrate the application of this developed test.
TI_9_1	Chen, Ding-Geng	University of Pretoria, South Africa.
Title	A statistical distribution for simultaneously modeling skewness, kurtosis and bimodality
In our funded research on cusp catastrophe modelling supported by USA NIH R01, we revitalized a family of distributions defined as f(x, α,β)=φ×exp[αx+12βx2−14x4] where α is the asymmetry parameter, β is the bifurcation parameter and the φ is the normalizing constant. This distribution is from the cusp catastrophe theory and was developed in the early 1970s by Rene Thom (Thom, R. 1975. Structural stability and morphogenesis. New York, NY: Benjamin-Addison-Wesley.) as part of the catastrophe theory in topographic research which included 7 elementary catastrophes (e.g., Fold, Cusp, Swallowtail, Elliptic Umbilic, Hyperbolic Umbilic, Butterfly, and Parabolic Umbilic). This distribution also belongs to the classical exponential family which can be used to statistically analyze data with skewness, kurtosis and bimodal simultaneously. In this talk, we will show the properties of this distribution and the parameter estimation with the theory of maximum likelihood estimation. We further demonstrate the applications of this distribution to analyze real data.
TI_21_2	Chen, Guangliang	San Jose State University
Title	All data are "documents": A scalable spectral clustering framework based on landmark points and cosine similarity
We present a unified scalable computing framework for various versions of spectral clustering. We first consider the special setting of cosine similarity for clustering sparse or low-dimensional data and show that in such cases, spectral clustering can be implemented without computing the weight matrix. Next, for general similarity, we introduce a landmark-based technique to convert the given data (and the selected landmarks) into a “document-term” matrix and then apply the scalable implementation of spectral clustering with cosine similarity to cluster them. We demonstrate the performance of our proposed algorithm on several benchmark data sets while comparing it with other methods.
TI_10_2	Cheng, Chin-I	Central Michigan University
Title	Bayesian estimators of the Odd Weibull distribution with actuarial application
The Odd Weibull distribution is a three-parameter generalization of the Weibull and the inverse Weibull distributions. The Bayesian approach with Jeffreys-type informative prior for estimating parameters of the Odd Weibull are considered. The propriety of the posterior distribution with proposed prior is provided. The Metropolis-Hastings algorithm and Adaptive Rejection Metropolis Sampling (ARMS) are adapted to generate random samples from full conditionals for inferences on parameters. The estimates based on Bayesian and maximum likelihood with application in actuarial dataset are compared.
TI_47_3	Chhetri, Sher B.	University of South Carolina, Sumter
Title	On the Beta-G Poisson Family
In this talk, we present a new family of distributions which is defined by using the genesis of the truncated Poisson distribution and the beta distribution. Some mathematical properties of the new family will be discussed. We also discuss the parameter estimation procedures and potential applications of such generalized family of distributions.
TI_9_0	Coelho, Carlos Agra	Universidade Nova de Lisboa, Portugal
Title	Contemporary Methods in Distribution Theory and Likelihood Inference
Recent results in the areas of Distribution Theory and Likelihood Inference that will be presented include: distributions adequate for simultaneously modeling skewness, kurtosis and bimodality, as well as multivariate skewness and heavy tails and yet likelihood ratio tests for elaborate covariance structures, based on samples of random sizes.
TI_10_0	Cooray, Kahadawala	Central Michigan University
Title	Parametric models for Actuarial Applications
This session presents a new copula to account for negative association with financial application, a new Pareto extension with applications to insurance data, new copula families by distorting the existing copulas with applications in financial risk management, and Bayesian estimation of the Odd Weibull parameters with applications to insurance data.
TI_15_4	Daniels, John	Central Michigan University
Title	Seeing RED: A New Statistical Solution to an Old Categorical Data Problem
Dental morphological traits (DMT) are often used to conduct inference on cultural populations. Often, the statistical “distance” between various populations is described using techniques such as Mean Measure of Divergence (MMD) or pseudo-Mahalanobis D2. These techniques, although common in Anthropology Research, have some significant drawbacks. First, MMD requires data compression into a dichotomized presence/absence indication at some arbitrary cutoff point. Second, the total sample size will be reduced in the presence of any missing values. This can be problematic with compromised or smaller data sets. A newly developed non-parametric method, Robust Estimator of Differences (RED) is proposed as a viable alternative. Utilizing both actual data and simulated data (with a known relationship), we will use both PCA and Cluster Analysis to determine the relationships between various cultural groups. The results will show that RED can outperform either method and is a viable alternative for Anthropologists to consider.
TI_46_4	Davies, Katherine	University of Manitoba
Title	Progressively Type-II Censored Competing Risks Data from the Linear Exponential Distribution
Across different types of lifetime studies, whether it be in the medical or engineering sciences, the possibility of competing causes of failures needs to be addressed. Typically referred to as competing risks, in this paper we consider progressively type-II censored competing risks data when the lifetimes are assumed to come from a linear exponential distribution. We develop likelihood inference and demonstrate the performance of the estimators via an extensive Monte Carlo simulation study. We also provide an illustrative example using a small data set.
TI_20_3	Davila, Victor Hugo Lachos	University of Connecticut
Title	Finite mixture modeling of censored data using the multivariate skew-normal distribution
Longitudinal HIV-1 RNA viral load measures are often subjected to censoring due to upper and lower detection limits depending on the quantification assays. A complication arises when these continuous measures present a heavy-tailed behavior because inference can be seriously affected by the misspecification of their parametric distribution. For such data structures, we propose a robust nonlinear censored regression model based on the scale mixtures of normal distributions. For taking into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. A stochastic approximation of the EM algorithm is developed to obtain the maximum likelihood estimates of the model parameters. The main advantage of this new procedure allows us to estimate the parameters of interest and evaluate the log-likelihood function in an easy and fast way. Furthermore, the standard errors of the fixed effects and predictions of unobservable values of the response can be obtained as a by-product. The practical utility of the proposed method is exemplified using both simulated and real data.
TI_19_2	Dharmaja, S.H.S.	Govt. College for Women, Trivandrum, India
Title	On logarithmic Kies distribution
In this paper we consider a logarithmic form of the Kies distribution and discuss some of its important properties. We derive explicit expressions for its percentile measures, raw moments, reliability measures etc. and attempted the maximum likelihood estimation of the parameters of the distribution. Certain real-life applications are also considered for illustrating the usefulness of the proposed distribution compared to existing models. Also, the asymptotic behaviour of likelihood estimators are studied by using simulated data sets.
TI_11_0	Diawara, Norou	Old Dominion University
Title	Statistical Methods for Space and Time Applications

TI_15_3	Diawara, Norou	Old Dominion University
Title	Density Estimation of Spatio-temporal Point Patterns using Moran's Statistic
In this paper, an Inflated Size-biased Modified Power Series Distributions (ISBMPSD), where inflation occurs at any of the support points is studied. This class include among others the size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution as its particular cases. We obtain the recurrence relations among ordinary, central and factorial moments. The maximum likelihood and Bayesian estimation of the parameters of the Inflated Size-biased MPSD is obtained. As special cases, results are extracted for size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution. Finally, an example is presented for the size-biased generalized Poisson distribution to illustrate the results and a goodness of fit test is done using the maximum likelihood and Bayes estimators.
TI_43_1	Dong, Yuexiao	Temple University
Title	On dual model-free variable selection with two groups of variables
In the presence of two groups of variables, existing model-free variable selection methods only reduce the dimensionality of the predictors. We extend the popular marginal coordinate hypotheses (Cook, 2004) in the sufficient dimension reduction literature and consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses and devise a joint backward selection algorithm for dual model-free variable selection. The performances of the proposed test and the variable selection procedure are evaluated through synthetic examples and a real data analysis.
TI_33_4	Duval, Francis	Université du Québec à Montréal (UQAM)
Title	Gradient Boosting-Based Model for Individual Loss Reserving
Modeling based on data information is one of the most challenging research topics in actuarial science. Statistical learning approaches offer a set of tools that could be used to evaluate loss reserves in an individual framework. In this talk, we contrast some traditional aggregate techniques with individual models based on both parametric and gradient boosting algorithms. These models use information about each of the payments made for each of the claims in the portfolio, as well as characteristics of the insured. We provide an example based on a dataset from an insurance company and we discuss some points related to practical applications.
TI_1_3	El Ktaibi, Farid	ZAYED university, UAE
Title	Bootstrapping the Empirical Distribution of a Stationary Process with Change-point
When detecting a change-point in the marginal distribution of a stationary time series, bootstrap techniques are required to determine critical values for the tests when the pre-change distribution is unknown. In this presentation, we propose a sequential moving block bootstrap and demonstrate its validity under a converging alternative. Furthermore, we demonstrate that power is still achieved by the bootstrap under a non-converging alternative. These results are applied to a linear process and are shown to be valid under very mild conditions on the existence of any moment of the innovations and a corresponding condition of summability of the coefficients.
TI_2_1	Elkadry, Alaa	Marshall University
Title	Analyzing Continuous Randomized Response Data with an Indifference-Zone Selection Procedure
A randomized response model applicable to continuous data that considers a mixture of two normal distributions is considered. The target here is to select the population with the best parameter value. A study on how to choose the best population between k distinct populations using an indifference-zone procedure is provided. Also, the operating characteristics (OCs) of a subset ranking and selection procedure are derived for the randomized response model for continuous data considered. The operating characteristics for the subset selection procedures are considered for two parameter configurations, the slippage configuration and the equi-spaced configuration.
TI_23_2	Ferreira, Johan	University of Pretoria
Title	Alternative Dirichlet priors for estimation of Shannon entropy using countably discrete likelihoods
Claude Shannon‘s seminal paper “A Mathematical Theory of Communication” is widely considered as the basis of information theory. Shannon entropy is a functional of a probability structure and is a measurement of information contained in a system. It has been applied as a cryptographic measure for a key generator module, for mining part of the security of the cipher system. In a machine-learning context, entropy is used to define an error function as part of the learning of weights in multilayers perceptron in neural networks. The practical problem of estimating entropy from samples (sometimes small samples) in many applied settings remains a challenging and relevant problem. In this presentation, previously unconsidered Dirichlet generators are introduced as possible priors for an underlying countably discrete model (in particular, the multinomial model). Resultant estimators for the entropy H(p) under the considered priors and assuming squared error loss will be presented. Particular cases of these proposed priors will of interest and their effect on the estimation of entropy subject to different parameter scenarios will be investigated.
TI_44_4	Fisher, Thomas	Miami University
Title	A split and merge strategy to variable selection
The curse of dimensionality, where p is large relative to n, is a well-known problem that can affect variable selection methods as well as model performance. We consider an algorithm similar to k-fold cross-validation where we segment the feature variables into subsets, variable selection (LASSO or others) is performed within the subset and the final set of selected variables is aggregated for a final model. Simulations show that this approach has comparable performance to standard techniques with the added benefit of improved computational run time. The method easily can be parallelized for further improved efficiency.
TI_12_0	Flegal, James M.	University of California, Riverside
Title	Advances in Bayesian Theory and Computation
Bayesian computation remains an active theoretical and practical research area. Talks in this session consider Bayesian penalized regression models under a unified framework, locally adaptive shrinkage in the Bayesian framework, weighted batch means variance estimators for MCMC output analysis, and recent developments concerning a graph-based Bayesian approach to semi-supervised learning.
TI_12_3	Flegal, James M.	University of California, Riverside
Title	Weighted batch means estimators in Markov chain Monte Carlo
We propose a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.
TI_11_3	Fofana, Demba	University of Texas Rio Grande Valley
Title	Combining Assumptions and Graphical Network into Gene Expression Data Analysis
Analyzing properly gene expression data is a daunting task that requires taking both assumptions and network relationships among genes into consideration. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be better analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Our method that combines assumptions and network information into the analysis is shown to be more powerful. We will provide an R package to help use this approach.
TI_31_2	Galoppo, Travis + Kogan, Clark	ABB US Corporate Research
Title	A GPU Enhanced Bayesian Ordinal Logistic Regression Model of Hospital Antimicrobial Usage
Bayesian data analysis has a high computational demand, with a critical bottleneck in the evaluation of data likelihood. When data samples are independent, there is significant opportunity for parallelization of the data likelihood calculation. We demonstrate a prototype GPU enhanced Gibbs sampler implementation using NVIDIA CUDA, applying a Bayesian ordinal logistic regression to a large dataset of antimicrobial usage in hospitals. Our implementation offloads only the data likelihood calculation to the GPU, while maintaining the core sampling logic on the CPU. We compare our results to other popular software packages, both to verify correctness and to showcase performance.
TI_22_2	Gao, Yong	Ohio University
Title	A Hierarchical Bayesian Bi-exponential Wiener Process for Luminosity Degradation of Display Products
This presentation will discuss a nonlinear Wiener process degradation model for analyzing the luminosity degradation of display products. To account for the nonlinear two-phase pattern in the observed degradation paths, we assume the bi-exponential function as the drift function of the Wiener process degradation model. The hierarchical Bayesian modeling framework is adopted to construct the model. The failure-time distribution of a unit randomly selected from the population is obtained. Prediction results are compared to the results from two alternative models, a bi-exponential degradation-path model and a time-scale transformed linear Wiener process.
TI_13_0	George, Olusegun	The University of Memphis
Title	Exchangeability in Statistical Inference - Theory and Applications
It is well documented that exchangeability is at the heart of statistical inference. The ground-breaking representation theorem of De Finetti (1931) on infinite exchangeability has had profound impact in the modeling clustered data. This special session is dedicated recent applications of finite and infinite exchangeability to analysis of clustered data.
TI_5_0	George, Tyler (Org - Amezziane,M.)	Central Michigan University
Title	TX Family: Extensions and Inference
TX family is a class of families formed through the compounding of distributions. Such operation allows the generated distribution to inherit the parameters of the compounded distributions but not necessarily their properties. This session explores different problems that can be solved using the flexibility of the TX distributions.
TI_14_0	Ghosh, Indranil	University of North Carolina, Wilmington
Title	Probability and Statistical models with applications
This session represents some of the recent developments and some of the noteworthy results in distribution theory (both in discrete and in the continuous paradigm). In addition, several application(s) and a through discussion on the associated statistical inference are also discussed.
TI_32_2	Ghosh, Indranil	University of North Carolina, Wilmington
Title	Bivariate Beta and Kumaraswamy Models developed using the Arnold-Ng Bivariate Beta Distribution
In this paper we explore some mechanisms for constructing bivariate and multivariate beta and Kumaraswamy distributions. Specifically, we focus our attention on the Arnold-Ng (2011) eight parameter bivariate beta model. Several models in the literature are identified as special cases of this distribution including the Jones-Olkin-Liu-Libby-Novick bivariate beta model, and certain Kotz and Nadarajah bivariate beta models among others. The utility of such models in constructing bivariate Kumaraswamy models is investigated. Structural properties of such derived models are studied. Parameter estimation for the models is also discussed. For illustrative purposes, a real-life data set is considered to exhibit the applicability of these models in comparison with rival bivariate beta and Kumaraswamy models.
TI_8_1	Ghosh, Santu	Medical College of Georgia, Augusta University
Title	Two-sample Tests for High Dimensional Means with Prepivoting and Random Projection
Within the medical field, the demand to store and analyze small sample, large variable data has become ever-abundant. Several two-sample tests for equality of means, including the revered Hotelling's T2 test, have already been established when the combined sample size of both populations exceeds the dimension of the variables. However, tests such as Hotelling's T2 become either unusable or output small power when the number of variables is greater than the combined sample size. We propose a test using both pre-pivoting and an Edgeworth expansion that maintains high power in this higher dimensional scenario, known as the ``large p small n" problem. Our test's finite sample performance is compared with other recently proposed tests designed to also handle the large p small n situation. We apply our test to a microarray gene expression data set and report competitive rates for both power and Type-I error.
TI_14_1	Ghosh, Souparno	Texas Tech University
Title	Coherent Multivariate Feature Selection and Inference across multiple databases
Random forest (RF) has become a widely popular prediction generating mechanism. Its strength lies in its flexibility, interpretability and ability to handle large number of features, typically larger than the sample size. However, this methodology is of limited use if one wishes to identify statistically significant features. Several ranking schemes are available that provide information on the relative importance of the features, but there is a paucity of general inferential mechanism, particularly in a multivariate set up. We use the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing. The resulting sequential algorithm offers an inferentially justifiable, but model-free, variable selection procedure. Significant features are then used to generate predictive RF. An added advantage of our methodology is that both variable selection and prediction are based on conditional inference framework and hence are coherent. Next, we extend this methodology to model paired observations obtained from two pharmacogenomics databases where the predictors are measured under different experimental protocols. Instead of simply taking the average of the paired predictors, we offer a latent variable approach that can impute over the databases and then perform variable selection over the full set of paired samples across the databases. We illustrate the performance of our Sequential Multi-Response Feature Selection approach through simulation studies and finally apply this methodology on Genomics of Drug Sensitivity for Cancer and Cancer Cell line Encyclopedia databases to identify genetic characteristics that significantly impact drug sensitivities. Significant set of predictors obtained from our method are further validated from biological perspective.
TI_26_3	Gunasekera, Sumith	The University of Tennessee at Chattanooga
Title	On Estimating the Reliability in a Multicomponent System based on Progressively-Censored Data from Chen Distribution
This research deals with the classical, Bayesian, and generalized estimation of stress-strength reliability parameter, R_{s, k} =Pr (At least s of the (X_{1}, X_{2},...,X_{k}) exceed Y) = Pr (X_{k-s+1:k}>Y) of an s-out-of-k: G multicomponent system, based on progressively type-II right censored samples with random removals when stress (Y) and strength (X) are two independent Chen random variables.  Under squared-error and LINEX loss functions, Bayes estimates are developed by using Lindley's approximation and the Markov Chain Monte Carlo method. Generalized estimates are developed by using generalized variable method while classical estimates, the maximum likelihood estimators, their asymptotic distributions, asymptotic confidence intervals, bootstrap-based confidence intervals - are also developed. A simulation study and a real-world data analysis are given to illustrate the proposed procedures. The size of the test adjusted and unadjusted power of the test, coverage probability and expected confidence lengths of the confidence intervals, and biases of the estimators are also computed and compared and contrasted.
TI_3_2	Hamdan, Hasan	James Madison University
Title	Approximating and Characterizing Infinite Scale Mixtures
In this talk, an efficient method for approximating any infinite scale mixture by a finite scale mixture up a specified tolerance level will be presented. Then this method will be applied to approximate many common classes of infinite scale mixtures. In particular, the method will be used to approximate infinite scale mixtures of normals, infinite scale mixtures of exponentials and infinite scale mixtures of uniforms. Several important results related to infinite scale mixtures will be presented with the focus on scale mixtures of normals. An extension to the multivariate infinite scale mixtures and to the class of infinite scale-location will be discussed.
TI_3_1	Hamed, Duha	Winthrop University
Title	New Families of Generalized Lomax Distributions: Properties and Applications
In this talk, we propose some families of generalized Lomax distributions named T-Lomax{Y} by using the methodology of the T-R{Y} framework. The T-Lomax{Y} families introduced arise from the quantile functions of exponential, logistic, log-logistic and Weibull distributions. The shapes of these T-Lomax{Y} distributions vary between unimodal and bimodal. Various structural properties of the new families are derived including moments, modes and Shannon entropies. Several new generalized Lomax distributions are studied and the estimation of the model parameters for a member of the new defined families of distributions is performed by the maximum likelihood method. An application of real data set is used to demonstrate the flexibility of this family of distributions.
TI_16_0	Hannig, Jan (organizer: Jana Jureckova	The Czech Academy of Sciences, Charles University
Title	Nonlinear Functionals of Probability Distributions
The talks of the session characterize and estimate various functionals of probability distributions, that are not only parameters, but which also analyze the shape of the distribution and its relation to other distributions, as their mutual dependence or the divergence.
TI_16_3	Hannig, Jan	University of North Carolina at Chapel Hill
Title	Model Selection without penalty using Generalized Fiducial Inference
Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model.  However, coefficient estimates are unreliable when, for example, the design matrix is collinear.  To overcome this challenge an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. We apply this idea to two different problems. First, this new procedure is able to effectively account for linear dependencies among subsets of covariates in a high-dimensional regression setting. Second, we apply our variable selection method to the sparse vector AR(1).
TI_35_3	He, Wenqing	Western University
Title	Perturbed Variance Based Null Hypothesis Tests with An Application to Clayton Models
Null hypothesis tests are popularly used when there is no appropriate alternative hypothesis available, especially in model assessment where the assumed model is evaluated with no model being considered an alternative. Motivated by the test of the Clayton models in multivariate survival analysis, a simple perturbed variance resampling method is proposed for null hypothesis testing. The proposed methods make use of the perturbation method to estimate the covariance matrix of the estimator to avoid intractable variance estimate for the estimator. The proposed tests enjoy the simplicity and theoretical justification. We apply the proposed method to modify the tests for the assessment of Clayton models. The proposed methods have simpler procedures than both the parametric bootstrap and the nonparametric bootstrap and present promising performance as shown in the simulation studies. A colon cancer study further illustrates the proposed methods.
TI_33_3	Herrmann, Klaus	University of Sherbrooke
Title	The Extreme Value Limit Theorem for Dependent Sequences of Random Variables
Extreme value theory is concerned with the limiting distribution of location-scale transformed block-maxima Mn(X1, …, Xn) of a sequence of identically distributed random variables (Xi)ni=1 defined on a common probability space (Ω,F,P). In case Xi, i ∈N, are independent, the weak limiting behaviour of appropriately location-scale transformed Mn is adequately described by the classical Fisher-Tippett-Gnedenko theorem. In this presentation we are interested in the case of dependent random variables Xi, i ∈N, while keeping a common marginal distribution function F for all Xi, i ∈N. As dependence structures we consider Archimedean copulas and discuss the connection between block-maxima and copula diagonals. This allows one to derive a generalization of the Fisher-Tippett-Gnedenko theorem for Xi, i ∈N dependent according to Archimedean copulas. We discuss connections to exchangeability and upper tail independence. Finally, we illustrate the resulting limit laws and discuss their properties.
TI_11_2	Hitchcock, David	University of South Carolina
Title	A Spatio-temporal Model Relating Gage Height Data to Precipitation at South Carolina Locations
The gage height of rivers (i.e., the height of the water’s surface) can be used to help define flood events. We use a Conditionally Autoregressive (CAR) model to relate gage height measured daily over five years (2011-2015) at nearly 100 locations across South Carolina to several covariates. An important covariate is the daily precipitation at these locations. Other covariates considered include the elevation at the locations and a fall-season indicator variable. We also include interactions in our model. The spatial dependency is specified by defining catchment basins as neighborhoods. We use a Bayesian approach to estimate our model parameters. Both the temporal and spatial correlations in the model are significant. Precipitation appears to have a positive effect on gage height, and this effect is significantly greater during the fall season. This is joint work with Haigang Liu and S. Zahra Samadi.
TI_41_3	Hu, Guanyu	University of Connecticut
Title	A Bayesian Joint Model of Marker and Intensity of Marked Spatial Point Processes with Application to Basketball Shot Chart
The success rate of a basketball shot may be higher at locations in the court where a player makes more shots. In a marked spatial point process model, this means that the markers are dependent on the intensity of the process. We develop a Bayesian joint model of the marker and the intensity of marked spatial point processes, where the intensity is incorporated in the model of the marker as a covariate. Further, we allow variable selection through the spike-slab prior. Inferences are developed with a Markov chain Monte Carlo algorithm to sample from the posterior distribution. Two Bayesian model comparison criteria, the modified Deviance Information Criterion and the modified Logarithm of the Pseudo-Marginal Likelihood, are developed to assess the fit of different joint models. The empirical performance of the proposed methods is examined in extensive simulation studies. We apply the proposed methodology to the 2017--2018 regular season shot data of four professional basketball players in the NBA to analyze the spatial structure of shot selection and field goal percentage. The results suggest that the field goal percentages of all four players have are significantly positively dependent on their shot intensities, and that different players have different predictors for their field goal percentages
TI_48_0	Huang, Hsin-Hsiung	University of Central Florida
Title	Statistical Methodology for Big Data
In this session, the speakers will talk about various novel methods to handle problems of real data which may have large sample sizes from different locations, missing values, and other challenges.
TI_48_1	Huang, Hsin-Hsiung	University of Central Florida
Title	A new statistical strategy for predicting major depressive disorder using whole-exome genotyping data
Major depressive disorder is a common and serious psychiatric disorder, which may cause significant morbidity and mortality, and lead to high rates of suicide. Genetic factors have been proven to play important roles in the development of MDD. Recently, genome-wide association studies on common variants have been studied. However, the large amount of missing values influences the analysis results. In this paper, we proposed to treat the missing values as distinct categories with various statistical classification models. The classification results improve significant compared to imputation of the missing values.
TI_22_4	Jayalath, Kalanka	University of Houston - Clear Lake
Title	A Bayesian Survival Analysis for the Inverse Gaussian Data
This talk focuses on a comprehensive survival analysis for the inverse Gaussian distribution employing Bayesian and Fiducial approaches. The analysis previously made in the literature required the distribution mean to be known, which is unrealistic, and thus it restricted the scope of the investigation. No such assumption is made here. Also, this study further includes an illustration for survival analysis of data with random rightly censored observations. The Gibbs sampling is employed in estimation and bootstrap comparisons are made between the Bayesian and Fiducial estimates. It is concluded that the size of censoring in data and the shape of inverse Gaussian distribution have the most impact on the two analyses, Bayesian vs Fiducial.
TI_3_3	Johnston, Douglas E	State University of New York at Farmingdale
Title	A Recursive Bayesian Model for the Excess Distribution with Stochastic Parameters
The generalized extreme value (GEV) and Pareto (GPD) distributions are important tools for analyzing extreme values such as large losses in financial markets. In particular, the GPD is the canonical distribution for modelling excess losses above a “high” threshold. This conditional distribution is typically used for the computation of risk-metrics such as expected shortfall (i.e., the conditional mean) and extreme quantiles. In our work, we propose a new approach for analyzing extreme values by apply a stochastic parametrization to the GPD distribution with the parameters following a hidden stochastic process which results in a non-linear, non-Gaussian state-space model with unknown static parameters. This approach allows for dependencies, such as clustering of extremes, often witnessed in financial data. To compute the predictive excess loss distribution, we derive a Rao-Blackwellized particle filter that reduces the parameter space, and a concise, recursive solution is obtained. This has the benefit of improved filter performance and permits real-time implementation. We introduce a new risk-measure that is a more robust estimate for the expected shortfall and we illustrate our results using both simulated data and actual stock market returns from 1928-2018. Finally, we compare our results to traditional methods of estimating the excess loss distribution, such as maximum likelihood, to show the improvement obtained.
TI_12_1	Jones, Galin L.	University of Minnesota
Title	Fully Bayesian Penalized Regression with a Generalized Bridge Prior
We consider penalized regression models under a unified framework. The particular method is determined by the form of the penalty term, which is typically chosen by cross validation. We introduce a fully Bayesian approach that incorporates both sparse and dense settings and show how to use a type of model averaging approach to eliminate the nuisance penalty parameters and perform inference through the marginal posterior distribution of the regression coefficients. We establish tail robustness of the resulting estimator as well as conditional and marginal posterior consistency for the Bayesian model. We develop a component-wise Markov chain Monte Carlo algorithm for sampling. Numerical results show that the method tends to select the optimal penalty and performs well in both variable selection and prediction and is comparable to, and often better than alternative methods. Both simulated and real data examples are provided.
TI_34_4	Kang, Sang (John)	The University of Western Ontario
Title	Moment-based density approximation techniques as applied to heavy-tailed distributions
Several advances for the approximation and estimation of heavy-tailed distributions are proposed. It is ﬁrst explained that on initially applying the Esscher transform to heavy-tailed density functions, one can utilize a moment-based technique whereby the tilted density functions are expressed as the product of a base density function and a polynomial adjustment. Alternatively, density approximants can be secured by appropriately truncating the distributions or mapping them onto compact supports. Extensions to the context of density estimation, in which case sample moments are employed in lieu of exact moments are discussed, and illustrative applications involving actuarial data sets are presented.
TI_17_0	Kao, Ming-Hung (Jason)	Arizona State University
Title	Design and analysis of complex experiments: Theory and applications
The four talks on the design and analysis of complex experiments in this session include sub-data sections for big data, a large data issue in computer experiments, a study on order-of-addition experiments, and an optimal experimental design approach for functional data analysis.
TI_24_4	Kapenga, John	Western Michigan University
Title	Computation of High Dimensions Integrals
Integrals in dimensions from 20 to a few thousand have recently been used in several applications including finance, Bayesian statistics and quantum physics. Even infinitely dimension integrals have been attacked numerically. Traditional numerical methods and the usual Monte Carlo methods cannot be applied as the dimension increases beyond perhaps 20. A brief history and the status of effective current lattice methods, such as the fast CBC construction, will be presented. Several examples and timings will be included.
TI_30_3	Kim, Jong Min	University of Minnesota-Morris
Title	Change point detection method with copula conditional distribution to multistage sequential control chart
In this research we propose change point model of the multistage Statistical Process Control (SPC) chart for high correlated multivariate data via copula conditional distribution, principal component analysis (PCA) and functional PCA. Furthermore, we review the current available multistage statistical process control charts. In addition, to verify our proposed change point model, we compare the current change point models of the single stage SPC chart via PCA with our change point model for the multistage SPC chart via copula conditional distribution, PCA and functional PCA with highly correlated multistage simulated and real data
TI_18_0	Kozubowski, Tomasz	University of Nevada
Title	Discrete Stochastic Models and Applications
Discrete stochastic models are an essential part of statistician’s toolbox, as they are widely used across many areas of applications. The session focuses on recent developments in this important area, and its scope is rather broad, from univariate to multivariate discrete distributions, including hybrid models with discrete as well as continuous components, heavy-tail distributions, and their applications.
TI_36_3	Kozubowski, Tomasz	University of Nevada
Title	Multivariate models connected with random sums and maxima of dependent Pareto components
We present recent results concerning stochastic models for (X,Y,N), where X and Y, respectively, are the sum and the maximum of N dependent, heavy tailed Pareto components. Models of this form are desirable in many applications, ranging from hydro-climatology, to finance and insurance. Our construction is built upon a pivotal model involving a deterministic number of IID exponential variables, where the basic characteristics of the involved multivariate distributions admit explicit forms. In addition to theoretical results, we shall present real data examples, illustrating the usefulness of these models
TI_26_2	Krishnamoorthy, Kalimuthu	University of Louisiana at Lafayette
Title	Fiducial Inference with Applications
Fiducial distribution for a parameter is essentially the posterior distribution with no a prior distribution on the parameter. In this talk, we shall describe Fisher's method of finding a fiducial distribution for normal parameters and fiducial inference through examples involving well-known distributions such as the normal and related distributions. We then describe the approach for finding fiducial distributions for the parameters of a location-scale family and for discrete distributions. We illustrate the approach for the Weibull distribution and delta-lognormal distribution. In particular, we shall see fiducial methods for finding confidence intervals, prediction intervals, prediction limits for the mean of a future sample.
TI_19_0	Kumar, C. Satheesh	University of Kerala, Trivandrum, India
Title	Distribution Theory
The session consists of four talks - the first two talks will be on Weibull related classes of distributions, while the third talk on the analysis of competing risk data under progressive type-II censoring. The session concludes with a talk on certain classes of discrete distributions of order k.
TI_19_4	Kumar, C. Satheesh	University of Kerala
Title	On a Wide Class of Discrete Distribution
Several types of discrete distributions of order k are available in the literature and they have been found extensive applications in many areas of scientific research. In the present talk, we discuss certain new classes of discrete distributions of order k, which are developed as distributions of the random sum of certain independent and identically distributed Hirano type random variables. We attempt to outline several important distributional properties of these families of distributions along with a brief discussion on their mixtures and limiting cases.
TI_7_2	Lee, Gee	Michigan State University
Title	General insurance deductible ratemaking (and extensions)
Insurance claims have deductibles, which must be considered when pricing for insurance premium. The deductible may cause censoring and truncation to the insurance claims. In this talk, an overview of deductible ratemaking will be provided, and the pros and cons of two deductible ratemaking approaches will be compared; the regression approach, and the maximum likelihood approach. The regression approach turns out to have an advantage in predicting aggregate claims, while the maximum likelihood approach has an advantage when calculating theoretically correct relativities for deductible levels beyond those observed by empirical data. A comparison of selected models show that the usage of long-tail severity distributions may improve the deductible rating, while the 01-inflated frequency model may have limited advantages due to estimation issues under censoring and truncation. For demonstration, loss models fit to the Wisconsin Local Government Property Insurance Fund (LGPIF) data will be illustrated, and examples will be provided for the ratemaking of per-loss deductibles offered by the fund.
TI_22_3	Lee, I-Chen	National Cheng-Kung University
Title	Global Planning of Accelerated Degradation Tests
The accelerated degradation test (ADT) is an efficient tool for assessing the life-time information of highly reliable products. Without taking the experimental cost into consideration, recently, an analytical approach was proposed in the literature to determine the optimum stress levels and the corresponding optimum sample size allocation simultaneously in a general class of exponential dispersion (ED) degradation models. However, conducting an ADT is very expensive. Therefore, how to conduct a cost-constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this study further proposes a semi-analytical procedure to determine the total sample size, the measurement frequencies, and number of measurements (within a degradation path) globally under the class of ED degradation models. An example is used to demonstrate that our proposed method is very efficient to obtain the cost-constrained ADT plan, compared with the conventional optimum plan by the grid search algorithm.
TI_24_2	Lee, Kevin	Western Michigan University
Title	Temporal Exponential-Family Random Graph Models with Time-Evolving Latent Block Structure for Dynamic Networks
Model-based clustering of dynamic networks has emerged as an essential research topic in statistical network analysis. We present a principled statistical clustering of dynamic networks through the temporal exponential-family random graph models with a hidden Markov structure. The temporal exponential-family random graph models allow us to detect groups based on interesting features of the dynamic networks and the hidden Markov structure is used to infer the time-evolving block structure of dynamic networks. The power of our proposed method is demonstrated in real-world applications.
TI_20_0	Levine, Michael	Purdue University
Title	Recent advances involving latent variable models for various distributions
This session is dedicated to some new developments in latent variable models. Models for specific distributions that are widely used in practice as well as the nonparametric latent variable models will be discussed. Moreover, some models for new types of data lying in non-Euclidean spaces will also be considered. Taken together, the models discussed in this section are capable of modeling a very wide range of data with some hidden/unobservable structure.
TI_20_1	Levine, Michael	Purdue University
Title	Estimation of two-component skew normal mixtures where one component is known
Two component mixtures have a special relevance for binary classification problems. In the standard setting for binary classification, labeled samples from both components are available in the form of training data. However, many real-world problems do not fall in this standard paradigm. For example, in social networks users may only be allowed to click `like' (if there is no `dislike' button) for a particular product. Thus, labeled data can be collected only for one of the components (a sample containing users who clicked `like'). In addition, unlabeled data from the mixture (a sample containing all users) is also available. To guarantee unimodality of components and allow for the skewness, we model the components with a skew normal family, a generalization of the Gaussian family with good theoretical properties and tractable inference. An efficient algorithm that estimates a mixture proportion as well as the parameters of the unknown component is proposed. We illustrate its performance using a well-designed simulation study.
TI_21_0	Li, Daoji	California State University Fullerton
Title	Big Data and Dimension Reduction
This session will present recent advances in big data and dimension reduction, including optimal subsampling for massive data, scalable spectral clustering framework, Robust PCA, and High-dimensional interaction detection.
TI_21_4	Li, Daoji	California State University Fullerton
Title	High-dimensional interaction detection with false sign rate control
Understanding how features interact with each other is of paramount importance in many scientific discoveries and contemporary applications. Yet interaction identification becomes challenging even for a moderate number of covariates. In this paper, we suggest an efficient and flexible procedure for interaction identification in ultra-high dimensions. Under a fairly general framework, we establish that for both interactions and main effects, the method enjoys oracle inequalities in selection. We prove that our method admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Our method and theoretical results are supported by several simulation and real data examples.
TI_48_2	Li, Keren	Northwestern University
Title	Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models
We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR). As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers”
TI_46_0	Lio, Yuhlong	University of South Dakota
Title	Statistical Modeling for Degradation Data II
In recent years, statistical modeling and inference techniques have been developed based on different degradation measures. This invited session is based on the book “Statistical Modeling for Degradation Data” co-edited by Professors Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony Ng, Tzong-Ru Tsai, published by Springer in 2017. The book strives to bring together experts engaged in statistical modeling and inference to present and discuss the most recent important advances in degradation data analysis and related applications. The speakers in this session are invited to contribute to this book and further present their recent development in this research area.
TI_32_3	Lio, Yuhlong	University of South Dakota
Title	Estimation of Stress-Strength for Burr XII distribution based on the progressively first failure-censored samples
Stress-strength is studied under the progressively first failure-censored samples from Burr XII distributions.  Confidence intervals for stress-strength constructed respectively by using variate procedures are discussed.  Some computation results from simulation study are presented and an illustrative example is provided for demonstration.
TI_40_2	Liu, Ruiqi	Indiana University Purdue University Indianapolis
Title	Optimal Nonparametric Inference via Deep Neural Network
The deep neural network is a state-of-art method in modern science and technology. Much statistical literature has been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this work, we show that such log-factors are not necessary. We derive upper bounds for the L^2 minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proven as minimax optimal under suitable network architectures.
TI_47_2	Long, Hongwei	Florida Atlantic University, Baca Raton, FL.
Title	The Beta Transmuted Pareto Distribution: Theory and Applications
In this talk, we present a composite generalizer of the Pareto distribution. The genesis of the beta distribution and transmuted map is used to develop the so-called beta transmuted Pareto (BTP) distribution. Several mathematical properties including moments, mean deviation, probability weighted moments, residual life, distribution of order statistics and the reliability analysis are discussed. The method of maximum likelihood is proposed to estimate the parameters of the distribution. We illustrate the usefulness of the proposed distribution by presenting its application to model real-life data sets.
TI_33_2	Mailhot, Melina	University of Concordia
Title	Multivariate geometric expectiles and range value-at-risk
Geometric generalizations of expectiles and Range Value-at-Risk for d-dimensional multivariate distribution functions will be introduced. Multivariate geometric expectiles are unique solutions to a convex risk minimization problem and are given by d-dimensional vectors. Multivariate geometric Range Value-at-Risk is also a risk measure considering tail events, which has TVaR as a special case. They are well behaved under common data transformations. Properties and highlights on the influence of varying margins and dependence structures will be presented.
TI_8_2	Maity, Arnab Kumar	Pfizer Inc.
Title	Bayesian Data Integration and Variable Selection for Pan-Cancer Survival Prediction using Protein Expression Data
Accurate prognostic prediction using molecular information is a challenging area of research which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes like the survival time of the patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, the data are available for different tumor types hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models which accommodate all these challenges aforementioned here. We use hierarchical Bayesian accelerated failure time (AFT) model for the survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We allow to borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated The Cancer Proteome Atlas (TCPA) which contains RPPA based high quality protein expression data as well as detailed clinical annotation including survival times.  Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model which links different tumors with the correlated prior structures.
TI_30_2	Makubate, Boikanyo	Botswana International university of Science and Technology
Title	A New Generalized Weibull Distribution with Applications to Lifetime Data
A new and generalized Weibull-type distribution is developed and presented. Its properties are explored in detail. Some estimation techniques including maximum likelihood estimation method are used to estimate the model parameters and finally applications of the model to real data sets are presented to illustrate the usefulness of the proposed generalized distribution.
TI_14_4	Mallick, Avishek	Marshall University, West Virginia
Title	An Inflated Geometric Distribution and its application
A count data that have excess number of zeros, ones, twos or threes are common- place in experimental studies. But these inflated frequencies at particular counts may lead to over dispersion and thus may cause difficulty in data analysis. So to get appropriate results from them and to overcome the possible anomalies in parameter estimation, we may need to consider suitable inflated distribution. Generally, Inflated Poisson or Inflated Negative Binomial distribution are the most commonly used for modeling and analyzing such data. Geometric distribution is a special case of Negative Binomial distribution. This work deals with parameter estimation of a Geometric distribution inflated at certain counts, which we called Generalized Inflated Geometric (GIG) distribution. Parameter estimation is done using method of moments, empirical probability generating function based method and maximum likelihood estimation approach. The three types of estimators are then compared using simulation studies and finally a Swedish fertility dataset was modeled using a GIG distribution.
TI_42_1	Mandal, Saumen	University of Manitoba
Title	Constrained optimal designs for estimating probabilities in contingency tables
Construction of optimizing probability distributions plays an important role in many areas of statistical research. One example is estimation of cell probabilities in contingency tables. It is well known that the unconstrained maximum likelihood estimation of the cell probabilities is quite straightforward. However, the presence of constraints on the probabilities makes the problem quite challenging. For example, the constraints could be based on a hypothesis of marginal homogeneity. In this work, we attempt to solve the constrained maximum likelihood problem using optimal design theory, Lagrangian theory and simultaneous optimization techniques. This is an optimization problem with respect to variables that satisfy several constraints. We first formulate the Lagrangian function with the constraints, and then transform the problem to that of maximizing a number of functions of the cell probabilities simultaneously. These functions have a common maximum of zero that is simultaneously attained at the optimum. We then apply the methodology in some real data sets. Finally, we discuss that our approach is flexible and provide a unified framework for various types of constrained optimization problems.
TI_23_0	Marques, Filipe	Universidade NOVA de Lisboa, Portugal
Title	Advances in distribution theory and statistical methodologies

TI_24_0	McKean, Joseph	Western Michigan University
Title	Big Data: Algorithms, Methodology, and Applications
Statisticians and Data Scientists must face the challenges of Big Data. In these talks, new algorithms and procedures (robust and traditional) are discussed to handle these challenges. Algorithm optimization in terms of error distributions are discussed. Application areas covered, include astronomical data, network analysis, and numerical integration.
TI_10_1	Mdziniso, Nonhle Channon	Bloomsburg University of Pennsylvania
Title	Odd Pareto families of distributions for modeling loss payment data
A three-parameter Odd Pareto (OP) distribution is presented with density function having a flexible upper tail in modeling loss payment data. The OP distribution is derived by considering the distributions of the odds of the Pareto and inverse Pareto distributions. Basic properties of the OP distribution are studied. Simulation studies based on the maximum likelihood method are conducted to compare the OP with other Pareto-type distributions. Furthermore, examples from the Norwegian fire insurance claims data-set are provided to illustrate the upper-tail flexibility of the distribution. Extensions of the Odd Pareto distribution are also considered to improve the fitting of data.
TI_46_3	MeInykov, Volodymyr	The University of Alabama
Title	On Model-Based Clustering of Time-Dependent Categorical Sequences
Clustering categorical sequences is an important problem that arises in many fields such as medicine, sociology, and economics. It is a challenging task due to the fact that there is a lack of techniques for clustering categorical data as the majority of traditional clustering procedures are designed for handling quantitative observations. Situations with categorical data being related to time are even more troublesome. We propose a mixture-based approach for clustering categorical sequences and apply the developed methodology to a real-life data set containing sequences of life events for respondents participating in the British Household Panel Survey.
TI_25_4	Melnykov, Igor	Colorado State University
Title	Positive and negative equivalence constraints in the semi-supervised K-means algorithm
K-means algorithm is a widely used clustering procedure thanks to its intuitive design and computational simplicity. The objective function of the algorithm has a clear interpretation when the algorithm is applied as an unsupervised method. In a semi-supervised setting, when certain restrictions are imposed on the solution, modifications of the objective function are necessary. We consider two classes of equivalence constraints that may influence the proposed clustering solution. We propose a method making both kinds of restrictions a part of the fabric of the algorithm and provide the necessary modifications of its objective function
TI_25_0	Melnykov, Volodymyr	The University of Alabama
Title	New developments in finite mixture modeling with applications
Finite mixtures present a flexible tool for modeling heterogeneity in data. Model-based cluster analysis is the most famous application of mixture models. The session covers novel methodological developments in this area and considers various applications.
TI_25_1	Melnykov, Yana	The University of Alabama
Title	On finite mixture modeling of processes with change points
We consider a novel framework for modeling heterogeneous processes with change points. The proposed finite mixture model can effectively take into account the potential presence of change points. Conducted simulation studies show that the model can correctly assess the mixture order as well as the location of change points within mixture components. The application to real-life data yields promising results.
TI_25_2	Michael, Semhar	South Dakota State University
Title	Finite mixture of regression models for data from complex survey design
We explored the use of finite mixture regression models when the samples were drawn using a stratified sampling design. We developed a new design-based inference where we integrated sampling weights in the complete-data log-likelihood function. The expectation-maximization algorithm was derived accordingly. A simulation study was conducted to compare the proposed method with the finite mixture of a regression model. The comparison was done using bias-variance components of mean square error with interesting results. Additionally, a simulation study was conducted to assess the ability of the Bayesian information criterion to select the optimal number of components under the proposed modeling approach
TI_34_3	Mohsenipour, Akbar	Vivametrica
Title	Approximating the distribution of various types of quadratic expressions on the basis of their moments
Several moment-based approximations to the distribution of various types of quadratic forms and expressions, including those in singular Gaussian and in elliptically contoured random vectors are proposed. In the normal case, the moments are obtained recursively from the cumulants and the distribution of positive definite quadratic forms is approximated by means of two and three-parameter gamma-type distributions. Approximations to the density functions of Hermitian quadratic forms in normal vectors and quadratic forms in order statistics from a uniform population are provided as well.
TI_27_0	Muthukumarana, Saman	University of Manitoba
Title	Bayesian Methods with Applications
This session will highlight the use of Bayesian modelling and inferential methods in discovering genetic associations with diseases, image analysis, studying populations of animals and sports. Bayesian regression tree models, latent ancestral tree models, semi-parametric Bayesian methods using Dirichlet process and Bayesian models for photographic identification in animal populations are discussed.
TI_27_4	Muthukumarana, Saman	University of Manitoba
Title	Model Based Estimation of Baseball Batting Metrics
We consider the modeling of batting outcomes of baseball batters using a weighted likelihood approach and a semi-parametric Bayesian approach. The weighted likelihood allows the other batters to contribute to the inference so that the relevant information they contain is not lost and the weights are determined based on their dissimilarities with the target batter. Minimum Averaged Mean Squared Error (MAMSE) weights are used as the likelihood weights. We then propose a semi-parametric Bayesian approach based on Dirichlet process that enables the borrowing information across batters. We demonstrate and compare these approaches using 2018 Major League Baseball data
TI_28_0	Nayak, Tapan	George Washington University
Title	Protection of Respondents' Privacy and Data Confidentiality
Protecting respondent’s privacy and data confidentiality has become a very important topic in recent years. This session is devoted to discussing recent developments in this area.
TI_28_4	Nayak, Tapan	George Washington University
Title	Discussion
I shall present some concluding remarks on protecting respondent’s privacy and data confidentiality.
TI_22_1	Ng, Hon Keung Tony	Southern Methodist University
Title	Improved Techniques for Parametric and Nonparametric Evaluations of the First-Passage Time of Degradation Processes
Determining the first-passage time (FPT) distribution is an important topic in reliability analysis based on degradation data because FPT distribution provides some valuable information on the reliability characteristics. In this paper, we propose some improved techniques based on saddlepoint approximation to improve upon some existing methods to approximate the FPT distribution of degradation processes. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. The limitations related to the improved techniques are discussed and some possible solutions to these limitations are proposed. Concluding remarks and practical recommendations are provided based on the results.
TI_32_0	Ng, Hon Keung Tony	Southern Methodist University
Title	Statistical Models and Methods for Analysis of Reliability and Survival Data
This session focus on the statistical methodologies for analyzing different kinds of reliability and survival data in industrial and medical studies. These methods are important to reliability engineers and medical researchers because they make the extraction of lifetime characteristics possible through suitable statistical analysis and lead to better decision making.
TI_4_4	Nguyen, Yet	Old Dominion University
Title	A histogram-Based Method for False Discovery Rate Control in Two Independent Experiments
In this talk, we present a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. In one experiment, thousands or millions of features are tested for significance with respect to some factor of interest. In a second experiment, the same features are also tested for significance. Researchers are interested in identifying simultaneous signals, i.e., features that are significant in both experiments. We develop an FDR estimation and control procedure that is a generalization of the histogram-based FDR estimation and control procedure for one experiment. Asymptotic results and simulation studies are shown to investigate performance of the proposed method and other existing methods.
TI_34_2	Nkurunziza, Sévérien	University of Windsor
Title	Some identities for the risk and bias of shrinkage-type estimators in elliptically contoured distributions
We consider an estimation problem regarding the mean of a random matrix whose distribution is elliptically contoured. In particular, we study the properties of a class of multidimensional shrinkage-type estimators in the context where the variance-covariance matrix of the shrinking random component is the sum of two Kronecker products. We present some identities for computing some mixed moments as well as two general formulas for the bias and risk functions of shrinkage-type estimators. As a by-product, we generalize some identities established in Gaussian sample cases for which the shrinking random component is represented by a single Kronecker product.
TI_36_2	Nolan, John	American University
Title	Multivariate Generalized Logistic Laws
Multivariate Fréchet laws are a class of extreme value distributions that exhibit heavy tails and directional dependence controlled by an angular measure. Multivariate generalized logistic laws are a recently described sub-class that are dense in a certain sense. It is shown that these laws are related to positive multivariate sum stable laws, which gives a way to simulate from these laws. The corresponding angular measure density is described, and expressions for the density of the distribution are given.
TI_13_4	Olufadi, Yunusa	University of Memphis
Title	EM Bayesian variable selection for clustered discrete and continuous outcomes
Feature selection for Gaussian and non-Gaussian linear model is common in literature. However, to our knowledge, there is scant report on clustered discrete and continuous outcomes that are highly dimensional. Mixed outcomes data of this kind are becoming increasingly common in developmental toxicity (DT) studies and several other studies. In toxico-epigenomics study for example, interest might be to extract biomarkers of DT or detect new biomarkers of DT. We develop a Bayesian hierarchical modeling procedure to guide both the estimation and efficient extraction of the most useful features.
TI_30_0	Oluyede, Broderick	Georgia Southern University
Title	Copulas, Informational Energy, Exponential Dominance and Uncertainty for Generalized and Multivariate Distributions
Copulas, exponential dominance and uncertainty for generalized distributions are explored and comparisons via informational energy functional and differential entropy are presented in this session. More importantly, the first talk deals with stochastic dominance and bounds for cross-discrimination and uncertainty measures for weighted reliability functions. In the second talk, new generalized distributions are developed. In the third talk, change point model for high correlated multivariate data via copula conditional distribution, principal component analysis (PCA) and functional PCA is presented. Finally, the last presentation deals with a class of stochastic SEIRS epidemic dynamic models.
TI_30_1	Oluyede, Broderick	Georgia Southern University
Title	Informational Energy, Stochastic Inequalities and Bounds for Weighted Weibull-Type Distributions.
In this talk, generalized distributions that are weighted distributions are presented. Inequalities and dominance, uncertainty and informational measures for weighted and parent generalized Weibull-type distributions are developed. Comparisons of the weighted and parent generalized Weibull-type distributions via informational energy function and the differential entropy are presented. Moment-type and stochastic inequalities as well as bounds for cross-discrimination and uncertainty measures in weighted and parent life distribution functions and related reliability measures are given.
TI_31_0	Omolo, Bernard	University of South Carolina – Upstate
Title	Statistical Methods for High‐Dimensional Data Analysis: Application to Genomics

TI_31_1	Omolo, Bernard	University of South Carolina – Upstate
Title	A Model-based Approach to Genetic Association Testing in Malaria Studies
In this study, we propose a two-step approach to genetic association testing in malaria studies in a GWAS setting that may enhance the power of the tests, by identifying the underlying genetic model first before applying the association tests. This is performed through tests of significance of a given genetic effect, noting the minimum p-values across all the models and the proportion of tests that a given genetic model was deemed the best, using simulated data. In addition, we fit generalized linear models for the genetic effects, using case-control genotype data from Kenya, Gambia and Malawi, available from MalariaGEN®.
TI_1_2	Oraby, Tamer	University of Texas - Rio Grande Valley
Title	Modeling Progression of Co-Morbidity Using Bivariate Markov Chains
In this work, we use bivariate Markov Chain (MC) to model the progression of two diseases or morbidities, like obesity and diabetes, and the correlation between both processes. We postulate that the MC has rates of transition that are dependent on a set of covariate, like age and gender as well as treatment. The data includes individuals who are dependent due to familial relationship. We will present the estimation of the model’s parameters and discuss its goodness of fit.
TI_18_3	Otunuga , Olusegun	Marshall University
Title	Closed form probability distribution of number of infections at a given time in a stochastic SIS epidemic model
We study the effect of external fluctuation in the transmission rate of certain diseases and how this perturbation affects the distribution of the number of infections over time. To do this, we introduce random noise in the transmission rate in a deterministic SIS model and study how the number of infections behaves over time. The closed form probability distribution of the number of infections at a given time in the resulting stochastic SIS epidemic model is derived. Using the Fokker-Planck equation, we reduce the differential equation governing the number of infections to a generalized Laguerre differential equation. The distribution is demonstrated using U.S. influenza data.
TI_23_1	Oyamakin S. O.	Universidade de São Paulo
Title	Some New Nonlinear Growth Models For Biological Processes based on Hyperbolic Sine Function
In this paper, we propose maximum a posteriori (MAP) estimators for the parameters of some survival distributions, which have a simple closed-form expression. In principle, we focus on the Nakagami distribution, which plays an essential role in communication engineering problems, particularly to model fading of radio signals. Moreover, we show that the obtained results can be extended to other survival probability distributions, such as the gamma and generalized gamma ones. Numerical results reveal that the MAP estimators outperform the existing estimators and produce almost unbiased estimates even for small sample sizes. Our applications are driven by embedded systems, which are commonly used in communication engineering. Particularly, they can consist of an electronic system inside a microcontroller, which can be programmed to maintain communication between a transmitting antenna and mobile antennas, which are operating at the same frequency. In this context, from the statistical point of view, closed-form estimators are needed, since they are embedded in mobile devices and need to be sequentially recalculated at real time.
TI_6_4	Ozdemir, Senay	Afyon Kocatepe University
Title	Combining Heavy-Tailed Distributions and Empirical Likelihood method for Linear Regression Model
Empirical likelihood (EL) estimation method proposed by Owen (1991) is one of the nonparametric methods to estimate the parameters of a linear regression model. In EL method an EL function is maximized under some constraints formed using the likelihood scores under normally distributed errors. In this paper, an alternative empirical likelihood (EL) estimator for the parameter vector of a linear regression model is proposed using the score functions of some popular heavy tail distributions as the constraints in the EL estimation method. Our numerical studies show that, when data set is subject to heavy- tailedness, the performance of the proposed EL estimator is remarkably superior to the performance of the EL estimator obtained under normally distributed error terms .
TI_32_4	Pal, Suvra	University of Texas at Arlington
Title	A New Estimation Algorithm for a Flexible Cure Rate Model
In this talk, I will first present a flexible cure rate model that contains the mixture cure rate model and promotion time cure rate model as special cases. For the estimation of the model parameters, I will present the results of the well-known EM algorithm and then discuss some of the issues associated with the EM algorithm. To circumvent these issues, I will present a new optimization procedure based on non-linear conjugate gradient (NCG) algorithm. Through a simulation study, I will show the advantages of NCG algorithm over the EM algorithm.
TI_41_2	Pal, Subhadip	University of Louisville
Title	A Bayesian Framework for Modeling Data on the Stiefel Manifold.
Directional data emerges in a wide array of applications, ranging from atmospheric sciences to medical imaging. Modeling such data, however, poses unique challenges by virtue of their being constrained to non-Euclidean spaces like manifolds. Here, we present a Bayesian framework for inference on the Stiefel manifold using the Matrix Langevin distribution. Specifically, we propose a novel family of conjugate priors and establish a number of theoretical properties relevant to statistical inference. Conjugacy enables translation of these properties to their corresponding posteriors, which we exploit to develop the posterior inference scheme. For the implementation of the posterior computation, including the posterior sampling, we adopt a novel computational procedure for evaluating the hypergeometric function of matrix arguments that appears as normalization constants in the relevant densities.
TI_18_2	Panorska, Anna K.	University of Nevada, Reno
Title	Discrete Pareto Distributions, Butterfly Diet Breadth, and Climate Change
We propose a new discrete distribution with finite support, which generalizes truncated Pareto and beta distributions as well as uniform and Benford’s laws. We present its fundamental properties and consider parameter estimation. We include an illustration of the applications of this new stochastic model in ecology.
TI_37_3	Pararai, Mavis	INDIANA UNIVERSITY OF PENNSYLVANIA
Title	The Weibull Linear Failure Rate Distribution and Its Applications
A new distribution called Weibull Linear Failure distribution is introduced and its properties are explored. The properties of this new distribution and its sub models will be discussed. Some statistical properties of the proposed distribution and maximum likelihood estimation of parameters are discussed. A simulation study to examine the bias and mean square error of the maximum likelihood estimators for each parameter is presented. Finally, applications of the model using a real data set is presented to illustrate how useful the model is.
TI_38_0	Peng, Hanxiang	Binghamton University
Title	Empirical Likelihood
The session addresses topics centered around the empirical likelihood approach.
TI_38_1	Peng, Hanxiang	Indiana University-Purdue University Indianapolis
Title	Maximum empirical likelihood estimation in U-statistics based general estimating equations.
In this talk, we discuss maximum empirical likelihood estimates (MELE's) in U-statistics based general estimating equations. Our approach is the jackknife empirical likelihood (JEL). We derive the estimating equations for MELE's and provide asymptotic normality. We provide a class of MELE's which have less computational burden than the usual MELE's and can be implemented using existing software. We show that the MELE's are efficient. We present several examples for constructing efficient estimates for moment based distribution characteristics in the presence of side information. In the end, we report some simulation results.
TI_13_2	Peng, Hanxiang	Indiana University-Purdue University Indianapolis
Title	An Empirical Likelihood Approach of Testing of Multivariate Symmetries
We propose several empirical likelihood tests for testing spherical symmetry, rotational symmetry, antipodal symmetry, coordinate-wise symmetry, and exchangeability. We construct the tests by exploiting the characterizations of these symmetries. The jackknife empirical likelihood for vector U-statistics are employed to incorporate side information. We exhibit that the tests are distribution free and asymptotically chi-square distributed. We report some simulation results about the numerical performance of the tests.
TI_26_0	Peng, Jianan	Acadia University
Title	Generalized and Fiducial Inference with Applications
Generalized inference, introduced by Weerahandi, has many applications. Fiducial inference, initiated by Fisher, is resurrecting to a new life, mainly due to Hanning and other researchers. In this session we have two talks (including the one by Weerahandi) on generalized inference and two talks on (generalized) fiducial inference.
TI_26_4	Peng, Jianan	Acadia University
Title	Successive Comparisons for One-way Layout under Heteroscedasticity
Suppose that k (k>2) treatments in a one-way layout are ordered in a certain way. For example, the treatments may be increasing dose levels of a drug in dose response studies. The experimenters may be interested in the successive comparisons of the treatments. In this talk, we consider the simultaneous confidence intervals for the successive comparisons under heteroscedasticity. We propose several methods, including the maxT method, the minP method, and the generalized fiducial confidence intervals, among others. We show that the generalized fiducial confidence intervals have correct coverage probability asymptotically. A simulation study and a real data example are given to illustrate the proposed procedures.
TI_11_1	Peng, Stephen	Georgetown University
Title	A Flexible Univariate Autoregressive Time-Series Model for Dispersed Count Data
Integer-valued time series data have an ever-increasing presence in various applications and need to be analyzed properly. While a Poisson autoregressive (PAR) model would seem like a natural choice to model such data, it is constrained by the equi-dispersion assumption. Hence, data that are over- or under-dispersed are improperly modeled, resulting in biased estimates and inaccurate forecasts. This work (coauthored by Stephen Peng and Ali Arab) instead develops a flexible integer-valued autoregressive (INAR) model for count data that contain over- or under-dispersion. Using the Conway-Maxwell-Poisson (COM-Poisson or CMP) distribution and related distributions as motivation, we develop a first-order sum-of-Conway-Maxwell-Poisson autoregressive (SCMPAR(1)) model that will instead offer a generalizable construct that captures the PAR, negative binomial AR (NBAR), and binomial AR (BAR) models respectively as special cases, and serve as an overarching representation connecting these three special cases through the dispersion parameter. We illustrate the SCMPAR model's flexibility through simulated and real data examples.
TI_17_2	Phoa, Frederick	Academia Sinica
Title	A systematic construction of cost-efficient designs for order-of-addition experiments
An order-of-addition (OofA) experiment aims at investigating how the order of factor inputs affects the experimental response, which is of great interest in clinical trials and industrial processes. Recent studies on the OofA designs focused on their properties of algebraic optimality rather than cost-efficiency. In this talk, we propose a systematic construction on the cost-efficient designs of the OofA experiments, which each pair of level settings from two different factors appears exactly once. Furthermore, unlike recent studies on OofA experiments, our designs can handle experimental factors with more than one level. Notice that the use of placebo or the choice of different does reveal the practicality of our designs in clinical trials for example.
TI_33_0	Pigeon, Mathieu	Université du Québec à Montréal (UQAM) , Canada
Title	Recent developments in predictive distribution modelling with applications in insurance

TI_23_3	Piperigou, Violetta	University of Patras, Greece
Title	Maximum Likelihood Estimators for a Class of Bivariate Discrete Distributions
"The method of maximum likelihood (ML) yields estimators which, asymptotically, are normally distributed, unbiased and with minimum variance. In this method, computational difficulties are encountered when families of univariate discrete distributions are considered such as convolutions and compound distributions. For these types of distributions the probabilities are given through recurrence relations and consequently the ML estimators require iterative procedures to be obtained. It has been shown that in a large class of univariate discrete distributions, the ML equations can be reduced by one, which is replaced by the first equation of the method of moments. As examples of two-parameter distributions the Charlier and the Neyman are presented, where only a single equation need be solved iteratively to derive the estimators. The parameterization used, when working with these distributions, often leads to extremely high correlations of the ML estimators. A reparameterization that reduces or eliminates such correlation is desirable. If the MLE's are asymptotically uncorrelated the parameterization is orthogonal. It is discussed such a reparameterization for a class of discrete distributions, where one of the orthogonal parameters is the mean. This class includes, among others, Delaporte and Hermite univariate distributions. These results are extended to a class of bivariate discrete distributions and the properties of MLE's are given. The case of a three-parameter bivariate Poisson is extensively discussed and some examples of applications are given."
TI_47_4	Pokhrel, Keshav P.	University of Michigan-Dearborn
Title	Reliability Models Using the Composite Generalizers of Weibull Distribution
In this article, we study the composite generalizers of Weibull distribution using exponentiated, Kumaraswamy, transmuted and beta distributions. The composite generalizers are constructed using both forward and reverse order of each of these distributions. The usefulness and effectiveness of the composite generalizers and their order of composition is investigated by studying the reliability behavior of the resulting distributions. Two sets of real-world data are analyzed using the proposed generalized Weibull distributions.
TI_27_2	Pratola, Matthew	The Ohio State University
Title	Adaptive Splitting Bayesian Regression Tree Models for Image Analysis
Bayesian regression tree models are competitive with leading machine learning algorithms yet retain the ability to capture uncertainties, making them incredibly useful for many modern statistical applications where one requires more than point prediction. However, a key limitation is the variable split rules, which are determined using static candidates. This limits the ability of the model to capture local sources of variation, and increasing the number of candidates is computationally burdensome. We introduce a novel adaptive strategy that replaces static splits with a dynamic grid that allows the tree bases to adapt, thereby more efficiently capturing patterns of local variation. Combined with a clever dimension-reduction prior enables low-dimensional tree representations of processes. We demonstrate these advances on an image analysis study investigating beach visitor counts in San Diego.
TI_34_0	Provost, Serge	The University of Western Ontario
Title	Recent Distributional Advances Involving Population and Sample Moments
This session features novel advances in connection with the application of certain moment-based methodologies to data modeling, the approximation of the distribution of quadratic forms and the estimation of heavy-tailed distributions. As well, a shrinkage-type estimator of the mean of an elliptically contoured random vector is introduced.
TI_34_1	Provost, Serge	The University of Western Ontario
Title	On recovering sample points from their associated moments and certain moment-based density estimation methodologies
A theorem asserting that, given the first n moments of a sample of size n, one can retrieve the original n sample points, will be discussed. For instance, this result entails that all the information being available in a sample of size n is contained in its first n moments, which substantiates the utilization of sample moments in statistical modeling and inference. Clearly, only a number of these n moments are useable in practice. Certain density estimation methodologies relying on such sample moments shall be presented.
TI_35_0	Qingcong Yuan (org: Qian, Lianfen)	University of Kentucky
Title	Recent Advances in Analyzing Medical Data and Dimension Reduction
This purpose of this invited session is to disseminate most recent advances in analyzing medical data and dimension reduction methods. Specifically interests may be on modeling semi-competing risks data, imputation methods for missing data and dimension reduction.
TI_17_3	Rha, Hyungmin	Arizona State University
Title	A probabilistic subset search (PSS) algorithm for optimizing functional data sampling designs
We study optimal sampling times for functional data. Our main objective is to find the best sampling schedule on the predictor time axis to precisely recover the trajectory of predictor function and predict the scalar/functional response through functional linear regression models. Three optimal designs are considered: the schedule maximizing the precision of recovering predictor function, the schedule best for predicting response, and the schedule optimizing a user-defined mixture of the relative efficiencies of the two objectives. We propose an algorithm that can efficiently generate nearly optimal designs, and demonstrate that our approach outperforms the previously proposed methods.
TI_36_0	Richter, Wolf-Dieter	University of Rostock
Title	Multivariate distributions
Authors of this Session discuss a new methodology for evaluating probabilities and normalizing constants of probability distributions particular extreme value distributions that exhibit heavy tails and controlled directional dependenc construction and application of models connected with sumsand maxima of dependent Pareto components - the stochastic representation, simulation and dynamic geometric disintegration of (p_1,…,p_k)-spherical probability laws.
TI_36_4	Richter, Wolf-Dieter	University of Rostock
Title	On (p_1,...,p_k)-spherical distributions
The class of (p_1, … , p_k)-spherical probability laws and a method of simulating random vectors following such distributions are introduced using a new stochastic vector representation. A dynamic geometric disintegration method and a corresponding geometric measure representation are used for generalizing the classical Chi-square-, t- and F- distributions. Combining the principles of specialization and marginalization gives rise to an effective method of dependence modeling.
TI_10_3	Samanthi Ranadeera	Central Michigan University
Title	On bivariate distorted copulas
In this talk, we propose families of bivariate copulas based on the distortions of existing copulas. The beta and Kumaraswamy cumulative distribution functions are employed to construct the proposed distorted copulas. With the additional two parameters in the distributions, the distorted copulas permit more flexibility in the dependence behaviors. Two theorems linking the original tail dependence behaviors and those of the distorted copula are derived for distortions that are asymptotically proportional to the power transformation in the lower tail and the dual-power transformation in the upper tail. Simulation results and an application to financial risk management are presented.
TI_45_4	Samanthi, Ranadeera	Central Michigan University
Title	Methods for Generating Coherent Distortion Risk Measures
In this talk, we present methods for generating new distortion functions by utilizing distribution functions and composite distribution functions. To ensure the coherency of the corresponding distortion risk measures, the concavity of the proposed distortion functions is established by restricting the parameter space of the generating distribution. Closed-form expressions for risk measures are derived for some cases. Numerical and graphical results are presented to demonstrate the effects of parameter values on the risk measures for exponential, Pareto and log-normal losses. In addition, we apply the proposed distortion functions to derive risk measures for a segregated fund guarantee. (This is a joint work with Jungsywan Sepanski, Central Michigan University.)
TI_12_4	Sanz-Alonzo, Daniel	University of Chicago
Title	Scalable graph-based Bayesian semi-supervised learning
The aim of this talk is to present some new theoretical and methodological developments concerning the graph-based, Bayesian approach to semi-supervised learning. I will show suitable scaling of graph parameters that provably lead to robust Bayesian solutions in the limit of large number of unlabeled data. The analysis relies on a careful choice of topology and in the study of the spectrum of graph Laplacians. Besides guaranteeing the consistency of graph-based methods, our theory explains the robustness of discretized function space MCMC methods in semi-supervised learning settings.
TI_28_2	Sarathy, Rathindra	Oklahoma State University
Title	Statistical Basis for Data Privacy and Confidentiality
Statistical disclosure limitation methods are occasionally viewed as ad hoc methods, providing no strong privacy or confidentiality guarantees. Although not true, this has been the primary motivation for recent standards such as differential privacy and their associated methods. In this talk, we explore the statistical basis for data confidentiality and methods that satisfy privacy and confidentiality requirements. We discuss the concepts underlying differential privacy to provide a comparison, as well as the potential utility trade-offs under both these frameworks.
TI_37_0	Sarhan, Ammar	Dalhousie University
Title	Generalization of lifetime distributions
Generalization of lifetime distribution is one of the important tools in lifetime analysis. Most of the commonly used lifetime distributions have monotonic hazard rate functions. In applications, many data sets show non-monotonic shapes of the hazard rates. In this session, some of the generalizations of lifetime distributions will be discussed.
TI_37_1	Sarhan, Ammar	Dalhousie University
Title	A new extension of the two-parameter bathtub hazard shaped distribution
This article proposes a new generalization of the two-parameter bathtub shaped lifetime distribution, named the odd generalized exponential two-parameter bathtub shaped. Statistical properties of the proposed distribution are discussed. The maximum likelihood and Bayesian procedures are used to estimate the model parameters and some of its reliability measures. To discuss the applicability of the proposed distribution, two real data sets are analyzed using different sampling scenarios. Simulations study is provided to investigate the properties of the methods applied.
TI_25_3	Sarkar, Shuchismita	Bowling Green State University
Title	Finite mixture modeling and model-based clustering for directed weighted networks
A novel approach relying on the notion of mixture models is proposed for modeling and clustering directed weighted networks. The developed methodology can be used in a variety of settings including multilayer networks. Computational issues associated with the developed procedure are effectively addressed by the use of MCMC techniques. The utility of the methodology is illustrated on the set of experiments as well as applications to real-life data containing export trade amounts for European countries.
TI_24_1	Schafer, Chad	Carnegie Mellon University
Title	Astrostatistics in the Era of LSST
The Large Synoptic Survey Telescope (LSST) will yield 15 Terabytes of data each evening over a ten year period, revolutionizing our understanding of the Universe. In this talk I will describe some of the opportunities, focusing on the recurring challenges when working with high-dimensional and noisy astronomical data. In their raw form, these data are difficult to model, and assumptions that may have been reasonable at small sample sizes could be revealed to be inadequate by LSST-scale data. Such inference challenges provide statisticians with opportunities to both contribute to science, and to advance statistical methodology.
TI_18_4	Schissler, A. Grant	University of Nevada
Title	On Simulating Ultra High-Dimensional Multivariate Discrete Data
It's critical to conduct realistic Monte Carlo studies. This is problematic when data are inherently multivariate and high dimensional. This situation appears frequently in high-throughput biomedical experiments (e.g., RNA-sequencing). Researchers, however, often resort to simulation designs that posit independence --- greatly diminishing insights into the empirical operating characteristics of any proposed methodology. To meet this gap, we propose a procedure to simulate high-dimensional multivariate discrete distributions and study its performance. We apply our method to simulate RNA-sequencing data sets (dimension > 20,000) with negative binomial marginals.
TI_5_2	Schmegner, Claudia	DePaul University
Title	TX Family and Horseshoe Priors
Consider the problem of estimating the vector of normal means θ= (θ1,...,θn) in the ultra-sparse normal means model (yi\|θi)∼N(θi,1) for i= 1,...,n. Horseshoe priors are very at handling cases in which many components of θ are exactly or approximately 0. The name “horseshoe” does not describe the shape of the density of θi, but rather the shape of the implied prior for the shrinkage coefficient associated with θi. We use the TX technique for generating distributions to propose new classes of Horseshoe priors, investigate their properties and compare their performances to those of the usual ones.
TI_8_3	Sen, Ananda	University of Michigan, Ann Arbor
Title	Honey I Shrunk the Intercept
In logistic regression, separation occurs when a linear combination of predictors perfectly discriminates the binary outcome. This is the premise of the current discourse. Because finite valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Classical studies of separation imply that efficiency in estimating regression coefficients may also depend upon the choice of intercept prior, yet relatively little focus has been given on whether and how to shrink the intercept parameter. Alternative prior distributions for the intercept are proposed that down-weight implausibly extreme regions of the parameter space, yielding regression estimates that are less sensitive to separation. Through extensive simulation, differences across priors are assessed using statistics measuring the degree of separation. Relative to diffuse priors, the proposed priors generally yield more efficient estimation of the regression coefficients themselves when the data are separated or nearly so. Moreover, they are equally efficient in non-separated datasets, making them suitable for default use. These numerical studies also highlight the interplay between priors for the intercept and the regression coefficients. Finally, the methodology is illustrated through implementation on a couple of datasets in the biomedical context.
TI_44_3	Shahzad, Mirza Naveed	University of Gujrat
Title	Singh-Maddala Distribution: A new candidate to analyze the extreme value data by linear moment estimation
Modeling, accurate inference, and prediction of extreme events by probabilistic models are very important in every field to minimize the damage as much as possible due to extremes. To secure this useful purpose, Singh-Maddala distribution is considered in this article as a new candidate for the analysis of extreme events. The extreme value datasets are frequently heavy-tailed, for such datasets method of L-moments and method of TL-moments are proposed to estimate the parameters of the distribution. The results of the simulation study and real dataset are indicated that the estimates of the linear-moments are the least bias than other methods.
TI_43_2	Shao, Xiaofeng	University of Illinois at Urbana Champaign
Title	Inference for change points in high dimensional data
In this talk, I will  present some recent work on change point testing and estimation for high dimensional data.  In the case of testing for a mean shift, we propose a new test which is based on U-statistics and utilizes the self-normalization principle. Our test targets dense alternatives in the high dimensional setting and  involves no tuning parameters. We show the weak convergence of a sequential U-statistic based process to derive the pivotal limit under the null and also obtain the asymptotic power under the local  alternatives.  Time permitting, we illustrate how our approach can be used in combination with wild binary segmentation to estimate the  number and location of multiple unknown change points.
TI_42_2	Shay, Garrett Charlie	Brock University
Title	Probabilistic and non-probabilistic methods of active learning for classification
Active learning is a useful learning process for classification. With a fixed size of training data, an active classifier selects the most beneficial data to learn from and achieves better classification accuracy than a passive classifier. We discuss the methods of developing optimal active learning processes, including both probabilistic and non-probabilistic ones. For a comparison study, we adapt a probabilistic classifier obtained by logistic regression, as well as a non-probabilistic classifier derived from an estimated discriminant function. Performance of proposed active classifiers is investigated under varying conditions and assumptions. Optimal two-stage and sequential active classification has been developed. Monte Carlo simulations have shown improved classification accuracy of the proposed active learning process compared to passive learning process for all scenarios considered.
TI_33_1	Shi, Peng	University of Wisconsin-Madison
Title	Regression for Copula-linked Compound Distributions with Applications in Modeling Aggregate Insurance Claims
In actuarial research, a task of particular interest and importance is to predict the loss cost for individual risks so that informative decisions are made in various insurance operations such as underwriting, ratemaking, and capital management. The loss cost is typically viewed to follow a compound distribution where the summation of the severity variables is stopped by the frequency variable. A challenging issue in modeling such outcome is to accommodate the potential dependence between the number of claims and the size of each individual claim. In this article, we introduce a novel regression framework for compound distributions that uses a copula to accommodate the association between the frequency and the severity variables, and thus allows for arbitrary dependence between the two components. We further show that the new model is very flexible and is easily modified to account for incomplete data due to censoring or truncation. The flexibility of the proposed model is illustrated using both simulated and real data sets. In the analysis of granular claims data from property insurance, we find substantive negative relationship between the number and the size of insurance claims. In addition, we demonstrate that ignoring the frequency-severity association could lead to biased decision-making in insurance operations.
TI_37_4	Sinha, Sanjoy K.	Carleton University
Title	Joint modeling of longitudinal and time-to-event data with covariates subject to detection limits
In many clinical studies, subjects are measured repeatedly over a fixed period of time. Longitudinal measurements from a given subject are naturally correlated. Linear and generalized linear mixed models are widely used for modeling the dependence among longitudinal outcomes. In addition to the longitudinal data, we often collect time-to-event data (e.g., recurrence time of a tumor) from the subjects. When multiple outcomes are observed from a given subject with a clear dependence among the outcomes, a natural way of analyzing these outcomes and their associations would be the use of a joint model. I will discuss a likelihood approach for jointly analyzing the longitudinal and time-to-event data. The method would be useful for dealing with left-censored covariates often observed in clinical studies due to limits of detection. The finite-sample properties of the proposed estimators will be discussed using results from a Monte Carlo study. An application of the proposed method will be presented using a large clinical dataset of pneumonia patients obtained from the Genetic and Inflammatory Markers of Sepsis (GenIMS) study.
TI_43_4	Sriperumbudur, Bharath	Penn State University
Title	Approximate Kernel PCA: Computational vs. Statistical Trade-off
Kernel principal component analysis (KPCA) is a popular non-linear dimensionality reduction technique, which generalizes classical linear PCA by finding functions in a reproducing kernel Hilbert space (RKHS) such that the function evaluation at a random variable X has maximum variance. Despite its popularity, kernel PCA suffers from poor scalability in big data scenarios as it involves solving an x n eigensystem leading to a computational complexity of O(n^3) with n being the number of samples. To address this issue, in this work, we consider a random feature approximation to kernel PCA which requires solving an m x m eigenvalue problem and therefore has a computational complexity of O(m^3), implying that the approximate method is computationally efficient if m<n with m being the number of random features. The goal of this work is to investigate the trade-off between computational and statistical behaviors of approximate KPCA, i.e., whether the computational gain is achieved at the cost of statistical efficiency. We show that the approximate KPCA is both computationally and statistically efficient compared to KPCA in terms of the error associated with reconstructing a kernel function based on its projection onto the corresponding eigenspaces. Depending on the eigenvalue decay behavior of the covariance operator, we show that only n^{2/3} features (polynomial decay) or \sqrt{n} features (exponential decay) are needed to match the statistical performance of KPCA, which means without losing statistically, approximate KPCA has a computational complexity of O(n^2) or O(n^{3/2}) depending on the eigenvalue decay behavior. We also investigate the statistical behavior of approximate KPCA in terms of the convergence of eigenspaces wherein we show that only \sqrt{n} features are required to match the performance of KPCA and if fewer than \sqrt{n} features are used, then approximate KPCA has a worse statistical behavior than that of KPCA.
TI_7_1	Su, Jianxi	Purdue University
Title	Full-range tail dependence copulas for modeling dependent insurance and financial data
Copulas are important tools when it comes to formulating models for multivariate data analysis.  An ideal copula should conform to a wide range of problems at hand by allowing for symmetricity and asymmetricity as well as for varying strengths of tail dependence. The copulas I plan to introduce are exactly such in that they satisfy all the aforementioned criteria. Specifically, in this talk, I shall introduce a class of full-range tail dependence copulas, which have proved to be very useful for modeling dependent financial/insurance data. I shall discuss the key mechanisms for constructing full-range tail dependence copulas and some fundamental properties of these structures.  Future research directions will be also discussed.
TI_19_1	Subha, R. Nair	HHMSPB NSS College for Women
Title	A generalization to the log-Weibull distribution and its applications in cancer research
Through this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution of Keller et al (Reliability Engineering, 1982).The theoretical properties of the distribution are investigated in detail including expressions for its cumulative distribution function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Some reliability aspects as well as the distribution and moments of order statistics are also discussed. The maximum likelihood estimation of the parameters of the proposed distribution is attempted and certain applications of the distribution in modelling data sets arising from industrial as well as bio-medical cancer related backgrounds are illustrated using real life examples. Further, the asymptotic behaviour of the estimators are examined with the help of simulated data sets.
TI_45_1	Sun, Ning	Western University
Title	The Pareto Optimal Design for Earthquake Index-based Insurance Based on Exponential Utilities
We obtain a necessary condition for the Pareto optimal earthquake index-based insurance design based on the decomposition of catastrophe risks. Moreover, we derive the explicit form of this Pareto optimal insurance design under the exponential utility assumption. Besides, minimization of the basis risk for this index-based insurance design is also discussed. Finally, we illustrate how a typical design of such an insurance product could be obtained from the observed data using historical economic losses due to earthquakes in mainland China.
TI_17_1	Sung, Chih-Li(Charlie)	Michigan State University
Title	Exploiting variance reduction potential in local Gaussian process search for large computer experiments
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley (2015)) was proposed as an accurate and computationally feasible emulation alternative. Constructing local sub-designs specific to predictions at a particular location of interest remains a substantial computational bottleneck to the technique. In this talk, two computationally efficient neighborhood search limiting techniques are introduced, and two examples demonstrate that the proposed methods indeed save substantial computation while retaining emulation accuracy.
TI_13_3	Szabo, Aniko	Medical College of Wisconsin
Title	Semi-parametric Model for Exchangeable Clustered Binary Outcomes
Dependent or correlated binary data occur in repeated measurement studies, longitudinal experiments, teratological risk assessment, and other important experimental studies. Both parametric and non-parametric models have been proposed for dose-response experiments with such data. In this work we propose semi-parametric models that combine a non- parametric baseline describing the within-cluster dependence structure with a parametric between-group effect. We develop an Expectation Minimization Minorize-Maximize algorithm to fit the model, apply it to several datasets, and compare the semi-parametric estimates of joint probabilities from different dose levels with corresponding GEE and non-parametric estimates.
TI_36_1	Takemura, Akimichi	Shiga University
Title	Holonomic gradient method for evaluation of multivariate probabilities
In2011 we developed a new methodology "holonomic gradient method"(HGM), which is useful for evaluation of probabilities and normalizing constants of probability distributions. Since then we have applied HGM to various problems, including distribution of roots of Wishart matrices, orthant probabilities and some distributional problems related to wireless communication. In this talk we give an introduction of HGM and present applications of the method to evaluation of multivariate probabilities.
TI_18_1	Tomoaki, Imoto	University of Shizuoka
Title	Bivariate GIT distribution
In this talk, we propose a bivariate discrete distribution, which is derived from a first passage point of the two dimensional random walk on lattice. This distribution is seen as a convolution of bivariate binomial and negative binomial distributions. Moreover its marginal distributions are also seen as a convolution of univariate binomial and negative binomial distributions and can model both over- and under-dispersion relative to Poisson distribution. From these properties, the proposed distribution is a flexible model for its dispersion and correlation. The other stochastic processes and operations derived for the proposed distribution are also discussed in this talk.
TI_40_4	Torkashvand, Elaheh	University of Waterloo
Title	Spatial Dynamical Autocorrelation of fMRI Images
The concept of dynamical correlation is extended to   functional time series. The dynamical autocorrelation is a measure of functional autocorrelation of a functional time series. The proposed method can be applied  to true, i.e., continuously measured, functional data or possibly to approximated functional data, for example after applying a smoothing step to observations measured in discrete time. An estimator of the dynamical autocorrelation is presented based on the Karhunen-Loève expression of time series. The central limit theorem is applied to obtain the asymptotic distribution of the proposed estimator of the dynamical autocorrelation under the assumption of m-dependency.
TI_4_1	Vinogradov, Vladimir	Ohio University
Title	On two extensions of Feller-Spitzer class of Bessel densities
We introduce two different extensions of Feller-Spitzer class of Bessel densities. Various properties of members of these classes are derived and compared.
TI_21_1	Wang, Haiying	University of Connecticut
Title	Optimal Subsampling: Sampling with Replacement vs Poisson Sampling
Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling distributions is an effective approach to improve estimation efficiency. In the context of maximizing a general target function, this paper derives optimal subsampling distributions for both subsampling with replacement and Poisson subsampling. The optimal subsampling distributions minimize functions of the subsampling approximation variances. Furthermore, they provide deep insights on the theoretical difference and similarity between subsampling with replacement and Poisson subsampling. Practically implementable algorithms are proposed based on the optimal structure results, which are evaluate by both theoretical and empirical analysis.
TI_40_0	Wang, Shan	University of San Francisco
Title	Recent Development in Nonparametric and Semiparametric Techniques
In recent years, semiparametric and nonparametric models have become a popular choice in many areas of statistics since they are more realistic and flexible than parametric models. This invited session focuses on the recent development in these methods and their applications.
TI_40_1	Wang, Shan	University of San Francisco
Title	Estimation of SEM with MELE approach
In this work, we construct improved estimates of linear functionals of a probability measure with side information using an easy empirical likelihood approach. We allow constraint functions, which determine side information to grow with the sample size and the use of estimated constraint functions. This is the case in applications to the structural equation models. In one case the random errors are modeled to be independent with covariates. In another case, we estimate the model with side information of known marginal medians for observed variable. We report some simulation results on efficiency gain
TI_41_0	Wang, Xia	University of Cincinnati
Title	Bayesian Modeling of Dependent Non-Gaussian Data
Dependent non-Gaussian data keep posing new challenges by its rapidly increasing data size and structure complexity. Bayesian perspectives provide feasible and flexible approaches. The session presents the new methods development on Bayesian modeling, computation and model comparison related to semi-continuous data, directional data, intensity data and ordinal data.
TI_41_4	Wang, Xia	University of Cincinnati
Title	Power Link Functions in Ordinal Regression Models with Gaussian Process Priors
Link functions and random effects structures are the two important components in building flexible regression models for dependent ordinal data. The power link functions include the commonly used links as special cases but have an additional skewness parameter making the probability response curves adaptive to the data structure. It overcomes the arbitrary symmetry assumption imposed by the commonly used logistic or probit links as well as the fixed skewness in the complementary log-log or log-log links. By employing Gaussian processes, the regression model can incorporate various dependence structures in the data, such as temporal and spatial correlations. The full Bayesian estimation of the proposed model is conveniently implemented through Rstan. Extensive simulation studies are carried out for discussion in model computation, parameterization, and evaluation in terms of estimation bias and overall model performance. The proposed model is applied to the PM2.5 data in Beijing and the Berberis thunbergii abundance data in New England. The results suggest the proposed model leads to important improvement in estimation and prediction in modeling dependent ordinal response data.
TI_46_2	Wang, Yueyao	Virginia Tech
Title	Building Degradation Index Using Multivariate Sensory Data with Variable Selection
The modeling and analysis of degradation data have been an active research area in reliability and system health management. Most of the existing research on degradation modeling assumes that the degradation index is provided. However, there are situations that a degradation index is not available. For example, modern sensor technology allows one to collect multi-channel sensor data that are related to the underlying degradation process, which may not be sufficiently represented by any single channel. Without a degradation index, most existing cannot be applied. Thus, constructing a degradation index is a fundamental step in degradation modeling. In this paper, we develop a general approach for degradation index building based on an additive-nonlinear model with variable selection. The approach is more flexible than a linear combination of sensor signals, and it can automatically select the most informative variables to be used in the degradation index. Maximum likelihood estimation with adaptive group penalty is developed based on training dataset. We use extensive simulations to validate the performance of the developed method. The NASA jet engine sensor dataset is then used for illustrations. The paper is concluded with some discussions and areas for future research. This is joint work with I-Chen Lee and Yili Hong.
TI_26_1	Weerahandi, Samaradasa	X-Techniques, Inc, New York
Title	Generalized Inference with Application to Business and Clinical Analytics
In applications, such as the ANOVA under unequal error variances, and Mixed Models, the classical approach can produce only asymptotic tests and confidence intervals for parameters of interest. This article reviews the notions and methods in Generalized Inference and show how such inferences can be based on exact probability statements. The approach is illustrated by an application concerning Variance Components in Mixed Models having applications in Business and Clinical Analytics. In such problems one may wish to use the Bayesian approach, but in doing so you need a prior. In the absence of a proper prior, Bayesian inferences are highly sensitive to the non-informative prior family, choice of hyper-parameters, and could take days to run models involving large number of parameters, such as that involving consumer response estimation TV ads by County or DMA. The task is easily accomplished by using the BLUP in Mixed Models with parameters tackled by the approach of Generalized Inference. It will also be argued that, the generalized approach can reproduce Parametric Bootstrap inference problems when exist and works even when Parametric Bootstrap approach fails. Moreover, one can reproduce equivalent generalized tests and generalized confidence intervals for any generalized fiducial inference method without having to treat fixed parameters as variables.
TI_12_2	Womack, Andrew	Indiana University
Title	Horseshoes, Shape Mixing, and Ultra-sparse Locally Adaptive Shrinkage
Locally adaptive shrinkage in the Bayesian framework provides one method for continuously relaxing discrete selection problems. We present extensions of the Horseshoe prior framework that arise from mixing both the scale and shape parameters from the hierarchical specification of the model. Mixing on the shape parameter provides both better spike and slab behavior as well as a way to model ultra-sparse signals. The reduction in risk comes from a better approximation of the hard thresholding rule that gives rise to discrete selection. As with other local-global priors, these models have non-convex, multimodal posterior distributions. This multi-modality, especially from the infinite spike at the origin, creates issues for fitting the models using out of the box methods like Gibbs samplers or EM algorithms. To address these problems, we implement a new MCMC algorithm that includes mode switching jumps that are akin to doing Stochastic Search Variable Selection for continuous local-global shrinkage models.
TI_45_2	Wu, Jiang	Central University of Finance and Economics
Title	A Financial Contagion Measure Based on the Maximal Tail Dependence Coefficient for Financial Time Series
A novel financial contagion measure is proposed. Itis based on the maximal tail dependence(MTD) coefficient of the financial time series of returns. Estimators for this contagion measure are provided for popular families of copulas, and a simulation study is employed to analyze the performance of these estimators. Applications are presented to illustrate the use of spatial contagion measures for determining asymmetric linkages in financial markets, and for creating clusters of financial time series. The methodology is also useful for selecting diversified portfolios of asset returns
TI_43_3	Wu, Wenbo	University of Texas
Title	Simultaneous estimation for semi-parametric multi-index models
Estimation of a general multi-index model comprises determining the number of linear combinations of predictors (structural dimension) that are related to the response, estimating the loadings of each index vector, selecting the active predictors, and estimating the underlying link function. These objectives are often achieved sequentially at different stages of the estimation process. In this study, we propose a unified estimation approach under a semi-parametric model framework to attain these estimation goals simultaneously. The proposed estimation method is more efficient and stable than many existing methods where the estimation error in the structural dimension may propagate to the estimation of the index vectors and variable selection stages. A detailed algorithm is provided to implement the proposed method.  Comprehensive simulations and a real data analysis illustrate the effectiveness of the proposed method.
TI_20_2	Wu, Yichao	UIC
Title	Nonparametric estimation of multivariate mixtures
A multivariate mixture model is determined by three elements: the number of components, the mixing proportions and the component distributions. Assuming that the number of components is given and that each mixture component has independent marginal distributions, we propose a non-parametric method to estimate the component distributions. The basic idea is to convert the estimation of component density functions to a problem of estimating the coordinates of the component density functions with respect to a good set of basis functions. Specifically, we construct a set of basis functions by using conditional density functions and try to recover the coordinates of component density functions with respect to this set of basis functions. Furthermore, we show that our estimator for the component density functions is consistent. Numerical studies are used to compare our algorithm with other existing non-parametric methods of estimating component distributions under the assumption of conditionally independent marginals.
TI_16_2	Xia, Aihua	University of Melbourne
Title	Probability Density Quantiles: Their Divergence from or Convergence to Uniformity
For each continuous distribution with square-integrable density, there is a probability density quantile (pdQ), which is an absolutely continuous distribution on the unit interval. The pdQ is representative of a location-scale family and carries essential information regarding shape and tail behavior of the family. We demonstrate that questions of convergence and divergence regarding shapes of distributions can be carried out in a location- and scale-free environment via their pdQs. We also establish a map of the Kullback-Leibler divergences from uniformity of these pdQs. Some numerical calculations point to a phenomenon that each application of the pdQ mapping seems to lower the Kullback-Leibler divergence from uniformity and hence we obtain new ﬁxed point theorems for repeated applications of the pdQ mappings. This is a joint work with Robert G. Staudte.
TI_38_4	Xie, Yanmei	University of Toledo
Title	Analysis of nonignorable missingness in risk factors for hypertension
The prevention of hypertension is a critical public health challenge across the world.  In the current study, we propose a novel empirical-likelihood-based method to estimate the effect of potential risk factors for hypertension. We adopt a semiparametric perspective on regression analysis with nonignorable missing covariates, which is motivated by the alcohol consumption and blood pressure data from the US National Health and Nutrition Examination Survey. The missingness in alcohol consumption is missing not at random since it is likely to depend largely on alcohol consumption itself. To overcome the difficulty of handling this nonignorable covariate-missing data problem, we propose a unified approach to constructing a system of unbiased estimating equations, which naturally incorporate the incomplete data into the data analysis, making it possible to gain estimation efficiency over complete case analysis. Our analyses demonstrate that increased alcohol consumption per day is significantly associated with increased systolic blood pressure. In addition, having a higher body mass index and being of older age are associated with a significantly higher risk of hypertension.
TI_48_3	Xu, Mengyu	University of Central Florida
Title	Simultaneous Prediction intervals for high-dimensional Vector Autoregressive model
We study the simultaneous prediction intervals for high-dimensional vector autoregressive model. We consider a de-biased calibration for the lasso prediction and propose a Gaussian-multiplier bootstrap based method for one-step ahead prediction. The asymptotic coverage consistency of the prediction interval is obtained. We also develop simulation result to evaluate the finite sample performance of the procedure.
TI_42_0	Xu, Xiaojian	Brock University
Title	Optimal design, active learning, and efficient statistics for big data
This session emphasizes the efficient statistical process when dealing with big data. Such efficiency consideration appears at both stages: the stage of optimal and robust designs for data selection (in Talks 1, 2, and 4) and the stage of estimation/predication after data are obtained (Talks 3 and 4). Our speakers of this session discuss a variety of statistical methods, including probability estimation, quantile regression, optimally weighted least squares, and incomplete U-statistics.
TI_42_4	Xu, Xiaojian	Brock University
Title	Robust active learning for approximate linear models
In this paper, we point out the common nature of active learning in machine learning field and robust experimental designs in statistics field, and present the methods of robust regression designs that can be implemented in a robust active learning process. We consider approximate linear regression models and weighted least squares estimation. Both optimal weighting schemes and robust optimal designs of the training data used for active learning are discussed for various scenarios. An analytical form for robust design density is derived. The simulation results and comparison study using practical examples indicate improved efficiencies.
TI_14_2	Yanev, George P.	The University of Texas
Title	On Arnold-Villasen ̃or conjectures for characterizing exponential distribution
Characterizations of the exponential distribution are abundant. Arnold and Villasen ̃or [1] obtained a series of new characterizations based on random sam- ples of size two and conjectured possible generalizations for larger sample size. Extending their techniques, we will prove Arnold and Villasen ̃or’s conjectures for an arbitrary but fixed sample size n. We will discuss results published in [2] as well as more recent findings.
TI_35_1	Yin, Xiangrong	University of Kentucky
Title	Moment Kernel for Estimating Central Mean Subspace and Central Subspace
The T-central subspace, introduced by Luo, Li and Yin (2014), allows one to perform sufficient dimension reduction for any statistical functional of interest. We propose a general estimator using (third) moment kernel to estimate the T-central subspace. In this talk, we particularly focus on central mean subspace via the regression mean function, and central subspace via Fourier transform or slicing.  Theoretical results are established and simulation studies show the advantages of our proposed methods.
TI_43_0	Yin, Xiangrong	University of Kentucky
Title	Variable selection and dimension reduction for high-dimension data problems
Variable selection and dimension reduction are important research topics, especially for high-dimensional data analysis. This session consists of talks in the areas. Dr. Dong’s talk focuses on variable selection on two sets of variables. Dr. Shao’s topic is on the inference for high-dimensional data, while Dr. Wu presents semi-parametric method to estimate multi-dimensions simultaneously, and Dr. Sriperumbudur’s topic is on the studying of kernel PCA, a popular dimension reduction method.
TI_38_2	Yu, Jihnhee	University of Buffalo
Title	Bayesian empirical likelihood approach to compare quantiles
Bayes factors, practical tools of applied statistics, have been dealt with extensively in the literature in the context of hypothesis testing. The Bayes factor based on parametric likelihoods can be considered both as a pure Bayesian approach as well as a standard technique for computing P-values for hypothesis testing. We employ empirical likelihood methodology to modify Bayes factor type procedures for the nonparametric setting, establishing asymptotic approximations to the proposed procedures. These approximations are shown to be similar to those of the classical parametric Bayes factor approach. The proposed approach is applied towards developing testing methods involving quantiles, which are commonly used to characterize distributions. We present and evaluate one and two sample distribution free Bayes factor type methods for testing quantiles based on indicators and smooth kernel functions.
TI_44_1	Yuan, Qingcong	Miami University
Title	A two-stage variable selection approach in the analysis of metabolomics and microbiome data
We propose a two-stage variable selection approach to analyze a mice data. Mice under different health conditions (obese or not) and different exposure levels to biodiesel ultrafine particles (UFPs) are considered. Their metabolomics and microbiome information are also recorded. We first did a sure variable screening on the metabolites and microbial species data respectively, then used Bayesian lasso to get a variable selection set. Multivariate analysis methods are then applied on the resulting dataset. The study focuses on the effects of UFPs exposure to gut microbial composition and functions, then evaluates the impact of UFPs to obese host health.
TI_4_3	Yuanqing Zhang	Shanghai University of International Business and Economics
Title	Inference for Partially Linear Additive Higher Order Spatial Autoregressive Model with Spatial Autoregressive Error and unknown Heteroskedasticity
This article extends spatial autoregressive model with spatial autoregressive disturbances (SARAR(1,1)) which is the most popular spatial econometric model to the case of an arbitrary finite number of nonparametric additive terms and spatial autoregressive models with spatial autoregressive disturbances of arbitrary finite order (SARAR(R,S)). We propose a sieve two stage least squares (S2SLS) regression and generalized method of moments (GMM) procedure of the high order spatial autoregressive parameters of the disturbance process. Under some sufficient conditions, we show that the proposed estimator for the finite dimensional parameter is √n consistent and asymptotically normally distributed.
TI_24_3	Zeitler, David	Grand Valley State University
Title	Rank Based Estimation With Skew Normal Error Distributions Using Big Data Sets
Skew normal distributions are a generalization of the normal distribution adding a parameter controlling the direction and magnitude of asymmetry. We will address a rank based algorithm to fit linear models with skew normal errors on big data sets using distributed computation with limited inter-process communication. Distributed computation may include multiple core as well as clustered hardware resources. Both theoretical development and a simulation demonstration using R will be discussed.
TI_13_1	Zelterman, Dan	Yale University
Title	Distributions for Exchangeable p-Values under an unspecified Alternative Hypothesis
A typical biomarker study may result in many p-values testing multiple hypotheses. Several methods have been proposed to adjust for multiple comparisons without exceeding the false discovery rate (FDR). Under an unspecified alternative hypothesis, we propose a marginal distribution for p-values whose joint distribution facilitates the description of exchangeable p-values. This model is used to describe the behavior of the number of statistically significant findings under Simes’ (1986, Biometrika) rule controlling FDR. We apply our model to a published biomarker study in which no statistically significant finding were observed by the authors, and provide new power analyses for the study.

TI_28_3	Zhang, Cheng	Medstar Cardiovascular Research Network
Title	Novel Post-randomization Methods for Controlling Identity Disclosure and Preserving Data Utility
Even when direct identifiers such as name and social security number are removed, identity disclosure of a survey unit in a data set is possible via matching demographic variables whose values are easily known from other sources. So, data agencies need to release a perturbed version of survey data. Ideally, a perturbation mechanism should protect individual’s identity while preserving inferences about the population. For categorical key variables, we propose a novel approach to measuring identification risk for setting strict disclosure control goals. Specifically, we suggest to ensure that the probability of identifying any survey unit is at most a given value ξ. We develop an unbiased post-randomization method that achieves this goal with little data quality loss.
TI_44_0	Zhang, Jing	Miami University
Title	New Explorations for High-Dimensional Big Data Analysis
Standard statistical methods are no longer computationally efficient or feasible in the analysis of high dimensional big data analysis. This session collects ideas of variable selection/dimension reduction / predictive modeling, exploring how to pick up the true "signals" among many noises and how to work with the volume of data.
TI_44_2	Zhang, Jing	Miami University
Title	A “Split and Resample” Approach in Big Data Analysis
Big data are massive in volume, intensity and complexity. Analysis of big data requires: picking up the true "signals" among lots of noises, and handling the volume of data. We introduce a "split and subsampling" algorithm that handles both variable selection and prediction for high dimensional big data. Simulation studies are conducted to show that the proposed algorithm is robust to multicolinearity among the predictors in both linear and generalized linear models, selecting the signal variables with better sensitivity and specificity, and achieving better prediction with lower MSPE values.
TI_20_4	Zhang, Lingsong	Purdue University
Title	On the analysis of data that lies in the cone
Complex data arise more often in applications such as images, genomics and many others. Traditional data were analyzed based on theoretical assumptions of data lie in Euclidean space. Recent years many new data types are within restricted space or sets, and require a new set of theory and methodology to analyze it. In this talk, we will focus on two types of data that lies in cones, and propose a generalized principal component type of tools to reveal underlying structure (or hidden factors) within such data. The approach naturally forms nested structure and thus is suitable for future investigation of optimal dimension. Application of this method such as diffusion tensor images will be shown in this talk as well.
TI_28_1	Zhang, Linjun	Rutgers University
Title	The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy
With the unprecedented availability of datasets containing personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of some statistical accuracy. A fundamental question is: to satisfy certain desired level of privacy, what is the best statistical accuracy one can achieve? Standard statistical methods fail to yield sharp results, and new technical tools are called for. In this talk, I will present a general lower bound argument to investigate the tradeoff between statistical accuracy and privacy, with application to three problems: mean estimation, linear regression and classification, in both the classical low-dimensional and modern high-dimensional settings. For these statistical problems, we also design computationally efficient algorithms that match the minimax lower bound under the privacy constraints. Finally I will show the applications of those privacy-preserving algorithms to real data such as SNPs containing sensitive information, for which privacy-preserving statistical methods are necessary.
TI_21_3	Zhang, Teng	University of Central Florida
Title	Robust PCA by Manifold Optimization
Robust PCA is a widely used statistical procedure to recover an underlying low-rank matrix with grossly corrupted observations. This work considers the problem of robust PCA as a nonconvex optimization problem on the manifold of low-rank matrices, and proposes two algorithms (for two versions of retractions) based on manifold optimization. It is shown that, with a proper designed initialization, the proposed algorithms are guaranteed to converge to the underlying low-rank matrix linearly. Compared with a previous work based on the Burer-Monterio decomposition of low-rank matrices, the proposed algorithms reduce the dependence on the conditional number of the underlying low-rank matrix theoretically. Simulations and real data examples confirm the competitive performance of our method.
TI_35_2	Zhang, Wei	University of Arkansas at Little Rock
Title	Imputation of Missing Data in the State Inpatient Databases
Eliminating healthcare disparities so underserved are assured access to quality medical care remains a national priority. Large, population based studies necessary to address healthcare disparities can be costly and difficult to perform. An efficient alternative that is becoming increasingly attractive is the use of the State Inpatient Databases. This study aimed at identifying appropriate imputation methods for SID and applying the imputed data sets for healthcare disparities research. We compared six imputation methods for missing data (i.e., complete case analysis, mean imputation, marginal draw method, hot deck imputation, joint multiple imputation (MI), conditional MI) through a novel simulation.
TI_40_3	Zhao, Wei	Indiana University Purdue University Indianapolis
Title	Optimal Sampling Distributions for Generalized Linear Models
One of the popular approaches to dealing with large sample data is subsampling, that is, a small portion of the full data set is subsampled with certain weights and used as a surrogate for the subsequent computation and simulation. The crucial part of the method of subsampling is constructing the sampling weights. In this paper, we propose A-optimal sampling distributions after investigating the consistency and asymptotic normality of the subsample estimator to the maximum likelihood estimator in generalized linear models. A two-step algorithm is proposed to approximate the A-optimal subsampling estimator. Simulation results show that our subsampling method outperforms the other subsampling methods with a smaller mean square error of estimation.
TI_42_3	Zheng, Wei	The University of Tennessee
Title	Incomplete U-statistic based on division and orthogonal array
U-statistic is an important class of statistics. Unfortunately, its computation easily becomes impractical as the data size $n$ increases. Particularly, the number of combinations, say $m$, that a U-statistic of order $d$ has to evaluate is in the order of $O(n^d)$. Many efforts have been made to approximate the original U-statistic by a small subset of the combinations in history since Blom (1976), who has coined such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require $m$ to grow at least faster than $n$, albeit much slower than $n^d$, in order for the corresponding incomplete U-statistic to be asymptotically efficient in the sense of mean squared error. In this paper, we introduce a new type of incomplete U-statistics, which can be asymptotically efficient even when $m$ grows slower than $n$. In some cases, $m$ is only required to grow faster than $\sqrt{n}$. The results are also extended to the degenerate case and the multi-sample case.
TI_38_3	Zhong, Ping-Shou	The University of Illinois at Chicago
Title	Order-restricted inference for means with missing values
Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease.
TI_45_0	Zitikis, Ricardas	Western University
Title	Risk Measures: Theory, Inference, and Applications

TI_45_3	Zitikis, Ricardas	Western University
Title	Gini Shortfall: ACoherent RiskMeasure
For quite some time, the value-at-risk (VaR) was an appealing risk measure, and even anindustry and regulatory standard for calculating risk capital in banking and insurance. The VaR isstill a standard, though criticized in many theoretical and empirical works. In this context, theexpected shortfall (ES) has been a remarkable innovation that rewards diversification andcaptures the magnitude of tail risk. But what about tail variability? The coherentrisk measure,called the Gini shortfall (GS), takes care of both the magnitude and the variability of tail risk,thus providing a much-needed missing piece in the encompassing risk-measurement puzzle. Inthis talk, we shall discuss various aspects of theGS, including its origins, properties, andstatistical inference.

Abstracts for General-Invited Speakers (Alphabetic Order)

G_1_1	Abujarad, Mohammed H.A.	Aligarh Muslim University
Title	Bayesian Survival Analysis of Topp-Leone Generalized Family with Stan
In this article, the discussion has been carried out on the generalization of three distribution by means of exponential, exponentiated exponential and exponentiated extension. We set up three and four parameters life model called the Topp-Leone exponential distribution, Topp-Leone exponentiated exponential distribution and Topp-Leone exponentiated extension distribution. We give extensive consequence of the, survival function and hazard rate function. To fit this model as survival model and hazard rate function we adopted to use Bayesian approach. A real survival data set is used to illustrate. application is done by R and Stan and suitable illustrations are prepared. R and Stan codes have been given to actualize censoring mechanism via optimization and also simulation tools.
G_2_1	Ahmed, Bilal Peer	Islamic University of Science & Technology, Awantipora, Pulwama (J&K), India
Title	Inflated Size-Biased Modified Power Series Distributions and its Applications
In this paper, an Inflated Size-biased Modified Power Series Distributions (ISBMPSD), where inflation occurs at any of the support points is studied. This class include among others the size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution as its particular cases. We obtain the recurrence relations among ordinary, central and factorial moments. The maximum likelihood and Bayesian estimation of the parameters of the Inflated Size-biased MPSD is obtained. As special cases, results are extracted for size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution. Finally, an example is presented for the size-biased generalized Poisson distribution to illustrate the results and a goodness of fit test is done using the maximum likelihood and Bayes estimators.
G_6_4	Bulut, Murat	Osmangazi University, Turkey
Title	Robust Logistic Regression based on Liu estimator
In this study, we propose a new estimator in logistic regression to handle multicollinearity and outlier problems simultaneously. There are some biased estimators proposed for the solution of the multicollinearity problem. Also, there are some studies to cope with the outlier problems. But there are only a few amount of studies in the literature when there exist the multicollinearity and outlier problems at the same time in the logistic model. In this study, we introduce a robust logistic estimator based on Liu estimator. We compare the proposed estimator with some other existing estimators by means of a simulation study.
G_5_1	Feng, Yaqin	Ohio University
Title	Stability and instability of steady states for a branching random walk
We consider the time evolution of a lattice branching random walk with local perturbations. Under certain conditions, we prove the Carleman type bound on the moment growth of a particle subpopulation number and show the existence of a steady state.
G_5_3	Lazar, Drew	Ball State University
Title	Robust and scalable optimization on manifolds
In this talk a robust and scalable procedure for estimation on classes of manifolds that generalizes the classical idea of “median of means” estimation is proposed. This procedure is motivated by statistical inference problems in data science which can be cast as optimization problems over manifolds. A key lemma that characterizes a property of the geometric median on manifolds is shown. This lemma allows the formulation of bounds on an estimator which aggregates subset estimators by taking their geometric median. Robustness and scalability of the procedure is illustrated in numerical examples on both simulated and real data sets.
G_1_3	Louzada-Neto, Francisco	ICMC, University of Sao Paulo
Title	Efficient Closed-Form MAP Estimators for Some Survival Distributions and Their Applications to Embedded Systems
In this paper, we propose maximum a posteriori (MAP) estimators for the parameters of some survival distributions, which have a simple closed-form expression. In principle, we focus on the Nakagami distribution, which plays an essential role in communication engineering problems, particularly to model fading of radio signals. Moreover, we show that the obtained results can be extended to other survival probability distributions, such as the gamma and generalized gamma ones. Numerical results reveal that the MAP estimators outperform the existing estimators and produce almost unbiased estimates even for small sample sizes. Our applications are driven by embedded systems, which are commonly used in communication engineering. Particularly, they can consist of an electronic system inside a microcontroller, which can be programmed to maintain communication between a transmitting antenna and mobile antennas, which are operating at the same frequency. In this context, from the statistical point of view, closed-form estimators are needed, since they are embedded in mobile devices and need to be sequentially recalculated at real time.
G_6_1	McTague, Jaclyn	LogEcal Analytics
Title	Repeated Significance Testing of Normal Variables with Unknown Variance
In clinical trials when data is accumulated over time, sequential hypothesis testing requires control of type-1 error. It is typically assumed that the sample sizes are large so that, even with an unknown variance, the test statistics are approximately normal. This leads to the reliance on the multivariate normal distribution to calculate the critical values. We develop the exact joint distribution of the test statistics for any sample size and provide critical values that ensure type-1 error control. We introduce an efficient numerical method that works for any number of tests commonly encountered in the so-called group sequential clinical trials.
G_6_3	Mesbah, Mounir	Sorbonne University
Title	Current statistical issues in HRQoL research: Testing local independence in latent variable models
In this talk, I will give a quick overview about the current research in Health Related Quality of Life (HRQoL) research. I will focus on few important challenging statistical issues, occurring when latent models are used. Local independence is a strong assumption of such models that needs to be checked. I will make the bibliographical point on the psychometrics literature on the subject which deals mainly with effect of local dependence on the inference of the parameters, and its detection. I will discuss the challenging theoretical and computational issues and present recent simulation results and application to real data sets.
G_1_2	Mynbaev, Kairat	International School of Economics, Kazakh-British Technical University
Title	Nonparametric kernel estimation of unrestricted distributions
We consider nonparametric estimation of an unrestricted distribution F in that it may, or may not, be absolutely continuous. Three problems are considered: estimation of F(x) at a continuity point x, estimation of F(y)-F(x), where x and y are continuity points and estimation of jumps of F. Contrary to the extant literature, we make no restriction on the existence or smoothness of the derivatives of F. The key insight for our result is the use of Lebesgue-Stieltjes integrals. The method is also applied to inversion theorems for characteristic functions, where we provide explicit estimates for convergence rates.
G_2_2	Odhiambo, Collins	Strathmore University
Title	Extended version of Zero-inflated Negative Binomial Distribution with Application to HIV Exposed Infant Count Data
Routine HIV exposed infants (HEI) data collected shows many HIV positive zeros due to prevention of mother-to-child transmission (PMTCT) policy. However, implementation of PMTCT differs and results to structured zero for HEI positive numbers (optimum PMTCT) and non-structured zero (sub-optimum PMTCT). Hence standard zero-inflated models may not be appropriate. We seek to extend the zero-inflated Negative Binomial (ZINB) model by incorporating variable α. Extensive simulations were conducted by varying α, dispersion and sample size and results compared using BC. HEI data sampled from six high HIV burden counties in Kenya was applied to the model and yielded better performance.
G_2_3	Ogawa, Mitsunori	University of Tokyo
Title	Parameter estimation for discrete exponential families under the presence of nuisance parameters
The parameter estimation problem for discrete exponential family models is discussed under the presence of nuisance parameters. Maximizing the conditional likelihood usually yields an estimator with statistically nice properties. However, the computation of its normalization constant often prevents its practical use. In this talk, we derive a class of computationally tractable estimators for such a situation based on the framework of composite local Bregman divergence with simultaneous use of tools from algebraic statistics.
G_2_4	Peng, Jie	St. Ambrose University
Title	Improved Prediction Intervals for Discrete Distributions
The problem of predicting a future outcome based on the past and currently available samples arises in many applications. Applications of prediction intervals (PIs) based on continuous distributions are well-known. Compared to continuous distributions, results on constructing PIs for discrete distributions are very limited. The problems of constructing prediction intervals for the binomial, Poisson and negative binomial distributions are considered here. Available approximate, exact and conditional methods for these distributions are reviewed and compared. Simple approximate prediction intervals based on the joint distribution of the past samples and the future sample are proposed. Exact coverage studies and expected widths of prediction intervals show that the new prediction intervals are comparable to or better than the available ones in most cases.
G_5_2	Sepanski, Jungsywan	Central Michigan University
Title	Constructing Bivariate Copulas with Distributional Distortions
Distortion of existing copulas provides a way to construct new copulas. We propose distributional distortions that are distribution functions with support on the unit interval. Specifically, the distortion considered in this presentation is the distribution of a unit-Burr random variable formed by the exponential transformation of a negative Burr random variable. The induced new copulas include the well-known BB1, BB2 and BB4 copulas as special cases. The dependence properties and relationships between the base bivariate copula and the induced copula in tail dependence coefficients and tail orders are studied. The unit-Burr distortion of existing bivariate copulas may result in copulas that allow a maximum range of dependence and permit both lower and upper tail coefficients. Contour plots and numerical results are also presented.
G_5_4	Smith, Scott	University of the Incarnate Word
Title	A Generalization of the Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq Copulas
An important aspect of modeling bivariate relationships is the choice of underlying copula. One-parameter copulas may be too restrictive to provide adequate fit. We present a two-parameter copula which possesses the Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq copulas as special cases. We then discuss dependence properties and simulation. Finally, we use the new copula to model two data sets and compare the fit to that of the FGM and AMH copulas.
G_6_2	Wang, Dongliang	SUNY Upstate Medical University
Title	Empirical likelihood inference for Kolmogorov-Smirnov test given censored data
"Kolmogorov-Smirnov test is commonly used for comparing two distributions and may be particularly valuable for censored data since the K-S test statistic can be interpreted as the maximum survival difference. In this work, the smoothed empirical likelihood (SEL) is developed for the K-S statistic given censored data with desirable asymptotic properties. The developed results not only lead to a new test procedure, but also a reliable interval estimator for maximum survival difference. The SEL method is evaluated by empirical simulations in terms of the coverage probability of the interval estimator, and illustrated via applying to a real life dataset.

Abstracts for Student Posters

(Alphabetically Ordered)

P-01		Amponsah, Charles		Univ of Nevada , Reno
Title		A Bivariate Gamma Mixture Discrete Pareto Distribution
We propose a new stochastic model describing the joint distribution of (X, N), where N has a heavy-tail discrete Pareto distribution while X is the sum of N independent gamma random variables. We present main properties of this distribution, which include marginal and conditional distributions, moments, representations, and parameter estimation. An example from finance illustrates modeling potential of this new mixed bivariate distribution.
P-02		Ash, Jeremy		North Carolina State University
Title		Confidence band estimation methods for accumulation curves at extremely small fractions with applications to drug discovery
Accumulation curves are used to assess the effectiveness of ranking algorithms. Items are ranked according to the algorithm's belief that they possess some desired feature, then items are tested according to relative rank. In a typical virtual screen in drug discovery, millions of chemicals are screened, while only tens of chemicals are tested. We propose modifications to previously developed confidence band estimation methods that have good coverage probabilities and expected widths under these conditions in simulation. We also perform power analyses to determine whether accumulation curves or other lift curves are better for detecting significant differences between ranking algorithms.
P-03		Cho, Min Ho		The Ohio State University
Title		Aggregated Pairwise Classification of Statistical Shapes
The classification of shapes is of great interest in diverse areas. Statistical shape data have two main properties: (i) shapes are inherently infinite dimensional with strong dependence among the position of nearby points; (ii) shape space is not Euclidean, but is fundamentally curved. To accommodate these features, we work with the square root velocity function, pass to tangent spaces of the manifold of shapes at different projection points, and use principal components within these tangent spaces. We illustrate the impact of the projection point and choice of subspace on the misclassification rate with a novel method of combining pairwise classifiers.
P-04		Damarjian, Hanna		Purdue University Northwest
Title		On the Transmuted Exponential Pareto Distribution
There has been a growing interest in developing statistical distributions that are capable to model various data. The purpose of this research project is to construct a new model that will portray strong flexibility for various types of data. This new model will be called the Transmuted Exponential Pareto (TEP) Distribution. Several lifetime distributions are embedded in this distribution. We provide various mathematical characteristics including the parameter estimation methods and simulation. Finally, the importance and flexibility of the proposed model will be illustrated by means of some real-life data analysis.
P-05		Das, Manjari		Carnegie Mellon University
Title		Efficient nonparametric estimation of population size from incomplete lists
Estimation of total population size using incomplete lists has long been an important problem across many biological and social sciences. For example, partial, overlapping lists of casualties in the Syrian war by multiple organizations, are of great importance to estimate the magnitude of destruction. Earlier approaches have either used strong parametric assumptions or suboptimal nonparametric techniques which can lead to bias via model misspecification and smoothing. Assuming conditional independence of two list, we derive a nonparametric efficiency bound for estimating the capture probability and construct a bias-corrected estimator. We apply our methods to estimate HIV prevalence in the Alameda-County, California
P-06		Farazi, Md Manzur Rahman		Marquette University
Title		Feature Selection for a Predictive Model using Machine Learning Techniques on Mosquito’s Spectral Data
Mosquitoes’ age is a key indicator to understand the capability of the mosquito to spread diseases and to evaluate the effectiveness of mosquito control interventions. Traditional methods of estimating age via dissection are expensive and require skill personnel. Near-Infrared (NIR) spectroscopy that measures the amount of light absorbed by mosquitoes’ head or thorax, are used as non-invasive method to estimate age. Standard methods do not consider the physiological changes mosquitos go through as they age. We propose a change-point model to estimate age from spectra using PLSR model. The change-point PLSR model performs better in age estimation of the mosquitoes.
P-07		Galarza, Christian		State University of Campinas
Title		On moments of folded and truncated multivariate extended skew-normal distributions
"Following Kan & Robotti (2017), this paper develops recurrence relations for integrals that involve the density of multivariate extended skew-normal distributions, which includes the well-known skew-normal distribution introduced by Azzalini & Dalla-Valle (1996) and the popular multivariate normal distribution. These recursions offer fast computation of arbitrary order product moments of truncated multivariate extended skew-normal and folded multivariate extended skew-normal distributions with the product moments of the multivariate truncated skew-normal, folded skew-normal, truncated multivariate normal and folded normal distributions as a byproduct. Finally, from the application point of view, these moments open the way to propose analytical expressions on the E-step of the Expectation-Maximization (EM) algorithm for complex data, such as, asymmetric longitudinal data with censored and/or missing observations. These new methods are provided to practitioners in the R MomTrunc package, an efficient R library incorporating C++ and FORTRAN subroutines through Rcpp."
P-08		George, Tyler		Central Michigan University
Title		Lack-of-fit Testing Without Replicates Available
A new technique for testing the lack-of-fit (LOF) in a linear regression model when replicates are not available was developed. Most applications result in data that does not contain replicates in its predictors. The classical lack of test found in most linear regression textbooks is not applicable. Many current solutions use close points as ``pseudo" replicates but close is not well defined. Presented in this paper is a more generalized and robust methodology, for testing LOF using a new grouping procedure. Power simulations are used as a comparison of the new test against previous test's for various alternative models.
P-09		Goward, Kenneth		Central Michigan University
Title		A New Generalized Inverse Gaussian Distribution with Bayesian Estimators
A four-parameter family of transformed inverse Gaussian (TIG) distribution is described. A three-parameter family derived from the four-parameter TIG family is considered, with a specific new distribution referred to as the Generalized inverse Gaussian (GIG) distribution being considered. Two different versions of this distribution are provided and computational and theoretical advantages of one over the other are discussed. Maximum likelihood techniques are discussed alongside Bayesian approaches with Jeffreys-type priors for parameter estimation. A simulation study was conducted and results from the Bayesian approach and approximations to the maximum likelihood estimators were analyzed using the Kolmogorov-Smirnov test. The applicability of this distribution is considered on a real world data set.
P-10		Ihtisham, Shumaila		Islamia College, Peshawar, Pakistan
Title		Alpha Power Inverse Pareto Distribution and its Properties
In this study, a new distribution referred to as Alpha-Power Inverse Pareto distribution is introduced by including an extra parameter. Several properties of the proposed distribution are obtained including moment generating function, quantiles, entropies, order statistics, mean residual life function and stochastic ordering. Method of maximum likelihood is used to find estimates of the parameters. Two real datasets are considered to examine the usefulness of the proposed distribution.
P-11		Ijaz, Muhammad		University of Peshawar Pakistan
Title		A New Family of Distributions with Applications
In this paper, the main goal is to introduce a new family of distributions. Generally, the proposed family is this paper is called a new alpha power transformed family (NAPT) of distributions. On the basis of the proposed family of distributions, we have fitted the CDF of the exponential distribution and called it new alpha power transformed exponential distribution (NAPTE). Some of their statistical properties are discussed, including mean residual life, quantile function, skewness, and kurtosis. The graphical representation is also elaborated for various values of parameters while plotting the hazard rate function and probability density function. The parameters are estimated by means of maximum likelihood estimation. Furthermore, the paper also presents the simulation study. To, illustrate the usefulness of new family of distributions two real-life data sets were used. The comparison is made on the basis of goodness of fit criteria’s including Akaike Information criterion, Consistent Akaike Information criterion, and some others. The results have been observed that the new alpha power transformed exponential distribution is more flexible as compared to other existing distributions for these two data sets under study.
P-12		Lee, Joo Chul		University of Connecticut
Title		Online Updating Method to Correct for Measurement Error in Big Data Streams
When huge amounts of data arrive in streams, online updating is an important method to alleviate both computational and data storage issues. This paper extends the scope of previous research for online updating in the context of the classical linear measurement error model. In the case where some covariates are unknowingly measured with error at the beginning of the stream, but then are measured without error after a particular point along the data stream, the updated estimators ignoring the measurement error are biased for the true parameters. We propose a method to correct the bias of the estimators, as well as correct their variances, once the covariates measured without error are first observed; after correction, the traditional online updating method can then proceed as usual. We further derive the asymptotic distributions for the corrected and updated estimators. We provide simulation studies and a real data analysis with the Airline on-time data to illustrate the performance of our proposed method.
P-13		Lun, Zhixin		Oakland University
Title		Simulating from Skewed Multivariate Distributions: The Cases of Lomax, Mardia’s Pareto (Type 1), Logistic, Burr and F Distributions
Convenient and easy to use programs are available to simulate data from several common multivariate distributions (e.g. normal, t). However, functions for directly generating data from other less common multivariate distributions are not as readily available. We will illustrate how to generate random numbers from multivariate Lomax (a flexible family of skewed multivariate distribution). Further, multivariate cases of Mardia’s Pareto of type I, Logistic, Burr, and F can be also considered easily by applying the useful properties of multivariate Lomax distribution. This work provides a useful tool for practitioners when they need to simulate skewed multivariate distribution for various studies.
P-14		Matuk, James		The Ohio State University
Title		Function Estimation through Phase and Amplitude Separation
An important task in functional data analysis is to estimate functional observations based on sparse and noisy observations on a time interval. To address this problem, we define a Bayesian model that can fit individual functions on a per subject basis, as well as multiple functions simultaneously by borrowing information across subjects. A distinguishing property of this work is that our model considers amplitude and phase variabilities separately which describe y-axis and x-axis variability, respectively. We validate the proposed framework using multiple simulated examples as well as real data including ECG signals and measurements from Diffusion Tensor Imaging.
P-15		Maxwell, Obubu		Nnamdi Azikiwe University Awka
Title		The Kumaraswamy Inverse Lomax Distribution (K-IL): Properties and Applications
For the first time, the Kumaraswamy Inverse Lomax distribution is introduced, and studied. Some of its basic statistical properties were investigated in minute details, including explicit expressions for the survival function, failure rate, reversed hazard, odds ratio, order statistics, moments, quantile and median. The model parameters were estimated using the maximum likelihood estimation method. Real - life applications were provided and the K-IL distribution offers better fits. Performance was assessed on the basis of the distributions log-likelihood and Akaike information criteria (AIC).
P-16		May, Paul		South Dakota State University
Title		Multiresolution Techniques for High Precision Agriculture
High Precision Agriculture is the use of data to observe and respond to variations in crop fields on both a macroscopic and granular level. Remote sensing techniques have created a wealth of data, but the size of these data sets leads to computational challenges. This has historically forced the use of less computationally expensive, but also less accurate methods. Recent development of multiresolution approximations for spatial covariance structures (Katzfuss 2015, Sang Huang, 2011) allow for the use of GLS and Kriging on very large data sets to make inferences that farmers can turn into profitable actions.
P-17		Melchert, Bryan		Purdue University Fort Wayne
Title		Forecasting Migration Timing of Sockeye Salmon to Bristol Bay, AK
Arrival of Sockeye Salmon (Oncorhynchus nerka) to the Bristol Bay river system of Alaska is notoriously compact, with about 75% of the annual run arriving within 4 weeks. This research seeks to leverage increased data access and modern statistical learning methods to generate an accurate migration timing forecast with potential of annual reproduction, which currently does not exist for the fishery. Included topics are dimensionality reduction, general additive modeling with time series data, gradient boosting methods, and model validation.
P-18		Mohammed, Mohanad		University of KwaZulu-Natal, Pietermaritzburg, South Africa
Title		Using stacking ensemble for microarray-based cancer classification
Microarray technology has produced a massive amount of gene expression data. This data can be used efficiently for classification that facilitates disease diagnosis and prognosis. There are many computational methods that are utilized for cancer classification using these gene expression data. Artificial neural networks (ANN), support vector machines (SVM), and random forests (RF) are among the most successful methods for classifying tumors. Recent research shows that combining many classifiers can yield better results than using one classifier. In this paper, we used stacking ensemble to combine different classifiers, namely, ANN, SVM, RF, naive Bayes (NB), and k-nearest neighbors (KNN) for microarray-based cancer classification. Results show that stacking ensemble performed better in terms of accuracy, kappa coefficient, sensitivity, specificity, area under the curve (AUC), and receiver operating characteristic (ROC) curve, when applied to publicly available microarray data.
P-19		Ordoñez, José Alejandro		Campinas State University
Title		Objective Bayesian Analysis for the Spatial Student t Regression model
We develop an objective Bayesian analysis for the Spatial Student t regression model with unknown degrees of freedom based on the reference prior method. As the degrees of freedom, the spatial parameter it is typically difficult to elicitate: The propriety of the posterior distribution is not always guaranteed, whereas proper prior distributions may dominate the analysis. We show that the Bayesian prior analysis using this method yield to a proper posterior distribution and we use it to develop model selection and prediction. Finally, we assess the performance of the method through simulation and illustrate it using a real data application.
P-20		Saha, Dheeman		University of New Mexico
Title		Sparse Bayesian Envelope
Due to the complexity of high dimensional datasets, it is difficult to evaluate them efficiently. However, using a Bayesian framework for dimension reduction and variable selection techniques can help to identify the material and immaterial parts. This, in turn, leads to improved efficiency in the estimation of the regression coefficients. In this work, we combined the idea of dimension reduction with Spike-and-Slab variable selection and proposed a Bayesian sparse Envelope method. In addition, to that, since the true structural dimension of the Envelope is unknown, we used Reversible Jump Markov Chain Monte Carlo to draw samples from the posterior distribution.
P-21	Shen, Luyi		University of Notre Dame
Title	Bayesian community detection for weighted sparse networks using mixture of SBM model
We propose a novel mixture of stochastic block model for community detection in weighted networks. Our model allows modeling the sparsity of network and performing community detection simultaneously by cleverly combining the spike and slab prior with a stochastic block model. A Chinese restaurant process prior is used for modeling the random partition of the model which does require the number of community to be known as a priori. Another appealing feature of our model is that it allows the sparsity level or the network to vary across communities. That is, the sparsity informational the network is incorporated for community detection. Efficient MCMC algorithms are derived for sampling the posterior distribution for inference and our model and algorithms were demonstrated using both simulated and teal data sets.
P-22		Shubhadeep, Chakraborty		Texas A&M University
Title		A New Framework for Distance and Kernel-based Metrics in High Dimensions
The paper presents new metrics to quantify and test for (i) the equality of distributions and (ii) the independence between two high-dimensional random vectors. We show that the energy distance based on the usual Euclidean distance cannot completely characterize the homogeneity of two high-dimensional distributions in the sense that it only detects the equality of means and the traces of covariance matrices in the high-dimensional setup. We propose a new class of metrics which inherit the desirable properties of the energy distance and maximum mean discrepancy/(generalized) distance covariance and the Hilbert-Schmidt Independence Criterion in the low-dimensional setting and is capable of detecting the homogeneity of/completely characterizing independence between the low-dimensional marginal distributions in the high dimensional setup. We further propose t-tests based on the new metrics to perform high-dimensional two-sample testing/independence testing and study their asymptotic behavior under both high dimension low sample size (HDLSS) and high dimension medium sample size (HDMSS) setups. The computational complexity of the t-tests only grows linearly with the dimension and thus is scalable to very high dimensional data. We demonstrate the superior power behavior of the proposed tests for homogeneity of distributions and independence via both simulated and real datasets.
P-23		Soale, Abdul-Nasah		Temple University
Title		On expectile-assisted inverse regression estimation for sufficient dimension reduction
Sufficient dimension reduction (SDR) has become an important tool for multivariate analysis. Among the existing SDR methods in the literature, sliced inverse regression, sliced average variance estimation, and directional regression are popular due to their estimation accuracy and easy implementation. However, these estimators all rely on slicing the response, and may not work well under heteroscedasticity. To improve these estimators, we propose to first estimate the conditional expectile of the response given the predictor and then perform inverse regression based on slicing the expectile. The superior performances of the new estimators are demonstrated through numerical studies and real data analysis.
P-24		Wang, Yang		The university of Alabama
Title		On variable selection in matrix mixture modeling
Finite mixture models are widely used for cluster analysis, including clustering matrix data. Nowadays, high-dimensional matrix observations arise in many fields. It is known that irrelevant variables can severely affect the performance of clustering procedures. Therefore, it is important to develop algorithms capable of excluding irrelevant variables and focusing on informative attributes in order to achieve good clustering results. Several variable selection approaches have been proposed in the multivariate framework. We introduce and study a variable selection procedure that can be applied in the matrix-variate context. The methodological developments are supported by several simulation studies and application to real-life dataset.
P-25		Wang, Runmin		University of Illinois at Urbana-Champaign
Title		Self-Normalization for High Dimensional Time Series
Self-normalization has attracted considerable attention in the recent literature of time series analysis but its scope of applicability has been limited to low/fixed dimensional parameter for low dimensional time series. In this article, we propose a new formulation of self-normalization for the inference of the mean of high dimensional stationary processes. Our original test statistic is a U-statistic with a trimming parameter to remove the bias caused by weak dependence. Under the framework of nonlinear causal processes, we show the asymptotic normality of our U-statistic with the convergence rate dependent upon the order of the Frobenius norm of the long run variance matrix. The self-normalized test statistic is then formulated on the basis of recursive subsampled U-statistic and its limiting null distribution is shown to be a functional of time-changed Brownian motion, which differs from the pivotal limit used in the low dimensional setting. An interesting phenomenon associated with self-normalization is that it works in the high dimensional context even if the convergence rate is unknown. We also present applications to testing for bandedness in covariance matrix and testing for white noise for high dimensional stationary time series and compare the finite sample performance with existing methods in simulation studies. At the root of our theoretical argument, we extend the martingale approximation to the high dimensional setting, which could be of independent theoretical interest.
P-26	Xing, Lin		University of Notre Dame
Title	A metric geometry approach to the weight prediction problem
Many real data can be represented as a hypergraph which is a pair consisting of two sets, one of which is the set of data points, and the other represents higher order relations among data point s, called the set of hyperedges. A standard example of a hypergraph data is a collaboration network in which the set of data points are mathematicians, and each hyperedge can be formed out of a group of mathematician having a joint publication. In this work, we propose a geometric approach to studying problems related to hypergraph data with emphasis on weight prediction problem which is one of the main problems in machine learning. We introduce several classes of metrics on the set of data points, and also on the set of hyperedges, to make these sets become metric spaces. Using the structures of metric spaces on such hypergraph data, we propose modified k nearest neighbors methods which apply to the weight prediction on data points or hyperedges of hypergraph data. We illustrate the techniques in our work by showing experimental analysis on several data.
P-27		Yang, Tiantian		Clemson University
Title		A Comparison of Several Missing Data Imputation Techniques for Analyzing Different Types of Missingness
Missing data is common in real world studies and can create issues in statistical inference. Discarding cases that have missing values or replacing the missing values with inappropriate imputation techniques can both result in biased estimates. Many imputation techniques have assumptions that can be hard to assess in practice, therefore the actual appropriate imputation technique is often unclear. To address this issue, a factorial simulation design was developed to measure the impact of certain data set characteristics on the validity of several popular imputation techniques. The factors in the study were missing mechanisms, missing data percentages, and missing data methods. The evaluation included parameter estimates, bias, confidence interval coverage and width for the parameters of interest. Simulation results suggest all three factors have significant impact on the quality of the estimation. Additional factors such as number of variables, type of variables, and correlations of data are being incorporated in the simulation. Finally, real data examples are discussed to illustrate the applicability of different missing data imputation methods.
P-28		Yao, Yaqiong		University of Connecticut
Title		Optimal two-stage adaptive subsampling design for softmax regression
For massive datasets, statistical analysis using the full data can be extremely time demanding and subsamples are often taken and analyzed according to available computing power. For this purpose, Wang et al. (2018) developed a novel two-stage subsampling design for logistic regression. We generalize this method to include the softmax regression. We derive the asymptotic distribution of the estimator obtained from subsamples that are drawn according to arbitrary subsampling probabilities, and then derive the optimal subsampling probabilities that minimize the asymptotic variance-covariance matrix under the A-optimality and the L-optimality criteria. The optimal subsampling probabilities involves unknown parameters, so we adopt the idea of optimal adaptive design and use a small subsample to obtain pilot estimators. We also consider Poisson subsampling for its higher computational and estimation efficiency. We provide simulation and real data examples to demonstrate the performance of our algorithm.
P-29		Yuu, Elizabeth		Robert Koch Institute
Title		Quantifying microbial dark matter using generalized linear models and its impact on metagenome analyses
We previously introduced DiTASiC (Differential Taxa Abundance including Similarity Correction) to address shared read ambiguity resolution based on a regularized, generalized linear model (GLM) framework. This, and other similar approaches, does not address the remaining unmapped reads, or “microbial dark matter”. We extend our approach by analyzing sub mappings with different error-tolerance and integrating dark matter variables in an effort to create a more appropriate GLM. This new idea has the potential to provide more accurate estimates of taxa abundance and inherent variation; this in turn can lead to improved taxa quantification and differential testing.
P-30		Zang, Xiao		The Ohio State University
Title		Clustering Functional Data using Fisher-Rao Metric
Functional data are infinite dimensional and histograms are no longer applicable for discovering multimodality. Also, due to misalignment pointwise summaries like cross-sectional means and standard deviations are unable to faithfully describe the typical form and variability. Therefore, we developed a functional k-means clustering algorithm that uses Fisher-Rao metric as the distance measure, which simultaneously aligns functions within each cluster using a flexible family of domain warping, with a BIC criterion to choose the optimal number of clusters. In simulation studies our method out-performed Sangalli et al. ‘s method in terms of clustering accuracy. Real-world applications will be illustrated on several datasets.
P-31		Zhang, Han		The University of Alabama
Title		Aggregate Estimation in Sufficient Dimension Reduction for Binary Responses
Many successful inverse regression based sufficient dimension reduction methods have been developed since Sliced Inverse Regression was introduced. However, most of them target on problems with continuous responses. Although some claim to be applicable to both categorical and numerical responses, they may work poorly for binary classification problem since the binary responses provide very limited information. In this paper, we put forward an aggregate estimation method for binary responses, which involves a decomposition step and a combination step. As an ensemble learning approach, aggregate estimation is proved to effectively decrease the bias and exhaustively estimate the dimension reduction space.
P-32		Zhang, Yangfan		University of Illinois Urbana-Champaign
Title		High Dimensional Regression Change Point Detection
In this article, we propose a method to detect possible change point in linear regression. We construct a U-statistic based statistic with self-normalization, and derive its null distribution, which tends out to be pivotal. Our method can allow intercept in the model while detecting the change point in the slope, which is more general than the existing literature. Under certain conditions, the power is also roughly derived. The performances are reasonably good for both size and power. Furthermore, our method can be combined with wild binary segmentation to deal with multiple change point case and estimate the locations.
P-33		Zhang, Yingying		The University of Alabama
Title		On model-based clustering of time-dependent categorical sequences
Clustering categorical sequences is an important problem that arises in many fields such as medicine, sociology, and economics. It is a challenging task due to the fact that there is a lack of techniques for clustering categorical data as the majority of traditional clustering procedures are designed for handling quantitative observations. Situations with categorical data being related to time are even more troublesome. We employ a mixture of first order Markov models with transition probabilities being functions of time to develop a new approach for clustering categorical time-related data. The proposed methodology is illustrated on synthetic data and applied to a real-life data set containing sequences of life events for respondents participating in the British Household Panel Survey.
P-34		Zhu, Changbo		University of Illinois at Urbana-Champaign
Title		Interpoint Distance Based Two Sample Tests in High Dimension
In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy are mainly targeting the differences between marginal means and variances, whereas the test based on L1-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings.