|
|
|
|
International
Conference on Statistical Distributions and Applications Oct. 10-12, 2019, at Eberhard Conference
Center, Grand Rapids, MI, USA |
|
(Expired) |
Titles and abstracts for Keynote and Plenary
speakers are on the ‘Keynotes & Plenary
Speakers’ Page.
Abstracts –
Topic-Invited Speakers (Alphabetically Ordered)
TI_1_0 |
Abdelrazeq,
Ibrahim |
Rhodes College |
Title |
Goodness
of fit Tests |
|
In
general, the goodness-of-fit-tests are used to test whether a sampled data
fits a claimed distribution, a particular model, or even a stochastic
process. This area has become very vast, and many approaches are now used to
find the appropriate goodness-of-fit test: parametric, non-paramedic,
classical, or even Bayesian approaches. In this session's talks, you will
explore goodness-of-fit tests that exemplify many of these different
approaches. |
||
TI_1_4 |
Abdelrazeq,
Ibrahim |
Rhodes College |
Title |
The
Spread Dynamics of S&P 500 vs Levy-Driven OU Processes |
|
When
an Ornstein-Uhlenbeck process is claimed
and observed at discrete times 0, h, 2h,··· [T/h]h, the unobserved
driving process can be approximated from the observed process. Approximated
increments of the driving process are used to test the assumption that the
process is Levy-driven. Asymptotic behavior of the test statistic at high
sampling frequencies is developed assuming that the model parameters are
known. The behavior of the test statistics using an estimated parameter is
also studied. Performance of the test is illustrated through
simulation. |
||
TI_3_4 |
Abdurasul,
Emad |
James Madison University |
Title |
The
Product Limit survival function Distribution with Small Sample Inference |
|
Our
contribution is deriving the exact distribution of product limit estimators
and developed mid-p population tolerance interval for it. Then we develop a saddlepoint-based method for the population survival
function from the product limit (PL) survival function estimator, under the
proportional hazards model to generate a small sample confidence bands for it. The saddlepoint
technique depends upon the Mellin transform of the
zero-truncated product limit estimator. This transform is inverted through saddlepoint approximations to yield highly accurate
approximations to the cumulative distribution functions of the respective
cumulative hazard function estimator. Then we compare our saddlepoint
confidence interval with what we got from the exact distribution and with
that we got from the large sample method. From our simulation study we found
that the saddlepoint confidence interval is very
close to the confidence interval derived from the exact distribution, while
being much less difficult to compute, and outperform the competing large
sample methods in terms of coverage probability. |
||
TI_48_4 |
Aburweis,
Mohamed |
University of Central Florida |
Title |
Comparative
study of the distribution of repetitive DNA in model organisms abstract |
|
Repetitive
DNA elements are abundant in the genome of a wide range of organisms. In
mammals, repetitive elements comprise about 40-50% of the total genomes.
However, their biological functions remain largely unknown. Analysis of their
abundance and distribution may shed some light on how they affect genome
structure, function, and evolution. We conducted a detailed comparative
analysis of repetitive DNA elements across ten different eukaryotic organisms,
including chicken (G. gallus), zebrafish (D.
rerio), Fugu (T. rubripes), fruit fly (D.
melanogaster), and nematode worm (C. elegans), along with five mammalian
organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat (R. norvegicus), and rhesus (M. mulatta).
Our results show that repetitive DNA content varies widely, from 7.3% in the
Fugu genome to 52% in the zebrafish, based on Repeat Masker data. The most
frequently observed transposable elements (TEs) in mammals are SINEs (Short
Interspersed Nuclear Elements), followed by LINEs (Long Interspersed Nuclear
Elements). In contrast, LINEs, DNA transposons, simple repeats, and low
complexity repeats are the most frequently observed repeat classes in the
chicken, zebrafish, fruit fly, and nematode worm genomes, respectively. LTRs
(Long Terminal Repeats) have significant genomic coverage and diversity,
which may make them suitable for regulatory roles. With the exception of the
nematode worm and fruit fly, the frequency of the repetitive elements follows
a log-normal distribution, characterized by a few highly prevalent repeats in
each organism. In mammals, SINEs are enriched near genic regions, and LINEs
are often found away from genes. We also identified many LTRs that are
specifically enriched in promoter regions, some with a strong bias towards
the same strand as the nearby gene. This raises the possibility that the LTRs
may play a regulatory role. Surprisingly, most intronic repeats, with the
exception of DNA transposons, have a strong tendency to be on the opposite
DNA strand as the host gene. One possible explanation is that intronic RNAs
which result from splicing may contribute to retro transposition to the
original intronic loci. |
||
TI_2_3 |
Ahmad, Morad |
University of Jordan |
Title |
On
the class of Transmuted-G Distributions |
|
In
this talk, we compare the reliability and the hazard function between a
baseline distribution and the corresponding transmuted-G distribution. Some
examples based on existing transmuted-G distributions in literature are
used. Three tests of parameter significance are utilized to
test the importance of a transmuted-G distribution over the baseline
distribution, and real data is used in an application of the inference about
the importance of transmuted –G distributions. |
||
TI_47_0 |
Akinsete,
Alfred |
Marshall University, Huntington,
WV |
Title |
A
new class of generalized distributions |
|
This
session presents a new class of generalized statistical distributions,
which may provide the robustness and versatility for scientists
and practitioners dealing with real life data. Each
paper presents detailed mathematical and statistical
properties of distribution, parameter estimation, and
applications to various types of datasets. |
||
TI_2_0 |
Al-Aqtash,
Raid |
Marshall University |
Title |
Generalized
Distributions and Applications |
|
The
first speaker, Dr. Elkadry, presents his work that
relates to Bayesian statistics with application to real life data. The other
speakers, Drs. Aljarrah, Ahmed & Al-Aqtash, present their work
on recently developed generalized statistical distributions
with application to real data. |
||
TI_2_4 |
Al-Aqtash,
Raid |
Marshall University |
Title |
On
the Gumbel-Burr XII Distribution; Regression and Application |
|
Additional
properties of the Gumbel-Burr XII distribution GBXII(L) are studied. We
consider useful characterizations for the GBXII(L) distribution in
addition to some structural properties including mean deviations and the
distribution of the order statistics. A simulation study is conducted to
assess the performance of the MLEs and then usefulness of the GBXII(L)
distribution is illustrated by means of real data. A log-GBXII(L) regression
model is proposed and a survival data is used in an application of the proposed
regression model. |
||
TI_5_3 |
Aldeni,
Mahmoud |
Western Carolina
University |
Title |
TX
Family and Survival Models |
|
We
introduce a generalized family of lifetime distributions, namely, the
uniform-R{generalized lambda} (U-R{GL}) and derive the corresponding survival
models. Two members of this family are derived, namely, the U-Weibull{GL}
(U-W{GL}), a generalized Weibull distribution, and U-loglogistic{GL}
(U-LL{GL}), a generalized loglogistic distribution. The hazard function of
U-R{GL} family can be monotonic, bathtub, upside-down bathtub, N-shaped, and
bimodal shaped. The U-W{GL} distribution is applied to fit two lifetime data
sets. The survival model, based on the U-W{GL} distribution, is applied to
fit a right censored lifetime data set. |
||
TI_2_2 |
Aljarrah,
Mohammad A. |
Tafila Technical
University, Tafila, Jordan |
Title |
A
new generalized normal regression model. |
|
We
develop a regression model using the new generalized normal distribution.
Assuming censored data, maximum likelihood estimates for the model
parameters are obtained. The implementation of this model is
demonstrated through applications to censored survival data. A
diagnostic analysis and a model check was performed based
on martingale-type residuals. |
||
TI_1_1 |
Al-Labadi, Luai |
University of Toronto, Mississauga |
Title |
A
Bayesian Nonparametric Test for Assessing Multivariate Normality |
|
A
novel Bayesian nonparametric test for assessing multivariate normal models is
presented. The use of the procedure has been illustrated through several
examples, in which the proposed approach shows excellent performance. |
||
TI_16_1 |
Al-Mofleh,
Hazem |
Tafila
Technical University, Tafila, Jordan |
Title |
Wrapped
Circular Statistical Distributions and Applications |
|
Measurements
in direction is common in science and real-life data observations. Therefore,
a circular distribution with random angle is used to describe these
phenomena. There are many techniques to getting a circular distribution form
the underlying density function, one of the very effective techniques is
called “wrapping”. |
||
TI_31_3 |
Almohalwas,
Akram |
UCLA |
Title |
Analysis of
Donald Trump's Twitter Data Using Text Mining and Social Network
Analysis |
|
As the U.S.
grows more accustomed to social media, it has started to be incorporated into
many aspects of American life, thus, it becomes one of the most efficient “weapon” for politicians campaigning and communicating with
people. One of the most famous examples is Donald Trump on Twitter. Twitter
is one of the well-known social media tools, it has a huge size of data that
needs to be swift through to get some insights into the owner of the Twitter
account. |
||
TI_5_1 |
Almomani,
Ayman |
Almomany Trade |
Title |
TX: The Extended Family |
|
Consider
two CDFs T and F with supports [0,1] and S, respectively, then G(x) = ToF(x) is a CDF whose support is S and whose parameters
include both those of T and F. The
distribution T is called a complementary distribution and its choice is
crucial in defining the distributional properties and moments of the newly
generated G. We investigate the
connection between complementary distributions and the TX family and present
different ways of extending the TX family through different choices of the
function T. We make recommendations on how to select the appropriate
T-transformations. |
||
TI_14_3 |
Alzaatreh,
Ayman |
American University of Sharjah |
Title |
Truncated
T-X family of distributions |
|
The
time and cost to start a business are highly related to the degree of
transparency of business information, which strongly impacts the loss due to
illicit financial flows. In order to study the distributional characteristics
of time and cost to start a business, we introduce right-truncated and
left-truncated T-X families of distributions. These families are used to
construct new generalized families of continuous distributions. Relationships
between the families are investigated. Real data sets including time and cost
to start a business are analyzed and the results show that the truncated
families perform very well for fitting highly skewed data. |
||
TI_3_0 |
Alzaghal,
Ahmad |
State University of New York at
Farmingdale |
Title |
Distributions
and Applications |
|
|
||
TI_37_2 |
Alzaghal,
Ahmad |
State University of New York at
Farmingdale |
Title |
A
Generalized Family of Lindley Distribution: Properties and Applications |
|
In
this talk, we introduce new families of generalized Lindley distribution,
using the T-R{Y} framework, named T-Lindley family of distributions. The new
families are generated using the quantile functions of uniform, exponential,
Weibull, logistic, log logistic and Cauchy distributions. Several general
properties of the T-Lindley family are studied in detail including moments,
mean deviations, mode and Shannon’s entropy. Several new members of T-Lindley
distributions are studied in more detail. The distributions in the T-Lindley
family can be skewed to the right, symmetric, skewed to the left, or bimodal.
A data set is used to demonstrate the flexibility and usefulness of the
T-Lindley family of distributions. |
||
TI_4_0 |
Amezziane,
Mohamed |
Central Michigan
University |
Title |
Models
for Complex Data |
|
Models
for densities, spatial autoregressive inference, post selection inference and
false discovery rate control. |
||
TI_15_2 |
Andrews,
Beth |
Northwestern University |
Title |
Partially
specified spatial autoregressive model with artificial neural network |
|
For
spatial modeling and prediction, we propose a spatial autoregressive model
with nonlinear neural network component. This allows for model flexibility in
describing the relationship between the dependent variable and covariates. We
consider model/variable selection and use a maximum likelihood technique for
parameter estimation. The estimators are consistent and asymptotically Normal
under general conditions. Simulation results indicate the asymptotic theory
holds for finite, large samples, and we use of our methods to model United
States voting patterns. |
||
TI_6_0 |
Arslan,
Olcay |
Ankara University |
Title |
Some non-normal distributions and their
applications in robust statistical analysis |
|
In this topic-invited
session, some non-Gaussian distributions used for modeling as alternatives to
the normal distribution will be discussed and some new
extensions of these distributions will be
proposed. Several different applications of these
distributions will be given to demonstrate the performances of
these distributions for conducting robust
statistical analysis of data sets that may have
non-normal empirical distributions. |
||
TI_6_1 |
Arslan,
Olcay |
Ankara University |
Title |
Multivariate Laplace and multivariate skewed
Laplace distributions and their applications in robust statistical
analysis |
|
In this
study, we will consider multivariate Laplace distribution and its
skew extension
that can be used alternatives to the multivariate normal or other
multivariate distributions for modeling non-normal data
sets. One of the advantages of these distributions is that they
can model thick-tailed and skew datasets and have a simpler form
than other multivariate or skew multivariate
distributions. Concerning the number
of parameters, these distributions have the same number of
parameters with the multivariate normal distribution and its
skew extensions. This will be an advantage in terms of
the parameter estimation. We will explore some properties
of these distributions
and study the parameter estimation via EM
algorithm. We will also discuss
some applications to demonstrate the modeling strength of
these distributions. |
||
TI_47_1 |
Aryal, Gokarna |
Purdue University Northwest, Hammond, IN |
Title |
Transmuted-G
Poisson Family |
|
In
this talk, we present a new family of distributions called the
Transmuted–G Poisson (TGP) family. This family of distributions is
constructed by using the genesis of the zero truncated
Poisson (ZTP) distribution and the transmutation map.
Some mathematical and statistical properties
of TGP family are provided. The parameter estimation and
simulation procedures are also discussed. Usefulness
of TGP family is illustrated by modeling couple
of real-life data. |
||
TI_9_3 |
Babic, Sladana |
Ghent University |
Title |
Comparison
and classification of flexible distributions for multivariate skew and
heavy-tailed data |
|
We
present, compare and classify the most popular families of flexible
multivariate distributions. By flexible distribution we mean that, besides
the usual location and scale parameters, the distribution has also both
skewness and tail parameters. The following families are presented:
elliptical distributions, skew-elliptical distributions, multiple scaled
mixtures of multinormal distributions, multivariate distributions based on
the transformation approach, copula-based multivariate distributions and
meta-elliptical distributions. Our classification is based on the tail
behavior (a single tail weight parameter or multiple tail weight
parameters) and the type of symmetry (spherical, elliptical, central
symmetry or asymmetry). We compare the flexible families both theoretically
(comparing the relevant properties and distinctive features) and with a Monte
Carlo study (comparing the fitting abilities in finite samples). |
||
TI_5_4 |
Bahadi, Taoufik |
University of Tampa |
Title |
TX Family of Link functions for Binary
Regression |
|
The link function in binary regression is used
to specify how the probability of success is linked to the model’s systematic
component. These link functions are chosen to be quantile functions of
popular distributions such as the logistic (logit), Gaussian (probit) and Gumbel (cloglog)
distributions. We choose new flexible link functions from the TX family of
distributions, build an inference framework for their regression models and
derive a new model validation procedure. |
||
TI_46_1 |
Bandyopadhyay,
Tathagata |
St. Ambrose University |
Title |
Inference problems in binary regression model
with misclassified responses |
|
The
problem of predicting a future outcome based on the past and currently
available samples arises in many applications. Applications of prediction
intervals (PIs) based on continuous distributions are well-known. Compared to
continuous distributions results on constructing PIs for discrete
distributions are very limited. The problems of constructing prediction
intervals for the binomial, Poisson and negative binomial distributions are
considered here. Available approximate, exact and conditional methods for
these distributions are reviewed and compared. Simple approximate prediction
intervals based on the joint distribution of the past samples and the future
sample are proposed. Exact coverage studies and expected widths of prediction
intervals show that the new prediction intervals are comparable to or better
than the available ones in most cases. |
||
TI_7_3 |
Baron,
Michael |
American University |
Title |
Sequential
testing and post-analysis of credibility |
|
Actuaries
routinely make decisions that are sequential in nature. During each insured
period, the new claims and losses data are collected, and together with the
new economic and financial situation and other factors, they are taken into account for the calculation of revised premiums
and risks. This talk focuses on the assessment of credibility,
estimation of credibility factors, and testing for full credibility based on
sequentially collected actuarial data. Proposed sequential tests for full
credibility control the overall error rate and power. They result in a
rigorous set of conditions under which an insured cohort becomes fully
credible. Following sequential decisions, methods are developed for the
computation of sequential p-values. Inversion of the derived sequential
test leads to a construction of a sequence of repeated confidence
intervals for the credibility factor. Methods are detailed for Gamma,
Weibull, and Pareto loss distributions and applied to CAS Public Loss
Simulator data sets. |
||
TI_9_2 |
Bekker,
Andriette |
University of Pretoria, South Africa. |
Title |
Class of matrix variate distributions: a
flexible approach based on the mean-mixture of normal model |
|
Limited research has been conducted on matrix
variate data that can describe skewness present in data. This paper
introduces a new class of matrix variate distributions based on the
mean-mixture of normal (MMN) model. The properties of the new matrix variate
class - stochastic representation, moments and characteristic function,
linear and quadratic forms as well as marginal, conditional distributions are
investigated. Three special cases including the restricted skew-normal,
exponentiated MMN and the half-normal exponentiated MMN matrix variate distributions
are highlighted. An EM-algorithm is implemented to obtain maximum likelihood
estimates of the parameters. The usefulness and practical utility of the
proposed methodology are illustrated using two conducted simulation studies.
To investigate the performance of the developed model in
the real-world analysis, Landsat satellite data (LSD) originally,
obtained from NASA, are used. Numerical results show that the new models,
within this proposed class, performed well when applied to skewed matrix
variate experimental data. |
||
TI_15_0 |
Berrocal,
Veronica |
University of California Irvine |
Title |
Comparing Spatial Fields |
|
In weather forecast verification, the need for
more advanced methods for analyzing high-resolution forecasts prompted a lot
of new methodology to be introduced; largely from image analysis and computer
vision, some from spatial statistics.
In this genre, it is important to capture information about how
similar features within the fields are, and there has not been much, if any,
work done on statistical inference in this arena, which is a more general
topic than just weather forecast verification. Deciding on how close, or far away, two
spatial fields are in some context is an important question in many areas of
research. |
||
TI_15_1 |
Berrocal,
Veronica |
University of California Irvine |
Title |
Comparing spatial fields to detect systematic
biases in regional climate models |
|
Since their introduction in 1990, regional
climate models (RCMs) have been widely used to study the impact of climate change
on human health, ecology, and epidemiology. To ensure that the conclusions of
impact studies are well founded, it is necessary to assess the uncertainty in
RCMs. This is not an easy task because two major sources of uncertainties can
undermine an RCM: uncertainty in the boundary conditions needed to initialize
the model and uncertainty in the model itself. Using climate data for
Southern Sweden over 45 years, in this paper, we present a statistical
modeling framework to assess an RCM driven by analyses. More specifically,
our scientific interest here is determining whether there exist time periods
during which the RCM inconsideration displays the same type of spatial
discrepancies from the observations. The proposed model can be seen as an
exploratory tool for atmospheric modelers to identify time periods that
require a further in-depth examination. Focusing on seasonal average
temperature, our model relates the corresponding observed seasonal fields to
the RCM output via a hierarchical Bayesian statistical model that includes a spatio-temporal calibration term. The latter, which
represents the spatial error of the RCM, is in turn provided with a Dirichlet
process prior, enabling clustering of the errors in time. We apply our
modeling framework to data from Southern Sweden spanning the period 1
December 1962 to 30 November 2007, revealing intriguing tendencies with
respect to the RCM spatial errors of seasonal average temperature. |
||
TI_4_2 |
Bhattacharjee,
Abhishek |
University of Northern Colorado |
Title |
Empirical
Bayes Intervals for the Selected Mean |
|
Empirical Bayes (EB) methods are very useful
for post selection inference. Following Datta et al. (2002), construction of
EB confidence intervals for the selected population mean will be discussed in
this presentation. The EB intervals are adjusted to achieve the target
coverage probabilities asymptotically up to the second order. Both
unconditional coverage probabilities of EB intervals and corresponding
probabilities conditional on ancillary statistics are found. |
||
TI_27_1 |
Bonner,
Simon |
University of Western Ontario |
Title |
Modelling
Score Based Data from Photo-Identification Studies of Wild Animals |
|
Photographic
identification has become an invaluable tool for studying populations of animals
that are hard to follow in the wild. Photographs are often
compared in-silico with computer algorithms that produce continuous
scores which are then classified to identify matches based on some predefined
cut-off. This process is prone to errors (false positive or negative matches)
which bias estimates of the population’s demographics. We present a general
framework for modelling photo-id data based on the raw scores, describe the
Bayesian framework for fitting this model, discuss computational issues, and
present an application to a long-term study of whale sharks (Rhincodon typus). |
||
TI_7_0 |
Brazauskas,
Vytaras |
University of Wisconsin-Milwaukee |
Title |
Actuarial Statistics |
|
In this session, we will discuss several statistical
methodological techniques that appear in actuarial studies,
including credibility, modeling of random variables affected by coverage
modifications and dependence, and non-standard distributions relevant to
insurance data. |
||
TI_7_4 |
Brazauskas,
Vytaras |
University of Wisconsin-Milwaukee |
Title |
Modeling severity and measuring tail risk of
Norwegian fire claims |
|
The probabilistic behavior of the claim
severity variable plays a fundamental role in calculation of deductibles, layers,
loss elimination ratios, effects of inflation, and other quantities arising
in insurance. Among several alternatives for modeling severity, the
parametric approach continues to maintain the leading position, which is
primarily due to its parsimony and flexibility. In this paper, several
parametric families are employed to model severity of Norwegian fire claims
for the years 1981 through 1992. The probability distributions we consider
include: generalized Pareto, lognormal-Pareto (two versions), Weibull-Pareto
(two versions), and folded-t. Except for the generalized Pareto distribution,
the other five models are fairly new proposals that recently appeared in the
actuarial literature. We use the maximum likelihood procedure to fit the
models and assess the quality of their fits using basic graphical tools
(quantile-quantile plots), two goodness-of-fit statistics (Kolmogorov-Smirnov
and Anderson-Darling), and two information criteria (AIC and BIC). In
addition, we estimate the tail risk of 'ground up' Norwegian fire claims
using the value-at-risk and tail-conditional median measures. We monitor the
tail risk levels over time, for the period 1981 to 1992, and analyze
predictive performances of the six probability models. In particular, we
compute the next-year probability for a few upper tail events using the
fitted models and compare them with the actual probabilities. |
||
TI_16_4 |
Broniatowski,
Michel |
Université Pierre
et Marie Curie (Sorbonne Université) |
Title |
A review on
divergence-based inference in parametric and semiparametric models |
|
The Csiszar class of divergences has the main advantage to fit
to both parametric and non-parametric settings,
in contrast with other classes of dissimilarity indexes. Starting from the dual representation for Csiszar divergences the talk will fist provide a unified treatment for parametric inference, with some accent to
non-regular models, as occurs for the number and the nature of components in mixture models. We will then turn to semi parametric models of two kinds: firstly, we will consider mixtures with a parametric component and a
nonparametric one,
a useful class of models for applications. Other semi parametric models defined by
moment conditions have been widely considered in the present literature, rooting in the wellknown empirical likelihood paradigm (Owen 1980). We will show that divergence based approaches can be applied in semiparametric models de.ned by conditions on moments of L-statistics; typical examples are provided when considering models defined as neighborhoods of parametric classes,
such
as Weibull or Pareto ones, when those neighborhoods are
defined through conditions on their first L
moments. The basic dual representation of divergences in parametric and non arametric models have been considered independently by Liese and Vajda (2006) and Broniatowski and Keziou (2006,2009). Semi parametric mixtures have been considered in the frame of Csiszar divergence-based inference
in Al Mohamad and Bumahdaf (2016), and inference under L-moment conditions have been studied by Broniatowski and Decurninge (2017). |
||
TI_6_3 |
Bulut, Yakup Murat |
Eskişehir
Osman Gazi University |
Title |
Matrix variate extensions of symmetric and skew
Laplace distributions: Properties, parameter
estimation and applications |
|
In this work, we introduce symmetric and skew
matrix variate Laplace distributions using mixture approaches. To obtain
symmetric version of the matrix variate Laplace distribution, we use scale
mixture approach. To drive a skew version of the
matrix variate Laplace distribution, we apply the variance-mean
mixture approach. Some statistical properties of newly defined
distributions are investigated. Further, we give EM based algorithm to
estimate the unknown parameters. A small simulation study and a real data example
are given to explore the performance
of the proposed algorithm for finding the parameter
estimates and also to illustrate the capacity of the proposed
distribution for modeling matrix variate data sets. |
||
TI_27_3 |
Burkett,
Kelly |
University of Ottawa |
Title |
Markov chain Monte Carlo sampling of gene
genealogies conditional on genotype data from trios |
|
To discover genetic associations with disease,
it is useful to model the latent ancestral trees (gene genealogies) that gave
rise to the observed genetic variability. Though the true tree is unknown, we
model its distribution conditional on observed genetic data and use Monte
Carlo methods to sample from this distribution. In this presentation, I first
describe my sampler, ‘sampletrees’,
that conditions on data from unrelated individuals. I then discuss an
extension to the algorithm when the observed data is from trios, consisting
of two parents and a child. Finally, as illustration, the trio-based sampler
will be applied to real data. |
||
TI_06_2 |
Çelikbıçak,
Müge B. |
Gendermarie
and Coast Guard Academy |
Title |
Parameter Estimation in MANOVA with Repeated
Non-normal Measures |
|
Repeated
measures design which multiple observations are made on each experimental
unit play an important role in the health and behavioral sciences. In these
designs, there are many methods to the analysis of repeated measures data.
Statistically the difference between these methods is in the assumptions
underlying the models. Many of these methods are based on normality
assumptions. In this study, we introduce an alternative non-normal
distribution as a scale-mixture of normal distribution to analyze
multivariate repeated measure data. We use EM algorithm to obtain maximum
likelihood estimators of parameters of analysis of variance model for
multivariate repeated measure. |
||
TI_19_3 |
Chacko, Manjo |
University of Kerala, India |
Title |
Bayesian Analysis of Weibull distribution based
on Progressive type-II Censored Competing Risks Data |
|
In this work, we consider the analysis of
competing risk data under progressive type-II censoring by assuming the
number of units removed at each stage is random and follows a binomial
distribution. Bayes estimators are obtained by assuming the population under
consider follows a Weibull distribution. A simulation study is carried out to
study the performance of the different estimators derived in this paper. A
real data set is also used for illustration |
||
TI_11_4 |
Chaganty,
Rao |
Old Dominion University |
Title |
Models
for selecting differentially expressed genes in microarray experiments |
|
There have been many advances in microarray
technology, enabling researchers to quantitatively analyze expression levels
of thousands of genes simultaneously. Two types of microarray chips are
currently in practice - the spotted cDNA chip developed by microbiologists at
Stanford University in the mid-1990’s and the oligonucleotide array first
commercially released by Affymetrix Corporation in 1996. Our focus
is on the spotted cDNA chip, which is more popular than the later microarray.
In a cDNA microarray, or “two-channel array,” the experimental sample is
tagged with red dye and hybridized along with a reference sample tagged with
green dye on a chip which consists of thousands of spots. Each spot contains
preset oligonucleotides. The red and green intensities are measured at each
spot by using a fluorescent scanner. In this talk, we aim to discuss
bivariate statistical models for the red and green intensities, which enable us
to select differentially expressed genes. |
||
TI_41_1 |
Chang, Won |
University of Cincinnati |
Title |
Ice Model Calibration using
Semi-continuous Spatial Data |
|
Rapid changes in
Earth's cryosphere caused by human activity can lead to significant environmental
impacts. Computer models provide a useful tool for understanding the behavior
and projecting the future of Arctic and Antarctic ice sheets. However, these
models are typically subject to large parametric uncertainties due to poorly
constrained model input parameters that govern the behavior of simulated ice
sheets. Computer model calibration provides a formal statistical framework to
reduce and quantify the uncertainty due to such parameters.
Calibration of ice sheet models is often challenging because the
relevant model output and observational data take the form of semi-continuous
spatial data, with a point mass at zero and a right-skewed continuous
distribution for positive values. The current calibration approaches cannot
readily handle such data type. Here we introduce a hierarchical latent
variable model that sequentially handles binary spatial patterns and positive
continuous spatial patterns in two stages. To overcome challenges due to
high-dimensionality we use likelihood-based generalized principal component
analysis to impose low-dimensional structures on the latent variables for
spatial dependence. We demonstrate that our proposed reduced-dimension method
can successfully overcome the aforementioned challenges in the example of
calibrating PSU-3D ice model for the Antarctic ice sheet and
provide improved future ice-volume change projections. |
||
TI_8_0 |
Chatterjee,
Arpita |
Georgia Southern University |
Title |
Statistical
Advancements in Health Sciences |
|
Statistics plays a pivotal role in research,
planning, and decision-making in the health sciences. In recent years there
has been an increasing interest for new statistical methodologies
in the field of biomedical sciences. This session will address statistical
advances to explore complex data emerging from non-inferiority clinical
trials and microarray experiments. |
||
TI_8_4 |
Chatterjee,
Arpita |
Georgia Southern University |
Title |
An
Alternative Bayesian Testing to Establish
Non-inferiority. |
|
Noninferiority clinical trials have gained
immense popularity within the last decades. Such trials are designed to
demonstrate that a new experimental drug is not unacceptably worse than an
active control by more than a pre-specified small margin. Three-arm non-inferiority
trials have been widely acknowledged as the Gold Standard because they can
simultaneously establish both non-inferiority and the assay sensitivity.
Bayesian testing, based on the posterior probability, for Non-inferiority
trials, have already been established in the context of continuous and count
data. We propose a Bayesian non-inferiority test based on Bayes factors. The
performance of our proposed test is demonstrated through simulated data. |
||
TI_22_0 |
Chen, Din
(Org Lio, Yuhlong) |
University of North Carolina at Chapel
Hill |
Title |
Statistical Modeling for Degradation Data I |
|
In recent
years, statistical modeling and inference techniques have been developed
based on different degradation measures. This invited session is based on the
book “Statistical Modeling for Degradation Data” co-edited by Professors
Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony
Ng, Tzong-Ru Tsai, published by Springer in
2017. The book strives to bring
together experts engaged in statistical modeling and inference to present and
discuss the most recent important advances in degradation data analysis and
related applications. The speakers in
this session are invited to contribute to this book and further present their
recent development in this research area. |
||
TI_32_1 |
Chen, Din |
University of North Carolina at Chapel
Hill |
Title |
Homoscedasticity in the Accelerated Failure
Time Model |
|
The
semiparametric accelerated failure time (AFT) model is a popular linear model
in survival analysis. Current research based on the AFT model assumed
homoscedasticity of the survival data. Violation of this assumption has been
shown to lead to inefficient and even unreliable estimation, and hence,
misleading conclusions for survival data analysis. However, there is no valid
statistical test in the literature that can be utilized to test this
homoscedasticity assumption. This talk will discuss a novel quasi-likelihood
ratio test for the homoscedasticity assumption in the AFT model. Simulation
studies are conducted to show the satisfactory performance of this novel
statistical test. A real dataset is used to demonstrate the application of
this developed test. |
||
TI_9_1 |
Chen, Ding-Geng |
University of Pretoria, South Africa. |
Title |
A statistical distribution for simultaneously modeling
skewness, kurtosis and bimodality |
|
In our funded research on cusp catastrophe
modelling supported by USA NIH R01, we revitalized a family
of distributions defined as f(x, α,β)=φ×exp[αx+12βx2−14x4]
where α is the asymmetry parameter,
β is the bifurcation parameter and the φ is the normalizing
constant. This distribution is from the cusp catastrophe theory and was
developed in the early 1970s by Rene Thom (Thom, R. 1975. Structural
stability and morphogenesis. New York, NY: Benjamin-Addison-Wesley.) as part
of the catastrophe theory in topographic research which included 7 elementary
catastrophes (e.g., Fold, Cusp, Swallowtail, Elliptic Umbilic, Hyperbolic
Umbilic, Butterfly, and Parabolic Umbilic). This distribution also belongs to
the classical exponential family which can be used to statistically analyze
data with skewness, kurtosis and bimodal simultaneously. In this talk, we
will show the properties of this distribution and the parameter estimation
with the theory of maximum likelihood estimation. We further demonstrate the
applications of this distribution to analyze real data. |
||
TI_21_2 |
Chen, Guangliang |
San Jose State University |
Title |
All data are "documents": A scalable
spectral clustering framework based on landmark points and cosine similarity |
|
We present a unified scalable computing
framework for various versions of spectral clustering. We first consider the
special setting of cosine similarity for clustering sparse or low-dimensional
data and show that in such cases, spectral clustering can be implemented
without computing the weight matrix. Next, for general similarity, we
introduce a landmark-based technique to convert the given data (and the
selected landmarks) into a “document-term” matrix and then apply
the scalable implementation of spectral clustering with cosine similarity to
cluster them. We demonstrate the performance of our proposed algorithm on
several benchmark data sets while comparing it with other methods. |
||
TI_10_2 |
Cheng,
Chin-I |
Central Michigan University |
Title |
Bayesian estimators of the Odd Weibull
distribution with actuarial application |
|
The Odd Weibull distribution is a
three-parameter generalization of the Weibull and the inverse Weibull
distributions. The Bayesian approach with Jeffreys-type informative prior
for estimating parameters of the Odd Weibull are considered. The
propriety of the posterior distribution with proposed prior is provided. The
Metropolis-Hastings algorithm and Adaptive Rejection Metropolis Sampling
(ARMS) are adapted to generate random samples from full conditionals for
inferences on parameters. The estimates based on Bayesian and maximum
likelihood with application in actuarial dataset are compared. |
||
TI_47_3 |
Chhetri,
Sher B. |
University
of South Carolina, Sumter |
Title |
On the Beta-G Poisson Family |
|
In this talk, we present a new family of
distributions which is defined by using the genesis of the truncated Poisson
distribution and the beta distribution. Some mathematical properties of the new
family will be discussed. We also discuss the parameter estimation procedures
and potential applications of such generalized family of distributions. |
||
TI_9_0 |
Coelho,
Carlos Agra |
Universidade
Nova de Lisboa, Portugal |
Title |
Contemporary Methods in Distribution Theory and
Likelihood Inference |
|
Recent results in the areas of Distribution
Theory and Likelihood Inference that will be
presented include: distributions adequate for simultaneously
modeling skewness, kurtosis and bimodality, as well as multivariate
skewness and heavy tails and yet likelihood ratio tests for elaborate
covariance structures, based on samples of random sizes. |
||
TI_10_0 |
Cooray,
Kahadawala |
Central Michigan University |
Title |
Parametric models for Actuarial
Applications |
|
This session presents a new copula to
account for negative association with financial application, a new
Pareto extension with applications to insurance data,
new copula families by distorting the existing copulas with
applications in financial risk management, and Bayesian estimation of the Odd
Weibull parameters with applications to insurance data. |
||
TI_15_4 |
Daniels,
John |
Central Michigan University |
Title |
Seeing
RED: A New Statistical Solution to an
Old Categorical Data Problem |
|
Dental
morphological traits (DMT) are often used to conduct inference on cultural
populations. Often, the statistical
“distance” between various populations is described using techniques such as
Mean Measure of Divergence (MMD) or pseudo-Mahalanobis
D2. These techniques, although common
in Anthropology Research, have some significant drawbacks. First, MMD requires data compression into a
dichotomized presence/absence indication at some arbitrary cutoff point. Second, the total sample size will be
reduced in the presence of any missing values. This can be problematic with compromised or
smaller data sets. A newly developed
non-parametric method, Robust Estimator of Differences (RED) is proposed as a
viable alternative. Utilizing both
actual data and simulated data (with a known relationship), we will use both
PCA and Cluster Analysis to determine the relationships between various
cultural groups. The results will show
that RED can outperform either method and is a viable alternative for
Anthropologists to consider. |
||
TI_46_4 |
Davies,
Katherine |
University of Manitoba |
Title |
Progressively Type-II Censored Competing Risks
Data from the Linear Exponential Distribution |
|
Across different types of lifetime studies,
whether it be in the medical or engineering sciences, the possibility of
competing causes of failures needs to be addressed. Typically referred to as
competing risks, in this paper we consider progressively type-II censored
competing risks data when the lifetimes are assumed to come from a linear
exponential distribution. We develop likelihood inference and demonstrate the
performance of the estimators via an extensive Monte Carlo simulation study.
We also provide an illustrative example using a small data set. |
||
TI_20_3 |
Davila,
Victor Hugo Lachos |
University of Connecticut |
Title |
Finite mixture modeling of censored data using the multivariate skew-normal
distribution |
|
Longitudinal
HIV-1 RNA viral load measures are often subjected to censoring due to upper
and lower detection limits depending on the quantification assays. A
complication arises when these continuous measures present a heavy-tailed
behavior because inference can be seriously affected by the misspecification
of their parametric distribution. For such data structures, we propose a
robust nonlinear censored regression model based on the scale mixtures of
normal distributions. For taking into account the autocorrelation
existing among irregularly observed measures, a damped exponential
correlation structure is considered. A stochastic approximation of the EM
algorithm is developed to obtain the maximum likelihood estimates of the
model parameters. The main advantage of this new procedure allows us to
estimate the parameters of interest and evaluate the log-likelihood function
in an easy and fast way. Furthermore, the standard errors of the fixed
effects and predictions of unobservable values of the response can be
obtained as a by-product. The practical utility of the proposed method is
exemplified using both simulated and real data. |
||
TI_19_2 |
Dharmaja, S.H.S. |
Govt. College for Women, Trivandrum,
India |
Title |
On logarithmic Kies
distribution |
|
In this paper we consider a logarithmic form of
the Kies distribution and discuss some of its important
properties. We derive explicit expressions for its percentile measures, raw
moments, reliability measures etc. and attempted the maximum likelihood
estimation of the parameters of the distribution. Certain real-life
applications are also considered for illustrating the usefulness of the
proposed distribution compared to existing models. Also, the asymptotic behaviour of likelihood estimators are studied by using
simulated data sets. |
||
TI_11_0 |
Diawara,
Norou |
Old Dominion
University |
Title |
Statistical Methods for Space and Time
Applications |
|
|
||
TI_15_3 |
Diawara,
Norou |
Old Dominion
University |
Title |
Density Estimation of Spatio-temporal
Point Patterns using Moran's Statistic |
|
In this paper, an Inflated Size-biased Modified
Power Series Distributions (ISBMPSD), where inflation occurs at any of the
support points is studied. This class include among others the size-biased
generalized Poisson distribution, size-biased generalized negative binomial
distribution and size-biased generalized logarithmic series distribution as
its particular cases. We obtain the recurrence relations among ordinary,
central and factorial moments. The maximum likelihood and Bayesian estimation
of the parameters of the Inflated Size-biased MPSD is obtained. As special
cases, results are extracted for size-biased generalized Poisson
distribution, size-biased generalized negative binomial distribution and
size-biased generalized logarithmic series distribution. Finally, an example
is presented for the size-biased generalized Poisson distribution to
illustrate the results and a goodness of fit test is done using the maximum
likelihood and Bayes estimators. |
||
TI_43_1 |
Dong, Yuexiao |
Temple University |
Title |
On dual model-free variable selection with two groups
of variables |
|
In the presence of two groups of variables,
existing model-free variable selection methods only reduce the dimensionality
of the predictors. We extend the popular marginal coordinate hypotheses
(Cook, 2004) in the sufficient dimension reduction literature and consider
the dual marginal coordinate hypotheses, where the role of the predictor and
the response is not important. Motivated by canonical correlation analysis
(CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses
and devise a joint backward selection algorithm for dual model-free variable
selection. The performances of the proposed test and the variable selection
procedure are evaluated through synthetic examples and a real data
analysis. |
||
TI_33_4 |
Duval,
Francis |
Université du
Québec à Montréal (UQAM) |
Title |
Gradient Boosting-Based Model for Individual
Loss Reserving |
|
Modeling based on data information is one of
the most challenging research topics in actuarial
science. Statistical learning approaches offer a set of tools
that could be used to evaluate loss reserves in an individual
framework. In this talk, we contrast some traditional aggregate
techniques with individual models based on both parametric and gradient
boosting algorithms. These models use information about each of the
payments made for each of the claims in the portfolio, as well as
characteristics of the insured. We provide an example based on a dataset
from an insurance company and we discuss some points related to practical
applications. |
||
TI_1_3 |
El Ktaibi, Farid |
ZAYED university, UAE |
Title |
Bootstrapping the Empirical Distribution of a
Stationary Process with Change-point |
|
When detecting a change-point in the marginal distribution
of a stationary time series, bootstrap techniques are required to determine
critical values for the tests when the pre-change distribution is unknown. In
this presentation, we propose a sequential moving block bootstrap and
demonstrate its validity under a converging alternative. Furthermore, we
demonstrate that power is still achieved by the bootstrap under a
non-converging alternative. These results are applied to a linear process and
are shown to be valid under very mild conditions on the existence of any
moment of the innovations and a corresponding condition of summability of the coefficients. |
||
TI_2_1 |
Elkadry, Alaa |
Marshall University |
Title |
Analyzing Continuous Randomized Response Data
with an Indifference-Zone Selection Procedure |
|
A randomized response model applicable to
continuous data that considers a mixture of two normal distributions is
considered. The target here is to select the population with the best
parameter value. A study on how to choose the best population between k distinct
populations using an indifference-zone procedure is provided. Also, the
operating characteristics (OCs) of a subset ranking and
selection procedure are derived for the randomized response model for
continuous data considered. The operating characteristics for the subset
selection procedures are considered for two parameter configurations,
the slippage configuration and the equi-spaced
configuration. |
||
TI_23_2 |
Ferreira,
Johan |
University of Pretoria |
Title |
Alternative Dirichlet priors for estimation of
Shannon entropy using countably discrete likelihoods |
|
Claude Shannon‘s seminal paper “A Mathematical
Theory of Communication” is widely considered as the basis of information
theory. Shannon entropy is a functional of a probability structure and is a
measurement of information contained in a system. It has been applied as a
cryptographic measure for a key generator module, for mining part of the
security of the cipher system. In a machine-learning context, entropy is used
to define an error function as part of the learning of weights in multilayers
perceptron in neural networks. The practical problem of estimating entropy
from samples (sometimes small samples) in many applied settings remains a
challenging and relevant problem. In this presentation, previously
unconsidered Dirichlet generators are introduced as possible priors for an
underlying countably discrete model (in particular, the multinomial model).
Resultant estimators for the entropy H(p) under the considered priors and
assuming squared error loss will be presented. Particular cases of these
proposed priors will of interest and their effect on the estimation of
entropy subject to different parameter scenarios will be investigated. |
||
TI_44_4 |
Fisher,
Thomas |
Miami University |
Title |
A split and merge strategy to
variable selection |
|
The
curse of dimensionality, where p is large relative to n, is
a well-known problem that can affect variable selection methods as well as
model performance. We consider an algorithm similar to k-fold cross-validation
where we segment the feature variables into subsets, variable selection
(LASSO or others) is performed within the subset and the final set of
selected variables is aggregated for a final model. Simulations show that
this approach has comparable performance to standard techniques with the
added benefit of improved computational run time. The method easily can be
parallelized for further improved efficiency. |
||
TI_12_0 |
Flegal,
James M. |
University of California,
Riverside |
Title |
Advances in
Bayesian Theory and Computation |
|
Bayesian
computation remains an active theoretical and
practical research area. Talks in this session consider
Bayesian penalized regression models under a unified
framework, locally adaptive shrinkage in the Bayesian framework, weighted
batch means variance estimators for MCMC output analysis, and
recent developments concerning a graph-based Bayesian approach
to semi-supervised learning. |
||
TI_12_3 |
Flegal,
James M. |
University of California,
Riverside |
Title |
Weighted batch
means estimators in Markov chain Monte Carlo |
|
We propose a
family of weighted batch means variance estimators, which are computationally
efficient and can be conveniently applied in practice. The focus is on Markov
chain Monte Carlo simulations and estimation of the asymptotic covariance
matrix in the Markov chain central limit theorem, where conditions ensuring
strong consistency are provided. Finite sample performance is evaluated
through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic
regression examples, where the new estimators show significant computational
gains with a minor sacrifice in variance compared with existing
methods. |
||
TI_11_3 |
Fofana, Demba |
University of Texas Rio Grande Valley |
Title |
Combining
Assumptions and Graphical Network into Gene Expression Data Analysis |
|
Analyzing
properly gene expression data is a daunting task that requires taking both
assumptions and network relationships among genes into consideration.
Combining these different elements cannot only improve statistical power, but
also provide a better framework through which gene expression can be better
analyzed. We propose a novel statistical model that combines assumptions and
gene network information into the analysis. Assumptions are important since
every test statistic is valid only when required assumptions hold. We
incorporate gene network information into the analysis because neighboring
genes share biological functions. This correlation factor is taken into account via similar prior probabilities for
neighboring genes. With a series of simulations our approach is compared with
other approaches. Our method that combines assumptions and network
information into the analysis is shown to be more powerful. We will provide
an R package to help use this approach. |
||
TI_31_2 |
Galoppo,
Travis + Kogan, Clark |
ABB US Corporate Research |
Title |
A GPU
Enhanced Bayesian Ordinal Logistic Regression Model of Hospital Antimicrobial
Usage |
|
Bayesian data
analysis has a high computational demand, with a critical bottleneck in the
evaluation of data likelihood. When data samples are independent, there is
significant opportunity for parallelization of the data likelihood
calculation. We demonstrate a prototype GPU enhanced Gibbs sampler
implementation using NVIDIA CUDA, applying a Bayesian ordinal logistic
regression to a large dataset of antimicrobial usage in hospitals. Our
implementation offloads only the data likelihood calculation to the GPU,
while maintaining the core sampling logic on the CPU. We compare our results
to other popular software packages, both to verify correctness and to
showcase performance. |
||
TI_22_2 |
Gao, Yong |
Ohio University |
Title |
A
Hierarchical Bayesian Bi-exponential Wiener Process for Luminosity
Degradation of Display Products |
|
This
presentation will discuss a nonlinear Wiener process degradation model for
analyzing the luminosity degradation of display products. To account for the
nonlinear two-phase pattern in the observed degradation paths, we assume the
bi-exponential function as the drift function of the Wiener process
degradation model. The hierarchical Bayesian modeling framework is adopted to
construct the model. The failure-time distribution of a unit randomly
selected from the population is obtained.
Prediction results are compared to the results from two alternative
models, a bi-exponential degradation-path model and a time-scale transformed
linear Wiener process. |
||
TI_13_0 |
George,
Olusegun |
The University of Memphis |
Title |
Exchangeability
in Statistical Inference - Theory and Applications |
|
It is well
documented that exchangeability is at the heart of statistical
inference. The ground-breaking representation theorem
of De Finetti (1931) on infinite
exchangeability has had profound impact in the modeling
clustered data. This special session is
dedicated recent applications of finite and
infinite exchangeability to analysis of clustered data. |
||
TI_5_0 |
George,
Tyler (Org - Amezziane,M.) |
Central Michigan University |
Title |
TX Family:
Extensions and Inference |
|
TX family is
a class of families formed through the compounding of distributions. Such
operation allows the generated distribution to inherit the parameters of the compounded
distributions but not necessarily their properties. This session explores
different problems that can be solved using the flexibility of the TX
distributions. |
||
TI_14_0 |
Ghosh,
Indranil |
University of North Carolina, Wilmington |
Title |
Probability
and Statistical models with applications |
|
This session
represents some of the recent developments and some of the noteworthy results
in distribution theory (both in discrete and in the continuous
paradigm). In addition, several application(s) and a through discussion
on the associated statistical inference are also discussed. |
||
TI_32_2 |
Ghosh,
Indranil |
University of North Carolina, Wilmington |
Title |
Bivariate
Beta and Kumaraswamy Models developed using the Arnold-Ng Bivariate
Beta Distribution |
|
In this
paper we explore some mechanisms for constructing bivariate and multivariate
beta and Kumaraswamy distributions. Specifically, we focus our
attention on the Arnold-Ng (2011) eight parameter bivariate beta model.
Several models in the literature are identified as special cases of this
distribution including the Jones-Olkin-Liu-Libby-Novick bivariate beta
model, and certain Kotz and Nadarajah bivariate
beta models among others. The utility of such models in constructing
bivariate Kumaraswamy models is investigated. Structural properties
of such derived models are studied. Parameter estimation for the models is
also discussed. For illustrative purposes, a real-life data set is considered
to exhibit the applicability of these models in comparison with rival
bivariate beta and Kumaraswamy models. |
||
TI_8_1 |
Ghosh, Santu |
Medical College of Georgia, Augusta
University |
Title |
Two-sample
Tests for High Dimensional Means with Prepivoting and Random
Projection |
|
Within the
medical field, the demand to store and analyze small sample, large variable
data has become ever-abundant. Several two-sample tests for equality of
means, including the revered Hotelling's T2 test,
have already been established when the combined sample size of both
populations exceeds the dimension of the variables. However, tests such as Hotelling's T2 become either unusable or
output small power when the number of variables is greater than the combined
sample size. We propose a test using both pre-pivoting and an
Edgeworth expansion that maintains high power in this higher
dimensional scenario, known as the ``large p small n"
problem. Our test's finite sample performance is compared with other recently
proposed tests designed to also handle the large p small n situation.
We apply our test to a microarray gene expression data set and
report competitive rates for both power and Type-I error. |
||
TI_14_1 |
Ghosh, Souparno |
Texas Tech University |
Title |
Coherent
Multivariate Feature Selection and Inference across multiple databases |
|
Random forest
(RF) has become a widely popular prediction generating mechanism. Its
strength lies in its flexibility, interpretability and ability to handle
large number of features, typically larger than the sample size. However,
this methodology is of limited use if one wishes to identify statistically
significant features. Several ranking schemes are available that provide
information on the relative importance of the features, but there is a
paucity of general inferential mechanism, particularly in a multivariate set
up. We use the conditional inference tree framework to generate a RF
where features are deleted sequentially based on explicit hypothesis testing.
The resulting sequential algorithm offers an inferentially justifiable, but
model-free, variable selection procedure. Significant features are then used
to generate predictive RF. An added advantage of our methodology is that both
variable selection and prediction are based on conditional inference
framework and hence are coherent. Next, we extend this methodology to
model paired observations obtained from two pharmacogenomics databases where
the predictors are measured under different experimental protocols. Instead
of simply taking the average of the paired predictors, we offer a latent
variable approach that can impute over the databases and then perform
variable selection over the full set of paired samples across the
databases. We illustrate the performance of our Sequential
Multi-Response Feature Selection approach through simulation studies and
finally apply this methodology on Genomics of Drug Sensitivity for Cancer and
Cancer Cell line Encyclopedia databases to identify genetic characteristics
that significantly impact drug sensitivities. Significant set of predictors
obtained from our method are further validated from biological
perspective. |
||
TI_26_3 |
Gunasekera, Sumith |
The University of Tennessee at
Chattanooga |
Title |
On
Estimating the Reliability in a Multicomponent System based on
Progressively-Censored Data from Chen Distribution |
|
This research
deals with the classical, Bayesian, and generalized estimation of
stress-strength reliability parameter, R_{s, k} =Pr (At
least s of the (X_{1}, X_{2},...,X_{k}) exceed Y) = Pr (X_{k-s+1:k}>Y)
of an s-out-of-k: G multicomponent system, based on progressively type-II
right censored samples with random removals when stress (Y) and strength (X)
are two independent Chen random variables. Under squared-error and
LINEX loss functions, Bayes estimates are developed by using
Lindley's approximation and the Markov Chain Monte Carlo method. Generalized
estimates are developed by using generalized variable method while classical
estimates, the maximum likelihood estimators, their asymptotic
distributions, asymptotic confidence intervals, bootstrap-based confidence
intervals - are also developed. A simulation study and a real-world data
analysis are given to illustrate the proposed procedures. The size of the
test adjusted and unadjusted power of the test, coverage probability and
expected confidence lengths of the confidence intervals, and biases of the
estimators are also computed and compared and contrasted. |
||
TI_3_2 |
Hamdan,
Hasan |
James Madison University |
Title |
Approximating
and Characterizing Infinite Scale Mixtures |
|
In this
talk, an efficient method for approximating any infinite scale mixture by a
finite scale mixture up a specified tolerance level will be presented. Then
this method will be applied to approximate many common classes of infinite
scale mixtures. In particular, the method will be used to approximate
infinite scale mixtures of normals, infinite
scale mixtures of exponentials and infinite scale mixtures of uniforms.
Several important results related to infinite scale mixtures will be
presented with the focus on scale mixtures of normals.
An extension to the multivariate infinite scale mixtures and
to the class of infinite scale-location will be discussed. |
||
TI_3_1 |
Hamed, Duha |
Winthrop University |
Title |
New Families
of Generalized Lomax Distributions: Properties and Applications |
|
In this
talk, we propose some families of generalized Lomax distributions
named T-Lomax{Y} by using the methodology of
the T-R{Y} framework. The T-Lomax{Y} families introduced
arise from the quantile functions of exponential, logistic, log-logistic
and Weibull distributions. The shapes of
these T-Lomax{Y} distributions vary between unimodal and
bimodal. Various structural properties of the new families are derived
including moments, modes and Shannon entropies. Several new generalized Lomax
distributions are studied and the estimation of the model parameters for a
member of the new defined families of distributions is performed by the
maximum likelihood method. An application of real data set is used to
demonstrate the flexibility of this family of distributions. |
||
TI_16_0 |
Hannig, Jan
(organizer: Jana Jureckova |
The Czech Academy of Sciences, Charles University |
Title |
Nonlinear Functionals of Probability Distributions |
|
The talks of
the session characterize and estimate various functionals of probability distributions,
that are not only parameters, but which also analyze the shape of the
distribution and its relation to other distributions, as their mutual
dependence or the divergence. |
||
TI_16_3 |
Hannig, Jan |
University of North Carolina at Chapel
Hill |
Title |
Model
Selection without penalty using
Generalized Fiducial Inference |
|
Standard
penalized methods of variable selection and parameter estimation rely on the
magnitude of coefficient estimates to decide which variables to include in
the final model. However, coefficient estimates are unreliable when,
for example, the design matrix is collinear. To overcome this
challenge an entirely new perspective on variable selection is presented
within a generalized fiducial inference framework. We apply this idea
to two different problems. First, this new procedure is able to effectively
account for linear dependencies among subsets of covariates in a
high-dimensional regression setting. Second, we apply our variable selection
method to the sparse vector AR(1). |
||
TI_35_3 |
He, Wenqing |
Western University |
Title |
Perturbed
Variance Based Null Hypothesis Tests with An Application to Clayton
Models |
|
Null
hypothesis tests are popularly used when there is no appropriate alternative hypothesis
available, especially in model assessment where the assumed model is
evaluated with no model being considered an alternative. Motivated by
the test of the Clayton models in multivariate survival analysis, a simple
perturbed variance resampling method is proposed for null hypothesis testing.
The proposed methods make use of the perturbation method to estimate the
covariance matrix of the estimator to avoid intractable variance estimate for
the estimator. The proposed tests enjoy the simplicity and theoretical
justification. We apply the proposed method to modify the tests for the
assessment of Clayton models. The proposed methods have simpler
procedures than both the parametric bootstrap and the nonparametric bootstrap
and present promising performance as shown in the simulation studies. A
colon cancer study further illustrates the proposed methods. |
||
TI_33_3 |
Herrmann,
Klaus |
University of Sherbrooke |
Title |
The Extreme
Value Limit Theorem for Dependent Sequences of Random Variables |
|
Extreme value
theory is concerned with the limiting distribution of location-scale
transformed block-maxima Mn(X1, …, Xn) of a
sequence of identically distributed random variables (Xi)ni=1
defined on a common probability space (Ω,F,P). In case Xi, i ∈N, are independent, the weak limiting behaviour of appropriately location-scale transformed Mn
is adequately described by the classical Fisher-Tippett-Gnedenko theorem. In this presentation we are interested
in the case of dependent random variables Xi, i ∈N, while keeping a common marginal
distribution function F for all Xi, i ∈N. As dependence structures we consider Archimedean copulas and
discuss the connection between block-maxima and copula diagonals. This allows
one to derive a generalization of the Fisher-Tippett-Gnedenko theorem for Xi, i ∈N dependent according to Archimedean
copulas. We discuss connections to exchangeability and upper tail
independence. Finally, we illustrate the resulting limit laws and discuss
their properties. |
||
TI_11_2 |
Hitchcock,
David |
University of South Carolina |
Title |
A Spatio-temporal Model Relating Gage Height Data to
Precipitation at South Carolina Locations |
|
The
gage height of rivers (i.e., the height of the water’s surface) can be used
to help define flood events. We use a Conditionally Autoregressive
(CAR) model to relate gage height measured daily over five years (2011-2015)
at nearly 100 locations across South Carolina to several covariates. An
important covariate is the daily precipitation at these locations. Other
covariates considered include the elevation at the locations and a
fall-season indicator variable. We also include interactions in our
model. The spatial dependency is specified by defining catchment basins
as neighborhoods. We use a Bayesian approach to estimate our model
parameters. Both the temporal and spatial correlations in the model are
significant. Precipitation appears to have a positive effect on gage
height, and this effect is significantly greater during the fall season.
This is joint work with Haigang Liu and
S. Zahra Samadi. |
||
TI_41_3 |
Hu, Guanyu |
University of Connecticut |
Title |
A Bayesian
Joint Model of Marker and Intensity of Marked Spatial
Point Processes with Application to Basketball Shot Chart |
|
The success
rate of a basketball shot may be higher at locations in the court where
a player makes more shots. In a marked spatial point process model, this
means that the markers are dependent on the intensity of the process. We
develop a Bayesian joint model of the marker and the intensity of marked
spatial point processes, where the intensity is incorporated in the
model of the marker as a covariate. Further, we allow variable selection
through the spike-slab prior. Inferences are developed with a Markov
chain Monte Carlo algorithm to sample from the posterior distribution.
Two Bayesian model comparison criteria, the modified Deviance
Information Criterion and the modified Logarithm of the Pseudo-Marginal
Likelihood, are developed to assess the fit of different joint
models. The empirical performance of the proposed methods is
examined in extensive simulation studies. We apply the
proposed methodology to the 2017--2018 regular season shot data of
four professional basketball players in the NBA to analyze the spatial structure
of shot selection and field goal percentage. The results suggest that
the field goal percentages of all four players have are significantly
positively dependent on their shot intensities, and that different
players have different predictors for their field goal percentages |
||
TI_48_0 |
Huang,
Hsin-Hsiung |
University of Central Florida |
Title |
Statistical
Methodology for Big Data |
|
In this
session, the speakers will talk about various novel methods to handle problems
of real data which may have large sample sizes from different locations,
missing values, and other challenges. |
||
TI_48_1 |
Huang,
Hsin-Hsiung |
University of Central Florida |
Title |
A new
statistical strategy for predicting major depressive disorder using
whole-exome genotyping data |
|
Major
depressive disorder is a common and serious psychiatric disorder, which may
cause significant morbidity and mortality, and lead to high rates of suicide.
Genetic factors have been proven to play important roles in the development
of MDD. Recently, genome-wide association studies on common variants have
been studied. However, the large amount of missing values influences the
analysis results. In this paper, we proposed to treat the missing values as
distinct categories with various statistical classification models. The
classification results improve significant compared to imputation of the
missing values. |
||
TI_22_4 |
Jayalath,
Kalanka |
University of Houston - Clear Lake |
Title |
A Bayesian Survival
Analysis for the Inverse Gaussian Data |
|
This talk
focuses on a comprehensive survival analysis for the inverse Gaussian
distribution employing Bayesian and Fiducial approaches. The analysis
previously made in the literature required the distribution mean to be known,
which is unrealistic, and thus it restricted the scope of the investigation.
No such assumption is made here. Also, this study further includes an
illustration for survival analysis of data with random rightly censored
observations. The Gibbs sampling is employed in estimation and bootstrap
comparisons are made between the Bayesian and Fiducial estimates. It is
concluded that the size of censoring in data and the shape of inverse
Gaussian distribution have the most impact on the two analyses, Bayesian vs
Fiducial. |
||
TI_3_3 |
Johnston,
Douglas E |
State University of New York at
Farmingdale |
Title |
A Recursive
Bayesian Model for the Excess Distribution with Stochastic Parameters |
|
The
generalized extreme value (GEV) and Pareto (GPD) distributions are important
tools for analyzing extreme values such as large losses in financial
markets. In particular, the GPD is the canonical distribution for
modelling excess losses above a “high” threshold. This conditional
distribution is typically used for the computation of risk-metrics such as
expected shortfall (i.e., the conditional mean) and extreme quantiles. In our
work, we propose a new approach for analyzing extreme values by apply a
stochastic parametrization to the GPD distribution with the parameters
following a hidden stochastic process which results in a non-linear,
non-Gaussian state-space model with unknown static parameters. This
approach allows for dependencies, such as clustering of extremes, often
witnessed in financial data. To compute the predictive excess loss
distribution, we derive a Rao-Blackwellized particle
filter that reduces the parameter space, and a concise, recursive solution is
obtained. This has the benefit of improved filter performance and permits
real-time implementation. We introduce a new risk-measure that is a
more robust estimate for the expected shortfall and we illustrate
our results using both simulated data and actual stock market returns from
1928-2018. Finally, we compare our results to traditional methods of
estimating the excess loss distribution, such as maximum likelihood, to show
the improvement obtained. |
||
TI_12_1 |
Jones, Galin L. |
University of Minnesota |
Title |
Fully
Bayesian Penalized Regression with a Generalized Bridge Prior |
|
We consider penalized
regression models under a unified framework. The particular method is
determined by the form of the penalty term, which is typically chosen by
cross validation. We introduce a fully Bayesian approach that incorporates
both sparse and dense settings and show how to use a type of model averaging
approach to eliminate the nuisance penalty parameters and perform inference
through the marginal posterior distribution of the regression coefficients.
We establish tail robustness of the resulting estimator as well as
conditional and marginal posterior consistency for the Bayesian model. We
develop a component-wise Markov chain Monte Carlo algorithm for sampling.
Numerical results show that the method tends to select the optimal penalty
and performs well in both variable selection and prediction and is comparable
to, and often better than alternative methods. Both simulated and real data
examples are provided. |
||
TI_34_4 |
Kang, Sang
(John) |
The University of Western Ontario |
Title |
Moment-based density approximation techniques
as applied to heavy-tailed distributions |
|
Several
advances for the approximation and estimation of heavy-tailed distributions
are proposed. It is first explained that on initially applying
the Esscher transform to
heavy-tailed density functions, one can utilize a moment-based technique
whereby the tilted density functions are expressed as the product of a base
density function and a polynomial adjustment. Alternatively, density
approximants can be secured by appropriately truncating the distributions or
mapping them onto compact supports. Extensions to the context of density
estimation, in which case sample moments are employed in lieu of exact
moments are discussed, and illustrative applications involving actuarial data
sets are presented. |
||
TI_17_0 |
Kao,
Ming-Hung (Jason) |
Arizona State University |
Title |
Design and
analysis of complex experiments: Theory and applications |
|
The four
talks on the design and analysis of complex experiments in this session
include sub-data sections for big data, a large data issue in computer
experiments, a study on order-of-addition experiments, and an optimal
experimental design approach for functional data analysis. |
||
TI_24_4 |
Kapenga,
John |
Western Michigan University |
Title |
Computation of
High Dimensions Integrals |
|
Integrals in
dimensions from 20 to a few thousand have recently been used in several
applications including finance, Bayesian statistics and
quantum physics. Even infinitely dimension integrals have been
attacked numerically. Traditional numerical methods and the usual
Monte Carlo methods cannot be applied as the
dimension increases beyond perhaps 20. A brief history and the
status of effective current lattice methods, such as the fast CBC
construction, will be presented. Several examples and timings
will be included. |
||
TI_30_3 |
Kim, Jong
Min |
University of Minnesota-Morris |
Title |
Change point
detection method with copula conditional distribution to multistage
sequential control chart |
|
In this research
we propose change point model of the multistage Statistical Process Control
(SPC) chart for high correlated multivariate data via copula conditional
distribution, principal component analysis (PCA) and functional PCA.
Furthermore, we review the current available multistage statistical process
control charts. In addition, to verify our proposed change point model, we
compare the current change point models of the single stage SPC chart via PCA
with our change point model for the multistage SPC chart via copula
conditional distribution, PCA and functional PCA with highly correlated
multistage simulated and real data |
||
TI_18_0 |
Kozubowski,
Tomasz |
University of Nevada |
Title |
Discrete
Stochastic Models and Applications |
|
Discrete
stochastic models are an essential part of statistician’s toolbox, as they
are widely used across many areas of applications. The session focuses on
recent developments in this important area, and its scope is rather broad,
from univariate to multivariate discrete distributions, including
hybrid models with discrete as well as continuous components, heavy-tail
distributions, and their applications. |
||
TI_36_3 |
Kozubowski,
Tomasz |
University of Nevada |
Title |
Multivariate
models connected with random sums and maxima of dependent Pareto
components |
|
We present recent results concerning stochastic models for (X,Y,N), where X and Y, respectively, are the sum and the maximum of N dependent,
heavy tailed Pareto components.
Models of this form are desirable in many applications,
ranging from hydro-climatology, to finance and insurance. Our construction is built upon
a pivotal model involving a deterministic number of IID exponential variables, where the basic characteristics of the involved multivariate distributions admit explicit forms.
In addition to theoretical results, we shall present real data examples, illustrating the usefulness of these models |
||
TI_26_2 |
Krishnamoorthy,
Kalimuthu |
University of Louisiana at
Lafayette |
Title |
Fiducial
Inference with Applications |
|
Fiducial
distribution for a parameter is essentially the posterior distribution with
no a prior distribution on the parameter. In this talk, we shall describe
Fisher's method of finding a fiducial distribution for normal parameters and
fiducial inference through examples involving well-known distributions such
as the normal and related distributions. We then describe the approach for
finding fiducial distributions for the parameters of a location-scale family
and for discrete distributions. We illustrate the approach for the Weibull
distribution and delta-lognormal distribution. In particular, we shall
see fiducial methods for finding confidence intervals, prediction intervals,
prediction limits for the mean of a future sample. |
||
TI_19_0 |
Kumar, C.
Satheesh |
University of Kerala, Trivandrum, India |
Title |
Distribution
Theory |
|
The session
consists of four talks - the first two talks will be on Weibull related
classes of distributions, while the third talk on the analysis of competing risk
data under progressive type-II censoring. The session concludes with a talk
on certain classes of discrete distributions of order k. |
||
TI_19_4 |
Kumar, C.
Satheesh |
University of Kerala |
Title |
On a Wide
Class of Discrete Distribution |
|
Several
types of discrete distributions of order k are available in the literature
and they have been found extensive applications in many areas of scientific
research. In the present talk, we discuss certain new classes of discrete
distributions of order k, which are developed as distributions of the random
sum of certain independent and identically distributed Hirano type random
variables. We attempt to outline several important distributional properties
of these families of distributions along with a brief discussion on their
mixtures and limiting cases. |
||
TI_7_2 |
Lee, Gee |
Michigan State University |
Title |
General
insurance deductible ratemaking (and extensions) |
|
Insurance
claims have deductibles, which must be considered when pricing for insurance premium.
The deductible may cause censoring and truncation to the insurance claims. In
this talk, an overview of deductible ratemaking will be provided, and the
pros and cons of two deductible ratemaking approaches will be compared; the
regression approach, and the maximum likelihood approach. The regression
approach turns out to have an advantage in predicting aggregate claims, while
the maximum likelihood approach has an advantage when calculating
theoretically correct relativities for deductible levels beyond those
observed by empirical data. A comparison of selected models show that
the usage of long-tail severity distributions may improve the deductible
rating, while the 01-inflated frequency model may have limited advantages due
to estimation issues under censoring and truncation. For demonstration,
loss models fit to the Wisconsin Local Government Property Insurance Fund
(LGPIF) data will be illustrated, and examples will be provided for the
ratemaking of per-loss deductibles offered by the fund. |
||
TI_22_3 |
Lee, I-Chen |
National Cheng-Kung University |
Title |
Global
Planning of Accelerated Degradation Tests |
|
The
accelerated degradation test (ADT) is an efficient tool for assessing the
life-time information of highly reliable products. Without taking the experimental
cost into consideration, recently, an analytical approach was proposed in the
literature to determine the optimum stress levels and the corresponding
optimum sample size allocation simultaneously in a general class of
exponential dispersion (ED) degradation models. However, conducting an ADT is
very expensive. Therefore, how to conduct a cost-constrained ADT plan is a
great challenging issue for reliability analysts. By taking the experimental
cost into consideration, this study further proposes a semi-analytical
procedure to determine the total sample size, the measurement frequencies,
and number of measurements (within a degradation path) globally under the
class of ED degradation models. An example is used to demonstrate that our
proposed method is very efficient to obtain the cost-constrained ADT plan,
compared with the conventional optimum plan by the grid search algorithm. |
||
TI_24_2 |
Lee, Kevin |
Western Michigan University |
Title |
Temporal Exponential-Family
Random Graph Models with Time-Evolving Latent Block Structure for Dynamic
Networks |
|
Model-based
clustering of dynamic networks has emerged as an essential research topic in
statistical network analysis. We present a principled statistical clustering
of dynamic networks through the temporal exponential-family random graph
models with a hidden Markov structure. The temporal exponential-family random
graph models allow us to detect groups based on interesting features of the
dynamic networks and the hidden Markov structure is used to infer the
time-evolving block structure of dynamic networks. The power of our proposed
method is demonstrated in real-world applications. |
||
TI_20_0 |
Levine,
Michael |
Purdue University |
Title |
Recent
advances involving latent variable models for various distributions |
|
This session
is dedicated to some new developments in latent variable models. Models for
specific distributions that are widely used in practice as well as the
nonparametric latent variable models will be discussed. Moreover, some
models for new types of data lying in non-Euclidean spaces will also be
considered. Taken together, the models discussed in this section are capable
of modeling a very wide range of data with some hidden/unobservable structure. |
||
TI_20_1 |
Levine,
Michael |
Purdue University |
Title |
Estimation
of two-component skew normal mixtures where one component is known |
|
Two
component mixtures have a special relevance for binary classification
problems. In the standard setting for binary classification, labeled samples
from both components are available in the form of training data. However,
many real-world problems do not fall in this standard paradigm. For example,
in social networks users may only be allowed to click `like' (if there is no
`dislike' button) for a particular product. Thus, labeled data can be
collected only for one of the components (a sample containing users who
clicked `like'). In addition, unlabeled data from the mixture (a sample
containing all users) is also available. To guarantee unimodality of
components and allow for the skewness, we model the components with a skew
normal family, a generalization of the Gaussian family with good theoretical
properties and tractable inference. An efficient algorithm that
estimates a mixture proportion as well as the parameters of the unknown
component is proposed. We illustrate its performance using a
well-designed simulation study. |
||
TI_21_0 |
Li, Daoji |
California State University
Fullerton |
Title |
Big Data and
Dimension Reduction |
|
This session
will present recent advances in big data and dimension reduction, including
optimal subsampling for massive data, scalable spectral clustering framework,
Robust PCA, and High-dimensional interaction detection. |
||
TI_21_4 |
Li, Daoji |
California State
University Fullerton |
Title |
High-dimensional
interaction detection with false sign rate control |
|
Understanding
how features interact with each other is of paramount importance in many
scientific discoveries and contemporary applications. Yet
interaction identification becomes challenging even for a moderate
number of covariates. In this paper, we suggest an efficient and
flexible procedure for interaction identification in ultra-high
dimensions. Under a fairly general framework, we establish that for both
interactions and main effects, the method enjoys oracle inequalities
in selection. We prove that our method admits an explicit
bound on the false sign rate, which can be asymptotically vanishing. Our
method and theoretical results are supported by several simulation and
real data examples. |
||
TI_48_2 |
Li, Keren |
Northwestern University |
Title |
Score-Matching
Representative Approach for Big Data Analysis with Generalized Linear Models |
|
We propose a
fast and efficient strategy, called the representative approach, for big data
analysis with linear models and generalized linear models. With a given
partition of big dataset, this approach constructs a representative data
point for each data block and fits the target model using the representative
dataset. In terms of time complexity, it is as fast as the subsampling
approaches in the literature. As for efficiency, its accuracy in estimating
parameters is better than the divide-and-conquer method. With comprehensive
simulation studies and theoretical justifications, we recommend two
representative approaches. For linear models or generalized linear models
with a flat inverse link function and moderate coefficients of continuous
variables, we recommend mean representatives (MR). For other cases, we
recommend score-matching representatives (SMR). As an illustrative
application to the Airline on-time performance data, MR and SMR are as good
as the full data estimate when available. Furthermore, the proposed representative
strategy is ideal for analyzing massive data dispersed over a network of
interconnected computers” |
||
TI_46_0 |
Lio, Yuhlong |
University of South Dakota |
Title |
Statistical
Modeling for Degradation Data II |
|
In recent years,
statistical modeling and inference techniques have been developed based on
different degradation measures. This invited session is based on the book
“Statistical Modeling for Degradation Data” co-edited by Professors Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony Ng, Tzong-Ru Tsai, published by Springer in 2017. The book strives to bring together experts
engaged in statistical modeling and inference to present and discuss the most
recent important advances in degradation data analysis and related
applications. The speakers in this
session are invited to contribute to this book and further present their
recent development in this research area. |
||
TI_32_3 |
Lio, Yuhlong |
University of South Dakota |
Title |
Estimation of
Stress-Strength for Burr XII distribution based on the progressively first
failure-censored samples |
|
Stress-strength
is studied under the progressively first failure-censored samples from Burr
XII distributions. Confidence intervals for stress-strength
constructed respectively by using variate procedures are discussed.
Some computation results from simulation study are presented and an
illustrative example is provided for demonstration. |
||
TI_40_2 |
Liu, Ruiqi |
Indiana University Purdue University Indianapolis |
Title |
Optimal
Nonparametric Inference via Deep Neural Network |
|
The
deep neural network is a state-of-art method in modern science and
technology. Much statistical literature has been devoted to understanding its
performance in nonparametric estimation, whereas the results are suboptimal
due to a redundant logarithmic sacrifice. In this work, we show that such
log-factors are not necessary. We derive upper bounds for the L^2
minimax risk in nonparametric estimation. Sufficient conditions on network
architectures are provided such that the upper bounds become optimal (without
log-sacrifice). Our proof relies on an explicitly constructed network
estimator based on tensor product B-splines. We also derive asymptotic
distributions for the constructed network and a relating hypothesis testing
procedure. The testing procedure is further proven as minimax optimal under
suitable network architectures. |
||
TI_47_2 |
Long,
Hongwei |
Florida Atlantic University,
Baca Raton, FL. |
Title |
The Beta Transmuted
Pareto Distribution: Theory and Applications |
|
In this
talk, we present a composite generalizer of the Pareto distribution. The
genesis of the beta distribution and transmuted map is used to develop the
so-called beta transmuted Pareto (BTP) distribution. Several mathematical
properties including moments, mean deviation, probability weighted moments,
residual life, distribution of order statistics and the reliability analysis
are discussed. The method of maximum likelihood is proposed to estimate the
parameters of the distribution. We illustrate the usefulness of the proposed
distribution by presenting its application to model real-life data
sets. |
||
TI_33_2 |
Mailhot,
Melina |
University of Concordia |
Title |
Multivariate
geometric expectiles and range value-at-risk |
|
Geometric
generalizations of expectiles and Range
Value-at-Risk for d-dimensional multivariate distribution functions will be
introduced. Multivariate geometric expectiles are unique
solutions to a convex risk minimization problem and are given by
d-dimensional vectors. Multivariate geometric Range Value-at-Risk is also a
risk measure considering tail events, which has TVaR
as a special case. They are well behaved under common data transformations.
Properties and highlights on the influence of varying margins and dependence
structures will be presented. |
||
TI_8_2 |
Maity,
Arnab Kumar |
Pfizer Inc. |
Title |
Bayesian
Data Integration and Variable Selection for Pan-Cancer Survival Prediction
using Protein Expression Data |
|
Accurate
prognostic prediction using molecular information is a challenging area of
research which is essential to develop precision medicine. In this paper, we
develop translational models to identify major actionable proteins that are
associated with clinical outcomes like the survival time of the patients.
There are considerable statistical and computational challenges due to the
large dimension of the problems. Furthermore, the data are available for
different tumor types hence data integration for various tumors is desirable.
Having censored survival outcomes escalates one more level of
complexity in the inferential procedure. We develop Bayesian hierarchical
survival models which accommodate all these challenges aforementioned here.
We use hierarchical Bayesian accelerated failure time (AFT) model for the
survival regression. Furthermore, we assume sparse horseshoe prior
distribution for the regression coefficients to identify the major proteomic
drivers. We allow to borrow strength across tumor groups by introducing a
correlation structure among the prior distributions. The proposed methods
have been used to analyze data from the recently curated The Cancer Proteome
Atlas (TCPA) which contains RPPA based high quality protein expression data
as well as detailed clinical annotation including survival times. Our
simulation and the TCPA data analysis illustrate the efficacy of the proposed
integrative model which links different tumors with the correlated
prior structures. |
||
TI_30_2 |
Makubate,
Boikanyo |
Botswana International university of
Science and Technology |
Title |
A New
Generalized Weibull Distribution with Applications to Lifetime
Data |
|
A
new and generalized Weibull-type distribution is developed and
presented. Its properties are explored in detail. Some estimation
techniques including maximum likelihood estimation method are used
to estimate the model parameters and finally applications of the model to
real data sets are presented to illustrate the usefulness of the
proposed generalized distribution. |
||
TI_14_4 |
Mallick, Avishek |
Marshall University, West
Virginia |
Title |
An Inflated
Geometric Distribution and its application |
|
A count data
that have excess number of zeros, ones, twos or threes are common- place in
experimental studies. But these inflated frequencies at particular counts may
lead to over dispersion and thus may cause difficulty in data analysis. So to get appropriate results from them and to overcome
the possible anomalies in parameter estimation, we may need to consider
suitable inflated distribution. Generally, Inflated Poisson or Inflated
Negative Binomial distribution are the most commonly used for modeling and
analyzing such data. Geometric distribution is a special case of Negative
Binomial distribution. This work deals with parameter estimation of a
Geometric distribution inflated at certain counts, which we called
Generalized Inflated Geometric (GIG) distribution. Parameter estimation is
done using method of moments, empirical probability generating function based method and maximum likelihood estimation
approach. The three types of estimators are then compared using simulation
studies and finally a Swedish fertility dataset was modeled using a GIG
distribution. |
||
TI_42_1 |
Mandal,
Saumen |
University of Manitoba |
Title |
Constrained
optimal designs for estimating probabilities in contingency tables |
|
Construction
of optimizing probability distributions plays an important role in many areas
of statistical research. One example is estimation of cell probabilities in
contingency tables. It is well known that the unconstrained maximum
likelihood estimation of the cell probabilities is quite straightforward.
However, the presence of constraints on the probabilities makes the problem
quite challenging. For example, the constraints could be based on a
hypothesis of marginal homogeneity. In this work, we attempt to solve the
constrained maximum likelihood problem using optimal design theory, Lagrangian theory and simultaneous optimization
techniques. This is an optimization problem with respect to variables that
satisfy several constraints. We first formulate the Lagrangian function
with the constraints, and then transform the problem to that of maximizing a
number of functions of the cell probabilities simultaneously. These functions
have a common maximum of zero that is simultaneously attained at the optimum.
We then apply the methodology in some real data sets. Finally, we discuss
that our approach is flexible and provide a unified framework for various
types of constrained optimization problems. |
||
TI_23_0 |
Marques,
Filipe |
Universidade
NOVA de Lisboa, Portugal |
Title |
Advances in
distribution theory and statistical methodologies |
|
|
||
TI_24_0 |
McKean,
Joseph |
Western Michigan University |
Title |
Big Data: Algorithms,
Methodology, and Applications |
|
Statisticians
and Data Scientists must face the challenges of Big Data. In these
talks, new algorithms and procedures (robust and traditional) are
discussed to handle these challenges. Algorithm optimization in
terms of error distributions are discussed. Application areas
covered, include astronomical data, network analysis, and numerical
integration. |
||
TI_10_1 |
Mdziniso,
Nonhle Channon |
Bloomsburg University of
Pennsylvania |
Title |
Odd Pareto
families of distributions for modeling loss payment data |
|
A
three-parameter Odd Pareto (OP) distribution is presented with density
function having a flexible upper tail in modeling loss payment data. The OP
distribution is derived by considering the distributions of the odds of the
Pareto and inverse Pareto distributions. Basic properties of the OP
distribution are studied. Simulation studies based on the maximum likelihood
method are conducted to compare the OP with other Pareto-type distributions.
Furthermore, examples from the Norwegian fire insurance claims data-set are
provided to illustrate the upper-tail flexibility of the distribution.
Extensions of the Odd Pareto distribution are also considered to improve the
fitting of data. |
||
TI_46_3 |
MeInykov,
Volodymyr |
The
University of Alabama |
Title |
On
Model-Based Clustering of Time-Dependent Categorical Sequences |
|
Clustering
categorical sequences is an important problem that arises in many fields such
as medicine, sociology, and economics. It is a challenging task due to the
fact that there is a lack of techniques for clustering categorical data as
the majority of traditional clustering procedures are designed for handling
quantitative observations. Situations with categorical data being related to
time are even more troublesome. We propose a mixture-based approach for
clustering categorical sequences and apply the developed methodology to a
real-life data set containing sequences of life events for respondents
participating in the British Household Panel Survey. |
||
TI_25_4 |
Melnykov,
Igor |
Colorado State University |
Title |
Positive and
negative equivalence constraints in the semi-supervised K-means algorithm |
|
K-means algorithm
is a widely used clustering procedure thanks to its intuitive design and
computational simplicity. The objective function of the algorithm has a clear
interpretation when the algorithm is applied as an unsupervised method. In a
semi-supervised setting, when certain restrictions are imposed on the
solution, modifications of the objective function are necessary. We consider
two classes of equivalence constraints that may influence the proposed
clustering solution. We propose a method making both kinds of restrictions a
part of the fabric of the algorithm and provide the necessary modifications
of its objective function |
||
TI_25_0 |
Melnykov,
Volodymyr |
The University of Alabama |
Title |
New
developments in finite mixture modeling with applications |
|
Finite
mixtures present a flexible tool for modeling heterogeneity in data.
Model-based cluster analysis is the most famous application of mixture
models. The session covers novel methodological developments in this area and
considers various applications. |
||
TI_25_1 |
Melnykov,
Yana |
The University of Alabama |
Title |
On finite
mixture modeling of processes with change points |
|
We consider
a novel framework for modeling heterogeneous processes with change points.
The proposed finite mixture model can effectively take into
account the potential presence of change points. Conducted simulation
studies show that the model can correctly assess the mixture order as well as
the location of change points within mixture components. The application to
real-life data yields promising results. |
||
TI_25_2 |
Michael, Semhar |
South Dakota State University |
Title |
Finite
mixture of regression models for data from complex survey design |
|
We explored
the use of finite mixture regression models when the samples were drawn using
a stratified sampling design. We developed a new design-based inference where
we integrated sampling weights in the complete-data log-likelihood function.
The expectation-maximization algorithm was derived accordingly. A simulation
study was conducted to compare the proposed method with the finite mixture of
a regression model. The comparison was done using bias-variance components of
mean square error with interesting results. Additionally, a simulation study
was conducted to assess the ability of the Bayesian information criterion to
select the optimal number of components under the proposed modeling approach |
||
TI_34_3 |
Mohsenipour,
Akbar |
Vivametrica |
Title |
Approximating
the distribution
of various types of quadratic expressions on
the basis of their moments |
|
Several
moment-based approximations to the distribution of various types
of quadratic forms and expressions, including those in singular
Gaussian and in elliptically contoured
random vectors are proposed. In the normal
case, the moments are obtained recursively from the
cumulants and the distribution of positive definite quadratic
forms is approximated by means of two and three-parameter gamma-type
distributions. Approximations to the density functions of Hermitian
quadratic forms in normal vectors and quadratic forms in order
statistics from a uniform population are provided as well. |
||
TI_27_0 |
Muthukumarana,
Saman |
University of Manitoba |
Title |
Bayesian
Methods with Applications |
|
This session
will highlight the use of Bayesian modelling and inferential methods
in discovering genetic associations with diseases, image analysis,
studying populations of animals and sports. Bayesian regression
tree models, latent ancestral tree models, semi-parametric Bayesian
methods using Dirichlet process and Bayesian models
for photographic identification in animal populations are
discussed. |
||
TI_27_4 |
Muthukumarana,
Saman |
University of Manitoba |
Title |
Model Based
Estimation of Baseball Batting Metrics |
|
We consider
the modeling of batting outcomes of baseball batters using a weighted
likelihood approach and a semi-parametric Bayesian approach. The weighted
likelihood allows the other batters to contribute to the inference so that
the relevant information they contain is not lost and the weights are
determined based on their dissimilarities with the target batter. Minimum
Averaged Mean Squared Error (MAMSE) weights are used as the likelihood
weights. We then propose a semi-parametric Bayesian approach based
on Dirichlet process that enables the borrowing information across
batters. We demonstrate and compare these approaches using 2018 Major League
Baseball data |
||
TI_28_0 |
Nayak, Tapan |
George Washington University |
Title |
Protection
of Respondents' Privacy and Data Confidentiality |
|
Protecting respondent’s
privacy and data confidentiality has become a very important topic in
recent years. This session is devoted to discussing recent developments in
this area. |
||
TI_28_4 |
Nayak, Tapan |
George Washington University |
Title |
Discussion |
|
I shall
present some concluding remarks on protecting respondent’s privacy and data
confidentiality. |
||
TI_22_1 |
Ng, Hon
Keung Tony |
Southern Methodist University |
Title |
Improved
Techniques for Parametric and Nonparametric Evaluations of the First-Passage Time
of Degradation Processes |
|
Determining
the first-passage time (FPT) distribution is an important topic in
reliability analysis based on degradation data because FPT distribution
provides some valuable information on the reliability characteristics. In
this paper, we propose some improved techniques based on saddlepoint
approximation to improve upon some existing methods to approximate the FPT
distribution of degradation processes. Numerical examples and Monte Carlo
simulation studies are used to illustrate the advantages of the proposed
techniques. The limitations related to the improved techniques are
discussed and some possible solutions to these limitations are proposed.
Concluding remarks and practical recommendations are provided based on the
results. |
||
TI_32_0 |
Ng, Hon
Keung Tony |
Southern Methodist University |
Title |
Statistical
Models and Methods for Analysis of Reliability and Survival Data |
|
This session
focus on the statistical methodologies for analyzing different kinds
of reliability and survival data in industrial and
medical studies. These methods are important to reliability
engineers and medical researchers because they make the extraction
of lifetime characteristics possible through suitable statistical analysis and lead
to better decision making. |
||
TI_4_4 |
Nguyen, Yet |
Old Dominion University |
Title |
A
histogram-Based Method for False Discovery Rate Control in Two Independent
Experiments |
|
In this
talk, we present a new method to estimate and control false discovery rate
(FDR) when identifying simultaneous signals in two independent experiments.
In one experiment, thousands or millions of features are tested for
significance with respect to some factor of interest. In a second experiment,
the same features are also tested for significance. Researchers are
interested in identifying simultaneous signals, i.e., features that are
significant in both experiments. We develop an FDR estimation and control
procedure that is a generalization of the histogram-based FDR estimation and
control procedure for one experiment. Asymptotic results and simulation
studies are shown to investigate performance of the proposed method and other
existing methods. |
||
TI_34_2 |
Nkurunziza, Sévérien |
University of Windsor |
Title |
Some
identities for the risk and bias of shrinkage-type estimators in elliptically
contoured distributions |
|
We consider
an estimation problem regarding the mean of a random matrix whose
distribution is elliptically contoured. In particular, we study the
properties of a class of multidimensional shrinkage-type estimators in the
context where the variance-covariance matrix of the shrinking
random component is the sum of two Kronecker products. We present
some identities for computing some mixed moments as well as two general
formulas for the bias and risk functions of shrinkage-type estimators. As a
by-product, we generalize some identities established in Gaussian sample
cases for which the
shrinking random component is represented by a single
Kronecker product. |
||
TI_36_2 |
Nolan, John |
American University |
Title |
Multivariate
Generalized Logistic Laws |
|
Multivariate Fréchet laws
are a class of extreme value distributions that exhibit heavy tails and
directional dependence controlled by an angular measure. Multivariate
generalized logistic laws are a recently described sub-class that are dense
in a certain sense. It is shown that these laws are related
to positive multivariate sum stable laws, which gives a way to simulate from
these laws. The corresponding angular measure density is described, and
expressions for the density of
the distribution are given. |
||
TI_13_4 |
Olufadi, Yunusa |
University of Memphis |
Title |
EM Bayesian
variable selection for clustered discrete and continuous outcomes |
|
Feature
selection for Gaussian and non-Gaussian linear model is common in literature.
However, to our knowledge, there is scant report on clustered discrete and
continuous outcomes that are highly dimensional. Mixed outcomes data of this
kind are becoming increasingly common in developmental toxicity (DT) studies
and several other studies. In toxico-epigenomics
study for example, interest might be to extract biomarkers of DT or detect
new biomarkers of DT. We develop a Bayesian hierarchical modeling procedure
to guide both the estimation and efficient extraction of the most useful
features. |
||
TI_30_0 |
Oluyede,
Broderick |
Georgia Southern University |
Title |
Copulas, Informational
Energy, Exponential Dominance and Uncertainty for Generalized and
Multivariate Distributions |
|
Copulas, exponential
dominance and uncertainty for generalized distributions are explored and
comparisons via informational energy functional and differential
entropy are presented in this session. More importantly, the first talk
deals with stochastic dominance and bounds for cross-discrimination and
uncertainty measures for weighted reliability functions. In
the second talk, new generalized
distributions are developed. In the third talk, change
point model for high correlated multivariate data via copula
conditional distribution, principal component analysis (PCA) and functional
PCA is presented. Finally, the last presentation deals
with a class of stochastic SEIRS epidemic dynamic models. |
||
TI_30_1 |
Oluyede,
Broderick |
Georgia Southern University |
Title |
Informational
Energy, Stochastic Inequalities and Bounds for Weighted Weibull-Type
Distributions. |
|
In this
talk, generalized distributions that are weighted distributions are
presented. Inequalities and dominance, uncertainty and
informational measures for weighted and parent generalized
Weibull-type distributions are developed. Comparisons of the weighted
and parent generalized Weibull-type distributions via
informational energy function and the differential entropy are
presented. Moment-type and stochastic inequalities as well as bounds
for cross-discrimination and uncertainty measures in weighted
and parent life distribution functions and related reliability
measures are given. |
||
TI_31_0 |
Omolo,
Bernard |
University of South Carolina – Upstate |
Title |
Statistical Methods for High‐Dimensional Data Analysis: Application to Genomics |
|
|
||
TI_31_1 |
Omolo,
Bernard |
University of South Carolina – Upstate |
Title |
A
Model-based Approach to Genetic Association Testing in Malaria Studies |
|
In this
study, we propose a two-step approach to genetic association testing in
malaria studies in a GWAS setting that may enhance the power of the tests, by
identifying the underlying genetic model first before applying the
association tests. This is performed through tests of significance of a given
genetic effect, noting the minimum p-values across all the models and the
proportion of tests that a given genetic model was deemed the best, using simulated
data. In addition, we fit generalized linear models for the genetic effects,
using case-control genotype data from Kenya, Gambia and Malawi, available
from MalariaGEN®. |
||
TI_1_2 |
Oraby,
Tamer |
University of Texas - Rio Grande Valley |
Title |
Modeling
Progression of Co-Morbidity Using Bivariate Markov Chains |
|
In
this work, we use bivariate Markov Chain (MC) to model the progression of two
diseases or morbidities, like obesity and diabetes, and the correlation
between both processes. We postulate that the MC has rates of transition that
are dependent on a set of covariate, like age and
gender as well as treatment. The data includes individuals who are dependent
due to familial relationship. We will present the estimation of the model’s
parameters and discuss its goodness of fit. |
||
TI_18_3 |
Otunuga
, Olusegun |
Marshall University |
Title |
Closed form
probability distribution of number of infections at a given time in a
stochastic SIS epidemic model |
|
We study the
effect of external fluctuation in the transmission rate of certain diseases
and how this perturbation affects the distribution of the number of
infections over time. To do this, we introduce random noise in the
transmission rate in a deterministic SIS model and study how the number of
infections behaves over time. The closed form probability distribution of the
number of infections at a given time in the resulting stochastic SIS epidemic
model is derived. Using the Fokker-Planck equation, we reduce the differential
equation governing the number of infections to a generalized Laguerre
differential equation. The distribution is demonstrated using U.S. influenza
data. |
||
TI_23_1 |
Oyamakin S.
O. |
Universidade de
São Paulo |
Title |
Some New
Nonlinear Growth Models For Biological Processes
based on Hyperbolic Sine Function |
|
In
this paper, we propose maximum a posteriori (MAP) estimators
for the parameters of some survival distributions, which have a
simple closed-form expression. In principle, we focus on the Nakagami distribution, which plays an essential role
in communication engineering problems, particularly to model fading of radio
signals. Moreover, we show that the obtained results can be extended to
other survival probability distributions, such as the gamma and generalized
gamma ones. Numerical results reveal that the MAP estimators
outperform the existing estimators and produce almost unbiased
estimates even for small sample sizes. Our applications are driven by
embedded systems, which are commonly used in communication engineering.
Particularly, they can consist of an electronic system inside a
microcontroller, which can be programmed to maintain communication between a
transmitting antenna and mobile antennas, which are operating at the same
frequency. In this context, from the statistical point of view,
closed-form estimators are needed, since they are embedded in mobile devices
and need to be sequentially recalculated at real time. |
||
TI_6_4 |
Ozdemir, Senay |
Afyon Kocatepe University |
Title |
Combining
Heavy-Tailed Distributions and Empirical
Likelihood method for Linear Regression Model |
|
Empirical
likelihood (EL) estimation method proposed by Owen (1991)
is one of the nonparametric methods to estimate the
parameters of a linear regression model. In EL method an
EL function is maximized under some constraints formed using the likelihood
scores under normally distributed errors. In this paper, an
alternative empirical likelihood (EL) estimator for the parameter
vector of a linear regression model is proposed using the score functions of
some popular heavy tail distributions as the constraints
in the EL estimation method. Our numerical
studies show that, when data set is subject to heavy- tailedness, the performance of the proposed EL estimator
is remarkably superior to the performance of the EL estimator
obtained under normally distributed error terms . |
||
TI_32_4 |
Pal, Suvra |
University of Texas at Arlington |
Title |
A New
Estimation Algorithm for a Flexible Cure Rate Model |
|
In this
talk, I will first present a flexible cure rate model that contains the
mixture cure rate model and promotion time cure rate model as special cases.
For the estimation of the model parameters, I will present the results of the
well-known EM algorithm and then discuss some of the issues associated with
the EM algorithm. To circumvent these issues, I will present a new
optimization procedure based on non-linear conjugate gradient (NCG)
algorithm. Through a simulation study, I will show the advantages of NCG
algorithm over the EM algorithm. |
||
TI_41_2 |
Pal, Subhadip |
University of Louisville |
Title |
A Bayesian
Framework for Modeling Data on the Stiefel Manifold. |
|
Directional
data emerges in a wide array of applications, ranging from atmospheric
sciences to medical imaging. Modeling such data, however, poses unique
challenges by virtue of their being constrained to non-Euclidean spaces like
manifolds. Here, we present a Bayesian framework for inference on
the Stiefel manifold using the
Matrix Langevin distribution. Specifically, we propose a novel
family of conjugate priors and establish a number of theoretical properties
relevant to statistical inference. Conjugacy enables translation of
these properties to their corresponding posteriors, which we exploit to
develop the posterior inference scheme. For the implementation of the
posterior computation, including the posterior sampling, we adopt a novel
computational procedure for evaluating the hypergeometric function of matrix
arguments that appears as normalization constants in the relevant
densities. |
||
TI_18_2 |
Panorska,
Anna K. |
University of Nevada, Reno |
Title |
Discrete
Pareto Distributions, Butterfly Diet Breadth, and Climate Change |
|
We propose a
new discrete distribution with finite support, which generalizes truncated
Pareto and beta distributions as well as uniform and Benford’s
laws. We present its fundamental properties and consider
parameter estimation. We include an illustration of the applications of
this new stochastic model in ecology. |
||
TI_37_3 |
Pararai,
Mavis |
INDIANA UNIVERSITY OF PENNSYLVANIA |
Title |
The Weibull
Linear Failure Rate Distribution and Its Applications |
|
A new distribution
called Weibull Linear Failure distribution is introduced and its properties
are explored. The properties of this new distribution and its sub models will
be discussed. Some statistical properties of the proposed distribution and
maximum likelihood estimation of parameters are discussed. A simulation study
to examine the bias and mean square error of the maximum likelihood
estimators for each parameter is presented. Finally, applications of the
model using a real data set is presented to illustrate how useful the model
is. |
||
TI_38_0 |
Peng, Hanxiang |
Binghamton University |
Title |
Empirical
Likelihood |
|
The session
addresses topics centered around the empirical likelihood approach. |
||
TI_38_1 |
Peng, Hanxiang |
Indiana University-Purdue University
Indianapolis |
Title |
Maximum
empirical likelihood estimation in U-statistics based general estimating
equations. |
|
In this
talk, we discuss maximum empirical likelihood estimates (MELE's) in
U-statistics based general estimating equations. Our approach is the jackknife
empirical likelihood (JEL). We derive the estimating equations for
MELE's and provide asymptotic normality. We provide a class of
MELE's which have less computational burden than the usual MELE's and
can be implemented using existing software. We show that the MELE's are
efficient. We present several examples for constructing efficient
estimates for moment based distribution characteristics
in the presence of side information. In the end, we report some simulation
results. |
||
TI_13_2 |
Peng, Hanxiang |
Indiana University-Purdue University
Indianapolis |
Title |
An Empirical
Likelihood Approach of Testing of Multivariate Symmetries |
|
We propose
several empirical likelihood tests for testing spherical symmetry, rotational
symmetry, antipodal symmetry, coordinate-wise symmetry, and exchangeability.
We construct the tests by exploiting the characterizations of these
symmetries. The jackknife empirical likelihood for vector U-statistics are
employed to incorporate side information. We exhibit that the tests are
distribution free and asymptotically chi-square distributed. We report
some simulation results about the numerical performance of the tests. |
||
TI_26_0 |
Peng, Jianan |
Acadia University |
Title |
Generalized
and Fiducial Inference with Applications |
|
Generalized
inference, introduced by Weerahandi, has many
applications. Fiducial inference, initiated by Fisher, is
resurrecting to a new life, mainly due to Hanning and
other researchers. In this session we have two talks (including the one
by Weerahandi) on generalized inference and two talks on
(generalized) fiducial inference. |
||
TI_26_4 |
Peng, Jianan |
Acadia University |
Title |
Successive Comparisons
for One-way Layout under Heteroscedasticity |
|
Suppose that
k (k>2) treatments in a one-way layout are ordered
in a certain way. For example, the treatments may be increasing dose levels
of a drug in dose response studies. The experimenters may be interested
in the successive comparisons of the treatments. In this talk, we
consider the simultaneous confidence intervals for the successive comparisons
under heteroscedasticity. We propose several methods, including the maxT method, the minP method,
and the generalized fiducial confidence intervals, among
others. We show that the generalized fiducial confidence
intervals have correct coverage probability asymptotically. A
simulation study and a real data example are given to illustrate the proposed
procedures. |
||
TI_11_1 |
Peng,
Stephen |
Georgetown University |
Title |
A Flexible
Univariate Autoregressive Time-Series Model for Dispersed Count Data |
|
Integer-valued
time series data have an ever-increasing presence in various applications and
need to be analyzed properly. While a Poisson autoregressive (PAR) model
would seem like a natural choice to model such data, it is constrained by
the equi-dispersion assumption. Hence, data
that are over- or under-dispersed are improperly modeled, resulting in
biased estimates and inaccurate forecasts. This work (coauthored by Stephen
Peng and Ali Arab) instead develops a flexible integer-valued autoregressive
(INAR) model for count data that contain over- or under-dispersion. Using the
Conway-Maxwell-Poisson (COM-Poisson or CMP) distribution and related
distributions as motivation, we develop a first-order
sum-of-Conway-Maxwell-Poisson autoregressive (SCMPAR(1)) model that will
instead offer a generalizable construct that captures the PAR, negative
binomial AR (NBAR), and binomial AR (BAR) models respectively as special
cases, and serve as an overarching representation connecting these three
special cases through the dispersion parameter. We illustrate the SCMPAR
model's flexibility through simulated and real data examples. |
||
TI_17_2 |
Phoa,
Frederick |
Academia Sinica |
Title |
A systematic
construction of cost-efficient designs for order-of-addition experiments |
|
An
order-of-addition (OofA) experiment aims at
investigating how the order of factor inputs affects the experimental
response, which is of great interest in clinical trials and industrial
processes. Recent studies on the OofA designs
focused on their properties of algebraic optimality rather than
cost-efficiency. In this talk, we propose a systematic construction on the
cost-efficient designs of the OofA experiments,
which each pair of level settings from two different factors appears exactly
once. Furthermore, unlike recent studies on OofA
experiments, our designs can handle experimental factors with more than one
level. Notice that the use of placebo or the choice of different does reveal
the practicality of our designs in clinical trials for example. |
||
TI_33_0 |
Pigeon,
Mathieu |
Université du
Québec à Montréal (UQAM) , Canada |
Title |
Recent
developments in predictive distribution modelling with applications in
insurance |
|
|
||
TI_23_3 |
Piperigou,
Violetta |
University of Patras, Greece |
Title |
Maximum
Likelihood Estimators for a Class of Bivariate Discrete Distributions |
|
"The
method of maximum likelihood (ML) yields estimators which, asymptotically,
are normally distributed, unbiased and with minimum variance. In this method,
computational difficulties are encountered when families of univariate
discrete distributions are considered such as convolutions and compound
distributions. For these types of distributions the
probabilities are given through recurrence relations and consequently the ML
estimators require iterative procedures to be obtained. It has been shown
that in a large class of univariate discrete distributions, the ML equations
can be reduced by one, which is replaced by the first equation of the method
of moments. As examples of two-parameter distributions the Charlier and the Neyman are
presented, where only a single equation need be solved iteratively to derive
the estimators. The parameterization used, when working with these
distributions, often leads to extremely high correlations of the ML
estimators. A reparameterization that reduces or eliminates such correlation
is desirable. If the MLE's are asymptotically uncorrelated the parameterization
is orthogonal. It is discussed such a reparameterization for a class of
discrete distributions, where one of the orthogonal parameters is the mean.
This class includes, among others, Delaporte and Hermite univariate distributions. These
results are extended to a class of bivariate discrete distributions and the
properties of MLE's are given. The case of a three-parameter bivariate
Poisson is extensively discussed and some examples of
applications are given." |
||
TI_47_4 |
Pokhrel,
Keshav P. |
University of Michigan-Dearborn |
Title |
Reliability
Models Using the Composite Generalizers of Weibull Distribution |
|
In this
article, we study the composite generalizers of Weibull distribution
using exponentiated, Kumaraswamy, transmuted and beta
distributions. The composite generalizers are constructed using
both forward and reverse order of each of these distributions. The
usefulness and effectiveness of the composite generalizers and their order of
composition is investigated by studying the reliability behavior of the
resulting distributions. Two sets of real-world data are analyzed using
the proposed generalized Weibull distributions. |
||
TI_27_2 |
Pratola,
Matthew |
The Ohio State University |
Title |
Adaptive
Splitting Bayesian Regression Tree Models for Image Analysis |
|
Bayesian
regression tree models are competitive with leading machine learning
algorithms yet retain the ability to capture uncertainties, making them
incredibly useful for many modern statistical applications where one requires
more than point prediction. However, a key limitation is the variable
split rules, which are determined using static candidates. This limits
the ability of the model to capture local sources of variation,
and increasing the number of candidates is computationally
burdensome. We introduce a novel adaptive strategy that replaces static
splits with a dynamic grid that allows the tree bases to adapt, thereby more
efficiently capturing patterns of local variation. Combined with a clever
dimension-reduction prior enables low-dimensional tree representations of
processes. We demonstrate these advances on an image analysis study
investigating beach visitor counts in San Diego. |
||
TI_34_0 |
Provost,
Serge |
The University of Western Ontario |
Title |
Recent Distributional Advances
Involving Population and Sample Moments |
|
This session
features novel advances in connection with the application
of certain moment-based
methodologies to data modeling, the approximation
of the distribution of quadratic
forms and the estimation of heavy-tailed distributions.
As well, a shrinkage-type estimator of the mean of an elliptically
contoured random vector is introduced. |
||
TI_34_1 |
Provost,
Serge |
The University of Western Ontario |
Title |
On recovering
sample points from their associated moments and certain
moment-based density estimation methodologies |
|
A theorem asserting that, given
the first n moments of a sample of size n, one can retrieve
the original n sample points, will be discussed. For
instance, this result entails that all the
information being available in a sample of size n is
contained in its first n moments, which substantiates the
utilization of sample moments in statistical modeling and
inference. Clearly, only a number of these n moments are useable in
practice. Certain density
estimation methodologies relying on such sample
moments shall be presented. |
||
TI_35_0 |
Qingcong
Yuan (org: Qian, Lianfen) |
University of Kentucky |
Title |
Recent
Advances in Analyzing Medical Data and Dimension Reduction |
|
This purpose
of this invited session is to disseminate most recent advances in analyzing
medical data and dimension reduction methods. Specifically
interests may be on modeling semi-competing risks
data, imputation methods for missing data and dimension
reduction. |
||
TI_17_3 |
Rha, Hyungmin |
Arizona State University |
Title |
A
probabilistic subset search (PSS) algorithm for optimizing functional data
sampling designs |
|
We study
optimal sampling times for functional data. Our main objective is to find the
best sampling schedule on the predictor time axis to precisely recover the
trajectory of predictor function and predict the scalar/functional response
through functional linear regression models. Three optimal designs are
considered: the schedule maximizing the precision of recovering predictor
function, the schedule best for predicting response, and the schedule
optimizing a user-defined mixture of the relative efficiencies of the two
objectives. We propose an algorithm that can efficiently generate nearly
optimal designs, and demonstrate that our approach
outperforms the previously proposed methods. |
||
TI_36_0 |
Richter,
Wolf-Dieter |
University of Rostock |
Title |
Multivariate
distributions |
|
Authors of this Session discuss a new methodology for evaluating probabilities and normalizing constants of probability distributions particular extreme value distributions that exhibit heavy tails and controlled directional dependenc construction and application of models connected with sumsand maxima of dependent Pareto components - the stochastic representation, simulation and dynamic geometric disintegration of (p_1,…,p_k)-spherical probability laws. |
||
TI_36_4 |
Richter,
Wolf-Dieter |
University of Rostock |
Title |
On (p_1,...,p_k)-spherical distributions |
|
The class of
(p_1, … , p_k)-spherical probability laws
and a method of simulating random vectors following such distributions
are introduced using a new stochastic vector representation. A
dynamic geometric disintegration method and a corresponding geometric measure
representation are used for generalizing the
classical Chi-square-, t- and F- distributions. Combining the principles
of specialization and marginalization gives rise to an effective method of
dependence modeling. |
||
TI_10_3 |
Samanthi
Ranadeera |
Central Michigan University |
Title |
On bivariate distorted copulas |
|
In this
talk, we propose families of bivariate copulas based on the distortions of
existing copulas. The beta and Kumaraswamy cumulative distribution
functions are employed to construct the proposed distorted copulas.
With the additional two parameters in the distributions, the distorted
copulas permit more flexibility in the dependence behaviors. Two theorems
linking the original tail dependence behaviors and those of the distorted
copula are derived for distortions that are asymptotically proportional to
the power transformation in the lower tail and the dual-power transformation
in the upper tail. Simulation results and an application to financial risk
management are presented. |
||
TI_45_4 |
Samanthi,
Ranadeera |
Central Michigan University |
Title |
Methods for
Generating Coherent Distortion Risk Measures |
|
In this
talk, we present methods for generating new distortion functions by utilizing
distribution functions and composite distribution functions. To ensure the
coherency of the corresponding distortion risk measures, the concavity of the
proposed distortion functions is established by restricting the parameter
space of the generating distribution. Closed-form expressions for risk
measures are derived for some cases. Numerical and graphical results are
presented to demonstrate the effects of parameter values on the risk measures
for exponential, Pareto and log-normal losses. In addition, we apply the
proposed distortion functions to derive risk measures for a segregated fund
guarantee. (This is a joint work with Jungsywan Sepanski, Central Michigan
University.) |
||
TI_12_4 |
Sanz-Alonzo,
Daniel |
University of Chicago |
Title |
Scalable
graph-based Bayesian semi-supervised learning |
|
The aim of
this talk is to present some new theoretical and methodological developments
concerning the graph-based, Bayesian approach to semi-supervised learning. I
will show suitable scaling of graph parameters that provably lead
to robust Bayesian solutions in the limit of large number of unlabeled data. The analysis relies on a careful choice of topology
and in the study of the spectrum of graph Laplacians. Besides
guaranteeing the consistency of graph-based methods, our theory explains the
robustness of discretized function space MCMC methods in semi-supervised
learning settings. |
||
TI_28_2 |
Sarathy, Rathindra |
Oklahoma State University |
Title |
Statistical
Basis for Data Privacy and Confidentiality |
|
Statistical disclosure
limitation methods are occasionally viewed as ad hoc methods,
providing no strong privacy or confidentiality guarantees. Although not
true, this has been the primary motivation for recent standards such as
differential privacy and their associated methods. In this talk, we explore
the statistical basis for data confidentiality and methods that satisfy
privacy and confidentiality requirements. We discuss the concepts underlying
differential privacy to provide a comparison, as well as the potential
utility trade-offs under both these frameworks. |
||
TI_37_0 |
Sarhan, Ammar |
Dalhousie University |
Title |
Generalization
of lifetime distributions |
|
Generalization
of lifetime distribution is one of the important tools in lifetime analysis.
Most of the commonly used lifetime distributions have monotonic hazard rate
functions. In applications, many data sets show non-monotonic shapes of the
hazard rates. In this session, some of the generalizations of lifetime
distributions will be discussed. |
||
TI_37_1 |
Sarhan, Ammar |
Dalhousie University |
Title |
A
new extension of the two-parameter bathtub hazard shaped distribution |
|
This
article proposes a new generalization of the two-parameter bathtub shaped
lifetime distribution, named the odd generalized exponential two-parameter
bathtub shaped. Statistical properties of the proposed distribution are
discussed. The maximum likelihood and Bayesian procedures are used to
estimate the model parameters and some of its reliability measures. To
discuss the applicability of the proposed distribution, two real
data sets are analyzed using different sampling scenarios. Simulations
study is provided to investigate the properties of the methods applied. |
||
TI_25_3 |
Sarkar, Shuchismita |
Bowling Green State University |
Title |
Finite
mixture modeling and model-based clustering for directed weighted networks |
|
A novel
approach relying on the notion of mixture models is proposed for modeling and
clustering directed weighted networks. The developed methodology can be used
in a variety of settings including multilayer networks. Computational issues
associated with the developed procedure are effectively addressed by the use
of MCMC techniques. The utility of the methodology is illustrated on the set
of experiments as well as applications to real-life data containing export
trade amounts for European countries. |
||
TI_24_1 |
Schafer,
Chad |
Carnegie Mellon University |
Title |
Astrostatistics in
the Era of LSST |
|
The Large
Synoptic Survey Telescope (LSST) will yield 15 Terabytes of data each evening
over a ten year period, revolutionizing
our understanding of the Universe. In this talk I will describe
some of the opportunities, focusing on the recurring challenges
when working with high-dimensional and noisy astronomical data. In
their raw form, these data are difficult to model, and assumptions that
may have been reasonable at small sample sizes could be revealed to
be inadequate by LSST-scale data. Such inference challenges provide
statisticians with opportunities to both contribute to science, and
to advance statistical methodology. |
||
TI_18_4 |
Schissler,
A. Grant |
University of Nevada |
Title |
On
Simulating Ultra High-Dimensional Multivariate Discrete Data |
|
It's
critical to conduct realistic Monte Carlo studies. This is problematic when data
are inherently multivariate and high dimensional. This situation appears
frequently in high-throughput biomedical experiments (e.g., RNA-sequencing).
Researchers, however, often resort to simulation designs that posit
independence --- greatly diminishing insights into the empirical operating
characteristics of any proposed methodology. To meet this gap, we propose a
procedure to simulate high-dimensional multivariate discrete distributions
and study its performance. We apply our method to simulate RNA-sequencing
data sets (dimension > 20,000) with negative
binomial marginals. |
||
TI_5_2 |
Schmegner,
Claudia |
DePaul University |
Title |
TX Family
and Horseshoe Priors |
|
Consider the
problem of estimating the vector of normal means θ= (θ1,...,θn) in the ultra-sparse
normal means model (yi|θi)∼N(θi,1) for i= 1,...,n. Horseshoe
priors are very at handling cases in which many components of θ are
exactly or approximately 0. The name “horseshoe” does not describe the shape
of the density of θi, but rather the shape of
the implied prior for the shrinkage coefficient associated with θi. We use the TX technique for generating
distributions to propose new classes of Horseshoe priors, investigate their
properties and compare their performances to those of the usual ones. |
||
TI_8_3 |
Sen, Ananda |
University of Michigan, Ann Arbor |
Title |
Honey I
Shrunk the Intercept |
|
In
logistic regression, separation occurs when a linear combination of
predictors perfectly discriminates the binary outcome. This is the premise of
the current discourse. Because finite valued maximum likelihood parameter
estimates do not exist under separation, Bayesian regressions with
informative shrinkage of the regression coefficients offer a suitable
alternative. Classical studies of separation imply that efficiency in
estimating regression coefficients may also depend upon the choice of
intercept prior, yet relatively little focus has been given on whether and
how to shrink the intercept parameter. Alternative prior distributions for
the intercept are proposed that down-weight implausibly extreme regions of
the parameter space, yielding regression estimates that are less sensitive to
separation. Through extensive simulation, differences across priors are
assessed using statistics measuring the degree of separation. Relative to
diffuse priors, the proposed priors generally yield more efficient estimation
of the regression coefficients themselves when the data are separated or
nearly so. Moreover, they are equally efficient in non-separated datasets,
making them suitable for default use. These numerical studies also highlight
the interplay between priors for the intercept and the regression
coefficients. Finally, the methodology is illustrated through implementation
on a couple of datasets in the biomedical context. |
||
TI_44_3 |
Shahzad,
Mirza Naveed |
University of Gujrat |
Title |
Singh-Maddala Distribution: A new candidate to analyze the
extreme value data by linear moment estimation |
|
Modeling,
accurate inference, and prediction of extreme events by probabilistic models
are very important in every field to minimize the damage as much as possible
due to extremes. To secure this useful purpose, Singh-Maddala
distribution is considered in this article as a new candidate for the
analysis of extreme events. The extreme value datasets are frequently
heavy-tailed, for such datasets method of L-moments
and method of TL-moments are proposed to estimate the parameters of the
distribution. The results of the simulation study and real dataset are
indicated that the estimates of the linear-moments are the least bias than
other methods. |
||
TI_43_2 |
Shao, Xiaofeng |
University of Illinois at Urbana
Champaign |
Title |
Inference
for change points in high dimensional data |
|
In
this talk, I will present some recent work on change point
testing and estimation for high dimensional data. In the case of
testing for a mean shift, we propose a new test which is based on
U-statistics and utilizes the self-normalization principle. Our test targets
dense alternatives in the high dimensional setting and involves no
tuning parameters. We show the weak convergence of a sequential U-statistic
based process to derive the pivotal limit under the null and also obtain the
asymptotic power under the local alternatives. Time
permitting, we illustrate how our approach can be used in combination
with wild binary segmentation to estimate the number and
location of multiple unknown change points. |
||
TI_42_2 |
Shay,
Garrett Charlie |
Brock University |
Title |
Probabilistic
and non-probabilistic methods of active learning for classification |
|
Active
learning is a useful learning process for classification. With a fixed size
of training data, an active classifier selects the most beneficial data to
learn from and achieves better classification accuracy than a passive
classifier. We discuss the methods of developing optimal active learning
processes, including both probabilistic and non-probabilistic ones. For a
comparison study, we adapt a probabilistic classifier obtained by logistic
regression, as well as a non-probabilistic classifier derived from an
estimated discriminant function. Performance of proposed active classifiers
is investigated under varying conditions and assumptions. Optimal two-stage
and sequential active classification has been developed. Monte Carlo simulations have shown improved
classification accuracy of the proposed active learning process compared to
passive learning process for all scenarios considered. |
||
TI_33_1 |
Shi, Peng |
University of Wisconsin-Madison |
Title |
Regression
for Copula-linked Compound Distributions with Applications in Modeling
Aggregate Insurance Claims |
|
In actuarial
research, a task of particular interest and importance is to predict the loss
cost for individual risks so that informative decisions are made in
various insurance operations such as underwriting, ratemaking, and
capital management. The loss cost is typically viewed to follow a
compound distribution where the summation of the severity variables is
stopped by the frequency variable. A challenging issue in modeling such
outcome is to accommodate the potential dependence between the number of
claims and the size of each individual claim. In this article, we
introduce a novel regression framework for compound distributions that uses a
copula to accommodate the association between the frequency and the severity
variables, and thus allows for arbitrary dependence between the two
components. We further show that the new model is very flexible and is
easily modified to account for incomplete data due to censoring or
truncation. The flexibility of the proposed model is illustrated using both
simulated and real data sets. In the analysis of granular claims data
from property insurance, we find substantive negative relationship
between the number and the size of insurance claims. In addition, we
demonstrate that ignoring the frequency-severity association could lead to
biased decision-making in insurance operations. |
||
TI_37_4 |
Sinha,
Sanjoy K. |
Carleton University |
Title |
Joint
modeling of longitudinal and time-to-event data with covariates subject to
detection limits |
|
In many
clinical studies, subjects are measured repeatedly over a fixed period of
time. Longitudinal measurements from a given subject are naturally correlated.
Linear and generalized linear mixed models are widely used for modeling the
dependence among longitudinal outcomes. In addition to the longitudinal data,
we often collect time-to-event data (e.g., recurrence time of a tumor) from
the subjects. When multiple outcomes are observed from a given subject with a
clear dependence among the outcomes, a natural way of analyzing these
outcomes and their associations would be the use of a joint model. I will
discuss a likelihood approach for jointly analyzing the longitudinal and
time-to-event data. The method would be useful for dealing with left-censored
covariates often observed in clinical studies due to limits of detection. The
finite-sample properties of the proposed estimators will be discussed using
results from a Monte Carlo study. An application of the proposed method will
be presented using a large clinical dataset of pneumonia patients obtained
from the Genetic and Inflammatory Markers of Sepsis (GenIMS)
study. |
||
TI_43_4 |
Sriperumbudur,
Bharath |
Penn State University |
Title |
Approximate Kernel PCA: Computational
vs. Statistical Trade-off |
|
Kernel
principal component analysis (KPCA) is a popular non-linear dimensionality
reduction technique, which generalizes classical linear PCA by finding
functions in a reproducing kernel Hilbert space (RKHS) such that the function
evaluation at a random variable X has maximum variance. Despite its
popularity, kernel PCA suffers from poor scalability in big data scenarios as
it involves solving an x n eigensystem leading to a computational
complexity of O(n^3) with n being the number of samples. To address this
issue, in this work, we consider a random feature approximation to kernel PCA
which requires solving an m x m eigenvalue problem and therefore has a computational
complexity of O(m^3), implying that the approximate method is
computationally efficient if m<n with m being the number of random
features. The goal of this work is to investigate the trade-off between
computational and statistical behaviors of approximate KPCA, i.e., whether
the computational gain is achieved at the cost of statistical efficiency. We
show that the approximate KPCA is both computationally and statistically
efficient compared to KPCA in terms of the error associated with reconstructing
a kernel function based on its projection onto the
corresponding eigenspaces. Depending on the eigenvalue decay behavior of
the covariance operator, we show that only n^{2/3} features (polynomial
decay) or \sqrt{n} features (exponential decay) are needed to match the
statistical performance of KPCA, which means without losing statistically,
approximate KPCA has a computational complexity of O(n^2) or O(n^{3/2})
depending on the eigenvalue decay behavior. We also investigate the
statistical behavior of approximate KPCA in terms of the convergence
of eigenspaces wherein we show that only \sqrt{n} features are
required to match the performance of KPCA and if fewer than
\sqrt{n} features are used, then approximate KPCA has a worse statistical
behavior than that of KPCA. |
||
TI_7_1 |
Su, Jianxi |
Purdue University |
Title |
Full-range
tail dependence copulas for modeling dependent insurance and financial
data |
|
Copulas are
important tools when it comes to formulating models for multivariate data
analysis. An ideal copula should conform to a wide range of
problems at hand by allowing for symmetricity and asymmetricity as well as
for varying strengths of tail dependence. The copulas I plan to
introduce are exactly such in that they satisfy all the
aforementioned criteria. Specifically, in this talk, I shall introduce a
class of full-range tail dependence copulas, which have proved to
be very useful for modeling dependent financial/insurance data. I shall
discuss the key mechanisms for constructing full-range tail dependence
copulas and some fundamental properties of these structures. Future
research directions will be also discussed. |
||
TI_19_1 |
Subha, R.
Nair |
HHMSPB NSS College for Women |
Title |
A
generalization to the log-Weibull distribution and its applications in cancer
research |
|
Through this
paper we consider a generalization of a log-transformed version of the
inverse Weibull distribution of Keller et al (Reliability Engineering,
1982).The theoretical properties of the distribution are investigated in detail
including expressions for its cumulative distribution function, reliability
function, hazard rate function, quantile function, characteristic function,
raw moments, percentile measures, entropy measures, median, mode etc. Some
reliability aspects as well as the distribution and moments of order
statistics are also discussed. The maximum likelihood estimation of the
parameters of the proposed distribution is attempted and certain applications
of the distribution in modelling data sets arising from industrial as well as
bio-medical cancer related backgrounds are illustrated using real life
examples. Further, the asymptotic behaviour of the
estimators are examined with the help of simulated data sets. |
||
TI_45_1 |
Sun, Ning |
Western University |
Title |
The Pareto
Optimal Design for Earthquake Index-based Insurance Based on Exponential
Utilities |
|
We obtain a
necessary condition for the Pareto optimal earthquake index-based insurance
design based on the decomposition of catastrophe risks. Moreover, we derive
the explicit form of this Pareto optimal insurance design under the
exponential utility assumption. Besides, minimization of the basis risk for
this index-based insurance design is also discussed. Finally, we illustrate
how a typical design of such an insurance product could be obtained from the
observed data using historical economic losses due to earthquakes in mainland
China. |
||
TI_17_1 |
Sung, Chih-Li(Charlie) |
Michigan State University |
Title |
Exploiting
variance reduction potential in local Gaussian process search for large
computer experiments |
|
Gaussian
process models are commonly used as emulators for computer experiments.
However, developing a Gaussian process emulator can be computationally
prohibitive when the number of experimental samples is even moderately large.
Local Gaussian process approximation (Gramacy and Apley (2015)) was proposed as an accurate and
computationally feasible emulation alternative. Constructing local
sub-designs specific to predictions at a particular location of interest
remains a substantial computational bottleneck to the technique. In this
talk, two computationally efficient neighborhood search limiting techniques
are introduced, and two examples demonstrate that the proposed methods indeed
save substantial computation while retaining emulation accuracy. |
||
TI_13_3 |
Szabo, Aniko |
Medical College of Wisconsin |
Title |
Semi-parametric
Model for Exchangeable Clustered Binary Outcomes |
|
Dependent or
correlated binary data occur in repeated measurement studies, longitudinal experiments,
teratological risk assessment, and other important experimental studies. Both
parametric and non-parametric models have been proposed for dose-response
experiments with such data. In this work we propose semi-parametric models
that combine a non- parametric baseline describing the within-cluster
dependence structure with a parametric between-group effect. We develop an
Expectation Minimization Minorize-Maximize algorithm to fit the model, apply
it to several datasets, and compare the semi-parametric estimates of joint
probabilities from different dose levels with corresponding GEE and
non-parametric estimates. |
||
TI_36_1 |
Takemura, Akimichi |
Shiga University |
Title |
Holonomic gradient method for evaluation of multivariate probabilities |
|
In2011 we developed a new methodology "holonomic gradient method"(HGM), which is useful for evaluation of probabilities and normalizing constants of probability distributions. Since then we have applied
HGM to various problems, including distribution of roots of Wishart matrices, orthant probabilities and some distributional problems related to wireless communication.
In this talk we give an introduction of HGM
and present applications of the method to evaluation of multivariate probabilities. |
||
TI_18_1 |
Tomoaki,
Imoto |
University of Shizuoka |
Title |
Bivariate
GIT distribution |
|
In this
talk, we propose a bivariate discrete distribution, which is derived from a
first passage point of the two dimensional random
walk on lattice. This distribution is seen as a convolution of bivariate
binomial and negative binomial distributions. Moreover its
marginal distributions are also seen as a convolution of univariate binomial
and negative binomial distributions and can model both over- and
under-dispersion relative to Poisson distribution. From these
properties, the proposed distribution is a flexible model for its dispersion
and correlation. The other stochastic processes and operations derived
for the proposed distribution are also discussed in this talk. |
||
TI_40_4 |
Torkashvand,
Elaheh |
University of Waterloo |
Title |
Spatial
Dynamical Autocorrelation of fMRI Images |
|
The concept
of dynamical correlation is extended to functional time series.
The dynamical autocorrelation is a measure of functional autocorrelation of a
functional time series. The proposed method can be applied to true,
i.e., continuously measured, functional data or possibly to approximated
functional data, for example after applying a smoothing step to observations
measured in discrete time. An estimator of the dynamical autocorrelation is
presented based on the Karhunen-Loève expression
of time series. The central limit theorem is applied to obtain the
asymptotic distribution of the proposed estimator of the dynamical
autocorrelation under the assumption of m-dependency. |
||
TI_4_1 |
Vinogradov,
Vladimir |
Ohio University |
Title |
On two
extensions of Feller-Spitzer class of Bessel densities |
|
We introduce
two different extensions of Feller-Spitzer class of Bessel densities. Various
properties of members of these classes are derived and compared. |
||
TI_21_1 |
Wang, Haiying |
University of Connecticut |
Title |
Optimal
Subsampling: Sampling with Replacement vs Poisson Sampling |
|
Faced with
massive data, subsampling is a commonly used technique to improve computational
efficiency, and
using nonuniform subsampling distributions is an effective approach
to improve estimation efficiency. In the context of maximizing a general
target function, this paper derives optimal subsampling distributions for
both subsampling with replacement and Poisson subsampling. The optimal
subsampling distributions minimize functions of the subsampling approximation
variances. Furthermore, they provide deep insights on the theoretical
difference and similarity between subsampling with replacement and Poisson
subsampling. Practically implementable algorithms are proposed based on the
optimal structure results, which are evaluate by both theoretical and
empirical analysis. |
||
TI_40_0 |
Wang, Shan |
University of San Francisco |
Title |
Recent
Development in Nonparametric and Semiparametric Techniques |
|
In recent
years, semiparametric and nonparametric models have become a popular choice
in many areas of statistics since they are more realistic and flexible than
parametric models. This invited session focuses on the recent development in
these methods and their applications. |
||
TI_40_1 |
Wang, Shan |
University of San Francisco |
Title |
Estimation
of SEM with MELE approach |
|
In this work,
we construct improved estimates of linear functionals of a probability
measure with side information using an easy empirical likelihood approach. We
allow constraint functions, which determine side information to grow with the
sample size and the use of estimated constraint functions. This is the case
in applications to the structural equation models. In one case the random
errors are modeled to be independent with covariates. In another case, we
estimate the model with side information of known marginal medians for
observed variable. We report some simulation results on efficiency gain |
||
TI_41_0 |
Wang, Xia |
University of Cincinnati |
Title |
Bayesian
Modeling of Dependent Non-Gaussian Data |
|
Dependent
non-Gaussian data keep posing new challenges by its rapidly increasing
data size and structure complexity. Bayesian perspectives provide feasible
and flexible approaches. The session presents the new methods
development on Bayesian modeling, computation and model comparison related
to semi-continuous data, directional data, intensity data and ordinal data. |
||
TI_41_4 |
Wang, Xia |
University of Cincinnati |
Title |
Power Link
Functions in Ordinal Regression Models with Gaussian Process Priors |
|
Link
functions and random effects structures are the two important components in
building flexible regression models for dependent ordinal data. The power
link functions include the commonly used links as special cases but have an
additional skewness parameter making the probability response
curves adaptive to the data structure. It overcomes the arbitrary
symmetry assumption imposed by the commonly used logistic or probit links as well as the
fixed skewness in the complementary log-log or log-log
links. By employing Gaussian processes, the regression model can
incorporate various dependence structures in the data, such as temporal and
spatial correlations. The full Bayesian estimation of the proposed
model is conveniently implemented through Rstan. Extensive
simulation studies are carried out for discussion in model
computation, parameterization, and evaluation in terms of estimation
bias and overall model performance. The proposed model is applied to
the PM2.5 data in Beijing and the Berberis thunbergii abundance data in New England. The
results suggest the proposed model leads to important improvement in
estimation and prediction in modeling dependent ordinal response data. |
||
TI_46_2 |
Wang, Yueyao |
Virginia Tech |
Title |
Building
Degradation Index Using Multivariate Sensory Data with Variable Selection |
|
The modeling
and analysis of degradation data have been an active research area in
reliability and system health management. Most of the existing research
on degradation modeling assumes that the degradation index is provided.
However, there are situations that a degradation index is not available. For
example, modern sensor technology allows one to collect multi-channel sensor
data that are related to the underlying degradation process, which may not be
sufficiently represented by any single channel. Without a degradation index,
most existing cannot be applied. Thus, constructing a degradation index is a
fundamental step in degradation modeling. In this paper, we develop a general
approach for degradation index building based on an additive-nonlinear model
with variable selection. The approach is more flexible than a linear
combination of sensor signals, and it can automatically select the most
informative variables to be used in the degradation index. Maximum likelihood
estimation with adaptive group penalty is developed based on training
dataset. We use extensive simulations to validate the performance of the
developed method. The NASA jet engine sensor dataset is then used for
illustrations. The paper is concluded with some discussions and areas for
future research. This is joint work with I-Chen Lee and Yili Hong. |
||
TI_26_1 |
Weerahandi, Samaradasa |
X-Techniques, Inc, New York |
Title |
Generalized Inference
with Application to Business and Clinical Analytics |
|
In
applications, such as the ANOVA under unequal error variances, and Mixed
Models, the classical approach can produce only asymptotic tests and
confidence intervals for parameters of interest. This article reviews
the notions and methods in Generalized Inference and show how such inferences
can be based on exact probability statements. The approach is illustrated by
an application concerning Variance Components in Mixed Models having
applications in Business and Clinical Analytics. In such problems one
may wish to use the Bayesian approach, but in doing so you need a prior. In
the absence of a proper prior, Bayesian inferences are highly sensitive to
the non-informative prior family, choice of hyper-parameters, and could take
days to run models involving large number of parameters, such as that
involving consumer response estimation TV ads by County or DMA. The task is
easily accomplished by using the BLUP in Mixed Models with parameters tackled
by the approach of Generalized Inference. It will also be argued that,
the generalized approach can reproduce Parametric Bootstrap inference
problems when exist and works even when Parametric Bootstrap approach fails.
Moreover, one can reproduce equivalent generalized tests and generalized
confidence intervals for any generalized fiducial inference method without
having to treat fixed parameters as variables. |
||
TI_12_2 |
Womack,
Andrew |
Indiana University |
Title |
Horseshoes, Shape
Mixing, and Ultra-sparse Locally Adaptive Shrinkage |
|
Locally
adaptive shrinkage in the Bayesian framework provides one method for
continuously relaxing discrete selection problems. We present extensions of
the Horseshoe prior framework that arise from mixing both the scale and shape
parameters from the hierarchical specification of the model. Mixing on the
shape parameter provides both better spike and slab behavior as well as a way
to model ultra-sparse signals. The reduction in risk comes from a better
approximation of the hard thresholding rule that gives rise to discrete
selection. As with other local-global priors, these models have non-convex,
multimodal posterior distributions. This multi-modality, especially from the
infinite spike at the origin, creates issues for fitting the models using out
of the box methods like Gibbs samplers or EM algorithms. To address these
problems, we implement a new MCMC algorithm that includes mode switching
jumps that are akin to doing Stochastic Search Variable Selection for
continuous local-global shrinkage models. |
||
TI_45_2 |
Wu, Jiang |
Central University of Finance and
Economics |
Title |
A Financial
Contagion Measure Based on the Maximal Tail Dependence Coefficient for
Financial Time Series |
|
A novel
financial contagion measure is proposed. Itis based on the maximal tail
dependence(MTD) coefficient of the financial time series of returns.
Estimators for this contagion measure are provided for popular families of
copulas, and a simulation study is employed to analyze the performance of
these estimators. Applications are presented to illustrate the use of spatial
contagion measures for determining asymmetric linkages in financial markets,
and for creating clusters of financial time series. The methodology is also
useful for selecting diversified portfolios of asset returns |
||
TI_43_3 |
Wu, Wenbo |
University of Texas |
Title |
Simultaneous
estimation for semi-parametric multi-index models |
|
Estimation
of a general multi-index model comprises determining the number of linear combinations
of predictors (structural dimension) that are related to the response,
estimating the loadings of each index vector, selecting the active
predictors, and estimating the underlying link function. These objectives are
often achieved sequentially at different stages of the estimation process. In
this study, we propose a unified estimation approach under a semi-parametric
model framework to attain these estimation goals simultaneously. The proposed
estimation method is more efficient and stable than many existing methods
where the estimation error in the structural dimension may propagate to the
estimation of the index vectors and variable selection stages. A detailed
algorithm is provided to implement the proposed method. Comprehensive
simulations and a real data analysis illustrate the effectiveness of the
proposed method. |
||
TI_20_2 |
Wu, Yichao |
UIC |
Title |
Nonparametric
estimation of multivariate mixtures |
|
A
multivariate mixture model is determined by three elements: the number of
components, the mixing proportions and the component distributions. Assuming
that the number of components is given and that each mixture component
has independent marginal distributions, we propose a non-parametric
method to estimate the component distributions. The basic idea is to
convert the estimation of component density functions to a problem of
estimating the coordinates of the component density functions with
respect to a good set of basis functions.
Specifically, we construct a set of basis functions by
using conditional density functions and try to recover the coordinates
of component density functions with respect to this set
of basis functions. Furthermore, we show that our
estimator for the component density functions is consistent. Numerical
studies are used to compare our algorithm with other existing
non-parametric methods of estimating component distributions under the
assumption of conditionally independent marginals. |
||
TI_16_2 |
Xia, Aihua |
University of Melbourne |
Title |
Probability
Density Quantiles: Their Divergence from or Convergence to
Uniformity |
|
For each
continuous distribution with square-integrable density, there is a
probability density quantile (pdQ), which
is an absolutely continuous distribution on the unit interval. The pdQ is representative of a location-scale family and
carries essential information regarding shape and tail behavior of the
family. We demonstrate that questions of convergence and divergence regarding
shapes of distributions can be carried out in a location- and scale-free
environment via their pdQs. We also establish
a map of the Kullback-Leibler divergences
from uniformity of these pdQs. Some numerical
calculations point to a phenomenon that each application of the pdQ mapping seems to lower the Kullback-Leibler divergence from uniformity and
hence we obtain new fixed point theorems for repeated
applications of the pdQ mappings. This is
a joint work with Robert G. Staudte. |
||
TI_38_4 |
Xie, Yanmei |
University of Toledo |
Title |
Analysis of nonignorable
missingness in risk factors for hypertension |
|
The
prevention of hypertension is a critical public health challenge across
the world. In the current study, we propose a novel
empirical-likelihood-based method to estimate the effect of potential risk
factors for hypertension. We adopt a semiparametric perspective on regression
analysis with nonignorable missing covariates, which is motivated by the
alcohol consumption and blood pressure data from the US National Health and
Nutrition Examination Survey. The missingness in alcohol consumption is
missing not at random since it is likely to depend largely on alcohol
consumption itself. To overcome the difficulty of handling this nonignorable
covariate-missing data problem, we propose a unified approach to constructing
a system of unbiased estimating equations, which naturally incorporate the
incomplete data into the data analysis, making it possible to gain estimation
efficiency over complete case analysis. Our analyses demonstrate that
increased alcohol consumption per day is significantly associated with
increased systolic blood pressure. In addition, having a higher body mass
index and being of older age are associated with a significantly higher risk
of hypertension. |
||
TI_48_3 |
Xu, Mengyu |
University of Central Florida |
Title |
Simultaneous
Prediction intervals for high-dimensional Vector Autoregressive model |
|
We study the
simultaneous prediction intervals for high-dimensional vector autoregressive
model. We consider a de-biased calibration for the lasso prediction and
propose a Gaussian-multiplier bootstrap based method
for one-step ahead prediction. The asymptotic coverage consistency of the
prediction interval is obtained. We also develop simulation result to
evaluate the finite sample performance of the procedure. |
||
TI_42_0 |
Xu, Xiaojian |
Brock University |
Title |
Optimal
design, active learning, and efficient statistics for big data |
|
This session
emphasizes the efficient statistical process when dealing with big data. Such
efficiency consideration appears at both stages: the stage of optimal and
robust designs for data selection (in Talks 1, 2, and 4) and the stage of
estimation/predication after data are obtained (Talks 3 and 4). Our
speakers of this session discuss a variety of statistical methods, including
probability estimation, quantile regression, optimally weighted least
squares, and incomplete U-statistics. |
||
TI_42_4 |
Xu, Xiaojian |
Brock University |
Title |
Robust
active learning for approximate linear models |
|
In this
paper, we point out the common nature of active learning in machine learning
field and robust experimental designs in statistics field,
and present the methods of robust regression designs that can be
implemented in a robust active learning process. We consider approximate
linear regression models and weighted least squares estimation. Both optimal
weighting schemes and robust optimal designs of the training data used for
active learning are discussed for various scenarios. An analytical form for
robust design density is derived. The simulation results and comparison study
using practical examples indicate improved efficiencies. |
||
TI_14_2 |
Yanev,
George P. |
The University of Texas |
Title |
On Arnold-Villasen ̃or conjectures for characterizing exponential
distribution |
|
Characterizations
of the exponential distribution are abundant. Arnold and Villasen ̃or [1] obtained a series of new
characterizations based on random sam- ples of size two and conjectured possible
generalizations for larger sample size. Extending their techniques, we will
prove Arnold and Villasen ̃or’s conjectures for an arbitrary but fixed sample
size n. We will discuss results published in [2] as well as more recent
findings. |
||
TI_35_1 |
Yin,
Xiangrong |
University of Kentucky |
Title |
Moment
Kernel for Estimating Central Mean Subspace and Central Subspace |
|
The T-central
subspace, introduced by Luo, Li and Yin (2014), allows one to perform
sufficient dimension reduction for any statistical functional of interest. We
propose a general estimator using (third) moment kernel to estimate
the T-central subspace. In this talk, we particularly focus on central
mean subspace via the regression mean function, and central subspace via
Fourier transform or slicing. Theoretical results are established and simulation studies show the advantages of
our proposed methods. |
||
TI_43_0 |
Yin,
Xiangrong |
University of Kentucky |
Title |
Variable
selection and dimension reduction for high-dimension data problems |
|
Variable
selection and dimension reduction are important research topics,
especially for high-dimensional data analysis. This session
consists of talks in the areas. Dr. Dong’s talk focuses on
variable selection on two sets of variables. Dr. Shao’s topic
is on the inference for high-dimensional data, while Dr.
Wu presents semi-parametric method to estimate
multi-dimensions simultaneously, and Dr. Sriperumbudur’s topic
is on the studying of kernel PCA, a popular dimension reduction method. |
||
TI_38_2 |
Yu, Jihnhee |
University of Buffalo |
Title |
Bayesian
empirical likelihood approach to compare quantiles |
|
Bayes
factors, practical tools of applied statistics, have been dealt
with extensively in the literature in the context of hypothesis testing.
The Bayes factor based on parametric likelihoods can be considered both as a
pure Bayesian approach as well as a standard technique for computing
P-values for hypothesis testing. We employ empirical likelihood methodology
to modify Bayes factor type procedures for the nonparametric setting,
establishing asymptotic approximations to the proposed procedures. These
approximations are shown to be similar to those of the classical parametric
Bayes factor approach. The proposed approach is applied towards developing
testing methods involving quantiles, which are commonly used to characterize
distributions. We present and evaluate one and two sample distribution free
Bayes factor type methods for testing quantiles based on indicators and
smooth kernel functions. |
||
TI_44_1 |
Yuan, Qingcong |
Miami University |
Title |
A two-stage
variable selection approach in the analysis of metabolomics and microbiome
data |
|
We propose a
two-stage variable selection approach to analyze a mice data. Mice under
different health conditions (obese or not) and different exposure levels to
biodiesel ultrafine particles (UFPs) are considered. Their metabolomics and
microbiome information are also recorded. We first did a sure variable
screening on the metabolites and microbial species data respectively, then
used Bayesian lasso to get a variable selection set. Multivariate analysis
methods are then applied on the resulting dataset. The study focuses on the
effects of UFPs exposure to gut microbial composition and functions, then
evaluates the impact of UFPs to obese host health. |
||
TI_4_3 |
Yuanqing
Zhang |
Shanghai University of International
Business and Economics |
Title |
Inference
for Partially Linear Additive Higher Order Spatial Autoregressive Model with
Spatial Autoregressive Error and unknown Heteroskedasticity |
|
This article
extends spatial autoregressive model with spatial autoregressive disturbances
(SARAR(1,1)) which is the most popular spatial econometric model to the case
of an arbitrary finite number of nonparametric additive terms and spatial
autoregressive models with spatial autoregressive disturbances of arbitrary
finite order (SARAR(R,S)). We propose a sieve two stage least squares (S2SLS)
regression and generalized method of moments (GMM) procedure of the high
order spatial autoregressive parameters of the disturbance process. Under
some sufficient conditions, we show that the proposed estimator for the
finite dimensional parameter is √n consistent and asymptotically
normally distributed. |
||
TI_24_3 |
Zeitler,
David |
Grand Valley State University |
Title |
Rank Based
Estimation With Skew Normal Error
Distributions Using Big Data Sets |
|
Skew normal
distributions are a generalization of the normal distribution adding a
parameter controlling the direction and magnitude of asymmetry. We will
address a rank based algorithm to fit
linear models with skew normal errors on big data sets using distributed
computation with limited inter-process communication. Distributed computation
may include multiple core as well as clustered hardware resources. Both
theoretical development and a simulation demonstration using R will be
discussed. |
||
TI_13_1 |
Zelterman,
Dan |
Yale University |
Title |
Distributions
for Exchangeable p-Values under an unspecified Alternative
Hypothesis |
|
A typical
biomarker study may result in many p-values testing multiple
hypotheses. Several methods have been proposed to adjust for
multiple comparisons without exceeding the false discovery rate (FDR).
Under an unspecified alternative hypothesis, we propose a marginal
distribution for p-values whose joint distribution facilitates the
description of exchangeable p-values. This model is used to describe
the behavior of the number of statistically significant findings under Simes’ (1986, Biometrika) rule
controlling FDR. We apply our model to a published biomarker
study in which no statistically significant finding were observed by the authors, and provide new power analyses for the
study. |
||
|
||
TI_28_3 |
Zhang, Cheng |
Medstar Cardiovascular Research Network |
Title |
Novel
Post-randomization Methods for Controlling Identity Disclosure and Preserving
Data Utility |
|
Even when
direct identifiers such as name and social security number are removed,
identity disclosure of a survey unit in a data set is possible via matching
demographic variables whose values are easily known from other sources. So,
data agencies need to release a perturbed version of survey data. Ideally, a
perturbation mechanism should protect individual’s identity while preserving
inferences about the population. For categorical key variables, we propose a
novel approach to measuring identification risk for setting strict disclosure
control goals. Specifically, we suggest to ensure
that the probability of identifying any survey unit is at most a given value
ξ. We develop an unbiased post-randomization method that achieves this
goal with little data quality loss. |
||
TI_44_0 |
Zhang, Jing |
Miami University |
Title |
New Explorations for High-Dimensional Big Data Analysis |
|
Standard
statistical methods are no longer computationally
efficient or feasible in the analysis of high dimensional big data
analysis. This session collects ideas of variable
selection/dimension reduction / predictive modeling, exploring how to pick
up the true "signals" among many noises and how to work with the
volume of data. |
||
TI_44_2 |
Zhang, Jing |
Miami University |
Title |
A “Split
and Resample” Approach in Big Data Analysis |
|
Big data are
massive in volume, intensity and complexity. Analysis of big data requires:
picking up the true "signals" among lots of noises,
and handling the volume of data. We introduce a "split
and subsampling" algorithm that handles both variable
selection and prediction for high dimensional big data.
Simulation studies are conducted to show that the proposed
algorithm is robust to multicolinearity among
the predictors in both linear and generalized linear models, selecting
the signal variables with better sensitivity and specificity,
and achieving better prediction with lower MSPE values. |
||
TI_20_4 |
Zhang, Lingsong |
Purdue University |
Title |
On the
analysis of data that lies in the cone |
|
Complex data
arise more often in applications such as images, genomics and many others.
Traditional data were analyzed based on theoretical assumptions of data lie
in Euclidean space. Recent years many new data types are within restricted
space or sets, and require a new set of theory and
methodology to analyze it. In this talk, we will focus on two types of data
that lies in cones, and propose a generalized
principal component type of tools to reveal underlying structure (or hidden
factors) within such data. The approach naturally forms nested structure and
thus is suitable for future investigation of optimal dimension. Application
of this method such as diffusion tensor images will be shown in this talk as
well. |
||
TI_28_1 |
Zhang, Linjun |
Rutgers University |
Title |
The Cost of
Privacy: Optimal Rates of Convergence for Parameter Estimation with
Differential Privacy |
|
With the
unprecedented availability of datasets containing personal information, there
are increasing concerns that statistical analysis of such datasets may
compromise individual privacy. These concerns give rise to statistical
methods that provide privacy guarantees at the cost of some statistical
accuracy. A fundamental question is: to satisfy certain desired level of
privacy, what is the best statistical accuracy one can achieve?
Standard statistical methods fail to yield sharp results, and new technical
tools are called for. In this talk, I will present a general lower bound
argument to investigate the tradeoff between statistical accuracy and
privacy, with application to three problems: mean estimation, linear
regression and classification, in both the classical low-dimensional and
modern high-dimensional settings. For these statistical problems, we also
design computationally efficient algorithms that match the minimax lower
bound under the privacy constraints. Finally I
will show the applications of those privacy-preserving algorithms to real
data such as SNPs containing sensitive information, for which
privacy-preserving statistical methods are necessary. |
||
TI_21_3 |
Zhang, Teng |
University of Central Florida |
Title |
Robust PCA
by Manifold Optimization |
|
Robust PCA
is a widely used statistical procedure to recover an underlying
low-rank matrix with grossly corrupted observations. This work considers the
problem of robust PCA as a nonconvex optimization problem on the manifold of
low-rank matrices, and proposes two algorithms (for
two versions of retractions) based on manifold optimization. It is shown
that, with a proper designed initialization, the proposed algorithms are
guaranteed to converge to the underlying low-rank matrix linearly. Compared
with a previous work based on the Burer-Monterio decomposition
of low-rank matrices, the proposed algorithms reduce the dependence on the
conditional number of the underlying low-rank matrix theoretically.
Simulations and real data examples confirm the competitive performance of our
method. |
||
TI_35_2 |
Zhang, Wei |
University of Arkansas at Little Rock |
Title |
Imputation
of Missing Data in the State Inpatient Databases |
|
Eliminating
healthcare disparities so underserved are assured access to quality medical
care remains a national priority. Large, population based
studies necessary to address healthcare disparities can be costly and
difficult to perform. An efficient alternative that is becoming increasingly
attractive is the use of the State Inpatient Databases. This study aimed at
identifying appropriate imputation methods for SID and applying the imputed
data sets for healthcare disparities research. We compared six imputation
methods for missing data (i.e., complete case analysis, mean
imputation, marginal draw method, hot deck imputation,
joint multiple imputation (MI), conditional MI) through a novel
simulation. |
||
TI_40_3 |
Zhao, Wei |
Indiana University Purdue University
Indianapolis |
Title |
Optimal
Sampling Distributions for Generalized Linear Models |
|
One of the
popular approaches to dealing with large sample data is subsampling, that is,
a small portion of the full data set is subsampled with certain weights and
used as a surrogate for the subsequent computation and simulation. The
crucial part of the method of subsampling is constructing the sampling
weights. In this paper, we propose A-optimal sampling distributions after
investigating the consistency and asymptotic normality of the subsample
estimator to the maximum likelihood estimator in generalized linear models. A
two-step algorithm is proposed to approximate the A-optimal subsampling
estimator. Simulation results show that our subsampling method outperforms
the other subsampling methods with a smaller mean square error of estimation. |
||
TI_42_3 |
Zheng, Wei |
The University of Tennessee |
Title |
Incomplete
U-statistic based on division and orthogonal array |
|
U-statistic
is an important class of statistics. Unfortunately, its computation easily becomes
impractical as the data size $n$ increases. Particularly, the number of
combinations, say $m$, that a U-statistic of order $d$ has to evaluate is in
the order of $O(n^d)$. Many efforts have been made
to approximate the original U-statistic by a small subset of the combinations
in history since Blom (1976), who has coined such an approximation as an
incomplete U-statistic. To the best of our knowledge, all existing methods
require $m$ to grow at least faster than $n$, albeit much slower than $n^d$, in order for the corresponding incomplete
U-statistic to be asymptotically efficient in the sense of mean squared
error. In this paper, we introduce a new type of incomplete U-statistics,
which can be asymptotically efficient even when $m$ grows slower than $n$. In
some cases, $m$ is only required to grow faster than $\sqrt{n}$. The results
are also extended to the degenerate case and the multi-sample case. |
||
TI_38_3 |
Zhong,
Ping-Shou |
The University of Illinois at Chicago |
Title |
Order-restricted
inference for means with missing values |
|
Missing
values appear very often in many applications, but the problem
of missing values has not received much attention in testing
order-restricted alternatives. Under the missing at random (MAR)
assumption, we impute the missing values nonparametrically using kernel
regression. For data with imputation, the classical likelihood ratio
test designed for testing the order-restricted means is no longer
applicable since the likelihood does not exist. This article proposes a
novel method for constructing test statistics for assessing means with
an increasing order or a decreasing order based on jackknife empirical
likelihood (JEL) ratio. It is shown that the JEL ratio statistic
evaluated under the null hypothesis converges to a chi-bar-square
distribution, whose weights depend on missing probabilities
and nonparametric imputation. Simulation study shows that the
proposed test performs well under various missing scenarios and
is robust for normally and nonnormally distributed data. The proposed
method is applied to an Alzheimer's disease neuroimaging initiative
data set for finding a biomarker for the diagnosis of the Alzheimer's
disease. |
||
TI_45_0 |
Zitikis, Ricardas |
Western University |
Title |
Risk
Measures: Theory, Inference, and Applications |
|
|
||
TI_45_3 |
Zitikis, Ricardas |
Western University |
Title |
Gini
Shortfall: ACoherent RiskMeasure |
|
For quite
some time, the value-at-risk (VaR) was an appealing
risk measure, and even anindustry and regulatory standard
for calculating risk capital in banking and insurance. The VaR isstill a standard, though
criticized in many theoretical and empirical works. In this context, theexpected shortfall (ES) has been a remarkable
innovation that rewards diversification andcaptures
the magnitude of tail risk. But what about tail variability? The coherentrisk measure,called the
Gini shortfall (GS), takes care of both the magnitude and the variability of
tail risk,thus providing a much-needed missing
piece in the encompassing risk-measurement puzzle. Inthis
talk, we shall discuss various aspects of theGS,
including its origins, properties, andstatistical
inference. |
Abstracts
for General-Invited Speakers (Alphabetic Order)
G_1_1 |
Abujarad,
Mohammed H.A. |
Aligarh
Muslim University |
Title |
Bayesian Survival Analysis of Topp-Leone
Generalized Family with Stan |
|
In this article, the discussion has been carried out on the
generalization of three distribution by means of exponential, exponentiated
exponential and exponentiated extension. We set up three and four parameters
life model called the Topp-Leone exponential
distribution, Topp-Leone exponentiated exponential
distribution and Topp-Leone exponentiated extension
distribution. We give extensive consequence of the, survival function and
hazard rate function. To fit this model as survival model and hazard rate
function we adopted to use Bayesian approach. A real survival data set is
used to illustrate. application is done by R and Stan and suitable
illustrations are prepared. R and Stan codes have been given to actualize
censoring mechanism via optimization and also simulation tools. |
||
G_2_1 |
Ahmed, Bilal Peer |
Islamic
University of Science & Technology, Awantipora,
Pulwama (J&K), India |
Title |
Inflated Size-Biased Modified Power Series Distributions and its
Applications |
|
In this paper, an Inflated Size-biased Modified Power Series
Distributions (ISBMPSD), where inflation occurs at any of the support points
is studied. This class include among others the size-biased generalized
Poisson distribution, size-biased generalized negative binomial distribution
and size-biased generalized logarithmic series distribution as its particular
cases. We obtain the recurrence relations among ordinary, central and
factorial moments. The maximum likelihood and Bayesian estimation of the parameters
of the Inflated Size-biased MPSD is obtained. As special cases, results are
extracted for size-biased generalized Poisson distribution, size-biased
generalized negative binomial distribution and size-biased generalized
logarithmic series distribution. Finally, an example is presented for the
size-biased generalized Poisson distribution to illustrate the results and a
goodness of fit test is done using the maximum likelihood and Bayes
estimators. |
||
G_6_4 |
Bulut,
Murat |
Osmangazi University, Turkey |
Title |
Robust Logistic Regression based on Liu estimator |
|
In this study, we propose a new estimator in logistic regression
to handle multicollinearity and outlier problems simultaneously. There are
some biased estimators proposed for the solution of the multicollinearity
problem. Also, there are some studies to cope with the outlier problems. But
there are only a few amount of studies in the
literature when there exist the multicollinearity and outlier problems at the
same time in the logistic model. In this study, we introduce a robust
logistic estimator based on Liu estimator. We compare the proposed estimator
with some other existing estimators by means of a simulation study. |
||
G_5_1 |
Feng, Yaqin |
Ohio
University |
Title |
Stability and instability of steady states for a branching
random walk |
|
We consider the time evolution of a lattice branching random
walk with local perturbations. Under certain conditions, we prove the Carleman type bound on the moment growth of a particle subpopulation number and show
the existence of a steady state. |
||
G_5_3 |
Lazar, Drew |
Ball State University |
Title |
Robust and scalable optimization on manifolds |
|
In this talk a robust and scalable procedure for estimation on
classes of manifolds that generalizes the classical idea of “median of means”
estimation is proposed. This procedure is motivated by statistical inference
problems in data science which can be cast as optimization problems over
manifolds. A key lemma that characterizes a property of the geometric median
on manifolds is shown. This lemma allows the formulation of bounds on an
estimator which aggregates subset estimators by taking their geometric
median. Robustness and scalability of the procedure is illustrated in
numerical examples on both simulated and real data sets. |
||
G_1_3 |
Louzada-Neto, Francisco |
ICMC, University of Sao Paulo |
Title |
Efficient Closed-Form MAP Estimators for Some
Survival Distributions and Their Applications to Embedded
Systems |
|
In this paper, we propose maximum a posteriori (MAP) estimators
for the parameters of some survival distributions, which have a simple
closed-form expression. In principle, we focus on the Nakagami
distribution, which plays an essential role in communication engineering
problems, particularly to model fading of radio signals. Moreover, we show
that the obtained results can be extended to other survival probability
distributions, such as the gamma and generalized gamma ones. Numerical
results reveal that the MAP estimators outperform the existing estimators and
produce almost unbiased estimates even for small sample sizes. Our
applications are driven by embedded systems, which are commonly used in
communication engineering. Particularly, they can consist of an electronic
system inside a microcontroller, which can be programmed to maintain
communication between a transmitting antenna and mobile antennas, which are
operating at the same frequency. In
this context, from the statistical point of view, closed-form estimators are
needed, since they are embedded in mobile devices and need to be sequentially
recalculated at real time. |
||
G_6_1 |
McTague, Jaclyn |
LogEcal Analytics |
Title |
Repeated Significance Testing of Normal Variables with Unknown
Variance |
|
In clinical trials when data is accumulated over time,
sequential hypothesis testing requires control of type-1 error. It is
typically assumed that the sample sizes are large so that, even with an
unknown variance, the test statistics are approximately normal. This leads to
the reliance on the multivariate normal distribution to calculate the
critical values. We develop the exact
joint distribution of the test statistics for any sample size and provide
critical values that ensure type-1 error control. We introduce an efficient
numerical method that works for any number of tests commonly encountered in
the so-called group sequential clinical trials. |
||
G_6_3 |
Mesbah, Mounir |
Sorbonne
University |
Title |
Current statistical issues in HRQoL research: Testing local
independence in latent variable models |
|
In this talk, I will give a quick overview about the current
research in Health Related Quality of Life (HRQoL) research. I will focus on few important
challenging statistical issues, occurring when latent models are used. Local
independence is a strong assumption of such models that needs to be checked.
I will make the bibliographical point on the psychometrics literature on the
subject which deals mainly with effect of local dependence on the inference
of the parameters, and its detection.
I will discuss the challenging theoretical and computational issues
and present recent simulation results and application to real data sets. |
||
G_1_2 |
Mynbaev,
Kairat |
International
School of Economics, Kazakh-British Technical University |
Title |
Nonparametric kernel estimation of unrestricted
distributions |
|
We consider nonparametric estimation of an unrestricted
distribution F in that it may, or may not, be absolutely continuous. Three
problems are considered: estimation of F(x) at a continuity point x,
estimation of F(y)-F(x), where x and y are continuity points and estimation
of jumps of F. Contrary to the extant literature, we make no restriction on
the existence or smoothness of the derivatives of F. The key insight for our
result is the use of Lebesgue-Stieltjes integrals.
The method is also applied to inversion theorems for characteristic
functions, where we provide explicit estimates for convergence rates. |
||
G_2_2 |
Odhiambo, Collins |
Strathmore
University |
Title |
Extended version of Zero-inflated Negative Binomial Distribution
with Application to HIV Exposed Infant Count Data |
|
Routine HIV exposed infants (HEI) data collected shows many HIV
positive zeros due to prevention of mother-to-child transmission (PMTCT) policy.
However, implementation of PMTCT differs and results to structured zero for
HEI positive numbers (optimum PMTCT) and non-structured zero (sub-optimum
PMTCT). Hence standard zero-inflated models may not be appropriate. We seek
to extend the zero-inflated Negative Binomial (ZINB) model by incorporating
variable α. Extensive simulations were conducted by varying α,
dispersion and sample size and results compared using BC. HEI data sampled
from six high HIV burden counties in Kenya was applied to the model and
yielded better performance. |
||
G_2_3 |
Ogawa, Mitsunori |
University of Tokyo |
Title |
Parameter estimation for discrete exponential families under the
presence of nuisance parameters |
|
The parameter estimation problem for discrete exponential family
models is discussed under the presence of nuisance parameters. Maximizing the conditional likelihood
usually yields an estimator with statistically nice properties. However, the computation of its
normalization constant often prevents its practical use. In this talk, we derive a class of
computationally tractable estimators for such a situation based on the
framework of composite local Bregman divergence with simultaneous use of
tools from algebraic statistics. |
||
G_2_4 |
Peng, Jie |
St.
Ambrose University |
Title |
Improved Prediction Intervals for Discrete Distributions |
|
The problem of predicting a future outcome based on the past and
currently available samples arises in many applications. Applications of
prediction intervals (PIs) based on continuous distributions are well-known.
Compared to continuous distributions, results on constructing PIs for
discrete distributions are very limited. The problems of constructing
prediction intervals for the binomial, Poisson and negative binomial
distributions are considered here. Available approximate, exact and
conditional methods for these distributions are reviewed and compared. Simple
approximate prediction intervals based on the joint distribution of the past
samples and the future sample are proposed. Exact coverage studies and
expected widths of prediction intervals show that the new prediction
intervals are comparable to or better than the available ones in most cases. |
||
G_5_2 |
Sepanski, Jungsywan |
Central
Michigan University |
Title |
Constructing Bivariate Copulas with Distributional Distortions |
|
Distortion of existing copulas provides a way to construct new
copulas. We propose distributional distortions that are distribution
functions with support on the unit interval. Specifically, the distortion
considered in this presentation is the distribution of a unit-Burr random
variable formed by the exponential transformation of a negative Burr random
variable. The induced new copulas include the well-known BB1, BB2 and BB4
copulas as special cases. The dependence properties and relationships between
the base bivariate copula and the induced copula in tail dependence
coefficients and tail orders are studied.
The unit-Burr distortion of existing bivariate copulas may result in
copulas that allow a maximum range of dependence and permit both lower and
upper tail coefficients. Contour plots
and numerical results are also presented.
|
||
G_5_4 |
Smith, Scott |
University
of the Incarnate Word |
Title |
A Generalization of the Farlie-Gumbel-Morgenstern
and Ali-Mikhail-Haq Copulas |
|
An important aspect of modeling bivariate relationships is the
choice of underlying copula. One-parameter copulas may be too restrictive to
provide adequate fit. We present a two-parameter copula which possesses the Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq copulas as special cases. We then discuss dependence
properties and simulation. Finally, we use the new copula to model two data
sets and compare the fit to that of the FGM and AMH copulas. |
||
G_6_2 |
Wang, Dongliang |
SUNY
Upstate Medical University |
Title |
Empirical likelihood inference for Kolmogorov-Smirnov test given
censored data |
|
"Kolmogorov-Smirnov test is commonly used for comparing two
distributions and may be particularly valuable for censored data since the K-S
test statistic can be interpreted as the maximum survival difference. In this
work, the smoothed empirical likelihood (SEL) is developed for the K-S
statistic given censored data with desirable asymptotic properties. The
developed results not only lead to a new test procedure, but also a reliable
interval estimator for maximum survival difference. The SEL method is
evaluated by empirical simulations in terms of the coverage probability of
the interval estimator, and illustrated via applying
to a real life dataset. |
Abstracts
for Student Posters
(Alphabetically
Ordered)
P-01 |
Amponsah, Charles |
Univ
of Nevada , Reno |
||
Title |
A Bivariate Gamma Mixture Discrete Pareto Distribution |
|||
We propose a new stochastic model describing the joint
distribution of (X, N), where N has a heavy-tail discrete Pareto distribution
while X is the sum of N independent gamma random
variables. We present main properties of this distribution, which include marginal
and conditional distributions, moments, representations, and parameter
estimation. An example from finance illustrates modeling potential of this
new mixed bivariate distribution. |
||||
P-02 |
Ash, Jeremy |
North
Carolina State University |
||
Title |
Confidence band estimation methods for accumulation curves at
extremely small fractions with applications to drug discovery |
|||
Accumulation curves are used to assess the effectiveness of
ranking algorithms. Items are ranked according to the algorithm's belief that
they possess some desired feature, then items are tested according to
relative rank. In a typical virtual screen in drug discovery, millions of
chemicals are screened, while only tens of chemicals are tested. We propose
modifications to previously developed confidence band estimation methods that
have good coverage probabilities and expected widths under these conditions
in simulation. We also perform power
analyses to determine whether accumulation curves or other lift curves are
better for detecting significant differences between ranking algorithms. |
||||
P-03 |
Cho, Min Ho |
The
Ohio State University |
||
Title |
Aggregated Pairwise Classification of Statistical Shapes |
|||
The classification of shapes is of great interest in diverse areas.
Statistical shape data have two main properties: (i) shapes are inherently
infinite dimensional with strong dependence among the position of nearby
points; (ii) shape space is not Euclidean, but is
fundamentally curved. To accommodate these features, we work with the square
root velocity function, pass to tangent spaces of the manifold of shapes at
different projection points, and use principal components within these
tangent spaces. We illustrate the impact of the projection point and choice
of subspace on the misclassification rate with a novel method of combining
pairwise classifiers. |
||||
P-04 |
Damarjian, Hanna |
Purdue
University Northwest |
||
Title |
On the Transmuted Exponential Pareto Distribution |
|||
There has been a growing interest in developing statistical
distributions that are capable to model various data. The purpose of this research project is to
construct a new model that will portray strong flexibility for various types
of data. This new model will be called
the Transmuted Exponential Pareto (TEP) Distribution. Several lifetime distributions are embedded
in this distribution. We provide various mathematical characteristics
including the parameter estimation methods and simulation. Finally, the importance and flexibility of
the proposed model will be illustrated by means of some real-life data
analysis. |
||||
P-05 |
Das, Manjari |
Carnegie
Mellon University |
||
Title |
Efficient nonparametric estimation of population size from
incomplete lists |
|||
Estimation of total population size using incomplete lists has
long been an important problem across many biological and social sciences.
For example, partial, overlapping lists of casualties in the Syrian war by
multiple organizations, are of great importance to estimate the magnitude of
destruction. Earlier approaches have either used strong parametric
assumptions or suboptimal nonparametric techniques which can lead to bias via
model misspecification and smoothing. Assuming conditional independence of two
list, we derive a nonparametric efficiency bound for estimating the capture
probability and construct a bias-corrected estimator. We apply our methods to
estimate HIV prevalence in the Alameda-County, California |
||||
P-06 |
Farazi,
Md Manzur Rahman |
Marquette
University |
||
Title |
Feature Selection for a Predictive Model using Machine Learning
Techniques on Mosquito’s Spectral Data |
|||
Mosquitoes’ age is a key indicator to understand the capability of
the mosquito to spread diseases and to evaluate the effectiveness of mosquito
control interventions. Traditional methods of estimating age via dissection
are expensive and require skill personnel. Near-Infrared (NIR) spectroscopy
that measures the amount of light absorbed by mosquitoes’ head or thorax, are
used as non-invasive method to estimate age. Standard methods do not consider
the physiological changes mosquitos go through as they age. We propose a
change-point model to estimate age from spectra using PLSR model. The
change-point PLSR model performs better in age estimation of the mosquitoes. |
||||
P-07 |
Galarza, Christian |
State
University of Campinas |
||
Title |
On moments of folded and truncated multivariate extended
skew-normal distributions |
|||
"Following Kan & Robotti
(2017), this paper develops recurrence relations for integrals that involve
the density of multivariate extended skew-normal distributions, which
includes the well-known skew-normal distribution introduced by Azzalini &
Dalla-Valle (1996) and the popular multivariate normal distribution. These
recursions offer fast computation of arbitrary order product moments of
truncated multivariate extended skew-normal and folded multivariate extended
skew-normal distributions with the product moments of the multivariate
truncated skew-normal, folded skew-normal, truncated multivariate normal and
folded normal distributions as a byproduct. Finally, from the application
point of view, these moments open the way to propose analytical expressions
on the E-step of the Expectation-Maximization (EM) algorithm for complex
data, such as, asymmetric longitudinal data with censored and/or missing
observations. These new methods are provided to practitioners in the R MomTrunc package, an efficient R library incorporating
C++ and FORTRAN subroutines through Rcpp." |
||||
P-08 |
George, Tyler |
Central
Michigan University |
||
Title |
Lack-of-fit Testing Without Replicates Available |
|||
A new technique for testing the lack-of-fit (LOF) in a linear regression
model when replicates are not available was developed. Most applications
result in data that does not contain replicates in its predictors. The
classical lack of test found in most linear regression textbooks is not
applicable. Many current solutions use close points as ``pseudo"
replicates but close is not well defined. Presented in this paper is a more
generalized and robust methodology, for testing LOF using a new grouping
procedure. Power simulations are used as a comparison of the new test against
previous test's for various alternative models. |
||||
P-09 |
Goward, Kenneth |
Central
Michigan University |
||
Title |
A New Generalized Inverse Gaussian Distribution with Bayesian
Estimators |
|||
A four-parameter family of transformed inverse Gaussian (TIG)
distribution is described. A three-parameter family derived from the
four-parameter TIG family is considered, with a specific new distribution
referred to as the Generalized inverse Gaussian (GIG) distribution being
considered. Two different versions of this distribution are provided and
computational and theoretical advantages of one over the other are discussed.
Maximum likelihood techniques are discussed alongside Bayesian approaches
with Jeffreys-type priors for parameter estimation. A simulation study was
conducted and results from the Bayesian approach and approximations to the
maximum likelihood estimators were analyzed using the Kolmogorov-Smirnov
test. The applicability of this distribution is considered on a real world data set. |
||||
P-10 |
Ihtisham,
Shumaila |
Islamia
College, Peshawar, Pakistan |
||
Title |
Alpha Power Inverse Pareto Distribution and its Properties |
|||
In this study, a new distribution referred to as Alpha-Power
Inverse Pareto distribution is introduced by including an extra parameter.
Several properties of the proposed distribution are obtained including moment
generating function, quantiles, entropies, order statistics, mean residual
life function and stochastic ordering. Method of maximum likelihood is used
to find estimates of the parameters. Two real datasets are considered to
examine the usefulness of the proposed distribution. |
||||
P-11 |
Ijaz, Muhammad |
University
of Peshawar Pakistan |
||
Title |
A New Family of Distributions with Applications |
|||
In this paper, the main goal is to introduce a new family of
distributions. Generally, the proposed family is this paper is called a new
alpha power transformed family (NAPT) of distributions. On the basis of the
proposed family of distributions, we have fitted the CDF of the exponential
distribution and called it new alpha power transformed exponential
distribution (NAPTE). Some of their statistical properties are discussed,
including mean residual life, quantile function, skewness, and kurtosis. The
graphical representation is also elaborated for various values of parameters
while plotting the hazard rate function and probability density function. The
parameters are estimated by means of maximum likelihood estimation. Furthermore,
the paper also presents the simulation study. To, illustrate the usefulness
of new family of distributions two real-life data sets were used. The
comparison is made on the basis of goodness of fit criteria’s including
Akaike Information criterion, Consistent Akaike Information criterion, and
some others. The results have been observed that the new alpha power
transformed exponential distribution is more flexible as compared to other
existing distributions for these two data sets under study. |
||||
P-12 |
Lee, Joo Chul |
University
of Connecticut |
||
Title |
Online Updating Method to Correct for Measurement Error in Big
Data Streams |
|||
When huge amounts of data arrive in streams, online updating is
an important method to alleviate both computational and data storage issues.
This paper extends the scope of previous research for online updating in the
context of the classical linear measurement error model. In the case where
some covariates are unknowingly measured with error at the beginning of the
stream, but then are measured without error after a particular point along
the data stream, the updated estimators ignoring the measurement error are
biased for the true parameters. We propose a method to correct the bias of
the estimators, as well as correct their variances, once the covariates
measured without error are first observed; after correction, the traditional
online updating method can then proceed as usual. We further derive the
asymptotic distributions for the corrected and updated estimators. We provide
simulation studies and a real data analysis with the Airline on-time data to
illustrate the performance of our proposed method. |
||||
P-13 |
Lun, Zhixin |
Oakland
University |
||
Title |
Simulating from Skewed Multivariate Distributions: The Cases of
Lomax, Mardia’s Pareto (Type 1), Logistic, Burr and
F Distributions |
|||
Convenient and easy to use programs are available to simulate
data from several common multivariate distributions (e.g. normal, t).
However, functions for directly generating data from other less common
multivariate distributions are not as readily available. We will illustrate
how to generate random numbers from multivariate Lomax (a flexible family of
skewed multivariate distribution). Further, multivariate cases of Mardia’s Pareto of type I, Logistic, Burr, and F can be
also considered easily by applying the useful properties of multivariate
Lomax distribution. This work provides a useful tool for practitioners when
they need to simulate skewed multivariate distribution for various studies. |
||||
P-14 |
Matuk, James |
The
Ohio State University |
||
Title |
Function Estimation through Phase and Amplitude Separation |
|||
An important task in functional data analysis is to estimate
functional observations based on sparse and noisy observations on a time
interval. To address this problem, we
define a Bayesian model that can fit individual functions on a per subject
basis, as well as multiple functions simultaneously by borrowing information
across subjects. A distinguishing
property of this work is that our model considers amplitude and phase
variabilities separately which describe y-axis and x-axis variability,
respectively. We validate the proposed framework using multiple simulated
examples as well as real data including ECG signals and measurements from
Diffusion Tensor Imaging. |
||||
P-15 |
Maxwell, Obubu |
Nnamdi
Azikiwe University Awka |
||
Title |
The Kumaraswamy Inverse Lomax Distribution (K-IL): Properties
and Applications |
|||
For the first time, the Kumaraswamy Inverse Lomax distribution
is introduced, and studied. Some of its basic statistical properties were
investigated in minute details, including explicit expressions for the
survival function, failure rate, reversed hazard, odds ratio, order
statistics, moments, quantile and median. The model parameters were estimated
using the maximum likelihood estimation method. Real - life applications were
provided and the K-IL distribution offers better
fits. Performance was assessed on the basis of the distributions
log-likelihood and Akaike information criteria (AIC). |
||||
P-16 |
May, Paul |
South
Dakota State University |
||
Title |
Multiresolution Techniques for High Precision Agriculture |
|||
High Precision Agriculture is the use of data to observe and
respond to variations in crop fields on both a macroscopic and granular
level. Remote sensing techniques have created a wealth of data, but the size
of these data sets leads to computational challenges. This has historically
forced the use of less computationally expensive, but also less accurate
methods. Recent development of multiresolution approximations for spatial
covariance structures (Katzfuss 2015, Sang Huang,
2011) allow for the use of GLS and Kriging on very large data sets to make
inferences that farmers can turn into profitable actions. |
||||
P-17 |
Melchert, Bryan |
Purdue
University Fort Wayne |
||
Title |
Forecasting Migration Timing of Sockeye Salmon to Bristol Bay,
AK |
|||
Arrival of Sockeye Salmon (Oncorhynchus nerka)
to the Bristol Bay river system of Alaska is notoriously compact, with about
75% of the annual run arriving within 4 weeks. This research seeks to
leverage increased data access and modern statistical learning methods to
generate an accurate migration timing forecast with potential of annual
reproduction, which currently does not exist for the fishery. Included topics
are dimensionality reduction, general additive modeling with time series
data, gradient boosting methods, and model validation. |
||||
P-18 |
Mohammed, Mohanad |
University
of KwaZulu-Natal, Pietermaritzburg, South Africa |
||
Title |
Using stacking ensemble for microarray-based cancer
classification |
|||
Microarray technology has produced a massive amount of gene
expression data. This data can be used efficiently for classification that
facilitates disease diagnosis and prognosis. There are many computational
methods that are utilized for cancer classification using these gene
expression data. Artificial neural networks (ANN), support vector machines
(SVM), and random forests (RF) are among the most successful methods for
classifying tumors. Recent research shows that combining many classifiers can
yield better results than using one classifier. In this paper, we used
stacking ensemble to combine different classifiers, namely, ANN, SVM, RF,
naive Bayes (NB), and k-nearest neighbors (KNN) for microarray-based cancer
classification. Results show that stacking ensemble performed better in terms
of accuracy, kappa coefficient, sensitivity, specificity, area under the
curve (AUC), and receiver operating characteristic (ROC) curve, when applied
to publicly available microarray data. |
||||
P-19 |
Ordoñez, José Alejandro |
Campinas
State University |
||
Title |
Objective Bayesian Analysis for the Spatial Student t Regression
model |
|||
We develop an objective Bayesian analysis for the Spatial
Student t regression model with unknown degrees of freedom based on the
reference prior method. As the degrees of freedom, the spatial parameter it
is typically difficult to elicitate: The propriety of the posterior distribution
is not always guaranteed, whereas proper prior distributions may dominate the
analysis. We show that the Bayesian prior analysis using this method yield to
a proper posterior distribution and we use it to develop model selection and
prediction. Finally, we assess the performance of the method through
simulation and illustrate it using a real data application. |
||||
P-20 |
Saha, Dheeman |
University
of New Mexico |
||
Title |
Sparse Bayesian Envelope |
|||
Due to the complexity of high dimensional datasets, it is
difficult to evaluate them efficiently. However, using a Bayesian framework
for dimension reduction and variable selection techniques can help to
identify the material and immaterial parts. This, in turn, leads to improved
efficiency in the estimation of the regression coefficients. In this work, we
combined the idea of dimension reduction with Spike-and-Slab variable
selection and proposed a Bayesian sparse Envelope method. In addition, to
that, since the true structural dimension of the Envelope is unknown, we used
Reversible Jump Markov Chain Monte Carlo to draw samples from the posterior
distribution. |
||||
P-21 |
Shen, Luyi |
University of Notre Dame |
||
Title |
Bayesian community detection for weighted sparse
networks using mixture of SBM model |
|||
We propose a novel mixture of stochastic block model
for community detection in weighted networks. Our model allows modeling the
sparsity of network and performing community detection simultaneously by
cleverly combining the spike and slab prior with a stochastic block model. A
Chinese restaurant process prior is used for modeling the random partition of
the model which does require the number of community
to be known as a priori. Another appealing feature of our model is that it
allows the sparsity level or the network to vary across communities. That is,
the sparsity informational the network is incorporated for community
detection. Efficient MCMC algorithms are derived for sampling the posterior
distribution for inference and our model and algorithms were demonstrated
using both simulated and teal data sets. |
||||
P-22 |
Shubhadeep, Chakraborty |
Texas
A&M University |
||
Title |
A New Framework for Distance and Kernel-based Metrics in High
Dimensions |
|||
The paper presents new metrics to quantify and test for (i) the
equality of distributions and (ii) the independence between two
high-dimensional random vectors. We show that the energy distance based on
the usual Euclidean distance cannot completely characterize the homogeneity
of two high-dimensional distributions in the sense that it only detects the
equality of means and the traces of covariance matrices in the
high-dimensional setup. We propose a new class of metrics which inherit the
desirable properties of the energy distance and maximum mean
discrepancy/(generalized) distance covariance and the Hilbert-Schmidt
Independence Criterion in the low-dimensional setting and is capable of
detecting the homogeneity of/completely characterizing independence between
the low-dimensional marginal distributions in the high dimensional setup. We
further propose t-tests based on the new metrics to perform high-dimensional
two-sample testing/independence testing and study their asymptotic behavior
under both high dimension low sample size (HDLSS) and high dimension medium
sample size (HDMSS) setups. The computational complexity of the t-tests only
grows linearly with the dimension and thus is scalable to very high
dimensional data. We demonstrate the superior power behavior of the proposed
tests for homogeneity of distributions and independence via both simulated
and real datasets. |
||||
P-23 |
Soale, Abdul-Nasah |
Temple
University |
||
Title |
On expectile-assisted inverse
regression estimation for sufficient dimension reduction |
|||
Sufficient dimension reduction (SDR) has become an important
tool for multivariate analysis. Among the existing SDR methods in the
literature, sliced inverse regression, sliced average variance estimation,
and directional regression are popular due to their estimation accuracy and
easy implementation. However, these estimators all rely on slicing the
response, and may not work well under heteroscedasticity. To improve these
estimators, we propose to first estimate the conditional expectile
of the response given the predictor and then perform inverse regression based
on slicing the expectile. The superior performances
of the new estimators are demonstrated through numerical studies and real
data analysis. |
||||
P-24 |
Wang, Yang |
The
university of Alabama |
||
Title |
On variable selection in matrix mixture modeling |
|||
Finite mixture models are widely used for cluster analysis,
including clustering matrix data. Nowadays, high-dimensional matrix
observations arise in many fields. It is known that irrelevant variables can
severely affect the performance of clustering procedures. Therefore, it is
important to develop algorithms capable of excluding irrelevant variables and
focusing on informative attributes in order to achieve good clustering
results. Several variable selection approaches have been proposed in the
multivariate framework. We introduce and study a variable selection procedure
that can be applied in the matrix-variate context. The methodological
developments are supported by several simulation studies and application to
real-life dataset. |
||||
P-25 |
Wang, Runmin |
University
of Illinois at Urbana-Champaign |
||
Title |
Self-Normalization for High Dimensional Time Series |
|||
Self-normalization has attracted considerable attention in the
recent literature of time series analysis but its
scope of applicability has been limited to low/fixed dimensional parameter
for low dimensional time series. In this article, we propose a new
formulation of self-normalization for the inference of the mean of high
dimensional stationary processes. Our original test statistic is a
U-statistic with a trimming parameter to remove the bias caused by weak
dependence. Under the framework of nonlinear causal processes, we show the
asymptotic normality of our U-statistic with the convergence rate dependent
upon the order of the Frobenius norm of the long
run variance matrix. The self-normalized test statistic is then formulated on
the basis of recursive subsampled U-statistic and its limiting null
distribution is shown to be a functional of time-changed Brownian motion,
which differs from the pivotal limit used in the low dimensional setting. An
interesting phenomenon associated with self-normalization is that it works in
the high dimensional context even if the convergence rate is unknown. We also
present applications to testing for bandedness in
covariance matrix and testing for white noise for high dimensional stationary
time series and compare the finite sample performance with existing methods
in simulation studies. At the root of our theoretical argument, we extend the
martingale approximation to the high dimensional setting, which could be of
independent theoretical interest. |
||||
P-26 |
Xing, Lin |
University of Notre Dame |
||
Title |
A metric geometry approach to the weight prediction
problem |
|||
Many real data can be represented as a hypergraph
which is a pair consisting of two sets, one of which is the set of data
points, and the other represents higher order relations among data point s,
called the set of hyperedges. A standard example of a hypergraph data is a
collaboration network in which the set of data points are mathematicians, and
each hyperedge can be formed out of a group of mathematician
having a joint publication. In this work, we propose a geometric approach to
studying problems related to hypergraph data with emphasis on weight
prediction problem which is one of the main problems in machine learning. We
introduce several classes of metrics on the set of data points, and also on
the set of hyperedges, to make these sets become metric spaces. Using the
structures of metric spaces on such hypergraph data, we propose modified k
nearest neighbors methods which apply to the weight
prediction on data points or hyperedges of hypergraph data. We illustrate the
techniques in our work by showing experimental analysis on several data. |
||||
P-27 |
Yang, Tiantian |
Clemson
University |
||
Title |
A Comparison of Several Missing Data Imputation Techniques for
Analyzing Different Types of Missingness |
|||
Missing data is common in real world studies and can create
issues in statistical inference. Discarding cases that have missing values or
replacing the missing values with inappropriate imputation techniques can
both result in biased estimates. Many imputation techniques have assumptions
that can be hard to assess in practice, therefore the actual appropriate
imputation technique is often unclear. To address this issue, a factorial
simulation design was developed to measure the impact of certain data set
characteristics on the validity of several popular imputation techniques. The
factors in the study were missing mechanisms, missing data percentages, and
missing data methods. The evaluation included parameter estimates, bias,
confidence interval coverage and width for the parameters of interest.
Simulation results suggest all three factors have significant impact on the
quality of the estimation. Additional factors such as number of variables,
type of variables, and correlations of data are being incorporated in the
simulation. Finally, real data examples are discussed to illustrate the
applicability of different missing data imputation methods. |
||||
P-28 |
Yao, Yaqiong |
University
of Connecticut |
||
Title |
Optimal two-stage adaptive subsampling design for softmax regression |
|||
For massive datasets, statistical analysis using the full data
can be extremely time demanding and subsamples are often taken and analyzed
according to available computing power. For this purpose, Wang et al. (2018)
developed a novel two-stage subsampling design for logistic regression. We
generalize this method to include the softmax
regression. We derive the asymptotic distribution of the estimator obtained
from subsamples that are drawn according to arbitrary subsampling
probabilities, and then derive the optimal subsampling probabilities that
minimize the asymptotic variance-covariance matrix under the A-optimality and
the L-optimality criteria. The optimal subsampling probabilities involves
unknown parameters, so we adopt the idea of optimal adaptive design and use a
small subsample to obtain pilot estimators. We also consider Poisson
subsampling for its higher computational and estimation efficiency. We
provide simulation and real data examples to demonstrate the performance of
our algorithm. |
||||
P-29 |
Yuu, Elizabeth |
Robert
Koch Institute |
||
Title |
Quantifying microbial dark matter using generalized linear
models and its impact on metagenome analyses |
|||
We previously introduced DiTASiC
(Differential Taxa Abundance including Similarity Correction) to address
shared read ambiguity resolution based on a regularized, generalized linear
model (GLM) framework. This, and other similar approaches, does not address
the remaining unmapped reads, or “microbial dark matter”. We extend our
approach by analyzing sub mappings with different error-tolerance and
integrating dark matter variables in an effort to create a more appropriate
GLM. This new idea has the potential to provide more accurate estimates of
taxa abundance and inherent variation; this in turn can lead to improved taxa
quantification and differential testing. |
||||
P-30 |
Zang, Xiao |
The
Ohio State University |
||
Title |
Clustering Functional Data using Fisher-Rao Metric |
|||
Functional data are infinite dimensional
and histograms are no longer applicable for discovering multimodality. Also,
due to misalignment pointwise summaries like cross-sectional means and
standard deviations are unable to faithfully describe the typical form and
variability. Therefore, we developed a functional k-means clustering
algorithm that uses Fisher-Rao metric as the distance measure, which
simultaneously aligns functions within each cluster using a flexible family
of domain warping, with a BIC criterion to choose the optimal number of clusters.
In simulation studies our method out-performed Sangalli
et al. ‘s method in terms of clustering accuracy. Real-world applications
will be illustrated on several datasets. |
||||
P-31 |
Zhang, Han |
The
University of Alabama |
||
Title |
Aggregate Estimation in Sufficient Dimension Reduction for
Binary Responses |
|||
Many successful inverse regression
based sufficient dimension reduction methods have been developed since Sliced
Inverse Regression was introduced. However, most of them target on problems
with continuous responses. Although some claim to be applicable to both
categorical and numerical responses, they may work poorly for binary
classification problem since the binary responses provide very limited
information. In this paper, we put forward an aggregate estimation method for
binary responses, which involves a decomposition step and a combination step.
As an ensemble learning approach, aggregate estimation is proved to
effectively decrease the bias and exhaustively estimate the dimension
reduction space. |
||||
P-32 |
Zhang, Yangfan |
University
of Illinois Urbana-Champaign |
||
Title |
High Dimensional Regression Change Point Detection |
|||
In this article, we propose a method to detect possible change
point in linear regression. We construct a U-statistic based statistic with
self-normalization, and derive its null distribution, which tends out to be
pivotal. Our method can allow intercept in the model while detecting the
change point in the slope, which is more general than the existing
literature. Under certain conditions, the power is also roughly derived. The
performances are reasonably good for both size and power. Furthermore, our
method can be combined with wild binary segmentation to deal with multiple
change point case and estimate the locations. |
||||
P-33 |
Zhang, Yingying |
The
University of Alabama |
||
Title |
On model-based clustering of time-dependent categorical
sequences |
|||
Clustering categorical sequences is an important problem that
arises in many fields such as medicine, sociology, and economics. It is a
challenging task due to the fact that there is a lack of techniques for
clustering categorical data as the majority of traditional clustering
procedures are designed for handling quantitative observations. Situations
with categorical data being related to time are even more troublesome. We
employ a mixture of first order Markov models with transition probabilities
being functions of time to develop a new approach for clustering categorical
time-related data. The proposed methodology is illustrated on synthetic data
and applied to a real-life data set containing sequences of life events for
respondents participating in the British Household Panel Survey. |
||||
P-34 |
Zhu, Changbo |
University
of Illinois at Urbana-Champaign |
||
Title |
Interpoint Distance Based Two Sample Tests in High Dimension |
|||
In this paper, we study a class of two sample test statistics
based on inter-point distances in the high dimensional and low sample size
setting. Our test statistics include the well-known energy distance and
maximum mean discrepancy with Gaussian and Laplacian kernels, and the
critical values are obtained via permutations. We show that all these tests
are inconsistent when the two high dimensional distributions correspond to
the same marginal distributions but differ in other aspects of the
distributions. The tests based on energy distance and maximum mean
discrepancy are mainly targeting the differences between marginal means and
variances, whereas the test based on L1-distance can capture the difference
in marginal distributions. Our theory sheds new light on the limitation of
inter-point distance based tests, the impact of
different distance metrics, and the behavior of permutation tests in high dimension.
Some simulation results and a real data illustration are also presented to
corroborate our theoretical findings. |
||||