Skip to content

Commit

Permalink
details
Browse files Browse the repository at this point in the history
remove rho=NULL in pk.test
add best practice
add cases in which use pk.test
  • Loading branch information
giovsaraceno committed Sep 26, 2024
1 parent fe46e64 commit 29cc5da
Show file tree
Hide file tree
Showing 33 changed files with 226 additions and 209 deletions.
12 changes: 6 additions & 6 deletions R/QuadratiK-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,17 @@
#' - **Goodness-of-Fit Tests**: The software implements one, two, and
#' *k*-sample tests for goodness of fit, offering an efficient and
#' mathematically sound way to assess the fit of probability distributions.
#' Expanded capabilities include supporting tests for uniformity on the
#' *d*-dimensional Sphere based on Poisson kernel densities. Our tests are
#' Our tests are
#' particularly useful for large, high dimensional data sets where the
#' assessment of fit of probability models is of interest. Specifically, we
#' offer tests for normality, as well as two- and *k*-sample tests, where
#' testing equality of two or more distributions is of interest, that is
#' \eqn{H_0: F_1 = F_2} and \eqn{H_0: F_1 = \ldots = F_k} respectively.
#' The proposed tests perform well in terms of level and power for contiguous
#' alternatives, heavy tailed distributions and in higher dimensions.
#' alternatives, heavy tailed distributions and in higher dimensions. \cr
#' Expanded capabilities include supporting tests for uniformity on the
#' *d*-dimensional Sphere based on the Poisson kernel, exhibiting excellent
#' results especially in the case of multimodal distributions.
#' - **Poisson kernel-based distribution (PKBD)**: the package offers
#' functions for computing the density value and for generating random samples
#' from a PKBD. The Poisson kernel-based densities are based on the normalized
Expand All @@ -42,8 +44,6 @@
#' algorithm leverages a mixture of Poisson kernel-based densities on the
#' Sphere, enabling effective clustering of spherical data or data that has
#' been spherically transformed.
#' The package also provides the functions for density evaluation and random
#' sampling from the Poisson kernel-based distribution.
#' - **Additional Features**: Alongside these functionalities, the software
#' includes additional graphical functions, aiding users in validating and
#' representing the cluster results as well as enhancing the interpretability
Expand All @@ -64,7 +64,7 @@
#' @author
#' Giovanni Saraceno, Marianthi Markatou, Raktim Mukhopadhyay, Mojgan Golzy
#'
#' Mantainer: Giovanni Saraceno \email{[email protected]}
#' Maintainer: Giovanni Saraceno \email{[email protected]}
#'
#'
#' @references
Expand Down
8 changes: 4 additions & 4 deletions R/kb.test.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
#' @details
#' The function \code{kb.test} performs the kernel-based quadratic
#' distance tests using the Gaussian kernel with bandwidth parameter \code{h}.
#' Depending on the shape of the input \code{y} the function performs the tests
#' Depending on the shape of the input \code{y} the function performs the tests
#' of multivariate normality, the non-parametric two-sample tests or the
#' k-sample tests.
#'
Expand Down Expand Up @@ -214,7 +214,7 @@
#' Goodness-of-Fit Tests.” \cr
#' https://doi.org/10.48550/arXiv.2407.16374
#'
#' Lindsay, B.G., Markatou, M. and Ray, S. (2014) "Kernels, Degrees of Freedom,
#' Lindsay, B.G., Markatou, M. and Ray, S. (2014) "Kernels, Degrees of Freedom,
#' and Power Properties of Quadratic Distance Goodness-of-Fit Tests", Journal
#' of the American Statistical Association, 109:505, 395-410,
#' DOI: 10.1080/01621459.2013.836972
Expand Down Expand Up @@ -280,7 +280,7 @@ setMethod("kb.test", signature(x = "ANY"),
'subsampling'")
}
if(b<=0 | b>1){
stop("b indicates the proportion used for the subsamples in the
stop("b indicates the proportion used for the subsamples in the
subsampling algoritm. It must be in (0,1].")
}

Expand Down Expand Up @@ -329,7 +329,7 @@ setMethod("kb.test", signature(x = "ANY"),

#stop("A value of the tuning parameter h must be provided to
#perform the kernel-based quadratic distance Normality tests")
h_best <- select_h(x=x, alternative=alternative, method=method,
h_best <- select_h(x=x, alternative=alternative, method=method,
b=b, B=B, power.plot=FALSE)
h <- h_best$h_sel

Expand Down
17 changes: 10 additions & 7 deletions R/pk.test.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
#' @description
#' This function performs the kernel-based quadratic distance goodness-of-fit
#' tests for Uniformity for spherical data \code{x} using the Poisson kernel
#' with concentration parameter \code{rho}.
#' with concentration parameter \code{rho}. \cr
#' The Poisson kernel-based test for uniformity exhibits excellent results
#' especially in the case of multimodal distributions, as shown in the example
#' of the \href{../doc/uniformity.html}{Uniformity test on the Sphere vignette}.
#'
#' @details
#' Let \eqn{x_1, x_2, ..., x_n} be a random sample with empirical distribution
Expand Down Expand Up @@ -79,16 +82,16 @@
#' asymptotic distribution.
#' \item \code{H0_Vn} A logical value indicating whether or not the null
#' hypothesis is rejected according to Vn.
#' \item \code{rho} The value of concentration parameter used for the Poisson
#' \item \code{rho} The value of concentration parameter used for the Poisson
#' kernel function.
#' \item \code{B} Number of replications for the critical value of the
#' U-statistic Un.
#'}
#'
#'
#' @references
#' Ding, Y., Markatou, M. and Saraceno, G. (2023). “Poisson Kernel-Based Tests for
#' Uniformity on the d-Dimensional Sphere.” Statistica Sinica.
#' Ding, Y., Markatou, M. and Saraceno, G. (2023). “Poisson Kernel-Based Tests
#' for Uniformity on the d-Dimensional Sphere.” Statistica Sinica.
#' doi:10.5705/ss.202022.0347
#'
#' @examples
Expand All @@ -107,7 +110,7 @@
#' @srrstats {G1.4} roxigen2 is used
#'
#' @export
setGeneric("pk.test",function(x, rho = NULL, B = 300, Quantile = 0.95){
setGeneric("pk.test",function(x, rho, B = 300, Quantile = 0.95){
standardGeneric("pk.test")
})
#' @rdname pk.test
Expand All @@ -126,7 +129,7 @@ setMethod("pk.test", signature(x = "ANY"),
} else if(is.data.frame(x)) {
x <- as.matrix(x)
} else if(!is.matrix(x)){
stop("x must be a matrix or a data.frame with dimension greater
stop("x must be a matrix or a data.frame with dimension greater
than 1.")
}
if(any(is.na(x))){
Expand Down Expand Up @@ -249,7 +252,7 @@ setMethod("summary", "pk.test", function(object) {
unif_data <- runif(nrow(dat_x),-1,1)
probs <- seq(0, 1, length.out = nrow(dat_x))
# qq_df <- data.frame(
# x = quantile(unif_data, probs = seq(0, 1, length.out = nrow(dat_x))),
# x = quantile(unif_data, probs = seq(0, 1, length.out = nrow(dat_x))),
# sample_quantiles = quantile(dat_x[,i], probs = probs))
x <- quantile(unif_data, probs = seq(0, 1, length.out = nrow(dat_x)))
sample_quantiles <- quantile(dat_x[,i], probs = probs)
Expand Down
18 changes: 9 additions & 9 deletions R/pkbd_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ dpkb <- function(x, mu, rho, logdens = FALSE) {
#' message asking the user to install the missing package(s).
#'
#' @references
#' Golzy, M. and Markatou, M. (2020) Poisson Kernel-Based Clustering on the Sphere:
#' Convergence Properties, Identifiability, and a Method of Sampling, Journal of
#' Computational and Graphical Statistics, 29:4, 758-770,
#' Golzy, M. and Markatou, M. (2020) Poisson Kernel-Based Clustering on the
#' Sphere: Convergence Properties, Identifiability, and a Method of Sampling,
#' Journal of Computational and Graphical Statistics, 29:4, 758-770,
#' DOI: 10.1080/10618600.2020.1740713.
#'
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' distribution", Electronic Journal of Statistics, 17(2), 2180-2209.
#'
#' @srrstats {G1.0} Reference section reports the related literature.
Expand Down Expand Up @@ -239,9 +239,9 @@ rpkb <- function(n, mu, rho, method = 'rejacg',
#' @param p dimension.
#'
#' @references
#' Golzy, M. and Markatou, M. (2020) Poisson Kernel-Based Clustering on the Sphere:
#' Convergence Properties, Identifiability, and a Method of Sampling, Journal of
#' Computational and Graphical Statistics, 29:4, 758-770,
#' Golzy, M. and Markatou, M. (2020) Poisson Kernel-Based Clustering on the
#' Sphere: Convergence Properties, Identifiability, and a Method of Sampling,
#' Journal of Computational and Graphical Statistics, 29:4, 758-770,
#' DOI: 10.1080/10618600.2020.1740713.
#'
#' @importFrom stats runif
Expand Down Expand Up @@ -282,7 +282,7 @@ rejvmf <- function(n, rho, mu, p) {
#' @param p dimension.
#'
#' @references
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' distribution", Electronic Journal of Statistics, 17(2), 2180-2209.
#'
#' @importFrom stats runif
Expand Down Expand Up @@ -350,7 +350,7 @@ rejacg <- function(n, rho, mu, p, tol.eps, max.iter){
#' @param p dimension.
#'
#' @references
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' Sablica L., Hornik K. and Leydold J. (2023) "Efficient sampling from the PKBD
#' distribution", Electronic Journal of Statistics, 17(2), 2180-2209.
#'
#'
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,16 @@

`QuadratiK` provides the first implementation, in R and Python, of a comprehensive set of goodness-of-fit tests and a clustering technique for $d$-dimensional spherical data $d \ge 2$ using kernel-based quadratic distances. It includes:

- **Goodness-of-Fit Tests**: The software implements one, two, and *k*-sample tests for goodness of fit, offering an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities include supporting tests for uniformity on the *d*-dimensional Sphere based on Poisson kernel densities. Our tests are particularly useful for large, high dimensional data sets where the assessment of fit of probability models is of interest. Specifically, we offer tests for normality, as well as two- and *k*-sample tests, where testing equality of two or more distributions is of interest, that is $H_0: F_1 = F_2$ and $H_0: F_1 = \ldots = F_k$ respectively. The proposed tests perform well in terms of level and power for contiguous alternatives, heavy tailed distributions and in higher dimensions.
- **Goodness-of-Fit Tests**: The software implements one, two, and *k*-sample tests for goodness of fit, offering an efficient and mathematically sound way to assess the fit of probability distributions. Our tests are particularly useful for large, high dimensional data sets where the assessment of fit of probability models is of interest. Specifically, we offer tests for multivariate normality, as well as two- and *k*-sample tests, where testing equality of two or more distributions is of interest, that is $H_0: F_1 = F_2$ and $H_0: F_1 = \ldots = F_k$ respectively. The proposed tests perform well in terms of level and power for contiguous alternatives, heavy tailed distributions and in higher dimensions.\
Expanded capabilities include supporting tests for uniformity on the *d*-dimensional Sphere based on the Poisson kernel, exhibiting excellent results especially in the case of multimodal distributions.

- **Poisson kernel-based distribution (PKBD)**: the package offers functions for computing the density value and for generating random samples from a PKBD. The Poisson kernel-based densities are based on the normalized Poisson kernel and are defined on the $d$-dimensional unit sphere. Given a vector $\mu \in \mathcal{S}^{d-1}$, and a parameter $\rho$ such that $0 < \rho < 1$, the probability density function of a $d$-variate Poisson kernel-based density is defined by: $$f(\mathbf{x}|\rho, \mathbf{\mu}) = \frac{1-\rho^2}{\omega_d ||\mathbf{x} - \rho \mathbf{\mu}||^d},$$ where $\mu$ is a vector orienting the center of the distribution, $\rho$ is a parameter to control the concentration of the distribution around the vector $\mu$ and it is related to the variance of the distribution. Furthermore, $\omega_d = 2\pi^{d/2} [\Gamma(d/2)]^{-1}$ is the surface area of the unit sphere in $\mathbb{R}^d$ (see Golzy and Markatou, 2020).

- **Clustering Algorithm for Spherical Data**: the package incorporates a unique clustering algorithm specifically tailored for $d$-dimensional spherical data and it is especially useful in the presence of noise in the data and the presence of non-negligible overlap between clusters. This algorithm leverages a mixture of Poisson kernel-based densities on the $d$-dimensional Sphere, enabling effective clustering of spherical data or data that has been spherically transformed. The package also provides the functions for density evaluation and random sampling from the Poisson kernel-based distribution.
- **Clustering Algorithm for Spherical Data**: the package incorporates a unique clustering algorithm specifically tailored for $d$-dimensional spherical data and it is especially useful in the presence of noise in the data and the presence of non-negligible overlap between clusters. This algorithm leverages a mixture of Poisson kernel-based densities on the $d$-dimensional Sphere, enabling effective clustering of spherical data or data that has been spherically transformed.

- **Additional Features**: Alongside these functionalities, the software includes additional graphical functions, aiding users in validating and representing the cluster results as well as enhancing the interpretability and usability of the analysis.

For an introduction to `QuadratiK` see the vignette [Introduction to the QuadratiK Package](https://giovsaraceno.github.io/QuadratiK-package/articles/Introduction.html).
For an introduction to the usage of `QuadratiK` see the vignette [Introduction to the QuadratiK Package](https://giovsaraceno.github.io/QuadratiK-package/articles/Introduction.html).

## Installation

Expand All @@ -40,7 +41,7 @@ The `QuadratiK` package is also available in Python on PyPI <https://pypi.org/pr
## Authors

Giovanni Saraceno, Marianthi Markatou, Raktim Mukhopadhyay, Mojgan Golzy\
Mantainer: Giovanni Saraceno \<[gsaracen\@buffalo.edu](mailto:[email protected])\>
Maintainer: Giovanni Saraceno \<[gsaracen\@buffalo.edu](mailto:[email protected])\>

## Citation

Expand Down
Loading

0 comments on commit 29cc5da

Please sign in to comment.