Skip to contents

A synthetic dataset comprised of two clusters (1:500, 501:1000), where X1 values overlap for both clusters, X2 values do not overlap for both clusters, X3 (categorical) is distinct between both clusters, and X4 (categorical) has 4 levels and each cluster has two levels. Both X2/X3/X4 are informative of the cluster (not X1).

Usage

dataset_5

Format

`dataset_5` A data frame with 1,000 rows and 3 columns:

X1

Continuous variable, mean = 8, sd = 1

X2

Continuous variable, mean = 2, sd = 1; mean = 9, sd = 1

X3

Categorical variable, levels = A or B, distinct for each cluster

X4

Categorical variable, levels = 1/2 or 3/4, distinct for each cluster

Source

Synthetic