Introduction
This blog post proclaims and describes the Raku package Data::Generators, [AAp0], that has functions for generating random strings, words, pet names, vectors, arrays, and (tabular) datasets.
Motivation
The primary motivation for this package is to have simple, intuitively named functions for generating random vectors (lists) and datasets of different objects.
Although, Raku has a fairly good support of random vector generation, it is assumed that commands like the following are easier to use:
ay random-string(6, chars => 4, ranges => [ <y n Y N>, "0".."9" ] ).raku;
The function random-tabular-dataset
of this package – and the package “Data::ExampleDatasets”, [AAp2] – made easier and more complete the development and testing of “Data::Resphapers”, [AAp1, AA1].
Neat example
Here is an example that showcases all functions in this package:
use Data::Generators;
use Data::Reshapers;
random-tabular-dataset(12, <String Real Word PetName JobTitle>,
generators => {String => &random-string,
Real => {random-real(12,$_)},
Word => &random-word,
PetName => &random-pet-name,
JobTitle => &random-pretentious-job-title})
==> to-pretty-table
# +-------------+------------------------------------+-----------+--------------+--------------------+
# | Word | JobTitle | Real | PetName | String |
# +-------------+------------------------------------+-----------+--------------+--------------------+
# | patient | Regional Operations Specialist | 8.385052 | Dollly | 23xdiA2mqVohmDHMy |
# | subpoena | Legacy Intranet Technician | 0.958287 | Lotus Morris | cc0Xri |
# | mandala | Interactive Web Executive | 7.985123 | Tuxy | U3E |
# | abreaction | Human Directives Designer | 4.548451 | Guinness | 5GKP6cZG4YEgM |
# | Capra | Principal Security Technician | 3.281746 | Guinness | 0KKshx6gaib9 |
# | lanky | Principal Team Synergist | 10.157400 | Tori | 3NKTjHT |
# | instillment | National Implementation Strategist | 2.745015 | Guinness | NfHbJBGat34GYU1 |
# | atheromatic | Customer Accountability Assistant | 8.007973 | Piper | WW72 |
# | Fennic | Chief Infrastructure Designer | 2.059512 | Tenzin | GN0kf2EErAgKqHn2Y |
# | horniness | Future Configuration Assistant | 6.366264 | Max | Bbn |
# | infuriation | Global Program Coordinator | 10.330985 | Ariel | DpETKDXe3LFJrygx2D |
# | OPV | Relational Usability Developer | 11.454449 | Guinness | i7QlzkS1LNg84nKl6k |
# +-------------+------------------------------------+-----------+--------------+--------------------+
Random strings
The function random-string
generates random strings.
Here is a random string:
use Data::Generators;
random-string
# wNK
Here we generate a vector of random strings with length 4 and characters that belong to specified ranges:
say random-string(6, chars => 4, ranges => [ <y n Y N>, "0".."9" ] ).raku;
# ("9712", "y367", "6YY3", "Y5NY", "436n", "5113")
Random words
The function random-word
generates random words.
Here is a random word:
random-word
# dog-tired
Here we generate a list with 12 random words:
random-word(12)
# (reprehensible Bos diatomite humanlike vacuity headdress north sustenance anuran catheterization lambaste estimator)
Here we generate a table of random words of different types:
use Data::Reshapers;
my @dfWords = do for <Any Common Known Stop> -> $wt { $wt => random-word(6, type => $wt) };
say to-pretty-table(@dfWords);
# +--------+----------------+------------+-----------+----------+--------------+-----------+
# | | 2 | 3 | 0 | 5 | 4 | 1 |
# +--------+----------------+------------+-----------+----------+--------------+-----------+
# | Any | jilt | quitter | lockup | helmeted | entrenchment | Acrilan |
# | Common | interpretative | presumable | poisonous | riffle | betwixt | parochial |
# | Known | half-length | where'er | crinion | draped | Pelargonium | tribute |
# | Stop | he | has | last | mostly | she | V |
# +--------+----------------+------------+-----------+----------+--------------+-----------+
Remark: Whatever
can be used instead of
'Any'
.
Remark: The function to-pretty-table
is from the package Data::Reshapers.
Random pet names
The function random-pet-name
generates random pet names.
The pet names are taken from publicly available data of pet license registrations in the years 2015–2020 in Seattle, WA, USA. See [DG1].
Here is a random pet name:
random-pet-name
# Elizabeth Montgomery
The following command generates a list of six random pet names:
srand(32);
random-pet-name(6).raku
# ("Guinness", "Professor Nibblesworth", "Piper", "Guinness", "Hazel", "Finn")
The named argument species
can be used to specify specie of the random pet names. (According to the specie-name relationships in [DG1].)
Here we generate a table of random pet names of different species:
my @dfPetNames = do for <Any Cat Dog Goat Pig> -> $wt { $wt => random-pet-name(6, species => $wt) };
say to-pretty-table(@dfPetNames);
# +------+----------+------------------+----------+---------+--------------------+----------+
# | | 3 | 0 | 5 | 4 | 2 | 1 |
# +------+----------+------------------+----------+---------+--------------------+----------+
# | Any | Buscuit | Kimchi | Guinness | Atticus | Owen VanderMittens | Kimchi |
# | Cat | Data | Scilla | Dax | Sisu | Chai Son | Dinosaur |
# | Dog | Judo | Hai Hai | Fatso | Reso | Zita | Sounder |
# | Goat | Margot | Sister Bertrille | Pepina | Beans | Pegasis | Estelle |
# | Pig | Guinness | Millie | Guinness | Atticus | Guinness | Atticus |
# +------+----------+------------------+----------+---------+--------------------+----------+
Remark: Whatever
can be used instead of 'Any'
.
The named argument (adverb) weighted
can be used to specify random pet name choice based on known real-life number of occurrences:
srand(32);
say random-pet-name(6, :weighted).raku
# ("Arya", "Millie", "Tica", "Professor Nibblesworth", "Olive", "Peanut")
The weights used correspond to the counts from [DG1].
Remark: The implementation of random-pet-name
is based on the Mathematica implementation RandomPetName
, [AAf1].
Random pretentious job titles
The function random-pretentious-job-title
generates random pretentious job titles.
Here is a random pretentious job title:
random-pretentious-job-title
# Relational Optimization Analyst
The following command generates a list of six random pretentious job titles:
random-pretentious-job-title(6).raku
# ("National Factors Engineer", "Chief Quality Designer", "Dynamic Tactics Architect", "Regional Identity Engineer", "Legacy Mobility Analyst", "Lead Group Facilitator")
The named argument number-of-words
can be used to control the number of words in the generated job titles.
The named argument language
can be used to control in which language the generated job titles are in. At this point, only Bulgarian and English are supported.
Here we generate pretentious job titles using different languages and number of words per title:
my $res = random-pretentious-job-title(12, number-of-words => Whatever, language => Whatever);
say to-pretty-table($res.rotor(3));
# +------------------------------+--------------------------------+------------------------+
# | 0 | 1 | 2 |
# +------------------------------+--------------------------------+------------------------+
# | Directives Director | Customer Impact Orchestrator | Изпълнител |
# | Администратор по Идентичност | Наследствен Продуцент по Данни | Manager |
# | Стратег | Централен Асистент на Програми | Представител по Пазари |
# | Infrastructure Administrator | Директор на Показатели | Продуцент по Качество |
# +------------------------------+--------------------------------+------------------------+
Remark: Whatever
can be used as values for the named arguments number-of-words
and language
.
Remark: The implementation uses the job title phrases in https://www.bullshitjob.com . It is, more-or-less, based on the Mathematica implementation RandomPretentiousJobTitle
, [AAf2].
Random reals
This module provides the function random-real
that can be used to generate lists of real numbers using the uniform distribution.
Here is a random real:
say random-real();
# 0.4802730218140824
Here is a random real between 0 and 20:
say random-real(20);
# 16.551017127142547
Here are six random reals between -2 and 12:
say random-real([-2,12], 6);
# (10.860038992053804 8.870681165460763 1.8612006267010126 6.575438206723781 7.519498832129216 11.977501199028579)
Here is a 4-by-3 array of random reals between -3 and 3:
say random-real([-3,3], [4,3]);
# [[-1.3846938458919782 -0.605344400722295 2.2357479203127255]
# [0.278595195645174 2.758777563339059 -2.9181888559528977]
# [-0.4885047451940778 -1.5054832770870736 2.8380077665645045]
# [1.5546501176218097 -2.5686794378128166 -1.6379792732766252]]
Remark: The signature design follows Mathematica’s function RandomReal
.
Random variates
This module provides the function random-variate
that can be used to generate lists of real numbers using distribution specifications.
Here are examples:
say random-variate(NormalDistribution.new(:mean(10), :sd(20)), 5);
# (12.184147503254014 -27.658666484011263 -3.646985931508887 -1.5511925051960098 34.2779407991513)
say random-variate(NormalDistribution.new( µ => 10, σ => 20), 5);
# (59.67722740301401 -4.880821000242969 24.241271309947653 -7.357817156195406 -14.956902533655022)
say random-variate(UniformDistribution.new(:min(2), :max(60)), 5);
# (17.194175133749432 23.15167681651898 56.6173691500257 27.80856840150481 12.017787851638628)
Remark: Only Normal distribution and Uniform distribution are implemented at this point.
Remark: The signature design follows Mathematica’s function RandomVariate
.
Here is an example of 2D array generation:
say random-variate(NormalDistribution.new, [3,4]);
# [[-0.5885759132871321 1.0718919022565259 0.5680526996535528 0.26989969624621957]
# [1.5020489325130406 -2.5505806920314713 0.08275668090525383 0.4534963073944675]
# [1.6370825945664167 2.2703387469727234 0.7333457734376895 -0.4230187677033482]]
Random tabular datasets
The function random-tabular-dataset
can be used generate tabular datasets.
Remark: In this module a dataset is (usually) an array of arrays of pairs. The dataset data structure resembles closely Mathematica’s data structure Dataset
, [WRI2].
Remark: The programming languages R and S have a data structure called “data frame” that corresponds to dataset. (In the Python world the package pandas
provides data frames.) Data frames, though, are column-centric, not row-centric as datasets. For example, data frames do not allow a column to have elements of heterogeneous types.
Here are basic calls:
{perl6, eval=FALSE} random-tabular-dataset(); random-tabular-dataset(Whatever):row-names; random-tabular-dataset(Whatever, Whatever); random-tabular-dataset(12, 4); random-tabular-dataset(Whatever, 4); random-tabular-dataset(Whatever, <Col1 Col2 Col3>):!row-names;
Here is example of a generated tabular dataset that column names that are cat pet names:
my @dfRand = random-tabular-dataset(5, 3, column-names-generator => { random-pet-name($_, species => 'Cat') });
say to-pretty-table(@dfRand);
# +-----------+--------------+---------------------+
# | Kaya | Miss Bangkok | Zimba |
# +-----------+--------------+---------------------+
# | 21.831185 | tutelar | 4rfg |
# | -1.950408 | churchyard | FlkByRk2aDZsF62VH91 |
# | 7.361637 | syllabize | 3QAuxEiYPCdh9IW |
# | 7.999150 | azurite | WIvf4MJ7OFk |
# | 10.155547 | bilimbi | 95IVTlR8VMp2S |
# +-----------+--------------+---------------------+
The display function to-pretty-table
is from Data::Reshapers
.
Remark: At this point only wide format datasets are generated. (The long format implementation is high in my TOOD list.)
Remark: The signature design and implementation are based on the Mathematica implementation RandomTabularDataset
, [AAf3].
References
Articles
[AA1] Anton Antonov, “Introduction to Data Wrangling with Raku”, (2021), RakuForPrediction at WordPress.
[AA2] Anton Antonov, “Pets licensing data analysis”, (2020), MathematicaForPrediction at WordPress.
Data repositories
[DG1] Data.Gov, Seattle Pet Licenses, catalog.data.gov.
Functions
[AAf1] Anton Antonov, RandomPetName, (2021), Wolfram Function Repository.
[AAf2] Anton Antonov, RandomPretentiousJobTitle, (2021), Wolfram Function Repository.
[AAf3] Anton Antonov, RandomTabularDataset, (2021), Wolfram Function Repository.
[SHf1] Sander Huisman, RandomString, (2021), Wolfram Function Repository.
[WRI1] Wolfram Research (2010), RandomVariate, Wolfram Language function.
[WRI2] Wolfram Research (2014), Dataset, Wolfram Language function.
Packages
[AAp0] Anton Antonov, Data::Generators Raku package, (2021), GitHub/antononcube.
[AAp1] Anton Antonov, Data::Reshapers Raku package, (2021), GitHub/antononcube.
[AAp2] Anton Antonov, Data::ExampleDatasets Raku package, (2021), GitHub/antononcube.
3 thoughts on “Data::Generators”