WWW::OpenAI (for ChatGPT and other statistical gimmicks)

In brief

The Raku package “WWW::OpenAI” provides access to the machine learning service OpenAI, [OAI1]. For more details of the OpenAI’s API usage see the documentation, [OAI2].

Remark: To use the OpenAI API one has to register and obtain authorization key.

Remark: This Raku package is much “less ambitious” than the official Python package, [OAIp1], developed by OpenAI’s team. Gradually, over time, I expect to add features to the Raku package that correspond to features of [OAIp1].

The design and implementation of “WWW::OpenAI” are very similar to those of “Lingua::Translation::DeepL”, [AAp1].


Installation

Package installations from both sources use zef installer (which should be bundled with the “standard” Rakudo installation file.)

To install the package from Zef ecosystem use the shell command:

zef install WWW::OpenAI

To install the package from the GitHub repository use the shell command:

zef install https://github.com/antononcube/Raku-WWW-OpenAI.git


Usage examples

Remark: When the authorization key, auth-key, is specified to be Whatever then openai-playground attempts to use the env variable OPENAI_API_KEY.

Basic usage

Here is a simple call:

use WWW::OpenAI;
say openai-playground('Where is Roger Rabbit?');

# [{finish_reason => stop, index => 0, message => {content => 
# 
# As an AI language model, I do not have access to real-time information or location tracking features. Therefore, I cannot provide an accurate answer to the question "Where is Roger Rabbit?" without additional context. However, if you are referring to the fictional character Roger Rabbit, he is a cartoon character created by Disney and is typically found in various media, including films, television shows, and comic books., role => assistant}}]

Another one using Bulgarian:

say openai-playground('Колко групи могат да се намерят в този облак от точки.');

# [{finish_reason => stop, index => 0, message => {content => 
# 
# Като асистент на AI, не мога да видя облак от точки, за да мога да дам точен отговор на този въпрос. Моля, предоставете повече информация или конкретен пример, за да мога да ви помогна., role => assistant}}]


Command Line Interface

The package provides a Command Line Interface (CLI) script:

openai-playground --help

# Usage:
#   openai-playground <text> [-m|--model=<Str>] [-r|--role=<Str>] [-t|--temperature[=Real]] [-a|--auth-key=<Str>] [--timeout[=UInt]] [--format=<Str>] -- Text processing using the OpenAI API.
#   
#     <text>                     Text to be processed.
#     -m|--model=<Str>           Model. [default: 'Whatever']
#     -r|--role=<Str>            Role. [default: 'user']
#     -t|--temperature[=Real]    Temperature. [default: 0.7]
#     -a|--auth-key=<Str>        Authorization key (to use OpenAI API.) [default: 'Whatever']
#     --timeout[=UInt]           Timeout. [default: 10]
#     --format=<Str>             Format of the result; one of "json" or "hash". [default: 'json']

Remark: When the authorization key argument “auth-key” is specified set to “Whatever” then openai-playground attempts to use the env variable OPENAI_API_KEY.


Mermaid diagram

The following flowchart corresponds to the steps in the package function openai-playground:


References

[AAp1] Anton Antonov, Lingua::Translation::DeepL Raku package, (2022), GitHub/antononcube.

[OAI1] OpenAI Platform, OpenAI platform.

[OAI1] OpenAI Platform, OpenAI documentation.

[OAIp1] OpenAI, OpenAI Python Library, (2020), GitHub/openai.

ML::ROCFunctions

This blog post proclaims and describes the Raku package “ML::ROCFunctions”, [AAp0], that facilitates the utilization of Receiver Operating Characteristic (ROC) functions.

The ROC framework is used for analysis and tuning of binary classifiers, [Wk1]. (The classifiers are assumed to classify into a positive/true label or a negative/false label. )

For computational introduction to ROC utilization (in Mathematica) see the article “Basic example of using ROC with Linear regression”, [AA1].

This package has counterparts in Mathematica, Python, and R. See [AAp1, AAp2, AAp3].

The examples below use the packages “Data::Generators”, [AAp4, AA3], “Data::Reshapers”, [AAp5], and “Data::Summarizers”, [AAp6], described in the article “Introduction to data wrangling with Raku”, [AA2].


Installation

Via zef-ecosystem:

zef install ML::ROCFunctions

From GitHub:

zef install https://github.com/antononcube/Raku-ML-ROCFunctions

Usage examples

Properties

Here are some retrieval functions:

use ML::ROCFunctions;
say roc-functions('properties');
# (FunctionInterpretations FunctionNames Functions Methods Properties)
roc-functions('FunctionInterpretations')
# {ACC => accuracy, AUROC => area under the ROC curve, Accuracy => same as ACC, F1 => F1 score, FDR => false discovery rate, FNR => false negative rate, FOR => false omission rate, FPR => false positive rate, MCC => Matthews correlation coefficient, NPV => negative predictive value, PPV => positive predictive value, Precision => same as PPV, Recall => same as TPR, SPC => specificity, Sensitivity => same as TPR, TNR => true negative rate, TPR => true positive rate}
say roc-functions('FPR');
# &FPR

Single ROC record

Definition: A ROC record (ROC-hash or ROC-hash-map) is an object of type Associative that has the keys: “FalseNegative”, “FalsePositive”, “TrueNegative”, “TruePositive”. Here is an example:

{perl6, eval=FALSE} {FalseNegative => 50, FalsePositive => 51, TrueNegative => 60, TruePositive => 39}

Here we generate a random “dataset” with columns “Actual” and “Predicted” that have the values “true” and “false” and show the summary:

use Data::Generators;
use Data::Summarizers;
my @dfRandomLabels = 
        random-tabular-dataset(200, <Actual Predicted>, 
        generators => {Actual => <true false>, 
                       Predicted => <true false>});
records-summary(@dfRandomLabels)
# +--------------+--------------+
# | Predicted    | Actual       |
# +--------------+--------------+
# | true  => 103 | false => 106 |
# | false => 97  | true  => 94  |
# +--------------+--------------+

Here is a sample of the dataset:

use Data::Reshapers;
to-pretty-table(@dfRandomLabels.pick(6))
# +-----------+--------+
# | Predicted | Actual |
# +-----------+--------+
# |   false   | false  |
# |    true   | false  |
# |   false   |  true  |
# |   false   | false  |
# |    true   | false  |
# |   false   |  true  |
# +-----------+--------+

Here we make the corresponding ROC hash-map:

to-roc-hash('true', 'false', @dfRandomLabels.map({$_<Actual>}), @dfRandomLabels.map({$_<Predicted>}))
# {FalseNegative => 49, FalsePositive => 58, TrueNegative => 48, TruePositive => 45}

Multiple ROC records

Here we make random dataset with entries that associated with a certain threshold parameter with three unique values:

my @dfRandomLabels2 = 
        random-tabular-dataset(200, <Threshold Actual Predicted>, 
                generators => {Threshold => (0.2, 0.4, 0.6), 
                               Actual => <true false>, 
                               Predicted => <true false>});
records-summary(@dfRandomLabels2)
# +--------------+-----------------+--------------+
# | Predicted    | Threshold       | Actual       |
# +--------------+-----------------+--------------+
# | true  => 105 | Min    => 0.2   | false => 107 |
# | false => 95  | 1st-Qu => 0.2   | true  => 93  |
# |              | Mean   => 0.402 |              |
# |              | Median => 0.4   |              |
# |              | 3rd-Qu => 0.6   |              |
# |              | Max    => 0.6   |              |
# +--------------+-----------------+--------------+

Remark: Threshold parameters are typically used while tuning Machine Learning (ML) classifiers.

Here we group the rows of the dataset by the unique threshold values:

my %groups = group-by(@dfRandomLabels2, 'Threshold');
records-summary(%groups)
# summary of 0.4 =>
# +---------------+-------------+-------------+
# | Threshold     | Actual      | Predicted   |
# +---------------+-------------+-------------+
# | Min    => 0.4 | false => 37 | true  => 36 |
# | 1st-Qu => 0.4 | true  => 35 | false => 36 |
# | Mean   => 0.4 |             |             |
# | Median => 0.4 |             |             |
# | 3rd-Qu => 0.4 |             |             |
# | Max    => 0.4 |             |             |
# +---------------+-------------+-------------+
# summary of 0.6 =>
# +-------------+---------------+-------------+
# | Actual      | Threshold     | Predicted   |
# +-------------+---------------+-------------+
# | true  => 33 | Min    => 0.6 | false => 33 |
# | false => 32 | 1st-Qu => 0.6 | true  => 32 |
# |             | Mean   => 0.6 |             |
# |             | Median => 0.6 |             |
# |             | 3rd-Qu => 0.6 |             |
# |             | Max    => 0.6 |             |
# +-------------+---------------+-------------+
# summary of 0.2 =>
# +---------------+-------------+-------------+
# | Threshold     | Actual      | Predicted   |
# +---------------+-------------+-------------+
# | Min    => 0.2 | false => 38 | true  => 37 |
# | 1st-Qu => 0.2 | true  => 25 | false => 26 |
# | Mean   => 0.2 |             |             |
# | Median => 0.2 |             |             |
# | 3rd-Qu => 0.2 |             |             |
# | Max    => 0.2 |             |             |
# +---------------+-------------+-------------+

Here we find and print the ROC records (hash-maps) for each unique threshold value:

my @rocs = do for %groups.kv -> $k, $v { 
  to-roc-hash('true', 'false', $v.map({$_<Actual>}), $v.map({$_<Predicted>})) 
}
.say for @rocs;
# {FalseNegative => 19, FalsePositive => 20, TrueNegative => 17, TruePositive => 16}
# {FalseNegative => 15, FalsePositive => 14, TrueNegative => 18, TruePositive => 18}
# {FalseNegative => 11, FalsePositive => 23, TrueNegative => 15, TruePositive => 14}

Application of ROC functions

Here we define a list of ROC functions:

my @funcs = (&PPV, &NPV, &TPR, &ACC, &SPC, &MCC);
# [&PPV &NPV &TPR &ACC &SPC &MCC]

Here we apply each ROC function to each of the ROC records obtained above:

my @rocRes = @rocs.map( -> $r { @funcs.map({ $_.name => $_($r) }).Hash });
say to-pretty-table(@rocRes);
# +----------+-----------+----------+----------+----------+----------+
# |   ACC    |    MCC    |   NPV    |   PPV    |   TPR    |   SPC    |
# +----------+-----------+----------+----------+----------+----------+
# | 0.458333 | -0.083398 | 0.472222 | 0.444444 | 0.457143 | 0.459459 |
# | 0.553846 |  0.107970 | 0.545455 | 0.562500 | 0.545455 | 0.562500 |
# | 0.460317 | -0.045894 | 0.576923 | 0.378378 | 0.560000 | 0.394737 |
# +----------+-----------+----------+----------+----------+----------+

ROC plots

Often classifiers are evaluated using ROC curves of FPR-vs-TPR. Here is a plot made with Mathematica using the Mathematica-to-Raku connection described in [AA4]:

ROC-curver-for-Tries-classifier-over-Titanic-data


References

Articles

[Wk1] Wikipedia entry, “Receiver operating characteristic”.

[AA1] Anton Antonov, “Basic example of using ROC with Linear regression”, (2016), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, “Introduction to data wrangling with Raku”, (2021), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “Data::Reshapers”, (2022), RakuForPrediction at WordPress.

[AA4] Anton Antonov, “Connecting Raku to Mathematica”, (2021), RakuForPrediction-book at GitHub.

Packages

[AAp0] Anton Antonov, ML::ROCFunctions Raku package, (2022), GitHub/antononcube.

[AAp1] Anton Antonov, ROCFunctions Mathematica package, (2016-2022), MathematicaForPrediction at GitHub/antononcube.

[AAp2] Anton Antonov, ROCFunctions Python package, (2022), Python-packages at GitHub/antononcube.

[AAp3] Anton Antonov, ROCFunctions R package, (2021), R-packages at GitHub/antononcube.

[AAp4] Anton Antonov, Data::Generators Raku package, (2021), GitHub/antononcube.

[AAp5] Anton Antonov, Data::Reshapers Raku package, (2021), GitHub/antononcube.

[AAp6] Anton Antonov, Data::Summarizers Raku package, (2021), GitHub/antononcube.