WWW::Gemini

This blog post proclaims the Raku package “WWW::Gemini” for connecting with Google’s Large Language Models (LLMs) service Gemini. It is based on Gemini’s API documentation.

The design and implementation of the package closely follows those of “WWW::PaLM”, [AAp1], and “WWW::OpenAI”, [AAp2].

Installation

From Zef ecosystem:

zef install WWW::Gemini

From GitHub:

zef install https://github.com/antononcube/Raku-WWW-Gemini

Usage examples

Show models:

use WWW::Gemini;

gemini-models()

# (models/chat-bison-001 models/text-bison-001 models/embedding-gecko-001 models/gemini-1.0-pro models/gemini-1.0-pro-001 models/gemini-1.0-pro-latest models/gemini-1.0-pro-vision-latest models/gemini-pro models/gemini-pro-vision models/embedding-001 models/aqa)

Show text generation:

.say for gemini-generate-content('what is the population in Brazil?', format => 'values');

# 215,351,056 (2023 est.)

Using a synonym function:

.say for gemini-generation('Who wrote the book "Dune"?');

# {candidates => [{content => {parts => [{text => Frank Herbert}], role => model}, finishReason => STOP, index => 0, safetyRatings => [{category => HARM_CATEGORY_SEXUALLY_EXPLICIT, probability => NEGLIGIBLE} {category => HARM_CATEGORY_HATE_SPEECH, probability => NEGLIGIBLE} {category => HARM_CATEGORY_HARASSMENT, probability => NEGLIGIBLE} {category => HARM_CATEGORY_DANGEROUS_CONTENT, probability => NEGLIGIBLE}]}], promptFeedback => {safetyRatings => [{category => HARM_CATEGORY_SEXUALLY_EXPLICIT, probability => NEGLIGIBLE} {category => HARM_CATEGORY_HATE_SPEECH, probability => NEGLIGIBLE} {category => HARM_CATEGORY_HARASSMENT, probability => NEGLIGIBLE} {category => HARM_CATEGORY_DANGEROUS_CONTENT, probability => NEGLIGIBLE}]}}

Embeddings

Show text embeddings:

use Data::TypeSystem;

my @vecs = gemini-embed-content(["say something nice!",
                            "shout something bad!",
                            "where is the best coffee made?"],
        format => 'values');

say "Shape: ", deduce-type(@vecs);
.say for @vecs;

# Shape: Vector(Vector((Any), 768), 3)
# [-0.031044435 -0.01293638 0.008904989 ...]
# [0.015647776 -0.03143455 -0.040937748  ...]
# [-0.01054143 -0.03587494 -0.013359126  ...]

Counting tokens

Here we show how to find the number of tokens in a text:

my $text = q:to/END/;
AI has made surprising successes but cannot solve all scientific problems due to computational irreducibility.
END

gemini-count-tokens($text, format => 'values');

# 20

Vision

If the function gemini-completion is given a list of images, textual results corresponding to those images is returned. The argument “images” is a list of image URLs, image file names, or image Base64 representations. (Any combination of those element types.)

Consider this this image:

Here is an image recognition invocation:

my $fname = $*CWD ~ '/resources/ThreeHunters.jpg';
my @images = [$fname,];
say gemini-generation("Give concise descriptions of the images.", :@images, format => 'values');

# The image shows a family of raccoons in a tree. The mother raccoon is watching over her two cubs. The cubs are playing with each other. There are butterflies flying around the tree. The leaves on the tree are turning brown and orange.

When a file name is given to the argument “images” of gemini-completion then the function encode-image of “Image::Markup::Utilities”, [AAp4], is applied to it.

Command Line Interface

Maker suite access

The package provides a Command Line Interface (CLI) script:

gemini-prompt --help

# Usage:
#   gemini-prompt [<words> ...] [--path=<Str>] [-n[=UInt]] [--mt|--max-output-tokens[=UInt]] [-m|--model=<Str>] [-t|--temperature[=Real]] [-a|--auth-key=<Str>] [--timeout[=UInt]] [-f|--format=<Str>] [--method=<Str>] -- Command given as a sequence of words.
#   
#     --path=<Str>                       Path, one of 'generateContent', 'embedContent', 'countTokens', or 'models'. [default: 'generateContent']
#     -n[=UInt]                          Number of completions or generations. [default: 1]
#     --mt|--max-output-tokens[=UInt]    The maximum number of tokens to generate in the completion. [default: 100]
#     -m|--model=<Str>                   Model. [default: 'Whatever']
#     -t|--temperature[=Real]            Temperature. [default: 0.7]
#     -a|--auth-key=<Str>                Authorization key (to use Gemini API.) [default: 'Whatever']
#     --timeout[=UInt]                   Timeout. [default: 10]
#     -f|--format=<Str>                  Format of the result; one of "json", "hash", "values", or "Whatever". [default: 'values']
#     --method=<Str>                     Method for the HTTP POST query; one of "tiny" or "curl". [default: 'tiny']

Remark: When the authorization key argument “auth-key” is specified set to “Whatever” then gemini-prompt attempts to use one of the env variables GEMINI_API_KEY or PALM_API_KEY.

Mermaid diagram

The following flowchart corresponds to the steps in the package function gemini-prompt:

References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPredictions at WordPress.

[AA2] Anton Antonov, “Number guessing games: Gemini vs ChatGPT” (2023), RakuForPredictions at WordPress.

[ZG1] Zoubin Ghahramani, “Introducing Gemini 2”, (2023), Google Official Blog on AI.

Packages, platforms

[AAp1] Anton Antonov, WWW::PaLM Raku package, (2023-2024), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023-2024), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Functions Raku package, (2023-2024), GitHub/antononcube.

[AAp4] Anton Antonov, Image::Markup::Utilities Raku package, (2023-2024), GitHub/antononcube.

[AAp5] Anton Antonov, ML::FindTextualAnswer Raku package, (2023-2024), GitHub/antononcube.

Heatmap plots over LLM scraped data

Introduction

In this document we show the use of Artificial Intelligence (AI) Vision and Large Language Models (LLMs) for data scraping from images and Web pages and we present heatmap plots corresponding to the scraped data.

The LLM utilization and visualization are done in chat-enabled Jupyter notebook with a Raku kernel; “chatbook” for short. See “Jupyter::Chatbook” , [AAp4, AAv3].

The heatmap plots in the Jupyter notebook are done with the package “JavaScript::D3” , [AAp8, AAv1, AAv2]. (Heatmap plots were recently implemented.)

We use data from sites dedicated of tracking Russian and Ukrainian casualties in NATO’s war in Ukraine, (2022-present):

Russian casualties: Mediazona , [MZ1]
Ukrainian casualties: UALosses , [UAL1]

Remark: Note that UALosses is relatively new, hence, it provides too few records of Ukrainian losses. The casualties of Medizona and UALosses should be considered underestimates because of the methodologies they use. (Tracking and verifying online records.) See:

Section “Our methods” of Mediazona
Page “About the project” of UALosses

This document is a complement to the document “Extracting Russian casualties in Ukraine data from Mediazona publications” , [AA4], and it uses AI Vision and LLM functionalities described in [AA1-AA3].

Outline

Here is an outline of the workflow steps shown below:

Setup (packages and JavaScript plotting)
Get a screenshot of Russian casualties heatmap plot from Mediazona , [MZ1]
Using OpenAI’s AI Vision extract data from the screenshot
Verify and manually adjust the obtained data
Make a heatmap plot
Verify the plotted data
Download page with regional Ukrainian casualties from UALosses , [UAL1]
Use LLM to obtain the tabular data for those casualties
Adjust, translate, or match names of regions (via LLMs)
Make the corresponding heatmap plot
Observations and conclusions

Setup

In this section we load the necessary packages and setup chatbook’s environment for Javascript plots.

use JSON::Fast;
use HTTP::Tiny;

use Data::Generators;
use Data::Reshapers;
use Data::Summarizers;

use JavaScript::D3;

%% javascript
require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

Heatmap screenshot import

Here we get a screenshot image from [AAr1, MZ1] of the Russian casualties and we import it into the chatbook’s session:

#% markdown
my $url = 'https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/War-modeling/Diagrams/Mediazona-Russian-casualties-choropleth-upto-2024-01-19.png';
my $img = image-import($url);

Remark: The function image-import is from the package “Image::Markup::Utilities” , [AAp5], which is automatically loaded in a notebook session of “Jupyter::Chatbook” , [AAp4].

Data extraction via AI Vision

Here we extract the data from imported screenshot using the function llm-vision-synthesize , [AAp1, AAp2, AA3]:

llm-vision-synthesize("Give row-wise the Russian states and numbers in the squares of the choropleth.", $url, max-tokens => 2000)

The image contains a lot of text and numbers, which appear to be related to Russian casualties in Ukraine, categorized by different Russian states. I'll provide the information row-wise as requested:

First row:
- StPete: 480
- Moscow: 482
- Klng: 470

Second row:
- Murman: 256
- Karel: 295
- Len Obl: 428
- Novgrd: 246
- Vlgda: 413
- Arkngl: 520
- Komi: 414
- Hant-Ma: 407
- Tyumen: 259
- Tomsk: 904
- Kuzbas: 259
- Irkut: 844
- Yakut: 387
- Khabar: 196
- Chukot: 44
- Kmchtk: 126

Third row:
- Pskov: 328
- Tver: 446
- Yarslv: 262
- Ivonvo: 335
- Kstrma: 242
- Mari-El: 296
- Kirov: 476
- Perm: 1019
- Yekat: 1449
- Kurgan: 301
- Novsib: 790
- Khakas: 259
- Buryat: 1117
- Amursk: 192
- Mgadan: 111

Fourth row:
- Smlnsk: 186
- Kaluga: 233
- Msc Obl: 1125
- Vldmr: 358
- Nizhny: 733
- Chuvas: 347
- Tatar: 943
- Udmurt: 521
- Chelya: 1191
- Omsk: 623
- Alt Kr: 679
- Tyva: 486
- Zabayk: 829
- Evr AO: 57
- Primor: 731

Fifth row:
- Brnsk: 535
- Orel: 255
- Tula: 253
- Ryazan: 292
- Mordov: 191
- Ulyan: 451
- Samara: 928
- Bashkr: 1353
- Altai: 172
- Foreign: 243
- N/A: 271

Sixth row:
- Kursk: 407
- Liptsk: 324
- Tambov: 284
- Penza: 291
- Saratv: 853
- Orenbg: 830
- Belgrd: 501
- Vrnezh: 531
- Volggr: 1038

Seventh row:
- Crimea: 407
- Adygea: 117
- Kuban: 1640
- Rostov: 1143
- Kalmyk: 121
- Astrkn: 358
- Sevast: 144

Eighth row:
- Kar-Chr: 99
- Stavro: 904
- Chechn: 234
- Dagstn: 808
- Kab-Bal: 132
- Alania: 401
- Ingush: 57

The bottom of the image also states "At least 42,284 confirmed military deaths from 24 Feb 2022 to 15 Jan 2024." It explains that the timeline shows the number of confirmed casualties within a given time frame and notes that the number is not equal to the number of casualties per day. It also mentions that if the report does not mention the date, the date of its publication, that is, the earliest date we know when the person was confirmed to have been killed, is used. There is a note that Crimea and Sevastopol were annexed by Russia in 2014.

The result we get has the data from the screenshot, but in order to make a heatmap plot we would benefit from a more structured data representation that reflects the Geo-inspired structure of the choropleth.

Here is we prepare and use a more detailed and instructive prompt and combine it with:

JSON only prompt from “LLM::Prompts” , [AAp3]
JSON sub-parser from “Text::SubParsers” , [AAp6]

my $p = q:to/END/;
The image has boxes arranged in jagged array. 
The full array would have been with 11 rows and 17 columns. 
Give a JSON dictionary of the content of the boxes in the image. 
The keys should be matrix coordinates, the values are two element lists corresponding to the labels inside the boxes.
END

my $res = llm-vision-synthesize([$p, llm-prompt("NothingElse")("JSON")], $url, max-tokens => 2000, form => sub-parser('JSON'):drop)

[[9,1] => [Alania 401] [3,2] => [Msc Obl 1125] [1,3] => [Vlgda 413] [8,4] => [Dagstn 808] [0,7] => [Chukot 44] [0,1] => [Murm 256] [2,9] => [Kurgan 301] [3,4] => [Nizhny 733] [7,4] => [Kalmyk 121] [5,2] => [Tambov 284] [2,5] => [Mari-El 296] [4,3] => [Ryazan 291] [1,8] => [Tomsk 259] [3,0] => [Smlnsk 186] [3,13] => [Evr AO 57] [2,13] => [Amursk 192] [2,12] => [Buryat 1117] [6,2] => [Volggr 1038] [3,10] => [Alt Kr 679] [2,10] => [Novsib 790] [1,13] => [Skhlin 392] [9,0] => [Kab-Bal 132] [1,2] => [Novgrd 246] [0,9] => [Mgdan 111] [8,1] => [Kar-Chr 99] [4,5] => [Ulyan 451] [0,3] => [Karel 295] [1,9] => [Kuzbas 904] [2,3] => [Ivnovo 335] [7,0] => [Crimea 407] [0,5] => [Yamal 141] [4,0] => [Brnsk 535] [1,0] => [Klngrd 470] [5,0] => [Kursk 407] [3,1] => [Kaluga 233] [4,7] => [Bashkr 1353] [1,4] => [Arkngl 520] [2,0] => [Pskov 328] [5,5] => [Orenbg 830] [1,10] => [Irkut 844] [4,2] => [Tula 253] [8,3] => [Chechn 234] [2,7] => [Perm 1019] [4,8] => [Altai 172] [5,1] => [Lipstk 324] [3,7] => [Udmurt 521] [1,1] => [Len Obl 428] [5,4] => [Sarat 853] [10,1] => [N/A 271] [1,5] => [Komi 428] [1,7] => [Tyumen 407] [3,11] => [Tyva 486] [7,2] => [Kuban 1640] [4,6] => [Samara 928] [4,1] => [Orel 255] [2,6] => [Kirov 476] [0,8] => [Kmcht 126] [2,1] => [Tver 446] [2,8] => [Yekat 1449] [7,3] => [Rostov 1143] [8,2] => [Stavro 904] [8,0] => [Sevast 144] [2,4] => [Kstrma 242] [1,11] => [Yakut 387] [3,14] => [Primor 731] [0,2] => [Moscow 482] [0,0] => [StPete 480] [3,5] => [Chuvas 347] [0,6] => [Krsyar 875] [7,1] => [Adygea 117] [2,11] => [Khakas 259] [1,12] => [Khabar 196] [3,8] => [Chelya 1191] [4,4] => [Mordov 192] [6,0] => [Belgrd 501] [1,6] => [Hant-Ma 414] [9,2] => [Ingush 57] [3,6] => [Tatar 943] [2,2] => [Yarslv 262] [7,5] => [Astrkn 358] [0,4] => [Nenets 44] [10,0] => [Foreign 243] [3,12] => [Zabayk 829] [3,9] => [Omsk 623] [6,1] => [Vrnezh 531] [3,3] => [Vldmr 358] [5,3] => [Penza 291]]

Remark: Again, the packages “LLM::Prompts”, “Text::SubParsers” are automatically loaded in a chatbook session.

In order to explore the obtained data further we inform ourselves about its structure:

deduce-type($res)

Vector(Pair(Atom((Str)), Vector(Atom((Str)), 2)), 87)

Remark: The function deduce-type is from the package “Data::TypeSystem” , [AAp7], which is automatically loaded in a chatbook session. “Data::TypeSystem” is used other data-transformation packages, and, in “JavaScript::D3”, [AAp8], used for the heatmaps below.

From the deduced type result we see that the structure corresponds to what we specified in the prompt — a list of pairs.
But the JSON conversion from the sub-parser gives:

Keys that are not two-element lists of coodinates, but (corresponding) strings
Values that are lists of two strings (instead of a string and a number)

Here the data above is transformed into a dataset (a list of hash-maps):

my @ds3D = $res.map({ <y x z label>.Array Z=> [ |from-json($_.key), $_.value[1].Int, $_.value ].flat })>>.Hash;
say dimensions(@ds3D);
say deduce-type(@ds3D);

(87 4)
Vector(Struct([label, x, y, z], [Array, Int, Int, Int]), 87)

Using that dataset we can overview the extracted data in the corresponding choropleth (or heatmap plot):

#%js
js-d3-heatmap-plot(@ds3D, width => 1200, plot-labels-font-size =>10)

We can see that the vertical orientation is inverted. We got matrix coordinates, i.e. row indexes are ordered top-bottom.
We are given heatmap plot bottom-top.

Manual data adjustment

In this section we adjust extracted data in order to produce a heatmap that corresponds to that of Midiazona.
We also, opportunistically, verify the data results. (Correct AI Vision recognition of text and numbers should be both trusted and verified.)

Here is the manual adjustment of the data (placed into a new data structure):

my %locCas = (
   (1, 1) => ["StPete", 480], (1, 4) => ["Murman", 256], (1, 16) => ["Chukot", 44], (1, 17) => ["Kmchtk", 126],
   (2, 1) => ["Moscow", 482], (2, 3) => ["Karel", 295], (2, 8) => ["Nenets", 44], (2, 9) => ["Yamal", 141], (2, 12) => ["Krsyar", 875], (2, 16) => ["Mgadan", 111],
   
   (3, 2) => ["Len Obl", 428], (3, 3) => ["Novgrd", 246], (3, 4) => ["Vlgda", 413], (3, 8) => ["Arkngl", 520], (3, 9) => ["Komi", 428], (3, 10) => ["Hant-Ma", 414], (3, 11) => ["Tyumen", 407], (3, 12) => ["Tomsk", 259], (3, 13) => ["Kuzbas", 904], (3, 14) => ["Irkut", 844],
   (3, 15) => ["Yakut", 387], (3, 16) => ["Khabar", 196], (3, 17) => ["Skhlin", 392],
   
   (4, 1) => ["Klngd", 470], (4, 2) => ["Pskov", 328], (4, 3) => ["Tver", 446], (4, 4) => ["Yarslv", 262], (4, 5) => ["Ivnovo", 335], (4, 6) => ["Kstrma", 242], (4, 7) => ["Mari-El", 296], (4, 8) => ["Kirov", 476], (4, 9) => ["Perm", 1019], (4, 10) => ["Yekat", 1449], (4, 11) => ["Kurgan", 301], (4, 12) => ["Novsib", 790], (4, 13) => ["Khakas", 259],
   (4, 14) => ["Buryat", 1117], (4, 15) => ["Amursk", 192],
   
   (5, 2) => ["Smlnsk", 186], (5, 3) => ["Kaluga", 233], (5, 4) => ["Msc Obl", 1125], (5, 5) => ["Vldmr", 358], (5, 6) => ["Nzhny", 733], (5, 7) => ["Chuvas", 347], (5, 8) => ["Tatar", 943], (5, 9) => ["Udmurt", 521], (5, 10) => ["Chelya", 1191], (5, 11) => ["Omsk", 623], (5, 12) => ["Alt Kr", 679],
   (5, 13) => ["Tyva", 486], (5, 14) => ["Zabayk", 829], (5, 15) => ["Ev AO", 57], (5, 16) => ["Primor", 731],
   
   (6, 2) => ["Brnsk", 535], (6, 3) => ["Orel", 255], (6, 4) => ["Tula", 253], (6, 5) => ["Ryazan", 291], (6, 6) => ["Mordov", 192], (6, 7) => ["Ulyan", 451],
   (6, 8) => ["Samara", 928], (6, 9) => ["Bashkr", 1353], (6, 12) => ["Altai", 172],
   
   (7, 3) => ["Kursk", 407], (7, 4) => ["Liptsk", 324], (7, 5) => ["Tambov", 284], (7, 6) => ["Penza", 291], (7, 7) => ["Saratv", 853], (7, 8) => ["Orenbg", 830],
   
   (8, 4) => ["Belgrd", 501], (8, 5) => ["Vrnzeh", 531], (8, 6) => ["Volggr", 1038],
   
   (9, 1) => ["Crimea", 407], (9, 3) => ["Adygea", 117], (9, 4) => ["Kuban", 1640], (9, 5) => ["Rostov", 1143], (9, 6) => ["Kalmyk", 121], (9, 7) => ["Astrkn", 358],
   
   (10, 1) => ["Sevast", 144], (10, 4) => ["Kar-Chr", 99], (10, 5) => ["Stavro", 904], (10, 6) => ["Chechn", 234], (10, 7) => ["Dagstn", 808],
   
   (11, 4) => ["Kab-Bal", 132], (11, 5) => ["Alania", 401], (11, 6) => ["Ingush", 57],
   
   (10, 16) => ["Foreign", 243], (10, 17) => ["N/A", 271]
);

say "No of records    : {%locCas.elems}";
say "Total casualties : {%locCas.values.map(*[1]).sum}";
say deduce-type(%locCas);

No of records    : 87
Total casualties : 42284
Assoc(Atom((Str)), Tuple([Atom((Str)), Atom((Int))]), 87)

Remark: The total number of casualties from the data structure is the same as in the screenshot from Mediazona above.

Heatmap plot

Here we transform the data into a dataset and show the data type:

my @ds3D = %locCas.map({ <y x z label>.Array Z=> [ |$_.key.split(/\h/)>>.Int, $_.value[1], $_.value ].flat })>>.Hash;
say @ds3D.elems;
say deduce-type(@ds3D);

87
Vector(Struct([label, x, y, z], [Array, Int, Int, Int]), 87)

Here is the corresponding summary:

sink records-summary(select-columns(@ds3D, <x y z>))

+----------------------+--------------------+--------------------+
| z                    | y                  | x                  |
+----------------------+--------------------+--------------------+
| Min    => 44         | Min    => 1        | Min    => 1        |
| 1st-Qu => 246        | 1st-Qu => 3        | 1st-Qu => 4        |
| Mean   => 486.022989 | Mean   => 5.367816 | Mean   => 7.781609 |
| Median => 401        | Median => 5        | Median => 7        |
| 3rd-Qu => 731        | 3rd-Qu => 7        | 3rd-Qu => 12       |
| Max    => 1640       | Max    => 11       | Max    => 17       |
+----------------------+--------------------+--------------------+

Here we transform the dataset to have:

Two-row labels
Separation gap for “odd” regions (Moscow, Crimea, etc.)
Casualty values that are suitably rescaled for more informative visualization

my @ds3D2 = @ds3D>>.clone.map({ $_<z> = sqrt($_<z>); $_<x> = $_<x> > 1 ?? $_<x> + 1 !! $_<x> ; $_<y> = 12 - $_<y>; $_<label> = "<tspan>{$_<label>.head}</tspan><tspan dx='-{$_<label>.head.chars/2}em', dy='1em'>{$_<label>.tail}</tspan>"; $_ });
@ds3D2.elems

Here is the heatmap plot:

#%js
js-d3-heatmap-plot(@ds3D2, width => 1100, height => 800,
    x-tick-labels => (1..18),
    plot-labels-font-size => 13,
    plot-labels-color => 'white', 
    color-palette => 'Reds', 
    background => "#282828", 
    tick-labels-font-size => 0,
    low-value => 0,
    high-value => sqrt(1800),
    mesh => 0.01
)

Additional verifications

We can use LLMs do to some additional verification of the data.
For example, we asks about certain summary statistics over Russia that might increase our confidence in the extracted data.

#% chat, temperature=0.2, model=gpt-4
What are types of the administrative divisions of Russia? 
Answer concisely, only the types and the integer count of the corresponding entities per type.

The administrative divisions of Russia include the following types and their corresponding integer count:

1. Federal subjects: 85
2. Autonomous okrugs: 4
3. Autonomous oblast: 1
4. Federal cities: 3
5. Republics: 22
6. Krais: 9
7. Oblasts: 46

Scrape Ukrainian losses

Here we import the web page of regional Ukrainian losses from “UALosses” , [UAL1]:

my $response = HTTP::Tiny.new.get( 'https://ualosses.org/regions/' );
my $htmlRes = $response<content>.decode;
say $htmlRes.chars;

Here we show the table of from the imported page (by HTML applying a regex over its HTML code):

#%html
my $htmlTable = do with $htmlRes ~~ / '<table>' (.*) '</table>' / { $/.Str }

## For brevity the output is skipped -- see the table below.

The easiest way — and, maybe, the most reliable way — to transform that HTML table into a Raku data structure is to use an LLM with a specially crafted prompt.

Here is such an LLM invocation:

Uses the HTML table obtained above
Says that only JSON and nothing else should be returned
Post-process the result with a JSON sub-parser

my $uaRes = llm-synthesize([
    "Convert the HTML table into a Raku list of hashmaps. The values of 'Death Count' and 'Population (2022)' are integers.", 
    $htmlTable, 
    llm-prompt('NothingElse')('JSON')
    ], 
    e => llm-configuration('chatgpt', model => 'gpt-3.5-turbo-16k-0613', max-tokens => 2000), 
    form => sub-parser('JSON'):drop
);

say deduce-type($uaRes);

Vector(Struct([Average age at death, Death Count, Death Count per Capita, Name, Population (2022)], [Str, Int, Str, Str, Int]), 25)

Here we display the obtained data structure as an HTML table:

#% html
my @dsUALosses = |$uaRes;  
@dsUALosses ==> data-translation(field-names=>('Name', 'Population (2022)', 'Death Count', 'Death Count per Capita', 'Average age at death'))

Name	Population (2022)	Death Count	Death Count per Capita	Average age at death
Kirovohrad Oblast	903712	1975	2.185 per 1000	37.5 years
Zhytomyr Oblast	1179032	2378	2.017 per 1000	36.4 years
Vinnytsia Oblast	1509515	2871	1.902 per 1000	37.7 years
Volyn Oblast	1021356	1697	1.662 per 1000	36.8 years
Rivne Oblast	1141784	1890	1.655 per 1000	37.6 years
Chernihiv Oblast	959315	1585	1.652 per 1000	37.1 years
Khmelnytskyi Oblast	1228829	1921	1.563 per 1000	36.4 years
Sumy Oblast	1035772	1554	1.500 per 1000	37.4 years
Kyiv Oblast	1795079	2643	1.472 per 1000	37.7 years
Cherkasy Oblast	1160744	1671	1.440 per 1000	38.1 years
Poltava Oblast	1352283	1891	1.398 per 1000	37.5 years
Dnipropetrovsk Oblast	3096485	3930	1.269 per 1000	37.1 years
Mykolaiv Oblast	1091821	1269	1.162 per 1000	36.1 years
Chernivtsi Oblast	890457	1033	1.160 per 1000	37.7 years
Ivano-Frankivsk Oblast	1351822	1564	1.157 per 1000	38.0 years
Lviv Oblast	2478133	2865	1.156 per 1000	37.5 years
Ternopil Oblast	1021713	1123	1.099 per 1000	37.8 years
Odesa Oblast	2351392	1831	0.779 per 1000	36.1 years
Kharkiv Oblast	2598961	1807	0.695 per 1000	36.3 years
Zakarpattia Oblast	1244476	842	0.677 per 1000	37.1 years
Kherson Oblast	1001598	570	0.569 per 1000	35.4 years
Zaporizhzhia Oblast	1638462	792	0.483 per 1000	36.1 years
Kyiv	2952301	977	0.331 per 1000	37.8 years
Donetsk Oblast	4059372	977	0.241 per 1000	36.0 years
Luhansk Oblast	2102921	383	0.182 per 1000	34.8 years

The function data-translation is from the package “Data::Translators” , which is automatically loaded in a chatbook session.

Here is a verification sum:

@dsUALosses.map( *{"Death Count"} ).sum

Here we make a dictionary of region (“oblast”) name to casualties:

my %uaOblastCas = @dsUALosses.map({ $_<Name> => $_{"Death Count"} })

{Cherkasy Oblast => 1671, Chernihiv Oblast => 1585, Chernivtsi Oblast => 1033, Dnipropetrovsk Oblast => 3930, Donetsk Oblast => 977, Ivano-Frankivsk Oblast => 1564, Kharkiv Oblast => 1807, Kherson Oblast => 570, Khmelnytskyi Oblast => 1921, Kirovohrad Oblast => 1975, Kyiv => 977, Kyiv Oblast => 2643, Luhansk Oblast => 383, Lviv Oblast => 2865, Mykolaiv Oblast => 1269, Odesa Oblast => 1831, Poltava Oblast => 1891, Rivne Oblast => 1890, Sumy Oblast => 1554, Ternopil Oblast => 1123, Vinnytsia Oblast => 2871, Volyn Oblast => 1697, Zakarpattia Oblast => 842, Zaporizhzhia Oblast => 792, Zhytomyr Oblast => 2378}

Heatmap Ukraine casualties

Here we prepare choropleth stencil for the Ukrainian losses:

my %uaLoc = (
    (1, 1) => "Volyn", (1, 2) => "Rivne", (1, 5) => "Chernigov", (1, 6) => "Sumy",
   
    (2, 3) => "Zhitomir", (2, 4) => "Kyiv", (2, 4) => "Kyyivska",

    (3, 1) => "Lviv", (3, 2) => "Ternopil", (3, 3) => "Khmelnytskyi", (3, 5) => "Cherkask", (3, 6) => "Poltava", (3, 7) => "Kharkiv",
    
    (4, 1) => "Ivano-Frankivsk", (4, 3) => "Vinnica", (4, 8) => "Luhansk",
 
    (5, 1) => "Zakarpattia", (5, 2) => "Chernivtsi", (5, 5) => "Kirovohrad", (5, 6) => "Dnipropetrovsk", (5, 7) => "Donetsk",
  
    (6, 5) => "Mykolayivsk", (6, 6) => "Zaporizhzhia",

    (7, 4) => "Odesa", (7, 6) => "Kherson"
);

{1 1 => Volyn, 1 2 => Rivne, 1 5 => Chernigov, 1 6 => Sumy, 2 3 => Zhitomir, 2 4 => Kyyivska, 3 1 => Lviv, 3 2 => Ternopil, 3 3 => Khmelnytskyi, 3 5 => Cherkask, 3 6 => Poltava, 3 7 => Kharkiv, 4 1 => Ivano-Frankivsk, 4 3 => Vinnica, 4 8 => Luhansk, 5 1 => Zakarpattia, 5 2 => Chernivtsi, 5 5 => Kirovohrad, 5 6 => Dnipropetrovsk, 5 7 => Donetsk, 6 5 => Mykolayivsk, 6 6 => Zaporizhzhia, 7 4 => Odesa, 7 6 => Kherson}

Since the stencil was prepared using a Geo-data source different from ualosses.org ,
here we formulate and execute an LLM request to make a dictionary that matches the administrative division names
from the casualties table and stencil:

my %uaOblastNames = llm-synthesize([
    "Match the values of the list:\n {to-json(%uaLoc.values)} \n with the list: \n {to-json(@dsUALosses.map(*<Name>))} \n into a JSON dictionary.",
    llm-prompt('NothingElse')('JSON')
    ], 
    e => llm-configuration('chatgpt', model => 'gpt-3.5-turbo-16k-0613', max-tokens => 2000), 
    form => sub-parser('JSON'):drop
)

{Cherkask => Cherkasy Oblast, Chernigov => Chernihiv Oblast, Chernivtsi => Chernivtsi Oblast, Dnipropetrovsk => Dnipropetrovsk Oblast, Donetsk => Donetsk Oblast, Ivano-Frankivsk => Ivano-Frankivsk Oblast, Kharkiv => Kharkiv Oblast, Kherson => Kherson Oblast, Khmelnytskyi => Khmelnytskyi Oblast, Kirovohrad => Kirovohrad Oblast, Kyyivska => Kyiv Oblast, Luhansk => Luhansk Oblast, Lviv => Lviv Oblast, Mykolayivsk => Mykolaiv Oblast, Odesa => Odesa Oblast, Poltava => Poltava Oblast, Rivne => Rivne Oblast, Sumy => Sumy Oblast, Ternopil => Ternopil Oblast, Vinnica => Vinnytsia Oblast, Volyn => Volyn Oblast, Zakarpattia => Zakarpattia Oblast, Zaporizhzhia => Zaporizhzhia Oblast, Zhitomir => Zhytomyr Oblast}

Here we fill-in the stencil with the casualties numbers:

my %uaLocCas = %uaLoc.map({ $_.key => [$_.value, %uaOblastCas{ %uaOblastNames{$_.value} } // 0] })

{1 1 => [Volyn 1697], 1 2 => [Rivne 1890], 1 5 => [Chernigov 1585], 1 6 => [Sumy 1554], 2 3 => [Zhitomir 2378], 2 4 => [Kyyivska 2643], 3 1 => [Lviv 2865], 3 2 => [Ternopil 1123], 3 3 => [Khmelnytskyi 1921], 3 5 => [Cherkask 1671], 3 6 => [Poltava 1891], 3 7 => [Kharkiv 1807], 4 1 => [Ivano-Frankivsk 1564], 4 3 => [Vinnica 2871], 4 8 => [Luhansk 383], 5 1 => [Zakarpattia 842], 5 2 => [Chernivtsi 1033], 5 5 => [Kirovohrad 1975], 5 6 => [Dnipropetrovsk 3930], 5 7 => [Donetsk 977], 6 5 => [Mykolayivsk 1269], 6 6 => [Zaporizhzhia 792], 7 4 => [Odesa 1831], 7 6 => [Kherson 570]}

Here we convert the hash-map into a dataset (suitable for displaying with “JavaScript::D3”):

my @dsUA3D = %uaLocCas.map({ <y x z label>.Array Z=> [ |$_.key.split(/\h/)>>.Int, $_.value[1], $_.value ].flat })>>.Hash;
say @dsUA3D.elems;
say deduce-type(@dsUA3D);

24
Vector(Struct([label, x, y, z], [Array, Int, Int, Int]), 24)

As with the Russian casualties heatmap plot above, here we transform the Ukrainian losses dataset to have:

Two-row labels
Casualty values that are suitably rescaled for more informative visualization

my @dsUA3D2 = @dsUA3D>>.clone.map({ $_<z> = sqrt($_<z>); $_<y> = 12 - $_<y>; $_<label> = "<tspan>{$_<label>.head}</tspan><tspan dx='-{$_<label>.head.chars/2}em', dy='1em'>{$_<label>.tail}</tspan>"; $_ });
@dsUA3D2.elems

Here we make the heatmap plot:

#%js
js-d3-heatmap-plot(@dsUA3D2, width => 1000, height => 600,
    x-tick-labels => (1..8),
    plot-labels-font-size => 12,
    plot-labels-color => 'white', 
    color-palette => 'Reds', 
    background => "#282828", 
    tick-labels-font-size => 0,
    low-value => 0,
    high-value => sqrt(1800),
    mesh => 0.01
)

Observation and conclusions

In this section we give several groups of observation and conclusions that came out doing the presented data scraping and plot making.

Choropleths

Suitable Geo-data for the choropleth stencils have to be obtained and tweaked.
Russia
- The initial values for the choropleth coordinates were derived via AI Vision.
- Review and manual adjustment was (of course) required.
- Since the total number of administration districts is 85, having a set of initial values sped up the final stencil derivation.
Ukraine
- The choropleth stencil coordinates were derived from a Geo plot using Mathematica.
- Compared to UALosses, Mathematica uses slightly different names of the Ukrainian administrative divisions (or regions.)
The choropleth stencils could be automatically derived from the actual geometric centers of the administrative divisions, but it seemed easier and less error-prone to make those stencils manually.

Data scraping

AI Vision can be effectively used to get data from images in Web pages or other types of documents. See [AA1-AA4] for more details.
LLMs can be used to convert data elements — like tables — of Web pages into programming data structures (that suitable in further computations.)
- The data conversions we did using LLMs are done “through JSON”, because:
- JSON is a popular well represented data format in LLMs training data
- Raku has good JSON-to-Raku and Raku-to-JSON converters.
Small discrepancies or errors from data scraping procedures can be smoothed, detected, or eliminated using LLMs.

Heatmap plots

The heatmap plot function js-d3-heatmap-plot of “JavaScript:D3” allows:
- Use of sparse data
- Rectangles with labels
- Different color palettes for the values of the rectangles
- Tunable fonts and colors for axis labels and plot labels
In order to have two-row labels in the rectangles special HTML-spec has to be used.
- That is constraint coming up from the implementation of the underlying Javascript library D3 .

References

Articles

[AA1] Anton Antonov, “AI vision via Raku” , (2023), RakuForPrediction at WordPress .

[AA2] Anton Antonov, “AI vision via Wolfram Language” , (2023), Wolfram Community .

[AA3] Anton Antonov, “Day 24 – Streamlining AI vision workflows” , (2023), RakuAdventCalendar at WordPress .

[AA4] Anton Antonov, “Extracting Russian casualties in Ukraine data from Mediazona publications” , (2023), MathematicaForPrediction at WordPress .

[MZ1] Mediazona, Russian casualties in Ukraine , (2022-2024).

[UAL1] UALosses, Ukrainian losses , (2023-2024).

Packages, repositories

[AAp1] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube .

[AAp2] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube .

[AAp3] Anton Antonov, LLM::Prompts Raku package, (2023), GitHub/antononcube .

[AAp4] Anton Antonov, Jupyter::Chatbook Raku package, (2023), GitHub/antononcube .

[AAp5] Anton Antonov, Image::Markup::Utilities Raku package, (2023), GitHub/antononcube .

[AAp6] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube .

[AAp7] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube .

[AAp8] Anton Antonov, JavaScript::D3 Raku package, (2022-2023), GitHub/antononcube .

[AAp9] Anton Antonov, Data::Translators Raku package, (2023), GitHub/antononcube .

[AAr1] Anton Antonov, SystemModeling at GitHub , (2020-2024), GitHub/antononcube .

Videos

[AAv1] Anton Antonov, “The Raku-ju hijack hack for D3.js” , (2022), YouTube/@AAA4Prediction .

[AAv2] Anton Antonov, “Random mandalas generation (with D3.js via Raku)” , (2022), YouTube/@AAA4Prediction .

[AAv3] Anton Antonov, “Integrating Large Language Models with Raku” , (2023), YouTube/@therakuconference6823 .

Raku for Prediction

Menu

Tag Archives: AI Vision

WWW::Gemini

Installation

Usage examples

Embeddings

Counting tokens

Vision

Command Line Interface

Maker suite access

Mermaid diagram

References

Articles

Packages, platforms

Heatmap plots over LLM scraped data

Introduction

Outline

Setup

Heatmap screenshot import

Data extraction via AI Vision

Manual data adjustment

Heatmap plot

Additional verifications

Scrape Ukrainian losses

Heatmap Ukraine casualties

Observation and conclusions

Choropleths

Data scraping

Heatmap plots

References

Articles

Packages, repositories

Videos