Generating documents via templates and LLMs

Introduction

This blog post proclaims the “simple secret” of making fully fledged documents via Large Language Models (LLMs), [AA2, Wk1], using computational Markdown, Org-mode, or Pod6 template files, [AA1, AAv1], and a few Raku packages, [AAp1÷AAp3].

Of course, instead of using templates we can “just” write Raku scripts that generate such documents, but using scripts does not allow:

Convenient interleaving of human-written content with LLM hallucinations
(Semi-)interactive, refining, editing, and tweaking of both content types
- With suitable editors, integrated development environments, or notebook “solutions.”

Definition: A computational Markdown, Org-mode, or Pod6 file has code chunks the content of which is evaluated within one, document-wide context, [AA1, AAv1, AAp1].

Definition: We call the processing of a computational document via the package “Text::CodeProcessing”, [AAp1], document execution.

Remark: The document execution can be done via the Command Line Interface (CLI) of the package “Text::CodeProcessing”.

Remark: Recent development of “Text::CodeProcessing” added “code chunk” plug-ins for accessing the LLMs ChatGPT (OpenAI), [AAp2], and PaLM, [AAp3].

See the references for more details of the underlying mechanics, [AA1, AA2, AAp1÷AAp3].

Here is an example Markdown template: “Simple guide via LLM template”, [AA3].

Here are documents obtained by executing that template:

Remark: I plan to add other LLM-generated documents in the GitHub directory “Articles/LLM-generated” of the project “RakuForPrediction book”, [AAp4].

Here is a flowchart that summarizes the execution steps:

Template structure and content

In this section we describe how to design document generating, LLM leveraging templates like “Simple guide via LLM template”, [AA3], and exemplify the corresponding template code chunks.

The main idea of the template is simple:

Have a LLM code chunk that asks the generation of a list actions, features, or description elements on a certain subject or theme
Follow up with one or several Raku code chunks that:
- Process the LLM output
- Make LLM requests for expanding each (or some) of the items of the LLM derived list
Make the code chunks “self-deleting”
- I.e. only the results of code chunks execution are placed in the result document
- In this way we obtain more “seamless” documents
Place the code chunk results without code block markings, i.e. with “as is” specs

The following subsections demonstrate the two types of code chunks and provide details.

LLM cells

Here is an example of an LLM access cell using “WWW::OpenAI”, [AAp2]:

```openai, format=values, temperature=1.4, max-tokens=800, results=asis, output-prompt=NONE, echo=FALSE
Generate a 12 steps outline for quiting addiction to programmaring Python (and replacing with Raku.)
```

Note that:

All named parameters of openai-completion can be used as code chunk parameters.
The code chunk specs results=asis, output-prompt=NONE make the output to be “seamlessly” included in the result document.
The spec echo=FALSE “removes” the code chunk from the result document.

Raku cells with LLM access

Here is an example of Raku cell with a “LLM generation script” that expands each of the items from the list obtained above.

```perl6, results=asis, output-prompt=NONE, echo=FALSE
my $txt = _.trim;
my $txtExpanded = do for $txt.split(/ ^^ [ \d+ | <[IVXLC]>+ ] /, :v, :skip-empty)>>.Str.rotor(2) -> $p {
    my $res = "-" x 6; 
    $res ~= "\n"; 
    $res ~= "\n## {$p[0]}.";
    my $start = '';
    if $p[1] ~~ / '**'  (.*?) '**' | '<b>'  (.*?) '</b>' / {
        $start = $0.Str;
        $res ~= ' ' ~ $start.subst( / <punct>+ $$/, '');
    };
    $res ~= "\n\n>... {$p[1].subst(/'**' $start '**'/, '').subst( / ^^ <punct>+ /, '')}"; 
    #$res ~= "\n\n", openai-completion( "Expand upon: {$p[1]}", temperature => 1.45, max-tokens => 400, format=>'values' );
    $res ~= "\n\n", palm-generate-text( "Expand upon: {$p[1]}", temperature => 0.75, max-tokens => 400, format=>'values' );
}

$txtExpanded.join("\n\n") 
```

Note that:

With _ we can access the last evaluated result “Text::CodeProcessing”, [AAp1].
- By the way, using _ for last result access is:
  - “Inspired” by Python’s console
  - “Inherited” from Brian Duggan’s package “Jupyter::Kernel”
Again, the results are placed in the result notebook “as is.”

The script relies on the assumptions that:

List items are numbered (because of the formulation of the LLM request.)
- See the splitting regex.
Very likely list items have an opening phrase in Markdown bold font spec.