Getting Started

In [4]:

Copied!

import pandas as pd
from actantial import OpenAIBackend, AnthropicBackend, HuggingFaceBackend
from actantial import run_extract, load_annotations, compare_annotations
import pandas as pd
from actantial import OpenAIBackend, AnthropicBackend, HuggingFaceBackend
from actantial import run_extract, load_annotations, compare_annotations

In [5]:

Copied!





# create some test data
df = pd.DataFrame({
    "id": [1, 2, 3],
    "text": ["Alice wants Bob", "Bob wants Alice", "Alice wants Bob, but Bob does not want Alice"],
})

df.head(1)
# create some test data
df = pd.DataFrame({
    "id": [1, 2, 3],
    "text": ["Alice wants Bob", "Bob wants Alice", "Alice wants Bob, but Bob does not want Alice"],
})

df.head(1)

Out[5]:

	id	text
0	1	Alice wants Bob

Annotations GPT¶

First, create a backend for generation. The OpenAIBackend connects to the openai API, giving you access to all of their models. To use this backend, you need an OPENAI_API_KEY saved in a .env file in your directory. You can get your key from your OpenAI Account.

In [ ]:

Copied!





# Initialise a backend. 
# You can choose different models from the OpenAI API like gpt-5-mini. 
# Simply replace the model name in the following line
backend_gpt = OpenAIBackend("gpt-4o-mini")

# call .generate to send a request to the API
backend_gpt.generate("What is the capital of France?")
# Initialise a backend. 
# You can choose different models from the OpenAI API like gpt-5-mini. 
# Simply replace the model name in the following line
backend_gpt = OpenAIBackend("gpt-4o-mini")

# call .generate to send a request to the API
backend_gpt.generate("What is the capital of France?")

Out[ ]:

'The capital of France is Paris.'

To extract the actantial model from our data, we need the data, the backend, and a prompt template. The template is the heart of the extraction process. You can inspect the available templates for a specific backend:

In [16]:

Copied!

backend_gpt.list_templates()
backend_gpt.list_templates()

Out[16]:

{'model_specific': ['prompt_closed', 'prompt_open_variables'],
 'default': ['base_prompt']}

The default for each initial extraction is the base_prompt:

In [17]:

Copied!

backend_gpt.show_template('base_prompt')
backend_gpt.show_template('base_prompt')

Template 'base_prompt.txt' for model 'gpt-4o-mini' (source: default):

According to the Actantial Model by Greimas with the actant label set ["Subject", "Object", "Sender", "Receiver", "Helper", "Opponent"], the actants are defined as follows:

* Subject: The character who carries out the action and desires the Object.
* Object: The character or thing that is desired and transfered.
* Sender: The character who initiates the action, controls the Object, and transfers it to the Receiver.
* Receiver: The character who receives the Object.
* Helper: The character who assists the Subject in achieving its goal.
* Opponent: The character who opposes the Subject in achieving its goal.

Based on the Actantial Model, please recognize the actants in the following text.

Text: {{ text }}

Question: What are the main actants in the text?
1. Identify the main Object in the text.
2. Identify the respective Subject.
3. Identify the Sender who transfers the Object to the Receiver. Make sure that this matches the Object from the previous step.
4. Identify the Helper and Opponent that try to influence the Subject in achieving the Object.

Final Consistency Check:
- Ensure that all chosen actants align logically with the relationships described in the Actantial Model.
- The Object must remain consistent between desire and transfer.
- If necessary, update the actants to maintain coherence.

Response Format (JSON Dictionary)
- Each actant label should be a key, and the corresponding actor the value.
- If no actor is present for a label, return an empty string ("").

Example Format: {"Subject": ["Actant Name"], "Object": ["Actant Name"], "Sender": ["Actant Name"], "Receiver": ["Actant Name"], "Helper": ["Actant Name"], "Opponent": ["Actant Name"]}

Answer:

The base template frames the extraction task, provides a definition of the actantial model, and structures the extraction process as well as the output format. It can be used for basic open extraction. That is, to freely label the text with the actantial model.

The package also provides a couple of example templates showcasing closed extraction with a predefined label set (prompt_closed) and the use of additional variables besides the text, like author or publishing date (prompt_open_variables). These templates are model specific, i.e., in contrast to templates in the default folder which are available for every model, they are only available for gpt-4o-mini. While the default template is great for initial exploration, systematic application of the model likely requires a specialised, model-specific template. You can create your own custom templates following the steps described below.

For now, we will use the base_prompt:

In [19]:

Copied!





run_extract(
    data=df,
    backend=backend_gpt,
    output_dir="output",    # an output directory to save the extracted data and logs
    template="base_prompt",
)
run_extract(
    data=df,
    backend=backend_gpt,
    output_dir="output",    # an output directory to save the extracted data and logs
    template="base_prompt",
)

Timestamp: 	20260611_134003
Log: 		output/logs/gpt-4o-mini_base_prompt_20260611_134003.log
Files: 		output/actantial_models/gpt-4o-mini/base_prompt/20260611_134003

100%|██████████| 3/3 [00:14<00:00,  4.91s/it]

You can then load the annotations from the printed file path, matching them back to the original dataframe:

In [40]:

Copied!





annot_gpt = load_annotations(
    data=df, 
    label_folder="output/actantial_models/gpt-4o-mini/base_prompt/20260611_134003"  # update this path to match your own extraction
)

annot_gpt
annot_gpt = load_annotations(
    data=df, 
    label_folder="output/actantial_models/gpt-4o-mini/base_prompt/20260611_134003"  # update this path to match your own extraction
)

annot_gpt

Out[40]:

	id	text	file_name	Subject	Object	Sender	Receiver	Helper	Opponent
0	1	Alice wants Bob	output/actantial_models/gpt-4o-mini/base_promp...	alice	bob	alice	bob	None	None
1	2	Bob wants Alice	output/actantial_models/gpt-4o-mini/base_promp...	bob	alice	None	bob	None	None
2	3	Alice wants Bob, but Bob does not want Alice	output/actantial_models/gpt-4o-mini/base_promp...	alice	bob	alice	bob	None	bob

Annotations Claude¶

You can also use the anthropic API in the same way as described above. However, you also need an ANTHROPIC_API_KEY in your .env file.

In [42]:

Copied!

backend_claude = AnthropicBackend("claude-haiku-4-5")
backend_claude.list_templates()
backend_claude = AnthropicBackend("claude-haiku-4-5")
backend_claude.list_templates()

Out[42]:

{'model_specific': [], 'default': ['base_prompt']}

In [26]:

Copied!





run_extract(
    data=df,
    backend=backend_claude,
    output_dir="output",
    template="base_prompt",
)
run_extract(
    data=df,
    backend=backend_claude,
    output_dir="output",
    template="base_prompt",
)

Timestamp: 	20260611_134331
Log: 		output/logs/claude-haiku-4-5_base_prompt_20260611_134331.log
Files: 		output/actantial_models/claude-haiku-4-5/base_prompt/20260611_134331

100%|██████████| 3/3 [00:09<00:00,  3.18s/it]

Annotations HuggingFace¶

Lastly, you can use the HuggingFace platform, providing a wide range of models. However, this backend runs the models on your local GPU. While this significantly limits the size of models that can be run, it is free and independent of API changes. There is also an option to run the model in a quantised mode, yet this requires a CUDA GPU (no MPS).

In [59]:

Copied!

backend_hf = HuggingFaceBackend(repository="google", model_name="gemma-3-4b-it")
backend_hf = HuggingFaceBackend(repository="google", model_name="gemma-3-4b-it")

Loading model google/gemma-3-4b-it...

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Model loaded successfully

In [48]:

Copied!





run_extract(
    data=df,
    backend=backend_hf,
    output_dir="output",
    template="base_prompt",
)
run_extract(
    data=df,
    backend=backend_hf,
    output_dir="output",
    template="base_prompt",
)

Timestamp: 	20260611_142247
Log: 		output/logs/gemma-3-4b-it_base_prompt_20260611_142247.log
Files: 		output/actantial_models/gemma-3-4b-it/base_prompt/20260611_142247

  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:11<00:00,  4.00s/it]

Compare annotations¶

You can compare whether the different backend/models agree with their annotations. Simply replace the files paths with your own.

In [53]:

Copied!

annot_gpt = load_annotations(df, "output/actantial_models/gpt-4o-mini/base_prompt/20260611_134003")
annot_claude = load_annotations(df, "output/actantial_models/claude-haiku-4-5/base_prompt/20260611_134331")
annot_hf = load_annotations(df, "output/actantial_models/gemma-3-4b-it/base_prompt/20260611_142247")
annot_gpt = load_annotations(df, "output/actantial_models/gpt-4o-mini/base_prompt/20260611_134003")
annot_claude = load_annotations(df, "output/actantial_models/claude-haiku-4-5/base_prompt/20260611_134331")
annot_hf = load_annotations(df, "output/actantial_models/gemma-3-4b-it/base_prompt/20260611_142247")

In [54]:

Copied!





compare_annotations(
    dfs=[annot_gpt, annot_claude, annot_hf],
    names=['gpt', 'claude', 'hf'],
    metric="f1_micro"
)
compare_annotations(
    dfs=[annot_gpt, annot_claude, annot_hf],
    names=['gpt', 'claude', 'hf'],
    metric="f1_micro"
)

Out[54]:

	gpt_claude	gpt_hf	claude_hf	avg	N
Subject	1.0	1.0	1.0	1.0	9
Object	1.0	1.0	1.0	1.0	9
Sender	NaN	1.0	NaN	1.0	1
Receiver	NaN	0.5	NaN	0.5	2
Helper	NaN	NaN	NaN	NaN	0
Opponent	1.0	1.0	1.0	1.0	3
avg	1.0	0.9	1.0	0.9	<NA>

The agreement scores are relatively high. This is to be expected from the toy example provided. For a normal text, however, annotations will most likely diverge significantly. This is mostly due to the fact that open annotation does not provide any guardrails on the types of actors that are being labelled. Even if models agree on the actantial model, the phrasing of the actors might be different, making it impossible to assess agreement properly. Hence, compare_annotations is mostly useful for closed set annotations as demonstrated in the Case Study.

Create your own templates¶

You can copy the base and example templates from the package to a local folder and build your own model specific templates from there.

In [23]:

Copied!

!actantial-init-templates "./test/"
!actantial-init-templates "./test/"

Templates copied to 'test/templates'. Pass '--templates_dir test/templates' to use templates from this directory.

In [ ]:

Copied!





run_extract(
    data=df,
    backend=backend_gpt,
    output_dir="output",
    templates_dir="./test/templates",   # new template directory, this is where you can add your own templates!
    template="your_own_template",       # replace with the name of your custom template in the gpt-4o-mini folder
)
run_extract(
    data=df,
    backend=backend_gpt,
    output_dir="output",
    templates_dir="./test/templates",   # new template directory, this is where you can add your own templates!
    template="your_own_template",       # replace with the name of your custom template in the gpt-4o-mini folder
)

Fin