HuggingFace backend

`actantial.backends.huggingface.HuggingFaceBackend`

Bases: LLMBackend

Backend for locally loaded HuggingFace models.

Loads the model and tokenizer from the HuggingFace Hub at initialisation. Quantisation via bitsandbytes (4-bit) is supported, but requires a CUDA GPU.

`init(repository, model_name, quantisation=False, torch_dtype='auto', temperature=None, do_sample=False, top_p=None, top_k=None, **kwargs)`

Load the model and tokenizer from the HuggingFace Hub.

Parameters:

Name	Type	Description	Default
`repository`	`str`	HuggingFace repository name (e.g., `deepseek-ai`).	required
`model_name`	`str`	Model identifier within the repository (e.g., `DeepSeek-R1-Distill-Qwen-32B`).	required
`quantisation`	`bool`	If `True`, load the model in 4-bit precision using bitsandbytes. Requires a CUDA GPU.	`False`
`torch_dtype`	`str`	Floating-point precision passed to `from_pretrained`. Accepts `"auto"` (default), `"float16"`, or `"bfloat16"`.	`'auto'`
`temperature`	`Optional[float]`	Sampling temperature; higher values increase randomness.	`None`
`do_sample`	`bool`	If `True`, use sampling; defaults to `False` for deterministic (greedy) output.	`False`
`top_p`	`Optional[float]`	Nucleus sampling probability threshold.	`None`
`top_k`	`Optional[int]`	Top-k sampling parameter.	`None`
`**kwargs`	`Any`	Additional arguments passed to `AutoModelForCausalLM.from_pretrained`.	`{}`

`generate(prompt, max_new_tokens=2048, **kwargs)`

Generate text from a prompt.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The input prompt string.	required
`max_new_tokens`	`int`	Maximum number of tokens to generate.	`2048`
`**kwargs`	`Any`	Additional parameters passed to the model's `generate` method.	`{}`

Returns:

Type	Description
`str`	The generated text string, excluding the input prompt.

`cleanup()`

Unload model and free GPU memory.

HuggingFace backend

actantial.backends.huggingface.HuggingFaceBackend

__init__(repository, model_name, quantisation=False, torch_dtype='auto', temperature=None, do_sample=False, top_p=None, top_k=None, **kwargs)

generate(prompt, max_new_tokens=2048, **kwargs)

cleanup()

`actantial.backends.huggingface.HuggingFaceBackend`

`init(repository, model_name, quantisation=False, torch_dtype='auto', temperature=None, do_sample=False, top_p=None, top_k=None, **kwargs)`

`generate(prompt, max_new_tokens=2048, **kwargs)`

`cleanup()`