HuggingFace backend
actantial.backends.huggingface.HuggingFaceBackend
Bases: LLMBackend
Backend for locally loaded HuggingFace models.
Loads the model and tokenizer from the HuggingFace Hub at initialisation. Quantisation via bitsandbytes (4-bit) is supported, but requires a CUDA GPU.
__init__(repository, model_name, quantisation=False, torch_dtype='auto', temperature=None, do_sample=False, top_p=None, top_k=None, **kwargs)
Load the model and tokenizer from the HuggingFace Hub.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repository
|
str
|
HuggingFace repository name (e.g., |
required |
model_name
|
str
|
Model identifier within the repository (e.g., |
required |
quantisation
|
bool
|
If |
False
|
torch_dtype
|
str
|
Floating-point precision passed to |
'auto'
|
temperature
|
Optional[float]
|
Sampling temperature; higher values increase randomness. |
None
|
do_sample
|
bool
|
If |
False
|
top_p
|
Optional[float]
|
Nucleus sampling probability threshold. |
None
|
top_k
|
Optional[int]
|
Top-k sampling parameter. |
None
|
**kwargs
|
Any
|
Additional arguments passed to |
{}
|
generate(prompt, max_new_tokens=2048, **kwargs)
Generate text from a prompt.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The input prompt string. |
required |
max_new_tokens
|
int
|
Maximum number of tokens to generate. |
2048
|
**kwargs
|
Any
|
Additional parameters passed to the model's |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
The generated text string, excluding the input prompt. |
cleanup()
Unload model and free GPU memory.