The functions calls maps automatically to grammars which are currently supported only by llama.cpp, however, it is possible to turn off the use of grammars, and extract tool arguments from the LLM responses, by specifying in the YAML file no_grammar and a regex to map the response from the LLM:
name: model_nameparameters:
# Model file namemodel: model/namefunction:
# set to true to not use grammarsno_grammar: true# set one or more regexes used to extract the function tool arguments from the LLM responseresponse_regex:
- "(?P<function>\w+)\s*\((?P<arguments>.*)\)"
The response regex have to be a regex with named parameters to allow to scan the function name and the arguments. For instance, consider:
(?P<function>\w+)\s*\((?P<arguments>.*)\)
will catch
function_name({ "foo": "bar"})
Parallel tools calls
This feature is experimental and has to be configured in the YAML of the model by enabling function.parallel_calls:
name: gpt-3.5-turboparameters:
# Model file namemodel: ggml-openllama.bintop_p: 80top_k: 0.9temperature: 0.1function:
# set to true to allow the model to call multiple functions in parallelparallel_calls: true
Use functions with grammar
It is possible to also specify the full function signature (for debugging, or to use with other clients).
The chat endpoint accepts the grammar_json_functions additional parameter which takes a JSON schema object.