vLLM
VLLMInference
¶
Initializes a vLLM-based inference engine.
Parameters:
-
model_name_or_path(str) –The name or path of the model.
-
instruction(Path) –path to the instruction file.
-
instruct_in_prompt(bool, default:False) –whether to include the instruction in the prompt for models without system role.
-
template(Path, default:None) –path to a prompt template file if tokenizer does not include chat template. Optional.
-
num_gpus(int, default:1) –The number of GPUs to use. Defaults to 1.
-
llm_params(Dict, default:{}) –Additional parameters for the LLM model. Supports all parameters define by vLLM LLM engine. Defaults to an empty dictionary.
-
generation(Dict, default:{}) –Additional parameters for text generation. Supports all the keywords of
SamplingParamsof vLLM. Defaults to an empty dictionary.
Source code in ragfit/models/vllm.py
generate(prompt: str) -> str
¶
Generates text based on the given prompt.