vLLM
VLLMInference
¶
Initializes a vLLM-based inference engine.
Parameters:
-
model_name_or_path
(str
) –The name or path of the model.
-
instruction
(Path
) –path to the instruction file.
-
instruct_in_prompt
(bool
, default:False
) –whether to include the instruction in the prompt for models without system role.
-
template
(Path
, default:None
) –path to a prompt template file if tokenizer does not include chat template. Optional.
-
num_gpus
(int
, default:1
) –The number of GPUs to use. Defaults to 1.
-
llm_params
(Dict
, default:{}
) –Additional parameters for the LLM model. Supports all parameters define by vLLM LLM engine. Defaults to an empty dictionary.
-
generation
(Dict
, default:{}
) –Additional parameters for text generation. Supports all the keywords of
SamplingParams
of vLLM. Defaults to an empty dictionary.
Source code in ragfit/models/vllm.py
generate(prompt: str) -> str
¶
Generates text based on the given prompt.