README.md

# Python binding for llama-cpp

## build

CUDA and CUDA_TOOLKIT is must to enable cuda accelerate!!!

```bash
#To install with cuda accelerate and all optional dependencies
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -e .[all]
```

## llava-v1.5

To run llava-v1.5, first you need to start the server
```bash
#disable mlock
export use_mlock=False

#run llava server
python -m llama_cpp.server --model {llava_model} --clip_model_path {clip_model} --chat_format llava-1-5 --n_gpu_layers -1 --port {port} --host "0.0.0.0"
```

<!-- make sure you have git lfs to download the model
```bash
git lfs install
```
llava model path: ./llava-v1.5-13b-gguf/ggml-model-q4_k.gguf
clip model path: ./llava-v1.5-13b-gguf/mmproj-model-f16.gguf -->