README.md 745 Bytes
Newer Older
changlinli's avatar
changlinli committed
1
# Python binding for llama-cpp
Andrei Betlen's avatar
Andrei Betlen committed
2

changlinli's avatar
changlinli committed
3
## build
Andrei Betlen's avatar
Andrei Betlen committed
4

changlinli's avatar
changlinli committed
5
CUDA and CUDA_TOOLKIT is must to enable cuda accelerate!!!
6
7

```bash
changlinli's avatar
changlinli committed
8
9
#To install with cuda accelerate and all optional dependencies
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -e .[all]
Thomas Neu's avatar
Thomas Neu committed
10
11
```

changlinli's avatar
changlinli committed
12
## llava-v1.5
13

changlinli's avatar
changlinli committed
14
To run llava-v1.5, first you need to start the server
15
```bash
changlinli's avatar
changlinli committed
16
17
#disable mlock
export use_mlock=False
Andrei Betlen's avatar
Andrei Betlen committed
18

changlinli's avatar
changlinli committed
19
20
#run llava server
python -m llama_cpp.server --model {llava_model} --clip_model_path {clip_model} --chat_format llava-1-5 --n_gpu_layers -1 --port {port} --host "0.0.0.0"
Andrei Betlen's avatar
Andrei Betlen committed
21
22
```

changlinli's avatar
changlinli committed
23
<!-- make sure you have git lfs to download the model
Andrei Betlen's avatar
Andrei Betlen committed
24
```bash
changlinli's avatar
changlinli committed
25
git lfs install
Andrei Betlen's avatar
Andrei Betlen committed
26
```
changlinli's avatar
changlinli committed
27
28
llava model path: ./llava-v1.5-13b-gguf/ggml-model-q4_k.gguf
clip model path: ./llava-v1.5-13b-gguf/mmproj-model-f16.gguf -->