Install local
- CUDA local
- This forces reinstall and build llama cpp python with CUDA so that GPU can be used
CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUDA_FORCE_CUBLAS=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
- The only thing that worked outright was creating a new venv with conda and then installing llama cpp python lib. Every other option either failed to install or failed to GPU after install
Llama cpp cli
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
mkdir build
cd build
cmake -B build
cmake --build build --config Release
llama-cli -m "mistral-7b-instruct-v0.1.Q2_K.gguf" -cnv