* feature: use uv to setup python environment
* TrainProcessService add singleten method: get_instance
* feat: fix code
* Added CUDA support (#228)
* Add CUDA support
- CUDA detection
- Memory handling
- Ollama model release after training
* Fix logging issue
added cuda support flag so log accurately reflected cuda toggle
* Update llama.cpp rebuild
Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run
* Improved vram management
Enabled memory pinning and optimizer state offload
* Fix CUDA check
rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support
* Added fast restart and fixed CUDA check command
Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild
Fixed make docker-check-cuda command to correctly reflect cuda support
* Added docker-compose.gpu.yml
Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification
* Fixed cuda toggle
Last push accidentally broke cuda toggle
* Code review fixes
Fixed errors resulting from removed code:
- Added return save_path to end of save_hf_model function
- Rolled back download_file_with_progress function
* Update Makefile
Use cuda by default when using docker-restart-backend-fast
* Minor cleanup
Removed unnecessary makefile command and fixed gpu logging
* Delete .gpu_selected
* Simplified cuda training code
- Removed dtype setting to let torch automatically handle it
- Removed vram logging
- Removed Unnecessary/old comments
* Fixed gpu/cpu selection
Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu
* Fix Ollama embedding error
Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones
* Fixed model selection & memory error
Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly
* fix: train service singlten
---------
Co-authored-by: Zachary Pitroda <30330004+zpitroda@users.noreply.github.com>
* Add CUDA support
- CUDA detection
- Memory handling
- Ollama model release after training
* Fix logging issue
added cuda support flag so log accurately reflected cuda toggle
* Update llama.cpp rebuild
Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run
* Improved vram management
Enabled memory pinning and optimizer state offload
* Fix CUDA check
rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support
* Added fast restart and fixed CUDA check command
Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild
Fixed make docker-check-cuda command to correctly reflect cuda support
* Added docker-compose.gpu.yml
Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification
* Fixed cuda toggle
Last push accidentally broke cuda toggle
* Code review fixes
Fixed errors resulting from removed code:
- Added return save_path to end of save_hf_model function
- Rolled back download_file_with_progress function
* Update Makefile
Use cuda by default when using docker-restart-backend-fast
* Minor cleanup
Removed unnecessary makefile command and fixed gpu logging
* Delete .gpu_selected
* Simplified cuda training code
- Removed dtype setting to let torch automatically handle it
- Removed vram logging
- Removed Unnecessary/old comments
* Fixed gpu/cpu selection
Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu
* Fix Ollama embedding error
Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones
* Fixed model selection & memory error
Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly