Renting Out DGX Spark
Posted by jsfour@reddit | LocalLLaMA | View on Reddit | 4 comments
I plan on building a DGX spark cluster. I will be using it a lot, but I’m trying to figure out if there are marketplaces where I could rent out compute time on it while it’s not in use by me.
Has anybody come across something like this?
Obviously this would be for people looking to do training, but I think the price could be cheaper than it would cost on cloud clusters given my only cost is energy.
Dontdoitagain69@reddit
DGX spark is a dev or poc box. It was meant to develop, test, deploy solutions-before deploying it on gpu clusters. The reason they made it only two as a cluster is for you to test concurrency not to inference some model.
Here is what DGX was designed for “CUDA kernel prototyping and tuning , end-to-end model training and fine-tuning, large-batch hyperparameter search, distributed training experiments with NCCL, data preprocessing and ETL acceleration with GPU-aware Spark, feature engineering at scale on GPU dataframes, building and testing custom PyTorch/TensorFlow ops, GPU-accelerated graph analytics (GraphX/GraphFrames/RAPIDS cuGraph), simulation workloads (Monte Carlo, physics, finance, risk), reinforcement learning training loops, synthetic data generation and augmentation, GPU-accelerated SQL / lakehouse query benchmarking, experimentation with new parallelization strategies (tensor/model/pipeline parallel), multi-GPU performance profiling and bottleneck analysis, I/O and data-pipeline stress testing for future production clusters, validating mixed-precision and quantization strategies, developing and testing custom CUDA libraries for internal use, benchmarking different storage formats and compression schemes on GPU, building internal GPU-accelerated analytics dashboards and reports, prototyping real-time streaming pipelines with GPU operators, running large-scale unsupervised learning (clustering, PCA, topic models) on GPUs”
PachoPena@reddit
I think that may have been its original purpose but now I see Spark being sold as more of a workstation. Like Nvidia repackages it with its downstream (for ex, Gigabyte calls it an AI TOP ATOM personal supercomputer: www.gigabyte.com/AI-TOP-PC/GIGABYTE-AI-TOP-ATOM?lan=en) and promise you can do all sorts of local AI developmemt with it, makes sense OP would think it could rent out compute like any server I suppose.
Automatic-Bar8264@reddit
Cluster meaning 2?
Smooth-Cow9084@reddit
Vast ai, and Runpod