Enhans :: Starting GPU Adoption from Scratch

Starting GPU Adoption from Scratch

April 4, 2025

Labs

For AI startups, using GPUs is essential. Many startups adopt and utilize GPUs based on their own criteria. These deployments can take various forms—cloud-based, on-premise, or even hybrid setups. By sharing our decision-making process and model selection criteria for GPU adoption, we hope to provide useful insights for engineers considering a similar implementation.

Please note that this content is based on data available at the time of adoption.

‍

Background

‍

Since the second half of 2023, the usability of LLM models has increased due to the expansion of in-house services and features. As a result, the usage of commercial models has surged significantly. Given the growing service demands at the time, the need for GPUs became more apparent due to multiple factors.

While commercial AI models like GPT-3 and GPT-4 offer excellent performance, they are designed as general-purpose models rather than being specialized for a specific field or domain.
- Consequently, additional processing is required to tailor these models for our specific domain and objectives.
  Moreover, since these models operate as commercial services, cost is always a crucial factor.
  It was essential to evaluate both training costs and the operational expenses of providing services based on these models.

Fine-tuning costs for GPT-3.5 Turbo in 2023
(ref: OpenAI - GPT-3.5 Turbo Fine-Tuning and API Updates)

‍

Relying entirely on commercial models can lead to a critical vulnerability—becoming dependent on a specific service provider.

Due to these factors, the necessity for GPUs became even more evident, not only for researching and training alternative LLM models but also for evaluating and implementing them.

‍

Requirements

To determine the necessary hardware specifications, we first assessed the level of model training required and the anticipated trajectory of model development. This evaluation was based on a list of available models at the time.

The performance gap between LLaMA2 7B and LLaMA2 70B speaks for itself.
(Ref: Encord - LLaMA2 Explained)
‍

Based on these benchmark results, we established the following specifications for the required hardware:

- Rather than prioritizing FP32 FLOPS for raw computational speed, we focused on VRAM capacity to ensure smooth model parameter loading.
- We set a baseline requirement of at least 140GB of VRAM to accommodate fine-tuning for LLMs of 70B+ parameters.

‍

Basis for Requirement Analysis

The minimum adoption criteria were determined based on model performance evaluations.

For the LLaMA2 70B model, fine-tuning using LoRA (Low-Rank Adaptation) has been well-documented with relevant benchmarks:

(Ref: Anyscale - Fine-Tuning LLMs: LoRA or Full-Parameter?)

This study shows that while the LLaMA2 70B model is sensitive to input token size, it is capable of full-parameter training. When using LoRA, it can efficiently fine-tune for specific context lengths, making it a viable candidate for our requirements.

Based on these criteria, we proceeded with a feasibility study on separating GPU machines for training and inference to optimize resource allocation.

‍

Deployment Consideration

At the time, we had two main options for GPU deployment: cloud-based GPU services or on-premise GPU infrastructure. Below is an overview of the cost analysis and considerations for each option.

Cloud / Provider A

‍

VRAM 320GB, A100 8 GPUs

We identified an instance with a total VRAM of 320GB using eight A100 GPUs, which met the recommended specifications. The target instance was p4d.24xlarge.

The estimated operational costs based on a one-year commitment were as follows:

On-demand pricing: $32.77 per hour (~KRW 45,000)
- Daily: KRW 1.08M
- Monthly: KRW 32.4M
- Yearly: KRW 394M
One-year reserved pricing: $19.22 per hour (~KRW 27,000)
- Daily: KRW 650K
- Monthly: KRW 19.5M
- Yearly: KRW 237M

‍

VRAM 64GB, V100 8 GPUs

If we opted for the LLaMA-2 7B model instead of LLaMA-2 70B, we could consider a lower-spec instance like p3dn.24xlarge.

On-demand pricing: $12.24 per hour (~KRW 16,000)
- Daily: KRW 384K
- Monthly: KRW 11.52M
- Yearly: KRW 140M
One-year reserved pricing: $7.96 per hour (~KRW 10,000)
- Daily: KRW 240K
- Monthly: KRW 7.02M
- Yearly: KRW 84M

While CPU-based inference serving was also considered, it was ruled out due to significant response time degradation, which would negatively impact service quality.

Cloud / Provider B

We also compared costs with another cloud provider offering similar performance.

‍

‍

Instance Type	GPUs	GPU Memory	vCPU	Hourly Cost	Ref
a2-highgpu-8g	A100 8EA	320GB	96	$30	Link
a2-ultragpu-4g	A100 4EA	320GB	96	$20	Link

‍

On-Premise / AMD

AMD's MI-210 training GPU was also considered.

At the time of evaluation, the AMD MI-210 was priced at approximately KRW 32M per unit (now priced at KRW 55M).

However, AMD-based GPUs posed several challenges:

Limited software and framework support
Lack of references and community resources
Additional learning curve for engineers

Due to these concerns, AMD was deprioritized as an option.

‍

On-Premise / Nvidia

Most cloud services primarily use Nvidia GPUs. Given this industry standard, if we were to build an on-premise setup, Nvidia would be the preferred choice.

To balance training and inference workloads, we determined that at least four A100 80GB GPUs would be necessary.

The SXM form-factor model with NVLink support was priced at approximately KRW 23M per unit, meaning the total cost for four GPUs alone would be KRW 100M.

‍

Final Decision: On-Premise Nvidia GPUs

After a comprehensive cost-benefit analysis, on-premise deployment was found to be more favorable than cloud-based solutions, offering:

Lower long-term costs
- Cloud services would cost over KRW 200M per year for GPU instances.
- On-premise deployment required a one-time investment of KRW 100M for GPUs, plus additional infrastructure costs.
Better operational control
- Avoid dependency on cloud GPU availability.
- More flexibility in managing workloads without unexpected cloud costs.
Cloud GPU availability issues
- Even with cloud credits, GPU instances were often unavailable due to high demand.
- This could negatively impact training and inference operations.

Given these factors, we concluded that on-premise Nvidia GPUs were the most viable option for our needs.

‍

Operational Cost Analysis

Electricity Costs

If we proceeded with an on-premise deployment, we needed to estimate the ongoing operational costs, particularly electricity consumption.

Key Assumptions:

Peak power consumption was used for estimation, assuming one month of continuous model training.
Current electricity contract was used as the reference: (General Use (Type I), Low Voltage, Contracted Power 10kW).
Only GPU power consumption was considered (excluding server/CPU operation costs).
Usage scenario:
- LLM model training assumed to run 18 hours per day, 20 days per month.
- Costs could vary depending on training duration and methodology.
- Consideration was given to the possibility of pretraining a 70B model.

‍

	A100 80GB	AMD MI210 64GB	AMD MI250 128GB
Minimum GPU Quantity (based on VRAM)	4 units	4 units (slightly insufficient)	2 units
Power Consumption (per unit, peak)	400W	300W	560W
Total Power Consumption (GPU only)	1.6 kW	1.2 kW	1.12 kW
Daily Power Consumption (GPU only, 18H usage)	~29 kWh	~22 kWh	~20 kWh
Monthly Power Consumption (22 days usage)	~640 kWh	~480 kWh	~440 kWh
Estimated Monthly Electricity Cost	~KRW 150,000	~KRW 130,000	~KRW 120,000

‍

Even when including air conditioning and humidity control, the projected annual electricity costs remained below KRW 300,000 per month.

Given these factors, we proceeded with further cost analysis on IDC (Internet Data Center) hosting vs. an in-house micro data center.

IDC vs. In-House Micro Data Center

IDC Hosting

GPU servers consume significantly more power than general-purpose servers, requiring two full racks at an IDC.

Power requirements:
- General servers typically operate at 3kW per rack, but GPU servers require 4kW to 10kW.
- Additional networking costs also contribute to higher operational expenses.
Estimated IDC Monthly Cost: KRW 3M

In-House Micro Data Center

Operating an internal server room incurs an upfront infrastructure cost but provides long-term savings in maintenance and operations.

Advantages:
- Lower maintenance costs in the long run.
- Direct integration with the internal network, reducing networking expenses.
Estimated In-House Monthly Cost: KRW 400K

Final Decision: On-Premise GPU Deployment with an In-House Micro Data Center

Based on these findings, we proceeded with an in-house GPU deployment and established a micro data center. This decision was driven by:

Lower ongoing costs:
- IDC hosting required KRW 3M per month, while an in-house setup cost just KRW 400K per month.
- Electricity costs remained manageable, even with additional cooling.
Operational Control & Scalability:
- In-house infrastructure provided better control over hardware resources.
- Eliminated dependency on external GPU availability issues in cloud services.
Long-term Cost Efficiency:
- Despite higher initial investment, maintenance and operational expenses were significantly lower.

This self-hosted approach ultimately provided the most cost-effective and operationally sustainable solution for our GPU infrastructure.

‍

In-House GPU Machine Deployment

GPU Machine Selection

The GPUs were selected based on the following requirements:

Specifications for GPU Procurement Request

### GPU Procurement Specification

Requirements:
- **Training-focused server**
  - Not intended for high-throughput or large-scale inference.
  - Used for training and internal PoC (Proof of Concept) inference.

- **Minimal data storage needs**
  - **CPU**: Intel Xeon, 24-core (Gold 6248R level)
  - **Memory**: 512GB RAM (2x GPU VRAM capacity)
  - **Storage**: NVMe SSD, ~8TB

- **Support for 4 to 8 GPUs**
  - If 8 GPUs are included, vendor must provide real-world usage references.

- **Power consumption data**
  - Vendors must provide technical documents detailing server power usage.

‍

Vendor Selection & Procurement

### Vendor Selection Criteria

1. **Top-tier Vendor?**
   - Does the vendor have experience with diverse server configurations and high-density GPU servers?
   - Can the vendor deliver the system within **4 months**?
   - Does the vendor have real-world references for the requested server configurations?

2. **Technical Service Support?**
   - Can the vendor provide a **2-year or longer warranty**?
   - Is technical service available upon request?

3. **Customized System Design for Internal Needs?**
   - Can the vendor accurately assess internal needs and recommend the best system?
   - Can the vendor optimize the system within the requested specifications?

‍

We held meetings with four vendors, and during this process, we learned that A100 was being deprecated, while H100 had a lead time of over four months. Instead, two vendors introduced the L40S GPU as a viable alternative.

L40S GPU Overview

The L40S GPU was introduced as a general-purpose alternative to A100, offering similar specs while being available within two months.

Memory Bandwidth: Approximately half of A100
FP32 Compute Performance: ~3x faster than A100
Power Consumption: ~80% of A100, leading to lower operational costs

However, L40S uses PCIe, which may introduce bottlenecks compared to NVLink. Despite this, its cost-effectiveness, availability, and versatility made it a strong alternative to A100, leading us to proceed with L40S-based machines.

‍

‍

GPU Machine Procurement & Cost Comparison

We requested cost estimates from vendors for machines supporting up to 8 L40S GPUs.

	Vendor A	Vendor B	Vendor C	Vendor D
GPU	L40S 48GB	L40S 48GB	L40S 48GB	L40S 48GB
Number of GPUs	4	4	8	8
Total VRAM	192GB	192GB	384GB	384GB
CPU Model	Xeon Gold 5418Y	Xeon Gold 6526Y	Xeon Gold 6342	Xeon Gold 6330
CPU Cores	24	16	24	28
CPU Clock	2.0 GHz	2.8 GHz	2.8 GHz	2.0 GHz
RAM Memory	32GB	64GB	16GB	64GB
RAM Clock	DDR5 4400	DDR5 5600	DDR4 3200	DDR4 3200
Total RAM Size	384GB	512GB	384GB	512GB
Storage (NVMe SSD)	7.68TB	7.68TB	3.84TB	7.68TB
Max Power Consumption	-	-	3.3KW	-
Price (Per Server)	***	***	***	***
Additional Notes	2 servers + networking	2 servers + networking	-	-

‍

After careful evaluation, we selected Vendor C's machine, which best met our performance and cost criteria.

‍

‍

In-House Micro Data Center Setup

Cost Savings vs. IDC Hosting

Compared to IDC hosting, setting up an in-house micro data center resulted in monthly cost savings of approximately KRW 1.5M.

IDC Hosting Cost Estimate: KRW 3M per month
In-House Micro Data Center Estimate: KRW 1.5M per month

Micro Data Center Setup Details

Rack Capacity: Up to 2 GPU racks
Server Capacity:
- 1 high-end machine OR
- 2 mid-range machines
Power Configuration:
- Support for up to 20kW power consumption
- Custom power distribution board for UPS compatibility

The estimated setup cost was KRW 15M, meaning it paid for itself within 10 months.

Implementation Details

Dedicated office space was allocated.
SGP panel was used to construct a 3-pyeong (≈10m²) micro data center.
20kW power capacity was installed via a custom power distribution board.
Ceiling-mounted power trays were integrated into existing office lighting rails.
40-pyeong air conditioning unit (14.5kW cooling capacity) was installed.
- This accommodates current server heat dissipation (3.8kW).
- Future scalability for high-performance DGX-1-class (10kW) machines was ensured.

Finalized In-House Micro Data Center Setup

‍

‍Post-Implementation Impact

The most significant benefit of our in-house GPU deployment was the reduction in research-related costs.

When fine-tuning a LLaMA-2 13B model on cloud GPUs, training took 7 days, incurring a cost of $1,000 per day (~KRW 1.3M per day). While most cloud GPU users only consider pure training time, hidden costs—such as initial environment setup and data integration—also contribute to expenses.

In contrast, with 8 L40S GPUs, the operational cost was significantly lower. Based on the power consumption of 3.2kW, the estimated daily electricity cost was around KRW 8,000.

While air conditioning costs were initially high for room cooling, the sealed nature of the data center maintained stable temperature and humidity levels, minimizing long-term cooling expenses.

Inference Cost Reduction

By deploying internal inference models instead of relying on commercial APIs, we reduced per-query costs from $0.18 to $0.01, achieving a 90% reduction in inference expenses.

Research & Training Considerations

From a research perspective, lower cost barriers allowed for continuous internal research on domain-specific model optimization.

A trade-off existed in terms of training speed:
- H100 completed training in 0.5 days.
- L40S required 2 days for the same workload.
- Despite this, optimization techniques helped mitigate the impact.

If you're considering whether to adopt in-house GPUs, I hope this blog post provides valuable insights.

Thank you!

‍