Skip to content

Conversation

@hungnphan
Copy link

Summary

This PR adds support for the newly released Google Cloud Compute Engine machine types A4 and A4X, along with their associated NVIDIA GPU models (B200, GB200, and H200).

Motivation

Google Cloud has recently announced the A4 and A4X machine series featuring the latest NVIDIA Blackwell GPU architecture. These new accelerator-optimized machine types are designed for foundation model training and serving, representing a significant advancement in AI/ML compute capabilities.

Reference: https://cloud.google.com/compute/docs/gpus/

Changes Made

New Machine Type Configurations

1. A4 Machine Series (instances/series/a4.sql)

  • Family: Accelerator-optimized
  • GPU: NVIDIA B200 Blackwell GPUs
  • CPU Platform: Sapphire Rapids
  • Local SSD: 12,000 GiB
  • Network Bandwidth: 3,600 Gbps
  • Spot VM Support: Enabled
  • Machine Type: a4-highgpu-8g
    • 224 vCPUs
    • 3,968 GB memory
    • 8x NVIDIA B200 GPUs (1,440 GB total GPU memory)

2. A4X Machine Series (instances/series/a4x.sql)

  • Family: Accelerator-optimized
  • GPU: NVIDIA GB200 Grace Blackwell Superchips
  • CPU Platform: ARM Neoverse V2
  • Local SSD: 12,000 GiB
  • Network Bandwidth: 2,000 Gbps
  • ARM Architecture: Supported
  • Spot VM Support: Enabled
  • Machine Type: a4x-highgpu-4g
    • 140 vCPUs
    • 884 GB memory
    • 4x NVIDIA GB200 GPUs (720 GB total GPU memory)

GPU Model Support

Added support for the following NVIDIA GPU models in instances/series/gpu/gpu_names.sql:

  • NVIDIA H200 141GB (nvidia-h200-141gb) - Used in A3 Ultra
  • NVIDIA B200 (nvidia-b200) - Used in A4
  • NVIDIA GB200 (nvidia-gb200) - Used in A4X

Documentation Updates

Updated instances/README.md to:

  • Add A4 and A4X to the machine types list
  • Fix A3 link (was incorrectly pointing to a2.sql)
  • Update resources section to reference A3, A4, and A4X accelerator-optimized machines

Testing

All SQL files follow the existing project patterns and schema:

  • Consistent formatting with existing machine type configurations
  • Proper series and family classification
  • Accurate specifications from official Google Cloud documentation

References

Checklist

  • Created new SQL configuration files for A4 and A4X machine types
  • Updated GPU names mapping for new NVIDIA models
  • Updated documentation to reflect new machine types
  • Followed existing code style and patterns
  • All changes are based on official Google Cloud documentation
  • Clear and descriptive commit messages

Additional Notes

These machine types represent Google Cloud's latest offerings for AI/ML workloads:

  • A4 is optimized for foundation model training and serving with NVIDIA B200 GPUs
  • A4X features GB200 Grace Blackwell Superchips combining ARM CPUs with B200 GPUs for exascale AI computing

Both machine types require capacity reservation or specific provisioning methods as outlined in the Google Cloud documentation.

Hung Phan added 4 commits October 20, 2025 00:47
Add SQL configuration for Google Cloud A4 machine series:
- Accelerator-optimized family with NVIDIA B200 Blackwell GPUs
- Sapphire Rapids CPU platform
- 12TB Local SSD storage
- 3.6 Tbps network bandwidth
- Spot VM support enabled

Reference: https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms
Add SQL configuration for Google Cloud A4X machine series:
- Accelerator-optimized family with NVIDIA GB200 Grace Blackwell Superchips
- ARM Neoverse V2 CPU platform
- 12TB Local SSD storage
- 2 Tbps network bandwidth
- ARM architecture support
- Spot VM support enabled

Reference: https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms
Add GPU type mappings for latest NVIDIA accelerators:
- NVIDIA H200 141GB (nvidia-h200-141gb)
- NVIDIA B200 Blackwell (nvidia-b200)
- NVIDIA GB200 Grace Blackwell Superchip (nvidia-gb200)

These GPUs are used in A3 Ultra, A4, and A4X machine series respectively.

Reference: https://cloud.google.com/compute/docs/gpus
Update instances documentation:
- Add A4 and A4X to machine types list
- Fix A3 link (was incorrectly pointing to a2.sql)
- Update resources section to reference A3, A4, and A4X accelerator-optimized machines

This brings the documentation in sync with the newly added machine type configurations.
@Cyclenerd
Copy link
Owner

Thanks for the pull. As I understand it, you can only get A4 and A3 if you are special activated and have a separate contract. Am I right? How can we calculate the list price?

Please see: Cyclenerd/google-cloud-pricing-cost-calculator#279 and Cyclenerd/google-cloud-pricing-cost-calculator#309

@Cyclenerd
Copy link
Owner

Note: The machine type a4x-highgpu-4g is not published via the Google Compute API atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants