Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Google Cloud takes aim at CoreWeave and AWS with managed Slurm for enterprise-scale AI training

Some enterprises are best served by fine-tuning large models to their needs, but a number of companies plan to build their own models, a project that would require access to GPUs. Google Cloud wants to play a bigger role in enterprises’ model-making journey with its new service, Vertex AI Training. The service gives enterprises looking to train their own models access to a managed Slurm environment, data science tooling and any chips capable of large-scale model training. With this new service, Google Cloud hopes to turn more enterprises away from other providers and encourage the building of more company-specific AI models. While Google Cloud has always offered the ability to customize its Gemini models, the new service allows customers to bring in their own models or customize any open-source model Google Cloud hosts. Vertex AI Training positions Google Cloud directly against companies like CoreWeave and Lambda Labs, as well as its cloud competitors AWS and Microsoft Azure.  Jaime de Guerre, senior director of product management at Gloogle Cloud, told VentureBeat that the company has been hearing from a lot of organizations of varying sizes that they need a way to better optimize compute but in a more reliable environment.“What we're seeing is that there's an increasing number of companies that are building or customizing large gen AI models to introduce a product offering built around those models, or to help power their business in some way,” de Guerre said. “This includes AI startups, technology companies, sovereign organizations building a model for a particular region or culture or language and some large enterprises that might be building it into internal processes.”De Guerre noted that while anyone can technically use the service, Google is targeting companies planning large-scale model training rather than simple fine-tuning or LoRA adopters. Vertex AI Services will focus on longer-running training jobs spanning hundreds or even thousands of chips. Pricing will depend on the amount of compute the enterprise will need. “Vertex AI Training is not for adding more information to the context or using RAG; this is to train a model where you might start from completely random weights,” he said.Model customization on the rise
Enterprises are recognizing the value of building customized models beyond just fine-tuning an LLM via retrieval-augmented generation (RAG). Custom models would know more in-depth company information and respond with answers specific to the organization. Companies like Arcee.ai have begun offering their models for customization to clients. Adobe recently announced a new service that allows enterprises to retrain Firefly for their specific needs. Organizations like FICO, which create small language models specific to the finance industry, often buy GPUs to train them at significant cost. Google Cloud said Vertex AI Training differentiates itself by giving access to a larger set of chips, services to monitor and manage training and the expertise it learned from training the Gemini models. Some early customers of Vertex AI Training include AI Singapore, a consortium of Singaporean research institutes and startups that built the 27-billion-parameter SEA-LION v4, and Salesforce’s AI research team. Enterprises often have to choose between taking an already-built LLM and fine-tuning it or building their own model. But creating an LLM from scratch is usually unattainable for smaller companies, or it simply doesn’t make sense for some use cases. However, for organizations where a fully custom or from-scratch model makes sense, the issue is gaining access to the GPUs needed to run training.Model training can be expensiveTraining a model, de Guerre said, can be difficult and expensive, especially when organizations compete with several others for GPU space.Hyperscalers like AWS and Microsoft — and, yes, Google — have pitched that their massive data centers and racks and racks of high-end chips deliver the most value to enterprises. Not only will they have access to expensive GPUs, but cloud providers often offer full-stack services to help enterprises move to production.Services like CoreWeave gained prominence for offering on-demand access to Nvidia H100s, giving customers flexibility in compute power when building models or applications. This has also given rise to a business model in which companies with GPUs rent out server space.De Guerre said Vertex AI Training isn’t just about offering access to train models on bare compute, where the enterprise rents a GPU server; they also have to bring their own training software and manage the timing and failures. “This is a managed Slurm environment that will help with all the job scheduling and automatic recovery of jobs failing,” de Guerre said. “So if a training job slows down or stops due to a hardware failure, the training will automatically restart very quickly, based on automatic checkpointing that we do in management of the checkpoints to continue with very little downtime.”He added that this provides higher throughput and more efficient training for a larger scale of compute clusters. Services like Vertex AI Training could make it easier for enterprises to build niche models or completely customize existing models. Still, just because the option exists doesn’t mean it's the right fit for every enterprise. 

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

Here are all the games Microsoft added to Game Pass today, including a whole lot of Assassin's Creed
Here are all the games Microsoft added to Game Pass today, including a whol

<p>Xbox owners got a bit of a nasty surprise thanks to Microsoft’s sudden <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/microsoft-jacks-the-price-of-game-pass-ul [...]

Match Score: 143.10

The best smart scales for 2025
The best smart scales for 2025

<p>The New Year is here and there’s no better time to kickstart those health and fitness goals. Whether you’re looking to shed a few holiday pounds, track your muscle gains or simply stay on [...]

Match Score: 139.48

venturebeat
The next AI battleground: Google’s Gemini Enterprise and AWS’s Quick Su

<p>The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more <a href=& [...]

Match Score: 125.30

Amazon's AWS outage has knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline
Amazon's AWS outage has knocked services like Alexa, Snapchat, Fortnite, Ve

<p>On this crisp October morning, it feels like half of the internet is dealing with a hangover. A severe Amazon Web Services outage took out many, many websites, apps, games and other services [...]

Match Score: 113.64

engadget
AWS fell on its face, knocking many apps, websites and games offline

<p>On this crisp October morning, it feels like half of the internet is dealing with a hangover. A severe Amazon Web Services outage took out many, many websites, apps, games and other services [...]

Match Score: 107.77

venturebeat
GitHub leads the enterprise, Claude leads the pack—Cursor’s speed can

<p>In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with o [...]

Match Score: 101.93

venturebeat
Salesforce launches AI 'trust layer' to tackle enterprise deployment failur

<p><a href="https://www.salesforce.com/"><u>Salesforce Inc.</u></a> is expanding its artificial intelligence platform with new data management and governance ca [...]

Match Score: 78.66

venturebeat
Echelon's AI agents take aim at Accenture and Deloitte consulting models

<p><a href="https://www.echelonai.com/"><u>Echelon</u></a>, an artificial intelligence startup that automates enterprise software implementations, emerged from [...]

Match Score: 68.79

venturebeat
World's largest open-source multimodal dataset delivers 17x training effici

<p>AI models are only as good as the data they&#x27;re trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.</p&g [...]

Match Score: 67.76