Oracle announces MySQL HeatWave ML
Oracle announced that Oracle MySQL HeatWave now supports in-database machine learning (ML) in addition to the previously available transaction processing and analytics—the only MySQL cloud database service to do so. MySQL HeatWave ML fully automates the ML lifecycle and stores all trained models inside the MySQL database, eliminating the need to move data or the model to a machine learning tool or service. Eliminating ETL reduces application complexity, lowers cost, and improves security of both the data and the model. HeatWave ML is included with the MySQL HeatWave database cloud service in all 37 Oracle Cloud Infrastructure (OCI) regions.
Until now, adding machine learning capabilities to MySQL applications has been prohibitively difficult and time consuming for many developers. First, there is the process of extracting data out of the database and into another system to create and deploy ML models. This approach creates multiple silos for applying machine learning to application data and introduces latency as data moves around. It also leads to the proliferation of data out of the database, making it more vulnerable to security threats, and adds complexity for developers to program in multiple environments. Second, existing services expect developers to be experts in guiding the ML model training process; otherwise, the model is sub-optimal, which degrades the accuracy of predictions. Finally, most existing ML solutions don’t include functionality to provide explanations about why the models that developers build deliver specific predictions.
MySQL HeatWave ML solves these problems by natively integrating machine learning capabilities inside the MySQL database, eliminating the need to ETL the data to another service. HeatWave ML fully automates the training process and creates a model with the best algorithm, optimal features, and the optimal hyper-parameters for a given data set and a specified task. All models generated by HeatWave ML can provide model and prediction explanations.
No other cloud database vendor provides such advanced ML capabilities directly inside their database service. Oracle published ML benchmarks performed across a large number of publicly available machine learning classification and regression datasets such as Numerai, Namao, and Bank Marketing, among others. On average, on the smallest cluster, HeatWave ML trains machine learning models 25 times faster at one percent of the cost of Redshift ML. Additionally, the performance advantage over Redshift ML increases when training is done on a larger HeatWave cluster. Training is a time-consuming process and since it can be done very efficiently and rapidly with MySQL HeatWave, customers can now retrain their models more often and keep up with changes to data. This keeps the models up-to-date and improves the accuracy of predictions.
Edward Screven, chief corporate architect, Oracle, said, “Just as we integrated analytics and transaction processing within a single database, we are now bringing machine learning inside MySQL HeatWave. MySQL HeatWave is one of the fastest growing cloud services at Oracle. An increasing number of customers have migrated from Amazon and other cloud database services to MySQL HeatWave and have gained significant performance improvements and lower costs. Today, we are also announcing a number of other innovations which enrich HeatWave’s capabilities, improve availability, and lower the cost. Our new and fully transparent benchmark results again demonstrate that Snowflake, AWS, Microsoft, and Google are slower and more expensive than MSQL HeatWave by a large margin.”
Palanivel Saravanan, Cloud Engineering Leader, Oracle India, said, “Oracle maintains its leadership in the database market, while MySQL is the world’s most popular open-source database. With continued investment in improving our products and solutions, today we are adding machine learning capabilities to MySQL HeatWave, in addition to the previously available transaction processing and analytics features. This way customers have the benefits of keeping their data most secure, reducing their operations overheads and most importantly, making it significantly less expensive than buying hardware or software licenses. This is a game changer.”
HeatWave ML offers the following capabilities compared to other cloud database services:
Fully Automated Model Training: All of the different stages in creating a model with HeatWave ML are fully automated and do not require any intervention from developers. This results in a tuned model which is more accurate, requires no manual work, and the training process is always completed. Other cloud database services such as Amazon Redshift provide integration with machine learning capabilities in external services, which require extensive manual inputs from developers during the ML training process.
Model and Inference Explanations: Model explainability helps developers understand the behavior of a machine learning model. For example, if a bank denies a client a loan, the bank needs to be able to determine which parameters of the model were taken into account, or if the model contains any bias. Prediction explainability is a set of techniques that help answer the question of why a machine learning model made a specific prediction. Prediction explanations are becoming increasingly important these days as companies must be able to explain the decisions made by their machine learning models. HeatWave ML integrates both model explanation and prediction explanations as a part of its model training process. As a result, all models created by HeatWave ML can offer model as well as inference explanations without the need of training data at inference explanation time. Oracle has augmented existing explanation techniques to improve performance, interpretability, and quality. Other cloud database services do not offer such rich explainability for all of their machine learning models.
Hyper-Parameter Tuning: HeatWave ML implements a new gradient search-based reduction algorithm for hyper-parameter tuning. This enables the hyper-parameter search to be executed in parallel without compromising the model accuracy. Hyper-parameter tuning is the most time-consuming stage of ML model training, and this unique capability provides HeatWave ML with a significant performance advantage over other cloud services for building machine learning models.
Algorithm Selection: HeatWave ML uses the notion of proxy models—which are simple models exhibiting the properties of a full complex model—to determine the best ML algorithm for training. Using a simple proxy model, algorithm selection is done very efficiently without loss of accuracy. No other database services for building machine learning models have this proxy modeling capability.
Intelligent Data Sampling: During model training, HeatWave ML samples a small percentage of the data in order to improve performance. This sampling is done in such a manner that all representative data points are captured in the sample data set. Other cloud services for building machine learning models take a less efficient approach—using random data sampling—which samples a small percentage of data without considering the data distribution characteristics.
Feature Selection: Feature selection helps determine the attributes of the training data which influence the machine learning model behavior for making predictions. The techniques in HeatWave ML for feature selection have been trained over a broad swath of data sets across multiple domains and applications. From these gathered statistics and meta information, HeatWave ML is able to efficiently identify the relevant features in a new data set.
In addition to machine learning capabilities, Oracle released more innovations to the MySQL HeatWave service. Real-time elasticity enables customers to upsize and downsize their HeatWave cluster to any number of nodes, without any downtime or read-only time, and without the need to manually rebalance the cluster. Also included is data compression, which enables customers to process twice the amount of data per node and lowers costs by nearly 50 percent, while maintaining the same price performance ratio. Finally, a new pause-and-resume function enables customers to pause HeatWave to save costs. Upon resuming, both the data and the statistics needed for MySQL Autopilot are automatically reloaded into HeatWave.