-1.6 C
New York
Thursday, February 2, 2023

Databricks donates Delta Lake framework and MLflow operations platform entirely to open source – SiliconANGLE News

UPDATED 11:30 EDT / JUNE 28 2022
by Paul Gillin
Databricks Inc. opens its Data + AI Summit today with the announcement that it will release the entirety of its Delta Lake storage framework to open-source under the oversight of the Linux Foundation.
That means there will no longer be any functional differences between the Databricks-branded Delta Lake and the open-sourced version. The company said it will similarly release its recent enhancements to the MLflow machine learning operations platform and Apache Spark analytics framework to open source. Databricks also rolled out several new features for its core Lakehouse data lake.
Delta Lake, which was introduced three years ago and donated to open source in June 2020, improves the efficiency of the hybrid structured and unstructured analytical stores called data lakes to make information more reliable. It does that by managing transactions across batch and streaming data, coordinating multiple simultaneous writes and doing away with the need to build complicated data pipelines.
“Before Delta Lake, technologies like Spark would process large amounts of data; Delta Lake lets you process small deltas with all changes stored in history so you can go back and forward,” said Ali Ghodsi (pictured) Databricks’ co-founder and chief executive of Databricks. “This is important for audit trails and compliance so you can go back and find decisions you made a year ago.”
A new 2.0 release of Delta Lake features better query performance and a foundation based on open standards. The release candidate is now available and is expected to go into a general release later this year. Databricks said the update reflects contributions from more than 6,400 developers and noted that total commits have grown 95% with the average number of lines of code per commit surging 900% over the past year.
The company is also announcing version 2.0 of MLflow, a platform for managing machine learning projects. The release includes Pipelines, a new feature to speed and simplify machine learning model deployments. Pipelines give data scientists pre-defined, production-ready templates based on the model type they’re building to allow faster and more reliable model development without requiring intervention by production engineers.
Users can define the elements of the pipeline in a configuration file and MLflow Pipelines manages execution automatically, the company said. Databricks has also added serverless model endpoints to directly support production model hosting, as well as built-in model monitoring dashboards to help teams analyze the real-world model performance.
Ghodsi said the decision to donate the latest enhancements to MLflow — which was open-sourced two years ago to the Linux Foundation — is consistent with the company’s roots. “For us, the whole business model is to keep open-sourcing and keep innovating,” he said. Claiming 1 million downloads for MLflow, he said giving the software away has downstream benefits to the company.
“Imagine an enterprise software company with a million downloads,” he said. “Those people are not our customers but they are using our technology. These projects become standards; people teach classes and write books about them.”
Enhancements to Spark, the wildly successful analytics framework that launched Databricks in 2013, include Spark Connect, which allows Spark to run on nearly any device, and Project Lightspeed, a Structured Streaming engine for data streaming on the lakehouse. Spark Connect is a client/server interface for Spark based on Databricks’ DataFrame API that decouples the client and server for better stability while allowing for built-in remote connectivity.
Project Lightspeed is described as the next generation of the current Spark Structured Streaming engine that is aimed at improving performance, building a support ecosystem for connectors, adding new operators and simplifying deployment and operations.
The new streaming engine will also be more accessible from popular analytics programming languages such as Python, Ghodsi said. “Every year we’ve been excited for real-time streaming to take off and this year it’s taking off, I think, because of machine learning,” he said.
Databricks is also using the event to roll out a series of enhancements to its flagship Lakehouse platform. They include a serverless version that is now available in preview on the Amazon Web Services Inc. cloud, general availability of the company’s Photon query engine, open-source connectors for Go, Node.js and Python, and the ability to federate queries across multiple remote data sources without first extracting and loading the data.
Click here to join the free and open Startup Showcase event.
We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.
Click here to join the free and open Startup Showcase event.
Google launches public sector division for cloud services
Cryptocurrency data provider Kaiko closes $53M investment
HPE GreenLake expansion aimed at blurring lines between public and private clouds
Real-Time Innovations launches new connectivity framework for software-defined vehicles
Google Cloud Armor adds rate limiting, bot management, threat intelligence and more
Databricks donates Delta Lake framework and MLflow operations platform entirely to open source
Google launches public sector division for cloud services
CLOUD – BY MARIA DEUTSCHER . 28 MINS AGO
Cryptocurrency data provider Kaiko closes $53M investment
BLOCKCHAIN – BY MARIA DEUTSCHER . 4 HOURS AGO
HPE GreenLake expansion aimed at blurring lines between public and private clouds
CLOUD – BY PAUL GILLIN . 4 HOURS AGO
Real-Time Innovations launches new connectivity framework for software-defined vehicles
EMERGING TECH – BY DUNCAN RILEY . 5 HOURS AGO
Google Cloud Armor adds rate limiting, bot management, threat intelligence and more
SECURITY – BY MIKE WHEATLEY . 5 HOURS AGO
Databricks donates Delta Lake framework and MLflow operations platform entirely to open source
BIG DATA – BY PAUL GILLIN . 6 HOURS AGO
Forgot Password?
Like Free Content? Subscribe to follow.

source

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles