Databricks Acquires Okera to Strengthen Its AI-Based Data Governance Platform

Join senior executives in San Francisco on July 11-12 to learn how leaders are integrating and optimizing AI investments for success. Learn more

Databrick announced today the acquisition of the private company data governance platform provider Okera. The plan is to integrate Okera’s technology into Databricks’ existing data governance solution, Unity Catalog, offering more AI-powered features.

“By bringing in the talented Okera team and leveraging their domain expertise, we will accelerate the Unity catalog roadmap and deliver best-in-class governance for the Lake House,” said Reynold Xin. , co-founder and chief architect of Databricks, to VentureBeat.

Financial terms of the deal were not made public.

San Francisco-based Okera was founded in 2016 and raised $29.6 million in funding before being acquired. Okera has focused in recent years on the use of artificial intelligence for data governance and security.


Transform 2023

Join us in San Francisco on July 11-12, where senior executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.

Register now

Databricks, on the other hand, has raised a staggering $3.5 billion in venture capital to expand its data lake and AI technologies. Databricks recently made headlines for entering the generative AI space with the launch of Cartits ChatGPT clone.

Databricks and Okera were hardly strangers before the announcement of the acquisition. Xin noted that Nong Li, co-founder and CEO of Okera, is widely known for creating apache flooring, which is an open-source standard storage format that Databricks and the rest of the industry rely on. Li also previously worked at Databricks and led the vectorized parquet and codegen efforts that resulted in the performance improvement of Apache Spark 2.0 by 10 times.

What Okera brings to Databricks

Whether for analytics or machine learning (ML), data is fundamental. Being able to properly manage this data is essential for both accuracy, security and compliance.

Xin said that with Okera, customers will be able to use AI to discover, classify and govern all their AI data, analytics and assets with attribute-based and intent-based access policies. Governance is also observability – which is another area where Okera’s technology will help. Xin noted that Okera will help support Databricks data observability on the Lakehouse, allowing organizations to centrally audit and report the use of sensitive data in analytics and AI applications.

Going one step further, the combination of Okera and Databricks will allow users to automatically trace data lineage down to the column level.

“The idea is that customers will get a holistic view of their data estate through the clouds,” Xin said.

New security checks are on the way

Part of governance is also being able to provide the necessary controls to allow only authorized access. This is an area where Okera’s technology will also be useful to the Databricks platform in the future.

“Okera has also developed a new isolation technology that can support arbitrary workloads while enforcing governance control without sacrificing performance,” Xin said. “It will help companies effectively cover the full spectrum of applications in the new world.”

The isolation technology is currently in private preview and has already been tested by a number of joint Databricks and Okera customers on their AI workloads.

Safeguards or governance? What does AI need?

As AI becomes more powerful and versatile, the question of how to ensure its safety and ethical use has become urgent. One of the leading companies in the field, Nvidia, last month unveiled a new initiative called NeMo Railingwhich aims to help developers monitor and regulate the output of generative AI models capable of creating realistic text, images and speech.

Xin and Databricks also see the need for guardrails, as well as governance for AI.

“In this new world of AI, managing safeguards on the underlying data on which AI models, like LLMs, are trained is critical to mitigating bias and maintaining compliance if they are trained on private data,” Xin said. “For transparency, it is also essential to be able to trace the data lineage to ensure that these models are relevant, up-to-date and trustworthy.”

Xin commented that Okera’s AI-based tagging classification for all AI data and assets provides a holistic view of sensitive data, like personally identifiable information (PII). He adds that this will help customers apply these safeguards, not only on the underlying data, but also on ML models and functionality.

“AI can bring extreme value to organizations looking to harness their data, but as many AI pioneers have pointed out, it can also be misused, so thoughtful guidance is needed. “, said Xin. “From our perspective, the principles of governance – accountability, standardization, compliance, quality and transparency – apply as much to AI as they do to data.”

VentureBeat’s Mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Discover our Briefings.

Leave a Comment