MarkLogic uses machine learning to solve complex data problems by leveraging the new Embedded Machine Learning capability that runs at the core of MarkLogic.

MarkLogic Embedded Machine Learning helps you achieve the best results because your machine learning models have direct access to high quality, curated, governed data. And, if you’re not a data scientist, that’s okay too. We’re also using this capability to improve how MarkLogic operates and how data is curated — but it’s completely transparent to users of the MarkLogic Data Hub.

What Is Machine Learning?

Machine learning can be thought of as pattern recognition in data. The challenge, however, is voluminous and complex data that makes it difficult to detect relationships between attributes in the data without advanced tools. A machine learning model is a mathematical representation of relationships allowing you to:

  • Predict some future state based on how those features might change. For example, a person becomes a high risk for a health condition because of non-obvious changes in their lifestyle or condition
  • Classify new data based on the patterns learned from history. For example, a new customer has attributes extracted from text-based health records that put them in a certain category

Above all, machine learning provides levels of accuracy with data and insights that were not previously possible.

Challenges with Machine Learning

Lack of quality and governance — You need to have proper governance to trust your data not only for effective machine learning, but to foster trust in machine learning outputs. You need to be able to answer questions such as: What data should be used? Where did it come from and what’s been done to the data? Does it contain PII? Is it the same data we used last time? Good data is critical because machine learning can be even more sensitive to data quality since you’re using the same data to both train and then execute the model. As a result, any problems with data quality get amplified.

Wild west ecosystem — The machine learning and AI tools ecosystem is incredibly complex and as security and governance become a priority, it is tough to find people with the right skillsets to build and maintain the systems. According to an article in The New York Times, data scientists spend 80% of their time just wrangling data.

Low business ROI — Often times the business doesn’t trust the ‘black box’ outputs of machine learning models even when they are accurate. AI investments for most companies look more like science projects rather than core infrastructure because businesses don’t understand or trust the outputs of machine learning models to make decisions using them. And, data scientists and the hardware infrastructure they need aren’t cheap. High costs and poor outputs equate to an overall low ROI.

Die MarkLogic Lösung

We think the best place to do machine learning is in a data hub where data can be secured, governed and curated. That’s why we built MarkLogic Embedded Machine Learning into the core of MarkLogic. Machine learning routines can run close to the data, in parallel across a MarkLogic cluster, under the umbrella of a secure environment.

Vorteile

Improving Database Operations

With Embedded Machine Learning, MarkLogic will run queries more efficiently and scale autonomously based on workload patterns. With autonomous elasticity, for example, MarkLogic can use models of infrastructure workload patterns to automatically adjust the rules that govern data and index rebalancing.

Improving Data Curation

Embedded Machine Learning reduces complexity and increases automation of various steps in the data curation process. For example, with MarkLogic’s Smart Mastering feature, machine learning will augment the rules-based mastering process so that records are mastered with more accuracy, and models continue to improve as more data is processed—all with less human involvement.

Improving Data Science Workflows

For data scientists, it’s now simpler to just do the work of training and executing models right inside MarkLogic, where we can handle almost every part of the architecture and process. This includes data processing/curation, and the model engineering to build, train, execute and deploy the model.

So funktioniert's

MarkLogic’s Embedded Machine Learning is a full deep learning toolkit that operates as a run-time library installed right at the core of MarkLogic, in the database kernel. It exposes its functions as built-ins from JavaScript and XQuery, which means these functions run close to the data and are completely integrated.

Embedded machine learning was designed for peak performance not only for CPUs but also for GPUs, and it scales to multi-machine-multi-GPU systems. Additionally, it is designed using a compression technique that dramatically reduces communication costs, reducing inter-node communications and enabling highly scalable parallel training across multiple machines.

Embedded machine learning also supports the Open Neural Network Exchange ONNX format, an open-source shared model representation allowing for framework interoperability and shared optimization.  ONNX allows developers to move models between popular frameworks such as CNTK, MXNet, PyTorch, and others.

The toolkit leveraged to build MarkLogic Embedded Machine Learning was originally developed by Microsoft in conjunction with Facebook and AWS and released under the name Cognitive Toolkit, or CNTK. Microsoft used CNTK to develop keystone products like Skype, HoloLens, Cortana, and Bing.

Architektur

Client Sever Interface

The Best Database for Machine Learning and AI

Watch a talk introducing MarkLogic’s new machine learning algorithms and GPU acceleration capabilities. Learn more about data curation and take a deep dive into one company’s machine learning implementation.

Weitere Ressourcen

Dokumentation
Take a look at our machine learning docs

Mehr erfahren

Blogbeitrag
Read our machine learning announcement

Mehr erfahren

Webinar
Watch our webinar with a demo on Embedded Machine Learning

Mehr erfahren

Funktionen speziell für Unternehmen

Auf dieser Website werden Cookies verwendet.

Mit der Nutzung dieser Webseite stimmen Sie der Verwendung von Cookies gemäß der MarkLogic Datenschutzrichtlinie zu.