Michael J. Sullivan is Principal Cloud Solutions Architect at Oracle and one of SEMANTiCS 2019 Keynote Speakers. Michael has twenty+ years experience in professional services as a senior architect and tech lead responsible for designing and implementing custom integrated Customer Experience (CX) and Digital Experience (DX) solutions to Fortune 1000 companies using the Oracle stack. In this interview Michael talks about making the most of enterprise data and typical constraints along the ways.
“Hybrid Knowledge Management Architectures”, is the title of your Keynote at SEMANTiCS 2019. Such an architecture should address upcoming questions in data management like size, heterogeneity and 3rd party data orchestration. Which practical implications are you suggesting there?
First things first. What I mean by “hybrid” architectures = Neither top-down nor bottom-up. Neither homogeneous nor heterogeneous. On-prem + cloud. De-centralized business logic (e.g. have the database manage the model instead of the application, Microservices, Serverless, etc.). Structured + unstructured. Polyglot is a given. Multiple data silos. Multiple use-cases. Multiple technologies. Multiple vendors. In other words, the typical enterprise today.
So now that that is out of the way, how do we work with the above mess? In particular, how do we make sense of and maximize our IP? Enterprises have been struggling with this for decades with no end in sight. One thing we have learned over the years is that you cannot simply force everything into a single solution. J2EE and .NET are now relatively ancient ideas which, perhaps inadvertently, encouraged a monolithic approach to IT. Recent developments in our industry have moved far beyond these patterns, facilitated by the move to the cloud.
That being said, I have a fondness for viewing RDF -- particularly cloud-based RDF (and linked data) -- as a viable, non-disruptive, universal solution as a global metadata manager for the entire enterprise. In this way it could be thought of as a sort of knowledge registry for all applications used by the enterprise.
Additionally, being able to leverage patterns such as wrapping a virtual model around multiple domain-specific models + RDF views of relational data + materialized entailments all together as one view seems to me to be a real game changer. This is in keeping with and compatible with “try not to move the data” which is one of the key pillars of hybrid knowledge architectures.
But another key thing about RDF is that it should be thought of and promoted as “schema last” rather than just “schema less”. When thought of this way, developers and architects show less concern as they realize they won’t be tasked with maintaining a virtual schema in their application.
It seems as if Hybrid Knowledge Management Architectures were a remedy for issues crucial and typical at the same time. What are the main processes behind this approach and where are the main advantages compared to the technologies and approaches that also are out there?
The key advantage of embracing a hybrid approach is that you can then always use the best tool for the job and not let someone’s idealized concept of a master architecture dictate what end-users and developers use.
Simple example: RDF is ideal for semantics. Property Graphs are ideal for analytics. Why should we have to chose one over the other? I suggest that we should be using them both as both excel at what they do. The long-standing arguments against RDF ring hollow to my ear. Heck, exporting RDF for graph analytics is not rocket science. And neither is re-importing the analytics back into RDF. Ditto for machine learning — send data out for classification/ranking and update them with these enrichments back at the source. Iterate and repeat.
The key blocker is orchestration across all these silos. And yes, that is a big, mostly-unresolved issue.
The glue that makes a hybrid approach even possible is:
master metadata management,
robust cloud/on-prem integration services,
autonomous Disaster Recovery (DR) & High Availability (HA), and
the trend to move more modeling and analytics logic into the database and out of the application layers
Tell us about recent cases, challenges, constraints in applying Hybrid Knowledge Management Architectures.
RDF has features (URIs, formal semantics, W3C standards) that make it work better for combining separate datasets, so one obvious use-case is to use RDF as your model for building a warehouse of data. Then you can load subsets of this data into a Property Graph model for running graph analytics in-memory. This is exactly what Oracle Adaptive Intelligent Apps does.
What are the main issues that you see CIOs and CDOs confronted with in their digital transformation journeys at the moment?
Widespread lack of familiarity with knowledge/semantics in general — usually because it is outside the scope of typical computer science.
“Resuméware” is an ongoing problem in our industry. Developers are always chasing the latest technology to add to their resumés. Developers are simply responding to “the market”. Result: Churn.
Tendency to think in terms of monoliths — even though many problems are better suited for micro-services. Chief among them would be data integration across silos.
Data migration, storage, and ETL are taking more and more bandwidth
As stated previously, orchestration is a missing piece of the puzzle
Sadly, irrespective of how you discover the information, the quality of content is often inconsistent at best. Poor quality content found with much effort is one of the main drivers of escalating customer service costs across all industries. I think most enterprises would benefit greatly by hiring content quality experts to review and suggest copywriting/editorial improvements to be implemented across the board. (minor example: technical white papers are great for their intended audience but are useless to a broader audience without an editorial "wrapper" around the technical jargon).
The holy grail I see everywhere is to “throw AI at the problem” — basically admitting: “we’ve made a mess, now figure it all out for us”. This approach is guaranteed to fail in my opinion.
Ditto for search engines.
What are the five most important steps for CIOs to make knowledge management work and sustainable in their organizations?
Commitment to semantics and quality content across all knowledge areas within the enterprise — this will in turn require a commitment to hiring and/or training of content creators/curators to become familiar with semantics
Commitment to linked data across the enterprise — we want to encourage serendipity as no “top down” solution ever can. Put another way, we don’t know what we don’t know, so let’s encourage unforeseen connections to happen spontaneously and build on top of that. Existing apps will need to be able to consume/contribute to this effort of course.
Using focus groups, identify the top five improvements to provide better service to your customers (all of which will involve knowledge, one way or another)
Be sure to add “people” entities to your knowledge management data — i.e. the various authors, reviewers, approvers, commenters, subject-matter-experts, and content consumers. Serendipitous relationships will be much richer using the web of relationships provided by the connections of “People 2 Content 2 People” in your knowledge base. e.g. “What other SMEs are “close to” this White Paper I just downloaded?"
Stop thinking of “search” as the primary technology solution to discover information. Instead think of it as icing, not the cake. As a thought experiment, what might your knowledge solution look like without any search at all? Not easy, is it!!?
About SEMANTiCS
The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world. http://www.semantics.cc