Growing role of background knowledge: Semantic
Web researchers and early entrepreneurs knew (as exemplified by this first patent on Semantic Web technologies filed in
2000 that with moderate effort, it is possible to create background knowledge
and populated ontologies by aggregating and disambiguating high quality
information and facts from multiple sources. It has also been long known
that by using such knowledge bases, we can substantially improve information
extraction and develop a variety of semantic tools and applications including
semantic search, browsing, personalization, advertisement, etc. Over the past
3-5 years, several efforts to create such knowledge bases took place, of which
Freebase is a showcase. What has drawn everyone’s attention to this aspect of
semantic approach is Google's acquisition of the company that created Freebase
and significantly extending techniques largely known, but scaling it to the
next level, to create Google Knowledge Base (GKB). Further on, applying GKB to enhance search
(and I am sure other applications in future), has forever changed the
importance of creating and using background or domain models for semantic
applications. I believe this form of semantic application building will
see the fastest growth in the near future. I have discussed related
thoughts in my article titled “Semantics Scales Up”.
Growing pains for Linked Open Data (LOD): Publication of
over 300 large data sets with 30+ billion triples certainly draws the attention
of many. Data holders will continue to
find LOD an attractive vehicle to publish and share their data, so it will
continue to grow at a rapid pace. Some of the data sets, more than others, will
find additional usage as data reference, interlinking, and transformation. But
in the near term, broader or aggregate usage of LOD will be a slog because we are
running into some of the harder technical challenges: questionable quality of
data and provenance, unconstrained and uneven use of semantics (e.g. same-as
used inconsistently) and limited use of richer relationship types (part-of
relationship, causality), and poor interlinking (lack of high quality
alignment). We will need to have better
handle of these issues along with a better ability to identify the most
relevant and high quality data sets (a semantic search for LOD) and better
alignment tools (not limited to just same-as), before we can start realizing the
true promise of LOD. So, I would give it another five years to fully develop.
Democratization of Semantics: So far, we have paid the majority of
our attention to knowledge representation, languages, and reasoning. Furthermore,
a majority of the work focuses on documents in enterprises and on the Web, or
uses structured data transformed into triples. But, what is even more exciting,
is how semantics and Semantic Web technology (primarily through annotating data
with respect to background knowledge or ontologies) is being used for improving
interoperability and analysis of different types of textual and non-textual data,
esp. social data and data generated by sensors, devices, or Internet of Things. These types of data have long overtaken traditional
document-centric data and structured databases in terms of volume, velocity,
and variety. The type of semantics one needs to deal with for such (relatively)
nontraditional data is of amazing variety.
For example, in the Twitris system,
besides semantic annotation for spatial, temporal, and thematic elements
associated with the tweets, semantics (aka meaning) also includes understanding
people (about the poster and receiver), network (about interactions and flow of
message), sentiment, emotion, and intent. For
more in-depth treatment, see our just published book on semantics empowered Web 3.0. This is
probably the most important development in my view and is likely to garner a
much larger share of attention related to the application of semantics and
semantic web technologies.