Wednesday, December 19, 2012

Data Semantics and Semantic Web - 2012 year-end reflections and prognosis


Growing role of background knowledge:  Semantic Web researchers and early entrepreneurs knew (as exemplified by this first patent on Semantic Web technologies filed in 2000 that with moderate effort, it is possible to create background knowledge and populated ontologies by aggregating and disambiguating high quality information and facts from multiple sources. It has also been long known that by using such knowledge bases, we can substantially improve information extraction and develop a variety of semantic tools and applications including semantic search, browsing, personalization, advertisement, etc. Over the past 3-5 years, several efforts to create such knowledge bases took place, of which Freebase is a showcase. What has drawn everyone’s attention to this aspect of semantic approach is Google's acquisition of the company that created Freebase and significantly extending techniques largely known, but scaling it to the next level, to create Google Knowledge Base (GKB).  Further on, applying GKB to enhance search (and I am sure other applications in future), has forever changed the importance of creating and using background or domain models for semantic applications. I believe this form of semantic application building will see the fastest growth in the near future. I have discussed related thoughts in my article titled  “Semantics Scales Up”.

Growing pains for Linked Open Data (LOD): Publication of over 300 large data sets with 30+ billion triples certainly draws the attention of many.  Data holders will continue to find LOD an attractive vehicle to publish and share their data, so it will continue to grow at a rapid pace. Some of the data sets, more than others, will find additional usage as data reference, interlinking, and transformation. But in the near term, broader or aggregate usage of LOD will be a slog because we are running into some of the harder technical challenges: questionable quality of data and provenance, unconstrained and uneven use of semantics (e.g. same-as used inconsistently) and limited use of richer relationship types (part-of relationship, causality), and poor interlinking (lack of high quality alignment).  We will need to have better handle of these issues along with a better ability to identify the most relevant and high quality data sets (a semantic search for LOD) and better alignment tools (not limited to just same-as), before we can start realizing the true promise of LOD. So, I would give it another five years to fully develop.

Democratization of Semantics: So far, we have paid the majority of our attention to knowledge representation, languages, and reasoning. Furthermore, a majority of the work focuses on documents in enterprises and on the Web, or uses structured data transformed into triples. But, what is even more exciting, is how semantics and Semantic Web technology (primarily through annotating data with respect to background knowledge or ontologies) is being used for improving interoperability and analysis of different types of textual and non-textual data, esp. social data and data generated by sensors, devices, or Internet of Things.  These types of data have long overtaken traditional document-centric data and structured databases in terms of volume, velocity, and variety. The type of semantics one needs to deal with for such (relatively) nontraditional data is of amazing variety.  For example, in the Twitris system, besides semantic annotation for spatial, temporal, and thematic elements associated with the tweets, semantics (aka meaning) also includes understanding people (about the poster and receiver), network (about interactions and flow of message), sentiment, emotion, and intent.   For more in-depth treatment, see our just published book on semantics empowered Web 3.0. This is probably the most important development in my view and is likely to garner a much larger share of attention related to the application of semantics and semantic web technologies.



December 19, 2012

ps: parts of this appear in: Semantic Tech Outlook: 2013


Saturday, August 25, 2012

Semantics empowered Physical-Cyber-Social systems


This is a piece I had written for the ISWC workshop on: What will the Semantic Web look like in 10 years from now?
Associated presentation: Physical-Cyber-Social Computing 
Postscript: I will give WIMS keynote on this topic in May 2013 where I will present more detailed thoughts on this topic

The role of semantics for interoperability, integration, and improved querying has been investigated for a few decades. Coining of the term ‘Semantic Web’ brought focus to using semantics and metadata initially to the Web documents. As the Web provided useful mechanisms to access and use new types of resources--richly represented data, services, user generated content and other social data, sensor and devices (WoT) data--techniques increasingly moved from syntactic and structural to semantic ones.  Compared to the semantic systems built using Semantic Web languages, standards and mainstream Semantic Web technologies, however, more systems are being built using informal and implicit forms of semantics rather than formal representations of semantics.[1]  One reason is that the role of the Web is increasingly becoming diffused and incidental (e.g., more people access content through applications compared to the Web browsers). The second reason is that lighter-weight approaches have led to better developer and user engagements, and have become a lot more scalable. Apple Siri, IBM Watson, and Google Knowledge Graph, are excellent examples of using semantics at scale, but where the formal form of semantic representation or RDF/SPARQL have not found a place. All these lead me to think that 10 years from now, Semantic Web would be thought of as something that popularized the core value proposition of semantics -- better search, interoperability/integration and analysis -- to deal with and exploit a vast variety of things that the Web (and its on going transformations) interconnects.  An analogy that comes to mind is that of Object Oriented Databases which generated huge excitement in the 1980s, and indeed has a number of secondary impacts, but it only remained a niche technology, product class and market. Simultaneously, Semantic Web is increasingly merging with other powerful technologies that support semantics, including Machine Learning, NLP, and Knowledge-based systems where background knowledge is applied. Consequently, what we think of as rather distinct Computer Science areas today will not retain strong distinctions, but will broadly incorporate semantics. 

While making a 10 year forecast is foolhardy, a vision that has been forming in my head since I first broached it in 2008 is Computing for Human Experience.[2]  It has a long lineage, starting in part with Vannevar Bush’s Memex through Mark Weiser’s “Computing in the 21st Century” and others. But the essence of the vision incorporates technology that serves human needs without explicit human effort to use the technology. The coming decade will see unique opportunities with the evolution of physical-cyber-social systems that will involve the following (among several other) areas of significant progress:
      make human interaction with technology very natural (e.g., gesture computing); blur the differences between human’s physical, cyber and social presence
      incorporate powerful ways the human brain works into the fabric of computing and communication (significant progress from neuroscience to cognitive science, and the resulting ability to model and mimic the human brain or thinking processes)
      bring the physical world, cyberspace and human closer with the help of devices around, on and inside human body
      continuously and dynamically create collective intelligence and background knowledge, combine that with historical and common sense knowledge, and contextually apply relevant knowledge and experiences to enhance technological support  at all levels of physical-cyber-social systems (i.e., continuous semantics).



[1] A. Sheth,C. Ramakrishnan, and C.Thomas, 'Semantics for The Semantic Web: the Implicit, the Formal and the Powerful', International Journal on Semantic Web & Information Systems, 1 (1), 2005, pp. 1-18.
[2] Keynote talks on “Computing for Human Experience: Semantics Empowered Cyber-Physical, Social and Ubiquitous Computing Beyond the Web”. First given at ASWC 2008; last given at On-the-Move Federated Conferences 2011. Also, vision article in IEEE Internet Computing: http://knoesis.org/index.php/Computing_For_Human_Experience

ps: A more detailed related discussion of this topic is now at: Physical-Cyber-Social Computing: An Early 21st Century Approach to Computing for Human Experience  and an abridged citable version is published as: Amit Sheth, Pramod Anantharam, Cory Henson, "Physical-Cyber-Social Computing: An Early 21st Century Approach," IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, Jan.-Feb. 2013, doi:10.1109/MIS.2013.20

Wednesday, December 21, 2011

Interdisciplinary research - learning from the success of Steve Jobs

This is a companion piece to the previous ones that shares some thoughts about the choices students seeking to go for graduate studies and research careers face.


I recently heard +Fareed Zakaria's interview on CNN GPS today w/ Steve Jobs biographer. It once again brought  out the key to Jobs' success: confluence of arts/music/design w/ technology. My own students who came in with non-CS/IT background (Mathematics, Statistics, Management, Cognitive Science) ended up doing significantly better than their colleagues whose background was in CS (my area is CS). Meena Nagarajan, a BITS Management student, got invited to give a keynote at an international workshop even before she got her PhD (something totally unheard of), got prestigious NSF CI-Fellowship, and hired as a researcher at prestigious IBM-Almaden (and a Stanford post-doc settled for post-doc in that dept). Satya Sahoo, a BS in Statistics, competed with 300-400 PhD applicants (including those from so called top 10-20 schools) and got hired as a tenure track asst professor in School of Medicine, and his Stats background, CS degree and interdisciplinary research in bio-medicine has something to do with his success.  Even changing a subarea within field could be useful. Ramakanth Kavuluru, who did his dissertation research in security and post-doc with me working on application of Semantic Web to Biomedicine and Health Care applications got hired as a tenure track faculty in School of Public Health at University of Kentucky (all faculty recruiting is extremely competitive).  This experience has led me to  advice my straight tech/CS students to cultivate their hidden talent in non-CS and nontechnical subjects.  Combining hard science skills with soft science expertise is usually very rewarding and open up lot more exciting career opportunities. Leaders like Steve Jobs succeeded because they identified unique need and importance of combining technology (a hard science) with design (a soft science), and creating something fairly distinct and new.

So, as you consider your options, do not be limited by your current degree-- also consider your interests. Let me give you an example: If you are a Computer Science student with some exposure to biology, you can consider a program like Biomedical Sciences PhD  at a Wright State University.  Don't just blindly run after so called highly ranked universities.  This is a highly selective program (very few are admitted), but has 4 concentrations (one of the is computing focused, see attached brochure) but all graduates have excellent career choices and importantly for many, those admitted are fully funded for the duration of their studies of 4 or 5 years-- and that is a rarity these days!

Some more thoughts: http://knoesis.org/aboutus/joiningus/joiningfaq

Sunday, December 4, 2011

21st century approach to seeking graduate studies

Preface: I monitor some lists where undergrads discuss their quests and questions for graduate studies and applications. Here are some thoughts from someone on the other side of the equation. For those who are interested in a quick degree and a well paying job, this post is not too relevant. For those interested in building a career in which they have lot more control, those interested in research - whether thought a MS thesis or a PhD - this may offer a few useful pointers.


We are already a decade into 21st century. Looking at the posts of undergraduate seeking to go to the best school they can go to, it still seems perceived ranking -- typically of the university or department, plays a key role in deciding those 5-10 places most applicants focus on. This approach is very outdated.



Here is how I went about it (when I applied in 1980 for 1981 admissions). I had already decided to do PhD (my father was a professor and research attracted me-- already had 2-3 publications, and most importantly, I had a chance to work at ISRO which gave me good understanding of what it meant to do a PhD-esp. in those pre-Internet days). And had also decided to do it in the area of Databases.  So I went to IIT-D library, found two most influential (at that time, most published authors in ACM/IEEE journals) professors in the most active area of the area (happened to be "database machines") that interested me (of course I could not fully understand most publications I stumbled upon). They happened to be at Ohio State and U Wisconsin, so applied there with the intent of working with either Prof. Dewitt (if I got a chance at Wisconsin) or Prof. Hsiao (Ohio State). I think my two internships at ISRO and letter by then Director of ISRO, Dr. Yash Pal, gave me a competitive edge when compared withmy peers with somewhat higher GPAs.



Much has changed: information is plentiful if you know where to look at, technology and

research areas evolve at much faster pace, there are too many exciting things happening even

in a subarea (eg, World Wide Web) of a discipline (Computer Science), most of the exciting
things are happening at the intersection of disciplines, there are more disciplines and interdisciplinary programs to choose from, a top high-tech company 10 years ago is no longer the most exciting place to be, there is a lot better insight into quality and quantity, you as a student have a lot more say in defining your own graduate study program …. and I could go on.  Here is an excellent piece by Thomas Friedman: time has come to invent your own job, not apply for one.

 
So what does the above mean in your investigation and decision making process? First is that one cannot just focus on a discipline as a whole, university as a whole, department as a whole, and one cannot focus on the skill or technology.  Broad ranking such as those in US News are practically useless except to make your family feel good if you got an admission in a program they rank highly.  But for improving your own outcome at the end of your graduate studies, you would need to focus more on personalities—your own interests and strengths (eg, do you like hardcore science/math/stat/algorithms, or do you like softer things – modeling, knowledge, semantics, human computer interactions?). When you take a subarea of a field such as computer games or social media, you will find potential to work in either type of work.  And you will need to ask who can guide you and provide an environment to grow as an independent researcher, innovator, architect or designer.  When you think of a guide (advisor), think about (and investigate) the guide both as a technical expert as well as a person.  Here are some of the things to do: ask if this person has reinvented (and continue to reinvent) during his/her career (eg., is the person doing the same thing s/he did during his/her PhD or  more than any 5 year period?), how much impact his/her work has (not just number of publications), what type of funding (is there a healthy dose of highly competitive NIH/NSF funds?), how well s/he guides and sponsors (does students do variety of things, do they have freedom, can they travel anywhere if they have a paper? can they buy anything they need for research? is the faculty's group international or are all students from  the same national/ethnic background (if latter, I would be cautious). Be sure to contact current advisees, and pay particular attention to  how successful are his/her advises (perhaps the most important factor of all!).  Understand that the outcomes vary widely. While salary is not the only or the main way to describe this, it is the easiest. A typical MS graduate gets $65K-90K in the first year. But my last two MS advisees who graduated in 2011 are making $110K and $120K in the first year working in companies in exactly the area they wanted to work in (Semantic Web and Cloud Computing, respectively).  If you ask them, more than the money (which tells how much the company values them) they will point out the quality of people they work with, the ability to work in the area they wanted to work in, the ability to work on innovative products and services, and the ability to architect or design instead of just implement what a supervise defines.  The same wide gap exists in the outcome for PhDs (it typically take more than 5 year to break even on income bases vis-a-vis a MS, but there are many other reasons one may want to do a PhD).

So how do you do finer investigations about choosing places to apply and faculty to work with?
  • Look at the web sites of the professor, his/her Google scholar/citation index (go to http://scholar.google.com or just Google his/her name),  lab/center’s web site including its social media presence (eg, here is one of Kno.e.sis' social media presence-- notice the level and variety of activity on such sites), senior students’ web pages, funding information (especially in last few years)
  • Do not focus on the field as a whole (eg Computer Sc)—look at one or two of the subareas that attract you most. Example: Suppose your interest is in World Wide Web: then look at influence in World Wide Web for University (it will usually cover that university’s department’s outcome in that area) and who are the most influential authors in WWW.  But say your interest is in Data Mining or Human-Computer Interaction. You will see an entirely different cast of organizations and researchers!  And if you look up last five year versus “all year” statistics, you will see huge changes! What if your selection was based on department or professor who was well known but has shown little activity and energy in recent years? If you choose any one of the traditional “top twenty Computer Sc departments”, it is quite possible that it is not even top 50 in your subfield of interest. And you will miss out on gems! I can give you an example close to my heart – Wright State University (WSU) is at 4th position among universities (8th including all organizations) in North America based on 5-yr impact in WWW and it is one of the smallest organization in top 50 (Kno.e.sis where practically all its work in WWW is done was established only in 2007). So while you will not many find WSU among top in many subfields (it is ranked 60th in CS research spending now—not bad at all – in fact quite good for its size and might make it top 30 on per capita basis; on the other hand in Semantic Web and in several Web 3.0 topics it is probably the largest academic group in the USA with funding from Google, Microsoft Research, IBM Research, IBM Research and so on!). Now knowing such data could give you what people in industry call unfair (unique) advantage- something that you know before others find out!  

  • Just looking at even subfield of WWW might not be good enough-- your results for infrastructure aspects of WWW would be very different than Web 3.0 (social, sensor, and semantic Web) topics!
  • Even the above strategy will not be full proof. Publication impact is just one of many factors of importance and success- how about industry relationships/collaborations, ability to offer vision and define new subfields?  In the brave new world, grass root or field work, development of new tools and systems that is used by others, technology transfer and commercialization, writing influential articles in non-academic venues or significant impact through social media, etc. can be extremely important (for example, MIT Media Labs makes huge waves through non-academic papers).
 
Now that you have some more ammunition about investigating your choices, it is equally important to know how will you sell yourself (but not oversell) once you find your top choices.  Unfortunately, this post is getting too long already, so let me point to some links at the end of this FAQ for graduate applicants.  And here is the link I intend to add to that FAQ: The 9 things that matter more than GPA.  When I evaluate prospective PhD students, I look for passion, desire to be at the top (ambition), work ethics (ability to work towards that ambition), communications skills, experience with or understanding of what research and innovation means (a taste of it demonstrated with research centric projects, publications and/or internships). Some of these come through the way prospective student communicates with me, personal statement (that says why you want to work with me or my group) and letters.  And typically this involves one or more Skype or Google hangout calls.  Of course, scrutiny for a MS applicant is lot less stringent, but then most MS students are not funded initially (many get funded while they are doing research and/or thesis component of their studies-- eg the above mentioned MS graduates did not have funding during the time they were doing core courses, but then they joined a research project and got funded.  And don't you think the final outcome was worth it?).  

Well-- good luck, my best wishes for success in this increasingly complex and interest times!


p.s.: [Jan 24, 2011] : The education system is changing very rapidly, see for example: Udacity and the future of online universities. The whole purpose of going to a university is changing rapidly, since you do not have to go to a traditional university to take standard courses.  So you would increasingly go to a graduate program for  activities that are more personalized, more interactive, and focused on learning that occurs beyond the course work.

Saturday, September 17, 2011

Semantics Scales Up: Beyond Search in Web 3.0

 Note: The Internet Computing article is open access (no-cost access).
Following is pre-print material.

Abstract: Concern for scalability- both in computational terms and in terms of human effort needed to develop semantic models and background knowledge- have hampered adoption of semantic techniques and Semantic Web. This concern is now misplaced as we have seen extensive progress in last decade on standards, methods and technologies for developing semantic models or ontologies, semantic annotations, and techniques for semantic integration, analysis and reasoning. This is complemented by plenty of recent success stories- not all well publicized- that use semantics in broad based applications like Web search, as well as in growing number of vertical domains.  As the future of computing expands beyond cyberspace to cyber-physical-social computing, with extensive growth in social and sensor data, semantics will play even larger and more pervasive role in exploiting larger amounts of increasingly heterogeneous and multimodal data.


Keywords: scaling semantics, semantic search, computing for human experience, semantics in Web3.0, semantics empowered physical-virtual systems, semantics empowered cyber-physical systems

Semantics can enhance a broad variety of information processing — search, integration, analysis, pattern extraction and mining, discovery, situational awareness, and question-answering. Consider search: a search system that could distinguish between “Merry Christmas” as a greeting and one of the 60 or so songs with “Merry Christmas” in the song title as cataloged in MusicBrainz (a community-created music encyclopedia,  would have a powerful semantic search capability. Practical solutions utilizing semantics involve using a conceptual or domain model to organize information (MusicBrainz in our example), creating metadata (or annotations) with respect to the model (indicating as an attribute/label/facet of “Merry Christmas” whether it’s a greeting or a song), and then utilizing the metadata and model for enhanced computation.


Even in the Web’s early days, simple schemas and metadata were used for faceted or attribute-based search of Web-based documents and data. An example is the InfoHarness system at Bellcore, commercialized in 1995. It supported extracting metadata from heterogeneous data and provided Mozilla-browser-based faceted search. Tom Gruber introduced the concept of ontologies in an information systems context in the early 1990s. This term has since become increasingly used for conceptual or domain models that also capture shared vocabulary and agreement, often with relevant factual knowledge. Several efforts in the mid-to-late 1990s such as SIMS, Observer, InfoSleuth, and InfoQuilt, demonstrated ontology-based Web data integration and querying.


In the late 1990s, I realized that it was possible to design conceptual models or ontologies — not too different from schema.org descriptions today — for many domains of practical interest for Web search (politics, business, finance, sports, entertainment, and so on). Taalee’s MediaAnywhere, Voquette’s SCORE and Semagix’s Freedom (products/services from Taalee which I founded in1999 and its follow-on mergers) could then extract, integrate, and repurpose high-quality datasets to populate these ontologies with factual information and background knowledge. For example, these systems extracted  the official site for Major League Baseball or a database with equivalent content to populate part of a baseball ontology, and used sources similar to MusicBrainz to populate the music component of an entertainment ontology. 


These ontologies, represented in a Resource Description Framework (RDF)-like language, supported a semantic and faceted Web search engine, MediaAnywhere, developed at Taalee, (see http://slidesha.re/sw-ib, http://bit.ly/sw-p, and http://bit.ly/sw-ic). Although the system scaled to a few hundred websites and enterprise semantic applications, it was ahead of its time; the concept had to wait for technology and market acceptability to catch up. Would this approach have scaled to the Web? I believe yes, but I couldn’t have convincingly argued or demonstrated this — until now, when Yahoo, Bing, and Google are all exploiting minimal ontologies, metadata provided by content developers, and large subject- (domain-) specific object and knowledge bases. Before discussing this, however, let’s first review why some subscribed to the perception during the past 10 years that semantic solutions can’t scale.

 

Ontologies and Web Search

The Web has continued to see explosive growth in the number of documents and amount of data accessible through it. We can view earlier directory-based approaches by Yahoo and DMOZ as one type of semantic approach, given that they used human-created taxonomies to manually catalog information; these approaches soon failed to scale. Search became the primary way for people to find the information they needed. Its success led many (beginning with Googlers like Larry Page and Peter Norvig) to believe that all you need is enough data and you can adequately, or even exclusively, extract semantics in a bottom up manner — or that, given the Web’s broad coverage, ontologies, models, and background knowledge simply aren’t relevant nor would they scale.

Some in the semantics camp, including myself, felt that only a limited form of implicit semantics is embedded in most data on the Web, and that semantics can and will scale. I’ve argued that we can readily apply background knowledge in developing a semantic solution (as we did with MediaAnywhere). The reader can find a link to Norvig’s post and my views on this matter in a 2005 blog at http://bit.ly/s-search.


Early, albeit unnecessary, emphasis on an AI focus to defining a Semantic Web approach also hurt, given the lack of success in scaling AI solutions during the 1980s and ’90s. If you were to interpret ontologies narrowly as models crafted in formal logic languages, or looked only at those ontologies exquisitely crafted with care by experts, the idea that they don’t scale might be valid. Fortunately, we can develop ontologies and associate background knowledge in a variety of ways, especially when we’re dealing with a Web search and browsing application or an information processing task not requiring complete consistency in the knowledge base.  Examples include using (a) ad hoc specifications such as Schema.org and grounding concepts in Linked Open Data cloud, (b) using domain specific community maintained resources (e.g., MusicBrainz for Music, IMDB for movies), and (c) dynamically generated domains models by tools such as Doozer++.

Semantics at the Web scale is gaining acceptance, and our ability to support it is increasing. In the case of Web search, several significant factors are helping semantics to improve it. First, each of the major Web search engines is creating domain-specific structured knowledge. Search engines can and are exploiting semantics via at least three methods.


The first is creating a concept base or object base of facts (entities and relationships) one domain at a time. For example, you can use MusicBrainz, which captures comprehensive knowledge about more than 550,000 artists and their creative works, as background knowledge for the music domain. Bing’s support for specific domains such as entertainment, sports, or travel is in part powered by domain-specific models and relevant background knowledge in a way reminiscent of MediaAnywhere. This use of domain-specific knowledge will continue to expand now that Google has acquired Freebase  and the adoption of linked open data (a method of publishing structured data so that it can be interlinked) increases from current tens of billions triples (facts) and tens of domains by additional one or two orders.


Second is the recent collaboration between the three major search engine players in defining schema.org, which provides schemas or conceptual models for several common domains. The third method is content developers’ increasing use and support of microdata (a simple way to embed semantic markup into HTML documents) and RDFa (for embedding rich metadata within Web documents using RDF) to improve search results. This in turn entices content developers to provide more metadata or annotations and use relevant models to add semantics. Quick on search engines’ heels, social media (such as Facebook’s Open Graph protocol), e-commerce (BestBuy’s use of the GoodRelations ontology), and a wide variety of Web businesses and services are building a growing and synergistic Semantic Web ecosystem.

Search is No Longer the King of Web Applications

Prior to the era when semantics didn’t scale, search was the king of all Web applications. But the importance of search is highly overrated. Its best days were in the past. We’re in an era with significant growth in heterogeneity (social data, mobile-device-generated data, data from sensors inside, on, and around humans, and so on) and quantity (the rate of data creation has already surpassed our ability to store it). Simply needing access to data (which a search engine can index and return as a document) no longer serves our needs. We need knowledge and insights for decision-making as well as answers to our questions. Semantics plays a pivotal role in helping us build solutions to meet these requirements. Relationships are at the heart of semantics and semantic web, and, consequently, we can transition from focusing on keywords and objects, as we did with search, to focusing on relationships and richer abstractions, including events and experiences.

Pervasive Role of Semantics in Computing's Future


Semantics is being adopted on a wide scale in various scientific and some business domains that use W3C-defined Semantic Web languages and standards. For example, in the life sciences domain, we can find nearly 300 ontologies at the BioPortal. Even more impressive is the growth of structured data on the Web, called the Web of Data and best showcased by the Linked Open Data initiative, which surpassed 25 billion triples (facts expressed as subject-predicate-object in RDF) last year and tripling year over year. BioRDF, a collection of facts and knowledge in the life sciences domain from multiple sources, exceeded 5 billion triples last year. Note that the Web of Data isn’t simply data, but is structured and reusable information that we can utilize to consistently annotate or tag data, enabling better data analysis, which would be difficult to achieve via bottom-up processing of unstructured data on the Web.

Semantics plays a central role in Web 3.0 and beyond, and is becoming the driving force behind the future of computing for several reasons.

Semantics for Integration

Semantics, in the sense of archiving shared understanding and meaning, comes from agreement. Consequently, it’s long since it had a role in integrating data in heterogeneous syntax and structure. Increasingly, it also plays a role in integrating information about the same concept or object in different modalities and media — for example, to relate a person’s images with his or her descriptive information, or to correlate information about an event on social media with corresponding sensor observations. In coming years, semantics will be crucial to integrating objects that straddle the cyber–physical-social or physical–virtual divide.

Semantics for Intelligent Processing and Reasoning

Much attention in the past has focused on data and information search and browsing, in which processing complexity is reduced because of significant human involvement in interpreting the results. As we move up the information processing value chain from search and browsing to integration, analysis, situational awareness, and question-answering, information processing’s complexity increases significantly. Looked at from another perspective, information processing is moving from keyword-based to object-based processing and on to relationship and event-centric processing. As mentioned, relationships are at the heart of semantics and, fundamentally, computations will need to focus on modeling, processing, and exploiting them (http://bit.ly/rel-at-heart). In the case of formal languages, this will involve richer forms of integrated reasoning, incorporating inductive, deductive, abductive and fuzzy reasoning. Future advanced information processing will also not be limited to silicon-based processing; rather it will increasingly involve collaboration between humans and machines, with semantics-aware sensors as intermediates.

Semantics for Knowledge-Enabled Computing

The power of human reasoning comes not only from the sophisticated computing abilities our brains support but also from background knowledge and past experiences. Similarly, the application of background knowledge to improve information processing is rapidly growing — from the improvement of information extraction, natural language processing, and machine learning to better understanding and processing of social and sensor data. We can now apply domain-independent (related to time, space, and geographic concepts, for example) as well as domain-specific models of various complexities and comprehensiveness, such as nomenclatures, taxonomies, and ontologies, to improve information processing. The ability to utilize user- and community-created dictionaries (such as urbandictionary.com) and knowledge repositories (Musicbrainz, for example) to exploit structured information from unstructured data (DBpedia from Wikipedia) — and reuse such knowledge in improving computation — has added significant strength to semantic processing.

Semantics for Abstractions and Human Experience

Increasing amount of data generated by 5 billion mobile phone users (arguably the most important tool in human history, and many now with data connections), millions of social media users, and more than 40 billion mobile sensors, is finding its way to the Web. A single four-hour flight might generate 240 terabytes of data. How much of it is useful for a given human need? The ability to search this much data, however good, simply isn’t scalable in terms of the search results humans can review and absorb. What we want is a few nuggets of information or insights that we can act on. We care about broader and aggregate understanding of events, improved decision making, and getting answers to our questions. And we care about enhancing our human experience.


Semantics is a core component of developing abstraction mechanisms so that we can use computing to support perception and cognition. Semantic approaches support abstractions that convert low-level data and observations into the high-level symbolic representations that constitute our human perception and cognition. Semantics-empowered solutions can now analyze constantly streaming sensor or social data to tell us abstractions and events of human interests (such as icy roads, blizzard conditions, the need for intervention to save crops, chances that a movie will succeed, or the progress of a mass protest). My earlier article “Computing for Human Experience: Semantics empowered Sensors, Services, and Social Computing on ubiquitous Web" (IEEE Internet Computing, Jan/Feb 2010), explores this topic a bit further. You can find examples of such approaches at http://knoesis.org/showcase.

We have entered an exciting time for semantic computing. Semantics is changing contemporary Web applications, such as search, and will play a pivotal role in future computing that will span cyber-physical-social systems.

Citation info.: Amit Sheth, "Semantics Scales Up: Beyond Search in Web 3.0," IEEE Internet Computing, pp. 3-6, November/December, 2011 

A companion presentation on Semantic computing in Real World: Vertical and Horizontal applications, withing Enterprise and on the Web

Thursday, January 13, 2011

New Tools for the rest of the 75% of the War


“After losing 4400 American lives and spending 700 billion directly on Iraq war, we are now losing influence to Turkey and Iran… the real big picture is that this is the sign that US effort in Iraq was too focused on hard power, on military power, and not enough on soft power, on political and economic measures... “[Fareed Zakaria GPS, Jan. 09, 2011.]
Consider these statements from our military and defense leaders:  (now ousted) General Stanley McCrystal: “It’s not the number of people you kill, it’s the number of people you convince” [P10]; and Secretary of Defense Robert Gates, our “ultimate success or failure will increasingly depend more on shaping the behavior of others – friends and adversaries, and most importantly, the people in between.”  These point to the new imperative for determining the success of our wars. It is to use cyberspace to enhance the soft power to achieve what Henry Crompton, a veteran CIA covert officer said in an interview: "what we're seeing now in the tribal areas is a classic insurgency. And the way you counter that is working with indigenous forces. You use hard power in the beginning, the first 10, 20 percent, and then the next 80, 90 percent is what you might refer to as soft power..." [C08].  In fact, Pentagon now believes that understanding cultural dynamics is at least as important as weapons." [W10] 
 
Robert Nye is credited with coining the term soft power [N90], and increasing number of world leaders (including those from China, India and Taiwan) have advocated the use of soft power as a preferred and more effective mechanism to resolve international conflict.
Mr. Robert Gates spoke of the need to enhance American soft power by "a dramatic increase in spending on the civilian instruments of national security -- diplomacy, strategic communications, foreign assistance, civic action and economic reconstruction and development."
 
Our ability to project soft power using information and cyberspace will involve:

(a)   understanding the reaction of the affected population to the US’s efforts to fight the war
(b)  winning the hearts and minds of people who must support our war on terror
(c)   identifying  and influencing terrorists and bad actors who (plan to) conduct irregular warfare using cyberspace and sensors (including social media); and understanding the use of false narratives disseminated over the internet by jihadists
(d)  interdicting to prevent the use of cyberspace to recruit jihadists and terrorists; interdicting any undesirable uses in order to prevent the spread of vile messages and influence; modeling and executing campaigns to convince terrorists and bad actors to change
(e)   Combine the above with hard power when persuasion fails
Arianna Huffington wrote an interesting post, arguing “we can now change the conversation to the impact of technology and social media on peace, not just on terror.” [H10]  An interesting question is—which of these help deal with terror or promote peace? Education? Poverty alleviation? Better interactions between people and cultures? Democracy, civil and political rights, and open communications? [E10]
Addressing the above objectives point to a multidisciplinary research agenda involving social scientists, computer scientists, cyberspace technologists, policymakers and strategists.  A small subset of the related items that the multidisciplinary team of researchers at the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) is interested in include:
·       Socio-cultural-behavioral modeling and its use to study cultural dynamics
o   personal identity, social behavior, logic, forms of reasoning, persuasion, etc as well as values and social roles
o   study of national and group differences in cognition
·       multidimensional, multifaceted situational awareness with support for
o   aggregation, extraction, integration of massive amount of content, including increasing amount of dynamic and real-time content over Web, social media/network including human-in-the-loop sensing, multimodal/multi-level sensor (A/v, signal,..), and importantly SMS which are available to 5 billion mobile connections and widely used in developing countries
o   spatio-temporal-thematic analysis and pattern mining leveraging high performance, high throughput infrastructure 
o   people-content-network-event analysis, understanding not just traditional enemies, but populations, developing a "Social Radar" [MM09] that can detect social signals [AS09], public perceptions, social sensing and tracking
I present some related ideas in this talk: Enhancing Soft Power (ESP)

[B09] John W. Bellflower, "The Soft Side of Airpower," Small Wars Journal, January 2009.

[C08]
Interview with Henry Crumpton, Frontline, September 8, 2008.
[H10] Arianna Huffington, Facebook, Twitter and the Search for Peace in the Middle East, Huffington Post, November 24, 2010.
[M09] Mark Maybury, Social Radar for Smart Power, MITRE Corporation, April 2010.
[My09] Gene Myers, Projecting power: QDR’s aerospace imperatives, Armed Forces Journal, July 2010.
[P10] Michael Phillips, "U.S. Steps Up Missions Targeting Taliban Leaders," Wall Street Journal, February 2, 2010.
[S09] Amit Sheth,  Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009.
[W10] Sharon Weinberger, Pentagon Turns to 'Softer' Sciences - US defence research to focus more on biology, cybersecurity and social sciences to help win conflicts. Scientific American,  April 14, 2010.