Monday, February 24, 2014

Can we handle two challenges: (a) Avoid being just "academic" (b) carry out interdisciplinary research ?

Today,  I had a chance to read two excellent pieces: 

First is the NYT article by Nicholas Kristof: Professors, We Need You! It discuss how we professors have marginalized ourselves or made us largely irrelevant to public and issues at large. 

Second one in the Guardian is by a PhD student Sarah Byrne titled Interdisciplinary research: why it's seen as a risky route.  It discusses, fairly accurately, that while funding agencies and academic leaders give lip service to interdisciplinary research or even encourage it, at an individual level,  it can be a very risky and at times very difficult choice for PhD students and young faculty to conduct interdisciplinary work.

As I review what we increasingly do that the Kno.e.sis center, I am wondering if we can do things differently? For one, a large percent of our research portfolio is targeted towards addressing significant human, social and/or economic development issues/challenges where our advances in computing (especially, Big and Smart Data Sciences involving physical, cyber and social data) help inform policy, make better decisions and take timely actions. We do take care to have significant rigor in computing research as part of our interdisciplinary work as exemplified by our graduates' dissertations (e.g., see Cory Henson's work on Semantic Perception that is used in our personalized digital health projects mentioned below). Perhaps we have avoided the traps discussed in the two articles if we use the outcomes for our PhDs as the evidence-- they have been getting top-tier jobs in academia or industry research labs/R&D labs, and some have felt confident enough to jump into high-tech entrepreneurship fairly early in their careers.

Here is a list of what we do at Kno.e.sis with focus on social and cyber data: 
And out list of projects with focus on sensor/IoT data include personalized digital health projects to 
  • reduce hospital readmission of Acute Decompensated Heart Failure patients
  • understand/predict asthma episodes in children
with more projects in other areas such as Smart City in early stages.

All above projects involve real-world data, real participants (e.g. patients), end-user or domain scientist usable tools/applications, and collaborations with domain experts/influential institutions such as UN (UNDP/UNF), QCRI, Crisis Response organizations/NGOs, clinicians (cardiologists and asthma specialists at Ohio State U & Wright State U), epidemiologists at CITAR, ER physicians and toxicologists, cognitive scientists, material scientists, and more.

Friday, December 13, 2013

Data Semantics and Semantic Web - 2013 year-end reflections and prognosis

The view by Bootlenose’s CEO  that Semantic Web has failed may catch your eyes, depending on how you look at it, but I would call it misguided or plain wrong. Yes, it is true that Semantic Web is not the hottest technology on the globe, and yes, it has not enjoyed the success of Machine Learning or the hype of Big Data lately. Before I present my views on its continued broad based progress, let me also identify two main reasons its progress has not been as fast as it could be. The first is scalability – while I argued that semantics can be scaled up, we need to succeed in convincing more people that indeed we can create domain models and background knowledge as fast as they can train machine learning algorithms, and that semantic techniques (e.g, for finding meaningful patterns, paths and subgraphs) can be applied to deal with volume and velocity of Big Data, and no other approach can deal with variety as well. The second is lack of trained personnel who have the expertise to deal with right part of Semantic Web – the part that focuses on Web of Data or Linked Data. The number of applications of logic end of Semantic Web is still very small.

Now let me present why and how we are making smart progress at multiple levels – from small scale to large scale applications and impact.

  • First, there are plenty of small, real wins, such as this effort to improve eCommerce.
  • Second, there are a growing number of products that improve upon or address the deficiencies of widely used techniques in IR, NLP, and ML (the importance of background knowledge has been widely recognized, e.g., see the discussion on Data Alone is Not Enough). This product requiring understanding of clinical notes is one of rapidly growing examples.
  • Finally, systemic changes are coming to Web scale systems where semantics is the primary differentiator. What can be better examples than the applications where lots of money is made: search and advertisement. The role of semantics for search/personalization/targeting/advertisement has been known and demonstrated in a commercial setting since around 2000 (see interview, patent, talk, paper), but in line with the rule of thumb I have experienced- that technology maturation and scale out often take 15 years, semantic search and advertisements are  receiving a full court press and are now indeed coming to an average consumer.

For the new year, here are my top three predictions.  

Growing pains for Linked Data: I expect growing pains for Linked Data. The easy part of putting out (and sometimes dumping of) data and simplistic (albeit not very useful forms of) alignment of linked data has been widely practiced. And more datasets, especially open government data, will be continue to be put out. But now comes the hard part of requiring better understanding of the quality of data and alignments or mapping involving richer relationships needed for real-world applications.

Continued slow progress for OWL, and even RDF: I expect continued slow paced advances for OWL-reliant approaches to semantic web as a few more people learn the tools of the trade.  To my surprise, the growth of RDF is also underwhelming. I think several factors are at play: one is lack of skilled people, second is fear that RDF based solution will not scale, and third is growing popularity of graph databases that many may feel can adequately provide the needed functionality with perceived ease of use and scalability. This is likely to persist in the coming year.

Breakout year for Smart Data: But I expect this to be a break out year for Smart Data (2004-2005 view, 2013 retake), where recent progress in semantic annotation and knowledge-based tagging will enable an enrichment of a wide variety of data (especially traditional unstructured text, semi-structured data, social and sensor data) that will afford semantics-enhanced search, integration, personalization, analysis, and advertisement. While many are aware of Google's effort to enhance search, 

Wednesday, November 27, 2013

Kno.e.sis - from 2007 to 2013

Agriculture, manufacturing, and services were the basis for economies in earlier generations. However, we have moved into what is called the “knowledge economy”. Starting in 2007 when I moved here, I saw an opportunity to work with my Computer Science colleagues to develop a world class organization covering a number of fields, such as semantic, social, and sensor webs, knowledge representation, advanced databases and data mining, information retrieval, NLP and machine learning, bioinformatics, cloud computing, and visualization.  We have grown  to what may be, in fact, the largest academic group in the US in Semantic Web- a key component of Web 3.0. 

When the Ohio Board of Regents declared a statewide competition for Ohio Centers of Excellence, we decided to pursue a very multidisciplinary approach with faculty from four colleges – especially involving researchers from biomedical and clinical research, and cognitive sciences. We wanted to focus on solving complex and real world problems rather than just focus on training our students and researchers in a single technical area of say machine learning or databases.  We were indeed selected as one of the Ohio Centers of Excellence in BioHealth innovation effective January 2010. We have also become one of the strongest, if not the strongest, academic research centers in Big Data and Data Sciences in the state.

Today, we do very exciting multidisciplinary projects such as using sensors and smartphones for predicting and managing asthma, analyzing social media for prescription drug abuse surveillance and epidemiology, social media based coordination during major crises and emergencies such as Hurricane Sandy or Typhoon Yolanda, analyzing CTA images for diffuse coronary artery detection, analyzing clinical notes to predict health outcomes, and developing semantic solutions to meet the objectives of the President’s Materials Genome Initiative.

What makes Kno.e.sis successful is its very unique, what many visitors have described as our entrepreneurial, eco-system. Our driving education philosophy is to learn how to learn.  The most important measure of success is undoubtedly the exceptional outcome of our students and postdocs. They have successfully competed against others from top 20 universities when they received offers for Tenure Track faculty at Case Western or the University of Kentucky, or at IBM Research Watson or Almaden where 4 of my recent PhDs chose to go.  In recent years, we have also seen our graduates getting the very best offers their employers have ever given out, such as $120K first year compensation for MS students who joined EMC, CISCO, or Bloomberg. Of course it helps that our faculty members are world class and well funded by NIH, NSF, AFRL, AFOSR, and industry partners. We have truly a world class computing infrastructure with a new resource consisting of 816 CPU cores, a raw disk capacity of 425TB, and 17TB of main memory. Now that we rank among the top 10 in the world in 5-yr publication impact in World Wide Web, top international applicants want to come here. The excitement of working on big real-world problems with real-world data leading to a real-world impact is something you have to experience. Last, but not least, we are proud of our local industry collaborations, regional economic impact, successful technology transfers and commercializations.

Explore: Overview, Vision, Projects, People, Showcase, Financial Aid, etc. at Kno.e.sis on the Web.

Monday, July 22, 2013

Some thoughts on this Gurupurnima

On the occasion of #gurupurnima [1],  I pay respect to all who have taught me and guided me. 

When I was a child, the stories involving  #gurukul (or #gurukulam  )  as a system where a child leaves birth parents at a very early age to go to a Rishi's ashram for an all encompassing education, and becomes a member of the extended family#gurukulparivar    of the guru and his wife (guru-ma) always attracted me. In some fields such as Indina classical music, the Guru-Shishya parampara continues even today [2].  My classical music teacher was a strong believer in Guru-sishya parampara, but a guru I was most influenced by was one I never met in person (Dr. Vikram Sarabhai, father of India's space science). No wonder I can believe in Eklavya's story. Dr. Sarabhai was so busy that he would take his research students with him on his train rides (from Ahmedabad to Surat or Mumbai) so that he can spend quality time with them. Such a dedication of a guru made a deep impression on me. So on the occasion of Guru Purnima, I pay my respect to such Gurus. 

On or about each gurupunima day, I also hear from a few past students (and occasionally a current student or two). In particular I find it hard to describe the emotion (proud? deeply satisfied?) when I receive gurupurnima greetings/respects  from one "student" who was never my student formally. Perhaps this is a power of doing something without seeking anything in return! To this "student": You know who you are, and you make me very proud- not because you have respected me as your guru, but because you have respected the power of this important relationship that educates us and makes us ready for the rest of our lives. 

Saturday, June 1, 2013

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (HadoopNoSQLMapReduceetc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity.  However, the most important feature of data, the raisond'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. 

Here is how I would define Smart Data:

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by
volume, velocity, variety and veracity  of big data,
in-turn providing actionable information and improve decision making.

Another way to look at Smart Data is:

OF human, BY human and FOR human”

Smart data is focused on the actionable value achieved by human involvement in data creation, processing and consumption phases for improving

Creating Smart Data requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. 

For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL.  Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.

The above, except for the definitions (in color) on Smart Data was written on Feb26, 2013, and posted at SEBD2013. Here is a copy of my keynote: 

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web from Amit Sheth

p.s. (Nov23, 2013): I stumbled across my use of term Smart Data in 2004, along with its use in the context of commercial application of Semantic Web that came from Semagix which acquired Taalee (founded 1999)/Voquette technology, and I find the idea resonating even now:

Wednesday, December 19, 2012

Data Semantics and Semantic Web - 2012 year-end reflections and prognosis

Growing role of background knowledge:  Semantic Web researchers and early entrepreneurs knew (as exemplified by this first patent on Semantic Web technologies filed in 2000 that with moderate effort, it is possible to create background knowledge and populated ontologies by aggregating and disambiguating high quality information and facts from multiple sources. It has also been long known that by using such knowledge bases, we can substantially improve information extraction and develop a variety of semantic tools and applications including semantic search, browsing, personalization, advertisement, etc. Over the past 3-5 years, several efforts to create such knowledge bases took place, of which Freebase is a showcase. What has drawn everyone’s attention to this aspect of semantic approach is Google's acquisition of the company that created Freebase and significantly extending techniques largely known, but scaling it to the next level, to create Google Knowledge Base (GKB).  Further on, applying GKB to enhance search (and I am sure other applications in future), has forever changed the importance of creating and using background or domain models for semantic applications. I believe this form of semantic application building will see the fastest growth in the near future. I have discussed related thoughts in my article titled  “Semantics Scales Up”.

Growing pains for Linked Open Data (LOD): Publication of over 300 large data sets with 30+ billion triples certainly draws the attention of many.  Data holders will continue to find LOD an attractive vehicle to publish and share their data, so it will continue to grow at a rapid pace. Some of the data sets, more than others, will find additional usage as data reference, interlinking, and transformation. But in the near term, broader or aggregate usage of LOD will be a slog because we are running into some of the harder technical challenges: questionable quality of data and provenance, unconstrained and uneven use of semantics (e.g. same-as used inconsistently) and limited use of richer relationship types (part-of relationship, causality), and poor interlinking (lack of high quality alignment).  We will need to have better handle of these issues along with a better ability to identify the most relevant and high quality data sets (a semantic search for LOD) and better alignment tools (not limited to just same-as), before we can start realizing the true promise of LOD. So, I would give it another five years to fully develop.

Democratization of Semantics: So far, we have paid the majority of our attention to knowledge representation, languages, and reasoning. Furthermore, a majority of the work focuses on documents in enterprises and on the Web, or uses structured data transformed into triples. But, what is even more exciting, is how semantics and Semantic Web technology (primarily through annotating data with respect to background knowledge or ontologies) is being used for improving interoperability and analysis of different types of textual and non-textual data, esp. social data and data generated by sensors, devices, or Internet of Things.  These types of data have long overtaken traditional document-centric data and structured databases in terms of volume, velocity, and variety. The type of semantics one needs to deal with for such (relatively) nontraditional data is of amazing variety.  For example, in the Twitris system, besides semantic annotation for spatial, temporal, and thematic elements associated with the tweets, semantics (aka meaning) also includes understanding people (about the poster and receiver), network (about interactions and flow of message), sentiment, emotion, and intent.   For more in-depth treatment, see our just published book on semantics empowered Web 3.0. This is probably the most important development in my view and is likely to garner a much larger share of attention related to the application of semantics and semantic web technologies.

December 19, 2012

ps: parts of this appear in: Semantic Tech Outlook: 2013