Archivo

Posts Tagged ‘semantic analysis’

Semantic Analysis and Big Data to understand Social TV

25 noviembre, 2013 1 comentario

We recently participated in the Big Data Spain conference with a talk entitled “Real time semantic search engine for social TV streams”. This talk describes our ongoing experiments on Social TV and combines our most recent developments on using semantic analysis on social networks and dealing with real-time streams of data.

Social TV, which exploded with the use of social networks while watching TV programs is a growing and exciting phenomenon. Twitter reported that more than a third of their firehose in the primetime is discussing TV (at least in the UK) while Facebook claimed 5 times more comments behind his private wall. Recently Facebook also started to offer hashtags and the Keywords Insight API for selected partners as a mean to offer aggregated statistics on Social TV conversations inside the wall.

As more users have turned into social networks to comment with friends and other viewers, broadcasters have looked into ways to be part of the conversation. They use official hashtags, let actors and anchors to tweet live and even start to offer companion apps with social share functionalities.

While the concept of socializing around TV is not new, the possibility to measure and distill the information around these interactions opens up brand new possibilities for users, broadcasters and brands alike.  Interest of users already fueled Social TV as it fulfills their need to start conversations with friends, other viewers and the aired program. Chatter around TV programs may help to recommend other programs or to serve contextually relevant information about actors, characters or whatever appears in TV.  Moreover, better ways to access and organize public conversations will drive new users into a TV program and engage current ones.

On the other hand, understanding the global conversation about a program is definitely useful to acquire insights for broadcasters and brands. Broadcasters and TV producers may measure their viewers preferences and reactions or their competence and acquire complementary information beyond plain audience numbers. Brands are also interested in finding the most appropriate programs to reach their target users as well as understand the impact and acceptance of their ads. Finally, new TV and ad formats are already being created based on interaction and participation, which again bolster engagement.

In our talk, we describe a system that combines natural language processing components from our Textalytics API and a scalable semi-structured database/search engine, SenseiDB, to provide semantic and faceted search, real-time analytics and support visualizations for this kind of applications.

Using Textalytics API we are able to include interesting features for Social TV like analyzing the sentiment around an entity (a program, actor or sportsperson). Besides, entity recognition and topic extraction allow us to produce trending topics for a program that correlate well with whatever happens on-screen. They work as an effective form to organize the conversation in real-time when combined with the online facets provided by SenseiDB. Other functionalities like language recognition and text classification help us to clean the noisy streams of comments.

SenseiDB is the second pillar of our system. A semi-structured distributed database that helps us to ingest streams and made them available for search in real-time with low query and indexing times. It includes a large number of facet types that enable us to use navigation using a range of semantic information. With the help of histogram and range facets it could even be overused for simple analytics tasks. It is well rounded with a simple and elegant query language, BQL, which help us to boost the development of visualizations on top.

If you find it interesting, check out our presentation for more detail or even the video of the event.

Trends in data analysis from Big Data Spain 2013

19 noviembre, 2013 Deja un comentario

logo Big Data Spain

The second edition of Big Data Spain took place in Madrid on last November 7 and 8 and proved to be a landmark event on technologies and applications of big data processing. The event attracted more than 400 participants, doubling last year’s number, and reflected the growing interest on these technologies in Spain and across Europe. Daedalus participated with a talk that illustrated the use of natural language processing and Big Data technologies to analyze in real time the buzz around Social TV.

Big Data technology has matured when we are about to cellebrate its 10th birthday, marked by the publication of the MapReduce computing abstraction that later gave rise to the field.

Rubén Casado, in one of the most useful talks to understand the vast amnount of Big Data and NoSQL project outlined the recent history of the technology in three eras:

  • Batch processing ( 2003 – ) with examples like  Hadoop or Cassandra.
  • Real time processing ( 2010 – ) represented by recent projects like StormKafka o Samza.
  • Hybrid processing ( 2013 – ) which attempts to combine both worlds in an unified programming model like Summingbird  or Lambdoop.

Withouth any doubt, the first era of solutions is enterprise-ready with several Hadoop based distributions like Cloudera, MapR or HortonWorks. Likewise the number of companies that are integrating them or providing consultancy in this field is expanding and reaching every sector from finance and banking to telecomunications or marketing.

Some other technological trends clearly emerged from talk topics and panels:

  • growing number of alternatives to deal online with large volume data analysis tasks (Spark, Impala, SploutSQL o SenseiDB)
  • SQL comeback, or at least as dialects on top of actual systems that made easier to develop and maintain applications
  • the importance of visualization as a tool to communicate Big Data results effectively.

However, adopting Big Data as a philosophy inside your company is not just merely technology. It requires a clear vision of the benefits that grounding all your processes in data may carry, and the value and knowledge that you may obtain by integrating internal and also external data. Another important factor is to be able to find the right people to bridge the chasm between the technical and businness sides. In this sense, the role of the data scientist is very important and Sean Owen from Cloudera defined it as “a person who is better at statistics than any software engineer and better at software engineering than any statistician”. We may add to the whish list a deep knowledge of your businness domain and the ability to ask the right questions.

While not everybody agreed, it seems that the best way to start “doing Big Data” is one step at a time and with a project with clear bussiness goals. If you want to test the technology, good candidates are those business process that have already become a bottleneck using standard databases. On the other hand, innovation may also be an important driver, by using external open data or if you need to design data-centric products. A good example of that sort is the Open Innovation challenge from Centro de Innovacion BBVA,  providing aggregate information on  credit card transactions.

Textalytics

Finally, going back to the theme of our talk, one of the external sources that would is generating more value are social network data. Due to their heterogeneity, social networks are intrinsically difficult to analyze, but, fortunately, text analytics tools like Textalytics API, enable you to make sense of unstructured data. If implemented into your Big Data toolset they open the door to the intellingent integration of quantitative and qualitative data with all the valuable insights you would obtain.

If you want to dive into the Big Data world, videos of the talks and experts panel are available at the Big Data Spain site.

A %d blogueros les gusta esto: