Archivo

Posts Tagged ‘real time processing’

Trends in data analysis from Big Data Spain 2013

19 noviembre, 2013 Deja un comentario

logo Big Data Spain

The second edition of Big Data Spain took place in Madrid on last November 7 and 8 and proved to be a landmark event on technologies and applications of big data processing. The event attracted more than 400 participants, doubling last year’s number, and reflected the growing interest on these technologies in Spain and across Europe. Daedalus participated with a talk that illustrated the use of natural language processing and Big Data technologies to analyze in real time the buzz around Social TV.

Big Data technology has matured when we are about to cellebrate its 10th birthday, marked by the publication of the MapReduce computing abstraction that later gave rise to the field.

Rubén Casado, in one of the most useful talks to understand the vast amnount of Big Data and NoSQL project outlined the recent history of the technology in three eras:

  • Batch processing ( 2003 – ) with examples like  Hadoop or Cassandra.
  • Real time processing ( 2010 – ) represented by recent projects like StormKafka o Samza.
  • Hybrid processing ( 2013 – ) which attempts to combine both worlds in an unified programming model like Summingbird  or Lambdoop.

Withouth any doubt, the first era of solutions is enterprise-ready with several Hadoop based distributions like Cloudera, MapR or HortonWorks. Likewise the number of companies that are integrating them or providing consultancy in this field is expanding and reaching every sector from finance and banking to telecomunications or marketing.

Some other technological trends clearly emerged from talk topics and panels:

  • growing number of alternatives to deal online with large volume data analysis tasks (Spark, Impala, SploutSQL o SenseiDB)
  • SQL comeback, or at least as dialects on top of actual systems that made easier to develop and maintain applications
  • the importance of visualization as a tool to communicate Big Data results effectively.

However, adopting Big Data as a philosophy inside your company is not just merely technology. It requires a clear vision of the benefits that grounding all your processes in data may carry, and the value and knowledge that you may obtain by integrating internal and also external data. Another important factor is to be able to find the right people to bridge the chasm between the technical and businness sides. In this sense, the role of the data scientist is very important and Sean Owen from Cloudera defined it as “a person who is better at statistics than any software engineer and better at software engineering than any statistician”. We may add to the whish list a deep knowledge of your businness domain and the ability to ask the right questions.

While not everybody agreed, it seems that the best way to start “doing Big Data” is one step at a time and with a project with clear bussiness goals. If you want to test the technology, good candidates are those business process that have already become a bottleneck using standard databases. On the other hand, innovation may also be an important driver, by using external open data or if you need to design data-centric products. A good example of that sort is the Open Innovation challenge from Centro de Innovacion BBVA,  providing aggregate information on  credit card transactions.

Textalytics

Finally, going back to the theme of our talk, one of the external sources that would is generating more value are social network data. Due to their heterogeneity, social networks are intrinsically difficult to analyze, but, fortunately, text analytics tools like Textalytics API, enable you to make sense of unstructured data. If implemented into your Big Data toolset they open the door to the intellingent integration of quantitative and qualitative data with all the valuable insights you would obtain.

If you want to dive into the Big Data world, videos of the talks and experts panel are available at the Big Data Spain site.