The buzz around HadoopSphere influencer list

There has been a generous dose of reactions to HadoopSphere Top Big Data Influencers list for 2015. Reactions have ranged from cheer and congratulations to some being disappointed at not making the list this year. Listed below are some reactions on social media.

Merv Adrian, analyst at Gartner humbly appreciated the recognition.

The pioneering online television show, The Cube is first time debutant on this list and its tweets were as brimmed with energy as the show.

The big social data influencer, Gregory Piatetsky who runs the KDnuggets site also acknowledged the listing. He has been on this list consistently despite stiff competition from other social magnets.

ODP and Pivotal were quick to acknowledge the recognition for its tech super star, Roman Shaposhnik.

But probably, everyone was not so pleased with this hadoop focused list. Chris Volinsky, AVP at AT&T Research was quick to rub in with a neat jibe.

And yes, there was some more light banter. Especially for Roman Shaponshik and even Milind Bhandarkar as their colleagues vouched for cult status after this recognition.

Once again, Congratulations to all those who made it to HadoopSphere Top Big Data Influencers of 2015 list. Well deserved.
Read more »

Apache Flink is grown up now, announces version 1.0

The hot open source streaming and batch processing system Apache Flink has made a major release milestone by announcing version 1.0 and promising more mature features and capabilities. Apache Flink ranked as one of the most influential products of 2015 by HadoopSphere is today among the most actively contributed open source product. 

While Apache Flink is more commonly associated with streaming, it has wider capabilities to support relational, machine learning and graph processing as well. It can run on a variety of cluster managers including Hadoop YARN. 

Flink was started as a Stratosphere research project 2009 by the Technical University of Berlin, along with several other European universities. The project entered Apache Incubator in April 2014 and soon became a Top-Level Project in December 2014.  In a release note, Ted Dunning, Vice President of the Apache Incubator and Chief Application Architect at MapR commented that "The two things that have always struck me about Flink has been the excellence of the code and the excellence of the team ... This pattern is continuing with this release." Just like its competitor Spark, the founders of Apache Flink have gone ahead and formed a commercial company of their own by the name Data-artisans. By the looks of what they are doing in Apache Flink, it seems there is a lot of art beside science under the hood. 

Some key features that make Apache Flink so talked about include:
- High Performance & Low Latency
- Support for Event Time and Out-of-Order Events
- Exactly-once Semantics for Stateful Computations
- Highly flexible Streaming Windows
- Continuous Streaming Model with Backpressure
- Fault-tolerance via Lightweight Distributed Snapshots
- One Runtime for Streaming and Batch Processing
- Efficient Memory Management
- An ecosystem of libraries
- Broad Integration with Hadoop, HBase, Tachyon, MRQL, Google Cloud Platform and others.
We will keep watching the product to see it's adoption within the industry.

Read more »

MRQL with Spark and Flink SQL announces new release

Apache MRQL is one of those new breed SQL on Hadoop tools which  run a SQL query using Apache Flink engine and conveniently switch over to Apache Spark or Apache Hama or plain old MapReduce as required.

Apache MRQL recently announced the release of new version 0.6 where it has added new features for incremental query processing. Further, it can now run with newer versions of Apache Flink and more importantly in YARN mode. MRQL is created by the same guys who are behind Apache Hama BSP engine and Apache Horn deep learning engine.

...MRQL (pronounced miracle) is claimed to be powerful enough to express most common data analysis tasks over many forms of raw in-situ data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users are able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system is able to compile these queries to efficient Java code.

Read more »

Horton hears a hoot from the wall street

Within the first  month of 2016, Hortonworks shares have seen a steep fall tanking below $10 for first time since it got listed on the stock exchange. More noteworthy has been the fact that its share value eroded around 37% in a single day. However, despite the aberration, the stock still is a recommended “buy” as per most of the stock analyst firms.

Hortonworks is the only fully open source Hadoop distribution currently and has an impressive market share. While relying mainly on services and support for its revenue, the company may not have been notching up billion $ revenues but had managed to get a billion $ IPO debut in November 2014. Being the first to race to stock exchange leaving behind Cloudera and MapR Hadoop distribution vendors, it has been raising hopes of a 'Red Hat' in data management space.

Tuesday’s (19th Jan 2016) crash may have dented the market confidence in this Hadoop unicorn. However, Hortonworks may be down for the moment but is definitely not out. For the full year 2015, Hortonworks estimates a total revenue of around $120 million – a growth of 129% in year on year basis. While some eyebrows have been raised at selling of shares by prominent VPs like Shaun Connolly and Greg Pavlik, they still hold more than 200K shares  each of the company.

From this point onwards, there seem to be 2 paths forward for the company if it needs to hold on competition from the likes of Cloudera and IBM which have bigger war chests and cash reserves. One is that it innovates aggressively to make Hadoop a viable alternate to database and not just the data warehouse to attract 100% of enterprise customers. Second is to get merged or acquired and there are many suitors for that. Options could range from Pivotal merger to acquisition by likes of Teradata or HP which already have a decent stake in the company. In either case, the open source movement and Hadoop is still going to be intact and will benefit by either of the moves.

Which way does the horton sit would however be clear by Q3 only. And we will keep watching this space.
Read more »

Databricks has a new CEO and an Executive Chairman

Databricks, one of the main commercial companies behind Apache Spark, has appointed Ali Ghodsi as CEO and Ion Stoica has moved up to the position of Executive Chairman. This is a significant move considering the meteoric rise of Apache Spark in last 2 years.

Databricks as a company continues to remain more focused on offering Apache Spark based cloud clusters and interactive workspace for engineers and data scientists. Besides, it is one of the main promoters of Spark Summit held twice a year. Majority of the committers in Apache Spark continue to be from Databricks.

Ion Stoica is also a professor in the EECS Department at UC Berkeley and a co-director of the AMPLab. At AMPLab, he is leading two other high wire projects in open source community namely Apache Mesos and Tachyon. Besides, Ion is also the CTO of Conviva, a company he co-founded in 2006 for large scale video distribution. Ion also serves on the advisory board of Blue Data, another niche big data infrastructure software firm.

With this leadership movement, speculations of IPO for Databricks have started rising in the industry circles. However, besides the IPO possibility, there is a likely chance that Databricks has got some serious customer traction and the CEO position has become all the more critical. In the new role, it is expected that Ion will not only lead Databricks and Spark ecosystem but also may be able to spend more time on the wider big data ecosystem projects he is nurturing. Cloudera had made a similar move a few years back when Mike Olson had moved on to the Chief Strategy Officer position and we all know what a great influence and leader he has been in that position so far. It would be interesting to watch how Databricks builds up on the new leadership paradigm while taking along the the vast pool of talent like Zaharia, Xin, Wendell et al along in the stride.
Read more »