Firing off at Cloudera for smartly modifying the queries while publishing a SQL on Hadoop benchmark comparison, Emma McGrattan, SVP of Engineering at Actian claimed she did not trust benchmarks unless they came from a party like TPC and were audited.
“In this particular
case that we call as the Impala subset of TPC-DS they identified the queries
where they were going to out-perform Hive, they rewrote them because they have
limited SQL support and then they added the partition keys. So it's a game -
they cheated and I call them on their cheating when I use this presentation
publicly because I think it's wrong.
Everybody is playing
games with these benchmarks and we can use them to demonstrate whatever you
chose. But for us we're using standard SQL. We have clean hands in this and we
also plan something for the first half of next year to publish some audited
benchmark results.”
In a detailed interview published on HadoopSphere, Emmaprovides a complete picture of SQL on Hadoop landscape and what the major
players are up to. Claiming that Actian SQL ‘in’ Hadoop offering Vector is 30
times faster than Cloudera Impala, Emma provides a detailed architectural view of
their offering. She is candid enough to say that Cloudera has borrowed quite a
few ideas from Actian’s solutions. “Now I
can’t go into too much detail as to exactly how we've done this because I say a
patent is pending and we do believe when we look at what Cloudera is doing in
Kudu, they have borrowed a number of the ideas - because up until now we have
been talking about this publicly and the Cloudera guys have learned a lot from
development work and research work that we have done at Actian.”
Read the full interview in 2 part no-holds barred articles
at following links. You would like to bookmark this and read it all over again
whenever you talk about SQL on Hadoop.
comments: