- What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or
- Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving
datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is
more like a NoSQL database in the sense that it has been thought for answering queries under
sub-second latencies. It has been thought for performing queries that impact a very small subset of
the data, not queries that analyze the whole dataset at once.
- Is it really so fast?
- Splout SQL is as fast as SQLite can be. The very good thing about Splout SQL is that, because it is
a read-only store and data is replaced entirely every time, it always has the data optimally indexed,
there is zero fragmentation and data colocation in disk can be controlled using sorting in the Hadoop
indexer process (insertionOrderBy). As an example, we have used data colocation techniques within
Splout SQL to obtain < 50ms average query time with 10 threads on dynamic GROUP BY's that hit an
average of 2000 records each in a multi-gigabyte database that exceeded available RAM in orders of
magnitude in a m1.small EC2 machine.
- Can I import data directly from Hive into Splout SQL?
- Yes, since release 0.2.2 it is possible to integrate Hive directly with Splout SQL. It is also possible
to do the same with Cascading or Pig. Please read the
user guide, section Integration with other tools.
- I am experiencing slow queries, why?
- Splout SQL is optimized for indexing data according to custom needs. You can create arbitrary indexes and
colocate data at insertion time for minimizing disk seeks. Please read the Troubleshooting section of the user guide.
- Can a DNode use more than one disk for storing data?
- Currently a DNode's working directory is fixed to be in a single disk location, however, there is
nothing that prevents you from installing two DNode services in the same machine, as long as you
configure everything properly so that they bind to different ports, for example.
- Can I execute INSERTs / UPDATEs on Splout SQL?
- There is nothing that will block or prevent you from executing INSERT / UPDATE statements through
Splout SQL's interface. However, it is not the way you are supposed to use it. Splout SQL has been
conceived to be a read-only store, and its data can be updated entirely by an atomic deploy
mechanism, meaning that the whole dataset is replaced by another version of it. This fits well into
batch-processing since usually you will have a new copy of the whole dataset each time you run your
batch process, for example in Hadoop. But it doesn't fit well if you want to incrementally update
your dataset in real-time.
- But really, what happens if I INSERT / UPDATE?
- What will happen is that your statements will be executed in one of the replicas of the partition
that was hit, so you will have unconsistent data across partitions. It would be only fine if you
didn't use replication, but in that case you wouldn't have failover, so it's not something you really
want to get into.