- Big Data Serving
- There are many Big Data problems whose output is also Big Data. Splout allows serving an
arbitrarily big dataset by partitioning it.
- There are many databases that allow serving Big Data such as NoSQL solutions, but they don't have a
rich query language like SQL. You generally can't aggregate data in real-time like
you would do with a GROUP BY clause. Because you can't precompute everything, SQL is a very
convenient feature to have in a Big Data Serving solution.
- For Hadoop
- Hadoop is nowadays the de-facto open-source solution for Big Data batch-processing. When the output
of a Hadoop process is big, there isn't a satisfying solution for serving it.
Think of pre-computed recommendations, for example, where the whole dataset may vary from one day to
another. Splout decouples database creation from database serving and makes it efficient and
safe to deploy Hadoop-generated datasets. Plus, it integrates seamlessly with common tools
in the ecosystem such as Cascading, Hive or Pig.
- Splout is not a "fast analytics" engine. Splout is made for demanding web or mobile
applications where query performance is critical. Arbitrary real-time aggregations should be
done in less than 200 milliseconds under high traffic load.
- Splout scales horizontally. By adding more machines you can increase throughput
linearly. Splout coordinates a cluster of machines to provide fail-over in case of network
splits or hardware corruption.
- Even though Splout is relational, it is very flexible. Because data is deployed
atomically, you can change your data model from one day to another without pain.
- Splout serves tablespaces. Each tablespace may have one or more
tables. Tables are either partitioned or replicated in every
partition. By using command-line tools you can index and deploy any dataset in your
HDFS, local system or remote S3 file system. You can also use the advanced Java API
for fine-tuning the whole process.
- Splout provides a REST interface that return JSON to any SQL query.