Next Generation Indico DB
Action Plan
- Tecnology Survey - Getting information on suitable DBMSs
- Evaluation - Classifying options in terms of pre-defined criteria
- Prototyping
- Creation of small prototypes that implement isolated parts of the Indico schema, using a limited number of DB technologies
- Benchmarking/profiling
- Decision - evaluating results from previous 2 phases and choosing one of the options (or several, if that's the case)
- Boilerplate creation
- Creating a set of base classes/modules for use in future versions of Indico
- Migration
- Gradual (dual-DB)?
- Immediate?
Selection Criteria
Technical
- Software Availability - Unless we want to jeopardize Indico's position as a widely-available Open Source solution, the chosen DB product should be equally accessible to the general public. This does, of course, discourage the use of a commercial product, unless compatibility with a similar free/open alternative is ensured.
- Scalability - It shouldn't be hard to scale up the infrastructure (more nodes), be it vertically (better machines) or horizontally (more machines).
- Replication - Data should be replicated across several machines, for fault-tolerance and/or load-balancing purposes (scalability);
- Ease of use/development
- Is the DB in question easy to develop with?
- How are the support tools?
Data-related
- Transactions - This is going to be a tough point. Most NoSQL solutions choose not to implement transactions, something we currently heavily rely on.
- Complex Schema - Indico has a quite complex schema. The selected platform should have:
- Indexing
- Support for complex queries
Project-related
- Project/Community Momentum
- Is the user community large?
- What about the ecosystem? Is it growing?
- Is it probable that it will stay around?
- Where in the hype curve are we?
- Cloud-friendliness - This is not a mandatory requirement, but ensuring that the system can be deployed across a broad range of cloud providers would be good;
Cost-related
- (Possible) Future costs - Is it probable that we will need paid support in the future?
- Migration costs
- Will it be easy to move from ZODB to this technology?
- No, it won't. But will it be very hard or extremely hard?
- Way out - Can we easily change to something else in case we realize we made the wrong choice?
- CERN - How would the solution fit CERN's infrastructure and/or existing services?
Possibilities
Name | Type | License | Community Size |
---|---|---|---|
ZODB | OODBMS | Zope Public License | Small |
Wakanda | OODBMS | GPLv3 / AGPLv3 / Commercial | |
MariaDB | RDBMS | GPLv2 | Large |
Microsoft SQL Server | RDBMS | Commercial | Large |
MySQL | RDBMS | GPLv2 / EULA | Large |
Oracle Database | RDBMS | Commercial | Large |
PostgresSQL | ORDBMS | PostgresSQL License | Large |
VoltDB | RDBMS | AGPLv3 | Small |
Amazon DynamoDB | NoSQL / Key-value | Commercial | |
FoundationDB | NoSQL / Key-value | Commercial | |
LevelDB | NoSQL / Key-value | BSD New | Small |
MemcacheDB | NoSQL / Key-value | BSD | |
Oracle NoSQL | NoSQL / Key-value | AGPLv3 / Commercial | |
Redis | NoSQL / Key-value | BSD | Medium |
Riak | NoSQL / Key-value | Apache License 2.0 | |
Tarantool | NoSQL / Key-value | BSD | Small |
Voldemort | NoSQL / Key-value | Apache License 2.0 | Small |
CouchBase? | NoSQL / Document | Commercial / Community edition | Small |
CouchDB | NoSQL / Document | Apache License 2.0 | Medium |
MongoDB | NoSQL / Document | AGPLv3 | Large |
OrientDB | NoSQL / Document | Apache License 2.0 | |
RavenDB | NoSQL / Document | AGPLv3 | |
RethinkDB | NoSQL / Document | AGPLv3 | Small |
Terrastore | NoSQL / Document | Apache License 2.0 | |
Accumulo | NoSQL / Column | Apache License 2.0 | |
Amazon SimpleDB | NoSQL / Column | Commercial | |
Cassandra | NoSQL / Column | Apache License 2.0 | |
HBase | NoSQL / Column | Apache License 2.0 | |
Hypertable | NoSQL / Column | GPLv2 | |
FlockDB | NoSQL / Graph | Apache License 2.0 | |
Infinite Graph | NoSQL / Graph | EULA / Commercial | |
InfoGrid? | NoSQL / Graph | AGPLv1 | Small |
Neo4J | NoSQL / Graph | GPLv3 / AGPLv3 / commercial | |
OrientDB | NoSQL / Graph | Apache License 2.0 |
Technical details
Name | Consistency | Atomicity | Scalability |
---|---|---|---|
ZODB | Strong | Transactions | RelStorage?, NEOPPOD, ZRS (commercial) |
MySQL | Strong | Transactions | Auto-sharding, Master-Slave replication |
PostgresSQL | Strong | Transactions | Master-Slave replication, App-level sharding |
MongoDB | Strong | Single doc modification, see below | Auto-sharding, Master-Slave, Replica sets (Primary-Secondary) |
Cassandra | Eventual | Atomic batches (since 1.2), otherwise row-level | "Auto-sharding", Peer-to-peer replication |
Redis | Strong | Transactions, no rollback | Master-slave replication, App-level sharding |
Neo4J | Strong | Transactions | Master-slave replication |
Riak | Eventual | None | "Auto-sharding", Peer-to-peer replication |
CouchDB / CouchBase | Eventual | Single doc modification, see below | Master-slave replication, App-level sharding (CouchDB), Auto-sharding (CouchBase) |
HBase | Strong | Row-level | Master-slave replication, Auto-sharding |
Voldemort | Eventual | Key-level | "Auto-sharding", Peer-to-peer replication |
CouchDB
Disadvantages
- Requires packing ("compaction");
Redis
Advantages
- Overall very good as non-authoritative storage
- Includes several useful data structures (ordered sets, hash tables)
- Easy to use, simple API
- The data set is all the time in memory, and can optionally be made persistent
Disadvantages
- The data set has to fit in memory;
- Despite providing some nice data structures, it is still a key-value storage;
Links
- Official docs
- Others
ZODB
Replication
- ZRS - The Zeo Foundation has recently released it as open source software. It provides primary/secondary replication.
Riak
Advantages
- Highly distributable
- Supports several backends (Bitcask, LevelDB, Memory, ...)
- The "links" feature allows many-to-many relationships to be easily established an queried
Disadvantages
- Ideal for large-scale distributed clusters, maybe not what we are looking for (overkill?)
- It's still a key-value storage, and simpler solutions exist
Things to be considered
- Erlang
Links
- Official docs
- Others
PostgreSQL
Advantages
Links
Technical Information
- Eric Brewer - "Towards Robust Distributed Systems"
- Great explanation of Eventual Consistency and CAP
- Developing polyglot persistence applications
- Your Coffee Shop Doesn’t Use Two-Phase Commit
- Polyglot Persistence (Martin Fowler)
MongoDB
- StackOverflow questions:
CouchDB
PostgreSQL
- Sharding IDs at Instagram
- Scaling out PostgreSQL
- Database tools by Skype
- Sharding your Database
Cassandra
Comparisons
- Riak vs. Others (Cassandra, CouchBase, CouchDB, HBase, MongoDB, Neo4j, DynamoDB)
- Cassandra vs. Redis
- NoSQL use cases
- RDBMS vs Neo4j in PageRank and Shortest Path
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
Love/Hate?
Last modified 2 years ago
Last modified on 10/07/13 10:47:05