This is a snapshot of Indico's old Trac site. Any information contained herein is most probably outdated. Access our new GitHub site here.
wiki:Dev/FutureDB

Next Generation Indico DB

Action Plan

  1. Tecnology Survey - Getting information on suitable DBMSs
  2. Evaluation - Classifying options in terms of pre-defined criteria
  3. Prototyping
    • Creation of small prototypes that implement isolated parts of the Indico schema, using a limited number of DB technologies
    • Benchmarking/profiling
  4. Decision - evaluating results from previous 2 phases and choosing one of the options (or several, if that's the case)
  5. Boilerplate creation
    • Creating a set of base classes/modules for use in future versions of Indico
  6. Migration
    • Gradual (dual-DB)?
    • Immediate?

Selection Criteria

Technical

  • Software Availability - Unless we want to jeopardize Indico's position as a widely-available Open Source solution, the chosen DB product should be equally accessible to the general public. This does, of course, discourage the use of a commercial product, unless compatibility with a similar free/open alternative is ensured.
  • Scalability - It shouldn't be hard to scale up the infrastructure (more nodes), be it vertically (better machines) or horizontally (more machines).
  • Replication - Data should be replicated across several machines, for fault-tolerance and/or load-balancing purposes (scalability);
  • Ease of use/development
    • Is the DB in question easy to develop with?
    • How are the support tools?
  • Transactions - This is going to be a tough point. Most NoSQL solutions choose not to implement transactions, something we currently heavily rely on.
  • Complex Schema - Indico has a quite complex schema. The selected platform should have:
    • Indexing
    • Support for complex queries
  • Project/Community Momentum
    • Is the user community large?
    • What about the ecosystem? Is it growing?
    • Is it probable that it will stay around?
    • Where in the hype curve are we?
  • Cloud-friendliness - This is not a mandatory requirement, but ensuring that the system can be deployed across a broad range of cloud providers would be good;
  • (Possible) Future costs - Is it probable that we will need paid support in the future?
  • Migration costs
    • Will it be easy to move from ZODB to this technology?
    • No, it won't. But will it be very hard or extremely hard?
  • Way out - Can we easily change to something else in case we realize we made the wrong choice?
  • CERN - How would the solution fit CERN's infrastructure and/or existing services?

Possibilities

Name Type License Community Size
ZODB OODBMS Zope Public License Small
Wakanda OODBMS GPLv3 / AGPLv3 / Commercial
MariaDB RDBMS GPLv2 Large
Microsoft SQL Server RDBMS Commercial Large
MySQL RDBMS GPLv2 / EULA Large
Oracle Database RDBMS Commercial Large
PostgresSQL ORDBMS PostgresSQL License Large
VoltDB RDBMS AGPLv3 Small
Amazon DynamoDB NoSQL / Key-value Commercial
FoundationDB NoSQL / Key-value Commercial
LevelDB NoSQL / Key-value BSD New Small
MemcacheDB NoSQL / Key-value BSD
Oracle NoSQL NoSQL / Key-value AGPLv3 / Commercial
Redis NoSQL / Key-value BSD Medium
Riak NoSQL / Key-value Apache License 2.0
Tarantool NoSQL / Key-value BSD Small
Voldemort NoSQL / Key-value Apache License 2.0 Small
CouchBase? NoSQL / Document Commercial / Community edition Small
CouchDB NoSQL / Document Apache License 2.0 Medium
MongoDB NoSQL / Document AGPLv3 Large
OrientDB NoSQL / Document Apache License 2.0
RavenDB NoSQL / Document AGPLv3
RethinkDB NoSQL / Document AGPLv3 Small
Terrastore NoSQL / Document Apache License 2.0
Accumulo NoSQL / Column Apache License 2.0
Amazon SimpleDB NoSQL / Column Commercial
Cassandra NoSQL / Column Apache License 2.0
HBase NoSQL / Column Apache License 2.0
Hypertable NoSQL / Column GPLv2
FlockDB NoSQL / Graph Apache License 2.0
Infinite Graph NoSQL / Graph EULA / Commercial
InfoGrid? NoSQL / Graph AGPLv1 Small
Neo4J NoSQL / Graph GPLv3 / AGPLv3 / commercial
OrientDB NoSQL / Graph Apache License 2.0

Technical details

Name Consistency Atomicity Scalability
ZODB Strong Transactions RelStorage?, NEOPPOD, ZRS (commercial)
MySQL Strong Transactions Auto-sharding, Master-Slave replication
PostgresSQL Strong Transactions Master-Slave replication, App-level sharding
MongoDB Strong Single doc modification, see below Auto-sharding, Master-Slave, Replica sets (Primary-Secondary)
Cassandra Eventual Atomic batches (since 1.2), otherwise row-level "Auto-sharding", Peer-to-peer replication
Redis Strong Transactions, no rollback Master-slave replication, App-level sharding
Neo4J Strong Transactions Master-slave replication
Riak Eventual None "Auto-sharding", Peer-to-peer replication
CouchDB / CouchBase Eventual Single doc modification, see below Master-slave replication, App-level sharding (CouchDB), Auto-sharding (CouchBase)
HBase Strong Row-level Master-slave replication, Auto-sharding
Voldemort Eventual Key-level "Auto-sharding", Peer-to-peer replication

CouchDB

Disadvantages

  • Requires packing ("compaction");

Redis

Advantages

  • Overall very good as non-authoritative storage
  • Includes several useful data structures (ordered sets, hash tables)
  • Easy to use, simple API
  • The data set is all the time in memory, and can optionally be made persistent

Disadvantages

  • The data set has to fit in memory;
  • Despite providing some nice data structures, it is still a key-value storage;

ZODB

Replication

  • ZRS - The Zeo Foundation has recently released it as open source software. It provides primary/secondary replication.

Riak

Advantages

  • Highly distributable
  • Supports several backends (Bitcask, LevelDB, Memory, ...)
  • The "links" feature allows many-to-many relationships to be easily established an queried

Disadvantages

  • Ideal for large-scale distributed clusters, maybe not what we are looking for (overkill?)
  • It's still a key-value storage, and simpler solutions exist

Things to be considered

  • Erlang

Links

PostgreSQL

Advantages

Links

Technical Information

MongoDB

CouchDB

PostgreSQL

Cassandra

Comparisons

Love/Hate?

Last modified 2 years ago Last modified on 10/07/13 10:47:05