= Next Generation Indico DB = == Action Plan == 1. Tecnology Survey - Getting information on suitable DBMSs 2. Evaluation - Classifying options in terms of pre-defined criteria 3. Prototyping - Creation of small prototypes that implement isolated parts of the Indico schema, using a limited number of DB technologies - Benchmarking/profiling 4. Decision - evaluating results from previous 2 phases and choosing one of the options (or several, if that's the case) 5. Boilerplate creation - Creating a set of base classes/modules for use in future versions of Indico 6. Migration - Gradual (dual-DB)? - Immediate? == Selection Criteria == === Technical === * **Software Availability** - Unless we want to jeopardize Indico's position as a widely-available Open Source solution, the chosen DB product should be equally accessible to the general public. This does, of course, discourage the use of a commercial product, unless compatibility with a similar free/open alternative is ensured. * **Scalability** - It shouldn't be hard to scale up the infrastructure (more nodes), be it vertically (better machines) or horizontally (more machines). * **Replication** - Data should be replicated across several machines, for fault-tolerance and/or load-balancing purposes (scalability); * **Ease of use/development** - Is the DB in question easy to develop with? - How are the support tools? ==== Data-related ==== * **Transactions** - This is going to be a tough point. Most NoSQL solutions choose not to implement transactions, something we currently heavily rely on. * **Complex Schema** - Indico has a quite complex schema. The selected platform should have: - **Indexing** - **Support for complex queries** === Project-related === * **!Project/Community Momentum** - Is the user community large? - What about the ecosystem? Is it growing? - Is it probable that it will stay around? - Where in the hype curve are we? * **Cloud-friendliness** - This is not a mandatory requirement, but ensuring that the system can be deployed across a broad range of cloud providers would be good; === Cost-related === * **(Possible) Future costs** - Is it probable that we will need paid support in the future? * **[[Dev/FutureDB/MigrationCosts|Migration costs]]** - Will it be easy to move from ZODB to this technology? - No, it won't. But will it be very hard or extremely hard? * **Way out** - Can we easily change to something else in case we realize we made the wrong choice? * **CERN** - How would the solution fit CERN's infrastructure and/or existing services? == Possibilities == ||= Name =||= Type =||= License =||= Community Size =|| || ZODB || OODBMS || Zope Public License || Small || || Wakanda || OODBMS || GPLv3 / AGPLv3 / Commercial || || || || || || || || MariaDB || RDBMS || GPLv2 || Large || || Microsoft SQL Server || RDBMS || Commercial || Large || || MySQL || RDBMS || GPLv2 / EULA || Large || || Oracle Database || RDBMS || Commercial || Large || || PostgresSQL || ORDBMS || PostgresSQL License || Large || || VoltDB || RDBMS || AGPLv3 || Small || || || || || || || Amazon DynamoDB || NoSQL / Key-value || Commercial || || || FoundationDB || NoSQL / Key-value || Commercial || || || LevelDB || NoSQL / Key-value || BSD New || Small || || MemcacheDB || NoSQL / Key-value || BSD || || || Oracle NoSQL || NoSQL / Key-value || AGPLv3 / Commercial || || || Redis || NoSQL / Key-value || BSD || Medium || || Riak || NoSQL / Key-value || Apache License 2.0 || || || Tarantool || NoSQL / Key-value || BSD || Small || || Voldemort || NoSQL / Key-value || Apache License 2.0 || Small || || || || || || || CouchBase || NoSQL / Document || Commercial / Community edition || Small || || CouchDB || NoSQL / Document || Apache License 2.0 || Medium || || MongoDB || NoSQL / Document || AGPLv3 || Large || || OrientDB || NoSQL / Document || Apache License 2.0 || || || RavenDB || NoSQL / Document || AGPLv3 || || || RethinkDB || NoSQL / Document || AGPLv3 || Small || || Terrastore || NoSQL / Document || Apache License 2.0 || || || || || || || || Accumulo || NoSQL / Column || Apache License 2.0 || || || Amazon SimpleDB || NoSQL / Column || Commercial || || || Cassandra || NoSQL / Column || Apache License 2.0 || || || HBase || NoSQL / Column || Apache License 2.0 || || || Hypertable || NoSQL / Column || GPLv2 || || || || || || || || FlockDB || NoSQL / Graph || Apache License 2.0 || || || Infinite Graph || NoSQL / Graph || EULA / Commercial || || || InfoGrid || NoSQL / Graph || AGPLv1 || Small || || Neo4J || NoSQL / Graph || GPLv3 / AGPLv3 / commercial || || || OrientDB || NoSQL / Graph || Apache License 2.0 || || == Technical details == ||= Name =||= Consistency =||= Atomicity =||= Scalability =|| || ZODB || Strong || Transactions || RelStorage, NEOPPOD, ZRS (commercial) || || MySQL || Strong || Transactions || Auto-sharding, Master-Slave replication || || PostgresSQL || Strong || Transactions || Master-Slave replication, App-level sharding || || MongoDB || Strong || Single doc modification, see below || Auto-sharding, Master-Slave, Replica sets (Primary-Secondary) || || Cassandra || Eventual || Atomic batches (since 1.2), otherwise row-level || "Auto-sharding", Peer-to-peer replication || || Redis || Strong || Transactions, no rollback || Master-slave replication, App-level sharding || || Neo4J || Strong || Transactions || Master-slave replication || || Riak || Eventual || None || "Auto-sharding", Peer-to-peer replication || || CouchDB / !CouchBase || Eventual || Single doc modification, see below || Master-slave replication, App-level sharding (CouchDB), Auto-sharding (!CouchBase) || || HBase || Strong || Row-level || Master-slave replication, Auto-sharding || || Voldemort || Eventual || Key-level || "Auto-sharding", Peer-to-peer replication || == CouchDB == ==== Disadvantages ==== * Requires packing ("compaction"); == Redis == === Advantages === * Overall very good as non-authoritative storage * Includes several useful data structures (ordered sets, hash tables) * Easy to use, simple API * The data set is all the time in memory, and can optionally be made persistent === Disadvantages === * The data set has to fit in memory; * Despite providing some nice data structures, it is still a key-value storage; === Links === * Official docs - [[http://redis.io/topics/persistence | Redis Persistence]] - [[http://redis.io/topics/replication | Replication]] * Others - [[http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs#_=_ | Storing hundreds of millions of simple key-value pairs in Redis]] - [[http://blog.gomiso.com/2011/05/24/how-redis-can-ruin-your-day-and-what-you-can-do-to-fix-it/ | How redis can ruin your day, and what you can do to fix it]] == ZODB == === Replication === * [[https://pypi.python.org/pypi/zc.zrs/|ZRS]] - The Zeo Foundation has recently released it as open source software. It provides primary/secondary replication. == Riak == === Advantages === * Highly distributable * Supports several backends (Bitcask, LevelDB, Memory, ...) * The "links" feature allows many-to-many relationships to be easily established an queried === Disadvantages === * Ideal for large-scale distributed clusters, maybe not what we are looking for (overkill?) * It's still a key-value storage, and simpler solutions exist === Things to be considered === * Erlang === Links === * Official docs - [[http://basho.com/link-walking-by-example/ | Riak link walking by example]] * Others - [[http://www.slideshare.net/seancribbs/introduction-to-riak-red-dirt-ruby-conf-training | Introduction to Riak]] - [[http://vimeo.com/21598799 | Riak and Scala at Yammer]] - [[http://www.slideshare.net/jmuellerleile/scaling-with-riak-at-showyou | Scaling with Riak at Showyou]] == PostgreSQL == === Advantages === * [[http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2#JSON_datatype | JSON support]] == Links == === Technical Information === * [[http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf|Eric Brewer - "Towards Robust Distributed Systems"]] * Great explanation of [[http://guide.couchdb.org/draft/consistency.html | Eventual Consistency and CAP]] * [[http://www.slideshare.net/chris.e.richardson/developing-polyglot-persistence-applications-devnexus-2013 | Developing polyglot persistence applications]] * [[http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf | Your Coffee Shop Doesn’t Use Two-Phase Commit]] * [[http://martinfowler.com/bliki/PolyglotPersistence.html | Polyglot Persistence]] (Martin Fowler) ==== MongoDB ==== * [[http://docs.mongodb.org/manual/tutorial/isolate-sequence-of-operations/ | Isolate Sequence of Operations]] * [[http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ | Perform Two-phase Commits]] * [[http://www.infoq.com/presentations/MongoDB-at-SourceForge | MongoDB at Sourceforge]] * !StackOverflow questions: - [[http://stackoverflow.com/questions/9588207/mongodb-atomicity-concerns-modifying-a-document-in-memory | MongoDB Atomicity Concerns — Modifying a document in memory]] - [[http://stackoverflow.com/questions/7149890/what-does-mongodb-not-being-acid-compliant-really-mean | What does MongoDB not being ACID compliant really mean?]] ==== CouchDB ==== * [[http://stackoverflow.com/questions/299723/can-i-do-transactions-and-locks-in-couchdb | Transactions and Locks in CouchDB]] ==== PostgreSQL ==== * [[http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram | Sharding IDs at Instagram]] * [[http://petrohi.me/post/30848036722/scaling-out-postgres-partitioning| Scaling out PostgreSQL]] * [[http://www.slideshare.net/adorepump/database-tools-by-skype | Database tools by Skype]] * [[http://www.craigkerstiens.com/2012/11/30/sharding-your-database/ | Sharding your Database]] ==== Cassandra ==== * [[http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency | Introduction to Cassandra Replication and Consistency]] === Comparisons === * [[http://docs.basho.com/riak/1.2.0/references/appendices/comparisons/ | Riak vs. Others]] (Cassandra, !CouchBase, CouchDB, HBase, MongoDB, Neo4j, DynamoDB) * [[http://www.slideshare.net/tim.lossen.de/cassandra-vs-redis | Cassandra vs. Redis]] * [[http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html | NoSQL use cases]] * [[http://istc-bigdata.org/index.php/benchmarking-graph-databases/ | RDBMS vs Neo4j in PageRank and Shortest Path]] * [[http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis | Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison]] === Love/Hate === * [[http://www.peterbe.com/plog/zodb-and-couchdb | CouchDB and ZODB]] * [[http://www.plope.com/Members/chrism/why_i_like_zodb | Why I like ZODB]] * http://www.slideshare.net/thobe/choosing-the-right-nosql-database * http://www.slideshare.net/VeryFatBoy/nosql-roadshow-amsterdam-2012-15414012