SSTABLE problem
We had a problem with DataStax Cassandra sstables.
Short description: Cassandra sstables broke randomly (mostly because great load). After broke we could not manage to repair it with the provided commands.
Environment:
- VMWare Workstation 10 / VMWare Player 6
- Guest: Ubuntu 12.04 64 bit
- CPU: 2 (i5-2500k, 3.3 GHz)
- Memory: 6 GB
- Disk: 10 GB (the underlying hardware was a Solid-State Drive; never reached more than 7GB usage)
- VMWare NAT
- No other additional (unnecessary) hardwares
- DataStax Cassandra 2.0.6 and 2.0.8 (from repository)
- JDK/JRE: (default) OpenJDK 1.7 64bit
Reproduction: write to Cassandra from the host on 128-256 threads (from a Java program using driver cassandra-driver-core), this results in full CPU and small (<10%) SSD load. With 16 threads the CPU load was 70-80%, but the problem also occurred using one thread or even when inserting a single row. If you cannot reproduce the problem, then try using 16 threads and kill the client script during import.
Behaviour: certain rows break and the driver returns with a consistency error (0 hosts replied but 1 was required). The problem only occurred when we were querying the broken records. The error log on the server contained a similar message (always with unable to seek to position):
ERROR 15:49:54,789 Error in ThreadPoolExecutor
java.lang.IllegalArgumentException: unable to seek to position 6774 in /root/import/exportdir/2/KEYSPACE/ColFam1/KEYSPACE-ColFam1-ic-3-Data.db (6523 bytes) in read-only mode
at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306)
...
Resolution: the only resolution we found was to reinstall Cassandra (uninstall, remove pid and other files, reboot, install, config).
The problem also occurred when we sent the VM to sleep and resumed it (but it also occurred without sleeping the VM). When a record broke, we did not manage to repair or delete that.
Ágnes Barta, Lajos Cseppentő, Noémi Szilvásy