NUMA vs UMA

Understanding JVM Garbage Collection – Part 4

JVM GC Jargon The jargon used to describe GC algorithms can get confusing. This section describes commonly used terms and their meaning. Live set – The number of live objects in a heap. The live set determines the amount of work and GC pause times for the garbage collector. Modern GC’s are able to de-link […]

Continue Reading
Java_Memory_Layout

Understanding JVM Garbage Collection – Part 3

Weak Generational Hypothesis The weak generational hypothesis has had a profound impact on the JVM’s heaps layout. Understanding the weak generational hypothesis is essential in order to understand various GC algorithms and approaches. The Weak Generational Hypothesis states that most objects die young. In other words, most objects created will be garbage collected very quickly. […]

Continue Reading
Stop_The_World_Pauses

Understanding JVM Garbage Collection – Part 2

GC Mark, Sweep, and Compact Basics The Mark and Sweep algorithm is the basis for garbage collection in Java. Although the actual algorithms used by the JVM are considerably more complex, the mark and sweep algorithm forms the basis of garbage collection in the JVM and must be understood. As you might have guessed, there […]

Continue Reading

Understanding JVM Garbage Collection – Part 1

Java is a popular language for business and enterprise computing. Java and the JVM have enjoyed enormous success over the last two decades. Java was initially considered a slow language (late 90’s). That changed with the introduction of the HotSpot virtual machine in April 99. HotSpot improved performance via “just in time” compilation and adaptive optimisation. Since […]

Continue Reading

CAP Theorem

The CAP theorem is a tool used to makes system designers aware of trade-offs while designing networked shared-data systems. CAP has influenced the design of many distributed data systems. It made designers aware of a wide range of tradeoff to consider while designing distributed data systems. Over the year the CAP theorem has been widely […]

Continue Reading

Reasons for unbalanced Cassandra Cluster

Sometimes an Apache Cassandra cluster can end up in an unbalanced state. An unbalanced state is where data is unevenly distributed across a cluster or locally configured data directories. There are a number of reasons this can happen. In this blog post, I will cover two basics reasons this might happen.  A cluster can end up […]

Continue Reading

Understanding an Apache Cassandra Memtable Flush

A recent question in the Apache Cassandra mailing list triggered this blog post. The question revolved around events that trigger a memtable flush. Understanding the root cause of a memtable flush is essential to get a better understanding of Apache Cassandra.  Another question that frequently crops up is the size of an SSTable as a result of […]

Continue Reading
gray-langur-264151_640

Cassandra Query Language (CQL) Tutorial

Apache Cassandra and the Cassandra Query Language (CQL) have evolved over the past couple of years. Key improvements include: Significant storage engine improvements Introduction of SSTable Attached Secondary Index i.e SASI Indexes Materialized views Simple role based authentication This post is an updated to “A Practical Introduction to Cassandra Query Language”. The tutorial will concentrate […]

Continue Reading

Apache Cassandra Data Modelling Principles

Designing an effective data model is imperative to scale any application. This is true both in the relational and non relational world. Data modelling in Apache Cassandra is no different than any relational database. It requires a good understanding of the target domain and usage patterns within the domain. A deep understanding of your target […]

Continue Reading