by Sarang Nagmote
Category - Databases
More Information & Updates Available at: http://http://insightanalytics.co.in
A bitmap (also called bit array, bit vector, etc.) is the data structure that immediately pops in your head when theres a need to map boolean information for a huge domain into a compact representation. It is a very popular data structure whenever memory space is at a premium: OS kernels (memory pages, inodes, etc.), digital imaging, etc.
Redis, being an in-memory data structure server, provides support for bit manipulation operations. However, there isn’t a special data structure for Bitmaps in Redis. Rather, bit level operations are supported on the basic Redis structure: Strings. Now, the maximum length for Redis strings is 512 MB. Thus, the largest domain that Redis can map as a Bitmap is 232 (512 MB = 229 bytes = 232 bits).
Bit-related operations in Redis are of two kinds: Constant time (O(1)) e.g. operations to get/set of a particular bit and operations that are O(N) which basically operate on a group of bits. N in these cases is the length of bits that the operation will need to work on. Let’s look at some examples.
Commands
# SETBIT key offset value: Store bit value value at offset offset for key. O(1)# Returns the original bit value stored at that offset.127.0.0.1:6379> setbit k 10 1(integer) 0# GETBIT key offset: Fetch value of offset in key. O(1)127.0.0.1:6379> getbit k 10(integer) 1127.0.0.1:6379> getbit k 11(integer) 0127.0.0.1:6379> getbit k 0(integer) 0127.0.0.1:6379> setbit k 9 1(integer) 0127.0.0.1:6379> setbit k 8 1(integer) 0# And since it is still a generic String, heres a get.127.0.0.1:6379> get k"à"# "à" -> "0000 0000 111"# BITCOUNT key [start end]: Number of set bits in the range. O(N)# IMPORTANT: start & end are bytes not bits127.0.0.1:6379> bitcount k(integer) 3127.0.0.1:6379> set m "meow"OK# meow -> 01101101 01100101 01101111 01110111 127.0.0.1:6379> bitcount m(integer) 21# BITPOS key bit [start] [end]: 1st position of 1 or 0 in the key in range. O(N)127.0.0.1:6379> SET mykey "ÿð"OK127.0.0.1:6379> BITPOS mykey 0(integer) 12
Besides their operators that work on the key itself, the BITOPoperator is used for bit-wise logical operations between multiple keys.
# BITOP operation destkey key [key ...]. O(N)# operation can be AND, OR, XOR and NOT127.0.0.1:6379> set a "ÿÿ"OK127.0.0.1:6379> bitop not nota a(integer) 2127.0.0.1:6379> get nota""127.0.0.1:6379> set b ""OK127.0.0.1:6379> set c "ðð"OK127.0.0.1:6379> BITOP OR orbc b c(integer) 2127.0.0.1:6379> get orbc"ÿÿ"127.0.0.1:6379> BITOP AND andbc b c(integer) 2127.0.0.1:6379> get andbc""127.0.0.1:6379> BITOP XOR xorbc b c(integer) 2127.0.0.1:6379> get xorbc"ÿÿ"
Internals
Since bitmap operations don’t have a data structure of their own, there isn’t a special data structure to describe. The Redis strings themselves are implemented as a binary safe string. Redis string data structure is internally called Simple Dynamic String (SDS). It is essentially a native char [] with some additional book keeping information. Implementation details can be found here.
The implementation of the bitmap functions is in the file bitops.c.
P.S. Given the importance of bit manipulation algorithms for critical OS and graphics functionality, most architectures provide special instructions for such operations. A good place to read up on various interesting computer arithmetic algorithms is the timeless classic Hacker’s Delight.
Applications
This popular GetSpool blog is a great example of the use of bitmap for real-time analytics over a large data set. It is also an example of the classic use case of a bitmap: to store boolean information of an extremely large domain into a (relatively) small space while retaining decent performance.
Size is usually a concern for really large bitmaps since most useful operations over it are O(N). In order to avoid working with huge keys, Redis documentation recommends splitting huge keys into multiple smaller ones. BITCOUNT performance remains acceptable until the key gets big. At that point, the recommendation is to either split the keys or use the range arguments to query incrementally. The recommendation to deal with slow BITOP operations is to run it on slaves. So, in general, it makes sense to deal with moderately sized keys and plan for future potential growth by splitting keys into multiple keys.
Redis Sets vs. Redis Bitmap
The nature of functionality provided by Redis Sets and the bitmap operations is similar. So it’s often confusing which of the two approaches are better. Well, it really depends on the exact nature of the use case. Obviously, this discussion is valid only for the kind of operations that both sets and bitmap can achieve.
Redis Sets are in general efficient and scale well and should be the data structure of choice until it’s size becomes untenable. Redis Sets are also easier to manage, program and debug would work well for most applications. The ease of use of Sets should not be underestimated: Code that manipulates bitmaps is usually hard to read, understand, debug and maintain. Even when the domain is very large, sets might still be appropriate. For e.g. if an application is meant to track daily visits to popular an e-commerce site, the results might still fit in well into a set as usually just 5-10% of the entire user base will visit the site daily. This obviously changes for a site where 60% of the entire user base is expected to log in daily. Then bitmaps become more relevant given the size and performance of logical bit-wise operations over a large number of keys. Redis Sets also have the distinct advantage of not having to map IDs to bit offsets. Similarly, if your values are from a domain larger than 232 then Redis Sets are must easier to use than figuring out mechanisms to split the domain for bitmap.
Analytics for an MOOC
Here’s a concocted (but real enough!) example for a place where one might apply Redis bitmap operations. Say, you are running a very popular online MOOC to which hundreds of thousands of students have enrolled. The academic team facilitating the course wants a dashboard where they can see real-time status of the progress of students as well as the general background of enrolled students. You decide to implement this via Redis bitmap operations. Here’s a step-by-step approach to it.
- Create a plan to map between student ID to bitmap offset. It could be as simple as the ID being the offset in the bitmap.
- Create and populate various demographic related bitmaps once the course begins. For e.g. students enrolled in other MOOCs from the same university, education level, gender etc.
- Now as the course progresses, you can create new bitmaps to record course progress. For e.g. students who completed watching all lectures of week 1, students who completed all assignments of week 1. etc.
- Now, creating real-time analytics based on these keys would be a very simple exercise and can be done on a drag and drop UI. For e.g
- A professor who wants to see how many students who watched the lectures for week 1 (A) but did not complete the assignment for week 1 (B): Operator: BITOP. Operation: A AND (NOT B).
- A student who completed all the assignments for week 1 (A), week 2 (B), week 3 (C) and week 4 (D): Operator: BITOP. Operation A AND B AND C AND D. Say, these are the people who passed the course.
- All male students (M) who passed the course (as calculated above, say, P): Operator: BITOP. Operation: M AND P.
- A number of students who passed the course: BITCOUNT P.
Similarly, any number of interesting cohorts can be setup as bitmaps and such permutations run on them.
Footnote
Other posts in our Redis data structures series:
No comments:
Post a Comment