<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

        <title>Babar K. Zafar</title>
        <link href="http://zafar.se/atom.xml" rel="self" />
        <link href="http://zafar.se/" />
        <id>zafar.se</id>
        <updated>Sun, 06 Jan 2013 23:20:16 GMT</updated>
        <author>
                <name>Babar Zafar</name>
                <email>babar.zafar@gmail.com</email>
        </author>

                <entry>
                <title>Think Different</title>
                <link href="http://zafar.se/think different.html" />
                <updated>Thu, 06 Oct 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/think different.html</id>
                <content type="html">&lt;blockquote&gt;
  &lt;p&gt;Here&apos;s to the crazy ones. &lt;/p&gt;
  
  &lt;p&gt;The misfits. The rebels. The troublemakers. &lt;/p&gt;
  
  &lt;p&gt;The round pegs in the square holes. &lt;/p&gt;
  
  &lt;p&gt;The ones who see things differently. &lt;/p&gt;
  
  &lt;p&gt;They&apos;re not fond of rules. &lt;/p&gt;
  
  &lt;p&gt;And they have no respect for the status quo. &lt;/p&gt;
  
  &lt;p&gt;You can quote them, disagree with them, glorify or vilify them. &lt;/p&gt;
  
  &lt;p&gt;About the only thing you can&apos;t do is ignore them. &lt;/p&gt;
  
  &lt;p&gt;Because they change things. &lt;/p&gt;
  
  &lt;p&gt;They push the human race forward. &lt;/p&gt;
  
  &lt;p&gt;And while some may see them as the crazy ones, we see genius. &lt;/p&gt;
  
  &lt;p&gt;Because the people who are crazy enough to think they can change the world, are the ones who do. &lt;/p&gt;
&lt;/blockquote&gt;
</content>
        </entry>
        <entry>
                <title>Product Development</title>
                <link href="http://zafar.se/product-development.html" />
                <updated>Tue, 20 Sep 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/product-development.html</id>
                <content type="html">&lt;blockquote&gt;
  &lt;p&gt;&quot;There are two ways to build a product. The first: a company starts with their strengths and builds to the needs of the consumer. The second: a company starts with the needs of the consumer and builds (into) the strengths of the company.&quot; -- Jeff Bezos&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We did neither and so we failed: &lt;a href=&quot;http://curictus.se&quot;&gt;http://curictus.se&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;R.I.P Curictus AB.&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>Share Dropbox</title>
                <link href="http://zafar.se/share-dropbox.html" />
                <updated>Sat, 10 Sep 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/share-dropbox.html</id>
                <content type="html">&lt;p&gt;So, I finally got sick and tired of manually typing in paths to files in shared
Dropbox folders. Even worse is when you manually have to click your way trough
to a deep hierarchy to find a specific document.&lt;/p&gt;

&lt;p&gt;Why can&apos;t you securely share a link with people who have access to the shared
folder?&lt;/p&gt;

&lt;p&gt;Well, now you can.&lt;/p&gt;

&lt;p&gt;Visit &lt;a href=&quot;http://www.sharedropbox.com&quot;&gt;http://www.sharedropbox.com&lt;/a&gt; for more information and to download
Windows/Mac extensions for Dropbox which provide this functionality.&lt;/p&gt;

&lt;p&gt;The project is &lt;a href=&quot;http://github.com/bkz/dropbox-uri&quot;&gt;open-source&lt;/a&gt; and feedback
is always welcome.&lt;/p&gt;

&lt;p&gt;BTW, please let people at Dropbox know that this a highly useful feature,
especially for people using their Dropbox for Teams product.&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>EC2 vs S3 for storing compressed data</title>
                <link href="http://zafar.se/ec2-vs-s3-cost.html" />
                <updated>Thu, 25 Aug 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/ec2-vs-s3-cost.html</id>
                <content type="html">&lt;p&gt;&lt;em&gt;Problem&lt;/em&gt;: Compress logfiles on EC2 and backup them to S3.&lt;/p&gt;

&lt;p&gt;Should you trade CPU time (high compression rate) for S3 storage costs?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# 720 GB stored for 6 months
#
# cmdline      cpu (s/MB)  size (%)    time (h)    ec2 ($)     s3 ($)      total ($)
# -----------------------------------------------------------------------------------
# raw          0.000013    100.000000  0.002560    0.000973    604.800000  604.800973
# gzip -1      0.015814    14.600000   3.238656    1.230689    88.300800   89.531489
# gzip -5      0.023426    12.100000   4.797696    1.823124    73.180800   75.003924
# gzip -9      0.032528    11.300000   6.661632    2.531420    68.342400   70.873820
# bzip2 -1     0.191470    11.200000   39.213056   14.900961   67.737600   82.638561
# bzip2 -5     0.246373    8.700000    50.457088   19.173693   52.617600   71.791293
# bzip2 -9     0.303029    7.900000    62.060288   23.582909   47.779200   71.362109
# pbzip2 -1    0.105028    11.200000   21.509632   8.173660    67.737600   75.911260
# pbzip2 -5    0.113369    8.900000    23.217920   8.822810    53.827200   62.650010
# pbzip2 -9    0.141710    7.900000    29.022208   11.028439   47.779200   58.807639
# lzma -1      0.060559    11.100000   12.402432   4.712924    67.132800   71.845724
# lzma -5      0.759580    5.600000    155.561984  59.113554   33.868800   92.982354
# lzma -9      1.692910    3.500000    346.707968  131.749028  21.168000   152.917028
# lzop -1      0.002903    20.900000   0.594432    0.225884    126.403200  126.629084
# lzop -5      0.003011    20.700000   0.616704    0.234348    125.193600  125.427948
# lzop -9      0.086239    14.400000   17.661696   6.711444    87.091200   93.802644
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Suprisingly, &lt;code&gt;gzip -9&lt;/code&gt; is almost as cost effective as &lt;code&gt;bzip -9&lt;/code&gt; with the
difference being that the same operation takes 10x as long.&lt;/p&gt;

&lt;p&gt;The table above was calculated for a standard &lt;code&gt;large&lt;/code&gt; EC2 intance (0.38$/h)
storing S3 using the first non-reduced tier (0.14$/GB/month).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# ----
# S3
# ----
#                      Normal          Reduced
# First  1 TB / month  $0.140 per GB   $0.093 per GB
# Next  49 TB / month  $0.125 per GB   $0.083 per GB
# Next 450 TB / month  $0.110 per GB   $0.073 per GB
# Next 500 TB / month  $0.095 per GB   $0.063 per GN
#
# ----
# EC2
# ----
#
# Large                 $0.38 per hour
# Extra Large           $0.76 per hour
#
# Hi-Memory On-Demand Instances
#
# Extra Large           $0.57 per hour
# Double Extra Large    $1.14 per hour
# Quadruple Extra Large $2.28 per hour
#
# Hi-CPU On-Demand Instances
#
# Medium                $0.19 per hour
# Extra Large           $0.76 per hour
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The code below was uses the calculate the results, pretty interesting to plug
in values to simulate various scenarios (thank
&lt;a href=&quot;https://twitter.com/#!/iconara&quot;&gt;@theo&lt;/a&gt; for the initial measurements):&lt;/p&gt;

&lt;pre class=&quot;prettyprint python&quot;&gt;
&lt;code&gt;
MEASUREMENT_SAMPLE_SIZE_MB = 800
MEASUREMENTS = [
    # setup        cpu s     ratio %
    # ------------------------------
    (&quot;raw&quot;   ,     0.01,     100.0),
    (&quot;gzip -1&quot;   , 12.651,   14.6),
    (&quot;gzip -5&quot;   , 18.741,   12.1),
    (&quot;gzip -9&quot;   , 26.022,   11.3),
    (&quot;bzip2 -1&quot;  , 153.176,  11.2),
    (&quot;bzip2 -5&quot;  , 197.098,  8.7),
    (&quot;bzip2 -9&quot;  , 242.423,  7.9),
    (&quot;pbzip2 -1&quot; , 84.022,   11.2),
    (&quot;pbzip2 -5&quot; , 90.695,   8.9),
    (&quot;pbzip2 -9&quot; , 113.368,  7.9),
    (&quot;lzma -1&quot;   , 48.447,   11.1),
    (&quot;lzma -5&quot;   , 607.664,  5.6),
    (&quot;lzma -9&quot;   , 1354.328, 3.5),
    (&quot;lzop -1&quot;   , 2.322,    20.9),
    (&quot;lzop -5&quot;   , 2.409,    20.7),
    (&quot;lzop -9&quot;   , 68.991,   14.4)
]

EC2_COST_SEC = 0.38  / 3600.0

S3_MB_COST_PER_MONTH = 0.140 / 1024.0

DATA_SIZE_MB = 30*24*1024 # 1 month of data @ 1 GB/h

STORED_FOR_MONTHS = 6

print &quot;%d GB stored for %d months&quot; % (DATA_SIZE_MB/1024, STORED_FOR_MONTHS)
print &quot;cmdline&quot;.ljust(12), &quot;&quot;.join([
        h.ljust(12) for h in [&quot;cpu (s/MB)&quot;,
                              &quot;size (%)&quot;,
                              &quot;time (h)&quot;,
                              &quot;ec2 ($)&quot;,
                              &quot;s3 ($)&quot;,
                              &quot;total ($)&quot;]])

for (cmdline, cpu_time, compress_rate) in MEASUREMENTS:
    cpu_time = cpu_time / float(MEASUREMENT_SAMPLE_SIZE_MB)
    compress_time = (DATA_SIZE_MB * cpu_time)
    ec2_cost = EC2_COST_SEC * compress_time
    s3_cost = (DATA_SIZE_MB * compress_rate / 100.0) * S3_MB_COST_PER_MONTH * STORED_FOR_MONTHS
    print cmdline.ljust(12), &quot;&quot;.join([
            (&quot;%.6f&quot; % v).ljust(12)
            for v in [cpu_time,
                      compress_rate,
                      compress_time / 3600.0,
                      ec2_cost,
                      s3_cost,
                      ec2_cost+s3_cost]])
&lt;/code&gt;
&lt;/pre&gt;
</content>
        </entry>
        <entry>
                <title>Git Notes</title>
                <link href="http://zafar.se/git-notes.html" />
                <updated>Mon, 22 Aug 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/git-notes.html</id>
                <content type="html">&lt;p&gt;Don’t be another one of those &lt;em&gt;lame programmers&lt;/em&gt; who’ll do anything but read
the documentation. You don’t need books, tutorials , screencasts or conference
tickets.&lt;/p&gt;

&lt;p&gt;Here are the magic steps required to learn Git:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Git.&lt;/li&gt;
&lt;li&gt;Read the online Git &lt;a href=&quot;http://www.kernel.org/pub/software/scm/git/docs/user-manual.html&quot;&gt;manual&lt;/a&gt; and/or the &lt;code&gt;man&lt;/code&gt; pages.&lt;/li&gt;
&lt;li&gt;Experiment.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let me digest it some more: &lt;strong&gt;read the manual&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s it. You don’t need anything else.&lt;/p&gt;

&lt;p&gt;Start with the included &lt;a href=&quot;http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html&quot;&gt;tutorial&lt;/a&gt;, move on the &lt;a href=&quot;http://www.kernel.org/pub/software/scm/git/docs/everyday.html&quot;&gt;everyday usage&lt;/a&gt; guide,
peek &lt;a href=&quot;http://www.kernel.org/pub/software/scm/git/docs/gittutorial-2.html&quot;&gt;under the hood&lt;/a&gt; and finally use the &lt;a href=&quot;http://www.kernel.org/pub/software/scm/git/docs/user-manual.html&quot;&gt;manual&lt;/a&gt; on a day to day basis.&lt;/p&gt;

&lt;p&gt;Avoid the urge to have opinions and be productive instead.&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>Social Jukebox Experiment</title>
                <link href="http://zafar.se/social-jukebox.html" />
                <updated>Sun, 21 Aug 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/social-jukebox.html</id>
                <content type="html">&lt;p&gt;&lt;strong&gt;Note: work in progress (offline)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try it out: &lt;a href=&quot;http://bkzserver.dyndns.org:8000/&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Download: &lt;a href=&quot;https://github.com/bkz/social-jukebox&quot;&gt;https://github.com/bkz/social-jukebox&lt;/a&gt;&lt;/p&gt;

&lt;div&gt;
&lt;a href=&quot;jukebox.jpg&quot;&gt;&lt;img src=&quot;jukebox.jpg&quot; width=&quot;420&quot;&gt;&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;A shared musical experience, everybody is the DJ, adding and
skipping songs on the go.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The jukebox will resort to playing classic Prince hits should it ever runs
out of songs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Jukebox automatically pauses playback when there&apos;s no one listening.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Music is streamed as 192 kbit/s MP3, quality should be fine.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Technical&lt;/h3&gt;

&lt;p&gt;The backend currrently compresses the audio stream as MP3 which limits browser
support to Chrome/IE9/Safari. Firefox only supports Ogg/Vorbis which isn&apos;t
suitable for our needs (requires us to transmit metadata frames manually since
the format isn&apos;t designed for streaming).&lt;/p&gt;

&lt;p&gt;Ogg/Vorbis is also not supported by Flash which in practice is the best audio
streaming technology available across all browsers and platforms.&lt;/p&gt;

&lt;p&gt;HTML5 audio support is completely broken for our needs, don&apos;t even bother. For
the the adventurous amongst you who want to give it a go anyway, the MP3 stream
is only a right click away :)&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>Bloom Filter Notes</title>
                <link href="http://zafar.se/bloom-filter-notes.html" />
                <updated>Mon, 08 Aug 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/bloom-filter-notes.html</id>
                <content type="html">&lt;p&gt;For the basics read up on the &lt;a href=&quot;http://en.wikipedia.org/wiki/Bloom_filter&quot;&gt;Wikipedia article&lt;/a&gt; and the &lt;a href=&quot;http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf&quot;&gt;Scalable Bloom
Filters&lt;/a&gt; paper. Make sure you understand the false positive rate &lt;code&gt;p&lt;/code&gt; and the
relationship to &lt;code&gt;m&lt;/code&gt; (bits) and &lt;code&gt;k&lt;/code&gt; (hashes/key).&lt;/p&gt;

&lt;p&gt;Hints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The capacity &lt;code&gt;n&lt;/code&gt; is reached when around 50% of bits are set.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hashing &lt;code&gt;k&lt;/code&gt; can be optimized to near constant time - it&apos;s better to try to
minimize &lt;code&gt;m/n&lt;/code&gt; - especially if you are doing network traffic or dealing with
huge datasets. Don&apos;t blindly set a expected &lt;code&gt;p&lt;/code&gt;, you might be able to shave
of a bit or so per key by simply relaxing &lt;code&gt;p&lt;/code&gt; by a small amount.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If using a third-party bitvector implementation as storage for your own
custom solution, verify that it actually clears out the bits on allocation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use separate slices of the bitvector (i.e. &lt;code&gt;m&lt;/code&gt; is divied into &lt;code&gt;k&lt;/code&gt; elements)
to get better distribution and predictable &quot;linear scan&quot; lookup patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build a growable bloom filter by combining Q filters, start out with Q1 and
add keys until to you reach full capacity, add Q2 and use it until you need
to add Q3 and so on. To check for a key you simply lookup the key in Q1..N.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common calculations:&lt;/p&gt;

&lt;div&gt;&lt;pre class=&quot;prettyprint&quot;&gt;
def optimal_bloomfilter(n, p):
    &quot;&quot;&quot;
    Calculate optimal m (vector bits) and k (hash count) for a bloom filter
    with capacity `n` and an expected positive failure rate of `p`. Returns a
    tuple (m, k).
    &quot;&quot;&quot;
    n, p = float(n), float(p)
    m = -1 * (n * log(p)) / pow(log(2), 2)
    k = (m / n) * log(2)
    return int(ceil(m)), int(ceil(k))


def bloomfilter_capacity(m, p):
    &quot;&quot;&quot;
    Calculate max capacity n for bloom filter with size `m` bits and expected
    `p` false positive rate.
    &quot;&quot;&quot;
    n = m * (pow(log(2), 2) / abs(log(p)))
    k = log(1/p, 2)
    return (int(round(n)), int(ceil(k)))
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The relationship between &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;m/n&lt;/code&gt; scale linearly as &lt;code&gt;n&lt;/code&gt; goes up so you can
use the following table as quick reference guide:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;|--------------|--------------|--------------|--------------|--------------|
|   capacity   |    max p     |   mem (kB)   |   bits/key   |   hash/key   |
|--------------|--------------|--------------|--------------|--------------|
|        1000  |         0.1  |        0.59  |        4.79  |           4  |
|        1000  |        0.05  |        0.76  |        6.24  |           5  |
|        1000  |        0.01  |        1.17  |        9.59  |           7  |
|        1000  |       0.005  |        1.35  |       11.03  |           8  |
|        1000  |       0.001  |        1.76  |       14.38  |          10  |
|        1000  |      0.0005  |        1.93  |       15.82  |          11  |
|        1000  |      0.0001  |        2.34  |       19.17  |          14  |
|--------------|--------------|--------------|--------------|--------------|
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is another table which optimizes (i.e. minimizes) &lt;code&gt;m/n&lt;/code&gt; (bits/key) and &lt;code&gt;k&lt;/code&gt;
hashes since we have to deal with even bits in the real world:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;|--------------|--------------|--------------|--------------|
|    max p     |  expected p  |   bits/key   |   hash/key   |
|--------------|--------------|--------------|--------------|
|       0.100  |      0.0918  |           5  |           3  |
|       0.050  |      0.0423  |           7  |           3  |
|       0.010  |      0.0094  |          10  |           5  |
|       0.005  |      0.0046  |          12  |           5  |
|       0.001  |      0.0009  |          15  |           8  |
|       0.001  |      0.0005  |          16  |          10  |
|       0.000  |      0.0001  |          20  |          10  |
|--------------|--------------|--------------|--------------|
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are a lot of implementations floating around on the net, some perfectly
fine but most seem to break down on two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hash algorithm &lt;code&gt;h&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Not verifying the false positive rate &lt;code&gt;p&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dealing with hashing is pretty straight-forward, don&apos;t bother with
cryptographic hashing (SHA/MD5) or CRC32 - simply find a &lt;a href=&quot;http://code.google.com/p/smhasher/&quot;&gt;Murmur&lt;/a&gt; hash
implementation for your platform/language combination and you&apos;re set. Next step
is to read the &lt;a href=&quot;http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf&quot;&gt;Less Hashing, Same Performance: Building a Better Bloom
Filter&lt;/a&gt; paper and understand how you can turn the hashing &lt;code&gt;k&lt;/code&gt; operation into
an almost constant time operation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;gi(x) = h1(x) + i * h2(x) mod m

h1 = murmur_hash(key, 0)
h2 = murmur_hash(key, h1)
for n in 1..k:
    g(n) = (h1 + i * h2) % bits_per_slice
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The second issue, verifying that the implementation you&apos;re intending to use
actually holds up to the expected &lt;code&gt;p&lt;/code&gt; is &lt;strong&gt;extremely important&lt;/strong&gt;. If you&apos;re
using a library which doesn&apos;t contain a test suite for checking the false
positive rate you &lt;em&gt;absolutely should&lt;/em&gt; to write something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Retrieve or calculate the filter capacity &lt;code&gt;n&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fill upp the filter with &lt;code&gt;n&lt;/code&gt; keys.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Now run a large series of lookups against the filter with random keys. Make
sure the keys are true false positives and keep track of the errors so that you
can calculate the &lt;code&gt;fp/n&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We expect &lt;code&gt;fp&lt;/code&gt; to hover around the expected &lt;code&gt;p&lt;/code&gt; value since we&apos;re dealing with
a probabilistic data-structure but large deviations need to be further
investigated. Also look out for implementations where the &lt;code&gt;fp&lt;/code&gt; is &lt;strong&gt;very low&lt;/strong&gt;
as this a good sign that the filter is not correctly tuned, &lt;code&gt;m&lt;/code&gt; is probably way
higher than it needs to be.&lt;/p&gt;

&lt;p&gt;I&apos;ve a written a small Python package to document most of my findings and to
serve as a future reference for doing bloom filter calculations. Let me know if
you find any bugs or if its helpful.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://github.com/bkz/bloom&quot;&gt;https://github.com/bkz/bloom&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;Resources:&lt;/h3&gt;

&lt;p&gt;Almeida, P. S., Baquero, C., Preguiça, N., and Hutchison, D. 2007. Scalable
Bloom Filters. Inf. Process. Lett. 101, 6 (Mar. 2007), 255-261. DOI=
http://dx.doi.org/10.1016/j.ipl.2006.10.007 &lt;a href=&quot;http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kirsh, A., and Mitzenmacher, M. 2006. Less Hashing, Same Performance: Building
a Better Bloom Filter. In Proceedings of the 14th Annual European Symposium on
Algorithms (ESA), pp. 456-467. &lt;a href=&quot;http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf&quot;&gt;link&lt;/a&gt;&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>Hybrid Web Applications</title>
                <link href="http://zafar.se/html5-future.html" />
                <updated>Mon, 25 Jul 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/html5-future.html</id>
                <content type="html">&lt;p&gt;I&apos;m really hoping HTML5 makes it big on the client side. Working with
CSS+DOM+JS is an incredibly broken and frustrating experience but it&apos;s still
the best cross-platform solution we&apos;ve had in computing. Even mobile devices
are catching up nicely, my Samsung Galaxy S II offers a browsing experience
almost equal to the one you get on a laptop or a workstation. Native
applications will always be king but for most applications needs the lower
implementation and support costs for a set of web applications adapted to work
on a wide range of platforms are infinitely more attractive.&lt;/p&gt;

&lt;p&gt;One interesting approach which I&apos;ve experimented with using Flash/Flex is a
hybrid approach where the application is split into three parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Front-end client UI (HTML+JS or Flash/Flex).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Localhost HTTP backend for client-side processing and access to local resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Web HTTP backend for credentials, DB, sharing data, social components, etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you write &lt;strong&gt;(2)&lt;/strong&gt; in a cross-platform language (Python, C++, etc) which can
be packaged into a single executable which runs a tiny HTTP server in the
background your applications is almost guaranteed to work on Linux, OSX,
Windows and mobile devices while exposing platform specific API:s to the
client. The &lt;strong&gt;(2)&lt;/strong&gt; components also allows you to design your application to
work offline - connecting to &lt;strong&gt;(3)&lt;/strong&gt; as needed and when possible allows for a
smooth experience even on mobile devices.&lt;/p&gt;

&lt;p&gt;The problem so far has been tools, Firefox and FireBug has been a nice combo
but the lack of deep integration with the browser does sometimes offer a
shallow experience. Luckily the Chrome team has acknowledged this and offer
their own variation of developer tools which are deeply integrated with both
the V8 JS VM and the network stack of the browser.&lt;/p&gt;

&lt;p&gt;Great introducton to Chrome Dev Tools from Google I/0 2011:&lt;/p&gt;

&lt;iframe width=&quot;640&quot; height=&quot;390&quot; src=&quot;http://www.youtube.com/embed/N8SS-rUEZPg?hd=1&quot; frameborder=&quot;0&quot; allowfullscreen&gt;&lt;/iframe&gt;
</content>
        </entry>
        <entry>
                <title>Tech Bubble</title>
                <link href="http://zafar.se/tech-bubble.html" />
                <updated>Wed, 20 Jul 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/tech-bubble.html</id>
                <content type="html">&lt;p&gt;Notes from &lt;a href=&quot;http://www.economist.com/debate/overview/206&quot;&gt;Economist debate&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;Definition of a tech bubble: rapid valuation inflation which exceeds company
fundamental values (profit expectations aren&apos;t in sync the financial reality
of the security) followed by a crash in value.&lt;/p&gt;

&lt;p&gt;Four phases of bubbles: Stealth, awareness, mania, blow-off.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stealth&lt;/strong&gt;: New industry segments are invested in which generate real
profit and value. Early investors who &quot;understand&quot; how the new markets could
result in &quot;off the cart&quot; revenues and profits increase their investments.
These people help build the high-profile companies of the bubble but always
cash out before the world enters the next phase since they understand the
game being played and move on to find the next thing instead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Awareness&lt;/strong&gt;: Later investors and different kinds of middle men
(brokers, salespeople) start to notice and bring investment money resulting
in higher prices. When valuations start going to the roof (i.e. Facebook
going from $10 billion to $50 billion in the same year) the mantra &quot;this
time - everything is different&quot; is inevitably preached. Everything in this
phase is about keeping the bubble alive and making it bigger.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mania&lt;/strong&gt;: The public jumps in with their money for the &quot;investment
opportunity of a lifetime&quot;. More money leads to even higher prices pushing
IPO valuations to insane levels. The goal is to maximize the bubble using
investments from people who aren&apos;t domain experts by showing them how early
investors in this phase are making extraordinary amounts of money as the
bubble grows. The process repeats until the bubble pops and the result is
basically a wealth redistribution, transferring value from &quot;marks&quot; to &quot;smart
money&quot; and &quot;promotors&quot; (see the chart above).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Warren Buffet on the housing bubble:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The only way you get a bubble is when a very high percentage of the
  population buys into some originally sound premise...that becomes distorted
  as time passes and people forget the original sound premise and start
  focusing solely on the price action. People overwhelmingly came to believe
  that house prices could not fall significantly. And since [property] was the
  biggest asset class in the country and it was the easiest class to borrow
  against it created, you know, probably the biggest bubble in our history.&lt;/p&gt;
&lt;/blockquote&gt;
</content>
        </entry>
        <entry>
                <title>Spotify Notes</title>
                <link href="http://zafar.se/spotify-notes.html" />
                <updated>Sat, 16 Apr 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/spotify-notes.html</id>
                <content type="html">&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: If you haven&apos;t used Spotify most of this won&apos;t make any sense
meaning that those of you in the US should probably find ways to play around
with the product instead of reading this. Sorry for any factual errors, I
prepared this for a different context...&lt;/p&gt;

&lt;p&gt;As a startup person myself and fellow Swede I&apos;ve been following the company
closely since their early days. Even though there are a lot of players in the
music streaming market I strongly believe Spotify has the best offering, great
taste combined with solid technology gives for a very responsive and
well-thought out user experience.&lt;/p&gt;

&lt;h2&gt;Business Model&lt;/h2&gt;

&lt;p&gt;First of all, if you&apos;re even slightly interested in the music industry and how
modern music streaming ventures are trying to &quot;solve&quot; the music problem you
need to research the story of &lt;a href=&quot;http://en.wikipedia.org/wiki/Imeem&quot;&gt;imeem&lt;/a&gt;. The easiest and probably most
interesting way is to watch one the &lt;a href=&quot;http://jacquesmattheij.com/Dalton+Caldwell,+Imeem&quot;&gt;founders talk&lt;/a&gt; at &lt;a href=&quot;http://www.justin.tv/startupschool/b/272178844&quot;&gt;YC Startup
School&lt;/a&gt;. So why is imeem interesting in the context of Spotify? Well both of
them tried to solve a hard problem only to realize that ad-supported business
models don&apos;t scale and that the music industry is highly territorial and
fragmented. Dealing with people who see their revenues falling something like
8.4% just last year ($1.5 billion globally) is probably not the easiest thing
in the world.&lt;/p&gt;

&lt;p&gt;So why isn&apos;t Spotify dead like imeem? In 2009 it seems that Spotify and others
finally understood the power of the iTunes experience. You buy music, play it
on your computer and then bring it along on your iPod/iPhone. What most people
fail to realize is that the problem with streaming is not renting versus
owning. It&apos;s that you can&apos;t consume music in the same way, the streaming
solution locks your music to computers (and require a working internet
connection). To make matters worse the music industry seems to insist on a
model of $10/month per user which requires a lot of added value if you&apos;re going
to get someone to pay when they&apos;re used to free.&lt;/p&gt;

&lt;p&gt;Timing is everything and with Apple allowing native music apps on the iPhone
they kind of gave away the keys to iTunes kingdom to companies like
Spotify. The entire iPod ecosystem is controlled by Apple and you can&apos;t easily
put music on them but with iOS a new music platform opened up. Spotify put out
their mobile app which synced music over WiFi/3G and allowed for offline
playing. And the customers responded. I believe the Spotify mobile app is the
single most important reason for driving sales and securing a profitable
future. In retrospective it&apos;s almost scary how much of being a successful
startup depends on timing and sheer luck. Had Spotify not been able to provide
a mobile offering we&apos;d instead be analyzing another imeem and how their big bet
on ads and a business model where people would pay to skip them didn&apos;t scale.&lt;/p&gt;

&lt;p&gt;So mobile seems to be key for making the streaming business models
work. What&apos;s even more fun is that it looks like the mobile carriers are the
ones who are going to scale the business model even further. The Swedish
version of AT&amp;amp;T is called Telia and since last year most of their phone
offerings have had 6 months of Spotify Premium included. The carrier basically
subsidies Spotify just like they do with the phones. This setup shovels
customers into the Spotify funnel (with zero cost) and gives them plenty of
time to let the product and the social aspects of it (Facebook sharing,
collaborative playlists) get people hooked. User retention is a great problem
to have when you user acquisition cost is zero.&lt;/p&gt;

&lt;p&gt;OK, so the model seems to working out. But they&apos;re still stuck in
Europe. Rumors about launching in the US have been floating around for almost a
year now. My bet is that Spotify probably won&apos;t launch in the US without a
partner since Apple is dominating the online music market and can leverage that
against the music labels. Remember when Apple bought Lala? Just to kill it off?
Understand that Apple makes money selling hardware. Not music or software,
iTunes is high revenue but very low profit-margin. So if you in any way
threaten sales of the Apple ecosystem which you indirectly do when you compete
with iTunes (since it drives iPod sales) you end up with Apple working against
you. You might think that with the iPad/iPhone Apple shouldn&apos;t care since they
still make money of the hardware. Sure, but then you&apos;re forgetting the 20
million iPods they sold in Q1 2011 alone...&lt;/p&gt;

&lt;p&gt;(As a side note, a US launch does seem pretty close though, Spotify recently
opened offices in New York which unlike other offices outside of Sweden seem to
include an engineering staff.)&lt;/p&gt;

&lt;h2&gt;Technology&lt;/h2&gt;

&lt;p&gt;Spotify uses a modified (proprietary) version of Ogg Vorbis which support
seeking in streams. Music is normally streamed at 160kbs but paying customers
can select higher quality streams at 320kbs. Music is streamed to clients using
a hybrid P2P model which is tuned to &lt;a href=&quot;http://www.ietf.org/meeting/75/p2p-presentations/documents/spotify-gunnar-kreitz.pdf&quot;&gt;minimize latency&lt;/a&gt;, in most cases playing
a song is instant. The focus on speed is not merely a gimmick, it is well known
as shown by Google and others that speed and low latency change usage
behavior. People will use your product more if it&apos;s reponsive since they feel
more in control.&lt;/p&gt;

&lt;p&gt;The Spotify client maintains a local cache which minimizes network
traffic. Together with the P2P components Spotify probably has the best
distribution model of all the players  HTML/Flash solutions are much harder to
scale and the costs are drastically higher when you can&apos;t cache or use P2P.&lt;/p&gt;

&lt;p&gt;One of the hardest technical problems which are overlooked by almost everyone
is the playlists component. Since playlists can be shared and collaborated on
with others in real-time and need to be synced across different devices you end
with up a form of distributed revision control. The real-time ascept make it
even harder since you have to use a push-model and maintain persistent
connections to all clients.&lt;/p&gt;

&lt;p&gt;A cute trick to make it easy to share songs and playlists is to give them
unique a URI. Spotify uses both normal HTTP links and a shorter form like
spotify:track:2cJz1loJp5EZM6shmQpLZN which are handled by OS protocol handlers
(Windows: via HKEY&lt;em&gt;CLASSES&lt;/em&gt;ROOT, OSX: CFBundleURLTypes).&lt;/p&gt;

&lt;p&gt;Spotify provides a &lt;a href=&quot;http://developer.spotify.com/en/libspotify/overview/&quot;&gt;binary-only library&lt;/a&gt; for Windows, OSX, Linux and ARM
which can be used to write your own player. The SDK is a bit quirky but you can
make it to work with something like SDL pretty easily, see my &lt;a href=&quot;https://github.com/bkz/libspotify-sdl&quot;&gt;hackish Win32 demo&lt;/a&gt;. For an interesting code reading session I also recommend
&lt;a href=&quot;http://despotify.svn.sourceforge.net/viewvc/despotify/src/lib/&quot;&gt;despotify&lt;/a&gt; which is the result of reverse engineering the Spotify client
protocol. Be sure to check out the authentication trick using &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.4982&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;computational
puzzles&lt;/a&gt; which makes the Spotify servers DoS-resistant.&lt;/p&gt;
</content>
        </entry>
        <entry>
                <title>Thoughts on software design</title>
                <link href="http://zafar.se/thougths-on-software.html" />
                <updated>Mon, 10 Jan 2011 01:01:00 GMT</updated>
                <id>http://zafar.se/thougths-on-software.html</id>
                <content type="html">&lt;p&gt;As we are in the middle of crunch on our big project at work I thought I&apos;d write down what I (currently) believe is the correct way of designing and implementing software. My views are heavily influenced by Emacs and dynamic languages, although both of the major projects I&apos;ve worked on so far have been implemented using a statically typed language (C++). As I&apos;m mostly interested in writing software that serves a purpose, all of the professional projects I&apos;ve worked on so far have been consumer-oriented products. &lt;/p&gt;

&lt;h2&gt;People&lt;/h2&gt;

&lt;p&gt;This factor alone will make or break a project. If a majority of the team hasn&apos;t worked on a project together things get dangerous. My advice to management in this case would be to let the team free-roam for a month or so and let them work on writing tools and libraries needed for the project. It might look like a folish waste of time but for the team this period is critical. First of all, senior people get the chance to see new hires in action and secondly they can easily spot people who are not a fit for the team for this specific project. &lt;/p&gt;

&lt;p&gt;Why is this important? I believe that the state of your software is determinted by the least competent developer on the team. This is the key if you want to run a successful software development project and in general if you want make money in this business. Good programmers are well worth paying for as the value you can extract from their work far exceeds what you have to spend to make them stay happy. &lt;/p&gt;

&lt;h2&gt;Team dynamics&lt;/h2&gt;

&lt;p&gt;The team should consist of people with one or more of the following traits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-energy, creative&lt;/li&gt;
&lt;li&gt;Detail-oriented, guru&lt;/li&gt;
&lt;li&gt;Customer and usability-oriented&lt;/li&gt;
&lt;li&gt;Gets-anything-done&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don&apos;t have any team members with the high-energy trait, suspend the project and wait until you get the right people instead of wasting time and money. The team should not consist of more than 3 or 4 people, larger projects should be done with multiple teams. Well motivated smart people will do the right thing and manage themselves. Project managers should manage projects, not people. &lt;/p&gt;

&lt;h2&gt;Software&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;If you don&apos;t have a clear picture of the user, don&apos;t bother unless you&apos;re doing R&amp;amp;D work.&lt;/li&gt;
&lt;li&gt;Unit-test or die.&lt;/li&gt;
&lt;li&gt;Analyze and design top-down. Buzzwords like Agile and Scrum don&apos;t remove the need for thinking and knowing what you&apos;re doing.&lt;/li&gt;
&lt;li&gt;Implement bottom-up, short delieverable iterations.&lt;/li&gt;
&lt;li&gt;Design composable software and libraries, because you&apos;ll end up doing X when Y isn&apos;t Z if Q is P.&lt;/li&gt;
&lt;li&gt;Make the software configurable at runtime.&lt;/li&gt;
&lt;li&gt;Make the internal data structures of the software inspectable at runtime.&lt;/li&gt;
&lt;li&gt;Make the system composable (pluggable) at runtime.&lt;/li&gt;
&lt;li&gt;Always start by attacking high-risk problems with prototypes. Plan to throw the first version away.&lt;/li&gt;
&lt;li&gt;Experiment, observe, adapt...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lessons learned in 2010: (2), (4), (9)&lt;/p&gt;
</content>
        </entry>

</feed>
