Yes, You Absolutely Might Possibly Need an EIN to Sell Software to the US
Posted on: 2013-11-13 04:45:08

Warning

For qualified advice, ask an organization such as the chamber of commerce. Here is some information from BDO for Canadians on selling to the USA. If you can pay for advice, you should set up an appointment with an adviser.

More...

How Asana Breaks the Rules About Per-Seat Pricing
Posted on: 2013-11-06 02:21:20

If one apple costs $1, how much would five apples cost? How about 500?

In everyday life, when you buy more of something, you get more bananas for your buck. The fixed costs decrease. If you sell a lot of apples to one person, you don't have to wrap each one, you don't have to pay fixed transaction fees on each sale, and you don't have to worry about finding someone to buy the other 499 apples. The savings are passed on to the consumer. Often, software is priced this way too. That's why I love Asana's pricing page. It breaks the rules.

Asana prices their product based on its value. It lets teams coordinate about projects and tasks they are working on. Asana is very clear about the value they give. In fact, the pricing page tells you that the only difference between the paid and free versions is that "premium plans allow you to coordinate with more team members, as well as the features listed in the table above. All other user features are exactly the same."

It's a mathematical law that as the number of people in a team grows, the number of communication paths grows quadratically. A company with 100 people using it is therefore getting much more value out of it than a company of 15 people, so they pay higher per-seat costs.

Homework

What is the one thing that gives your software value? Are you directly charging for that thing, or something else? How can you take advantage of team effects to provide more value when more people use it?

Get tips on improving your software business, right in your inbox

Enter your email and I will send you tips on selling software right in your inbox. I have been selling software since 1998, and whether it's consulting products, adsense, or software as a service, I have done it all, and I want to tell you what I wish I knew when I started.

More...

5 Ways PowToon Made Me Want to Buy Their Software
Posted on: 2013-10-31 21:28:19

Powtoon is online software that lets you create animated powerpoint presentations, without the steep learning curve of Adobe Flash. The selling techniques they use are simple and powerful. Even though I saw through their tricks at every step along the way, I am now a customer and proud of it. It is worthwhile to look at what they did, because these are simple things that you can do to improve your software business.

Help users remember your app with email

As soon as I signed up with powtoon, the emails started. By the fifth day of the constant barrage I started marking them as spam. But for those five days, my inbox was a constant reminder that Powtoon was there and waiting for me. Since it was on my mind, I mentioned it to three other people in conversations I had. If the emails weren't there, I might be still struggling to remember the name of that cartoon site I signed up with, instead of being a paying user. What was that name again? Powertoon.io? Powrtoon.com?

Homework

If you offer software as a service, you should have an email campaign that offers helpful tips, as a way of reminding people of where they've been. Emailing every day is excessive, however.

Go read the user manual $23 ebook we're giving you FREE!

Sadly, for users coming from Google Docs or PowerPoint, Powtoon's first-use experience still confusing and it is enormously helpful to read the manual. Many people read the manual without realizing it because it is delivered in the form of an Ebook, with a value attached to it. "You could buy this book from Amazon for $23!" says the copy, "or download it here FREE".

When I get a free book I usually toss it in the recycling. But if it's worth money, I'll probably read it.

Homework

Sometimes users need to read the manual to get the most out of your product. If reading the manual is a part of your sales funnel, and you should do everything you can to get people to read it. Package it as an E-book to get more people through this stage.

Price for value, not competition

Powtoon could have looked at Adobe Creative suite and priced their product cheaper. But Adobe is not the competition. Instead, the much larger, richer market of skilled powerpoint folks are looking at Powtoon, and comparing it with professional video design. From that perspective it looks like a bargain.

Homework

Unless your goal is to be acquihired, it is difficult to build a business on $9/month. If your software has no plan that costs at least $100/month, try to think of some feature that would be a must-have for businesses. For big companies, cost doesn't matter, as long as your highest plan has something they need.

A clear reason to buy

It is easy to make software that hides the premium features away. They are grayed out and pushed to the end of the list. I've done it myself. For users, its like living in a cosy room. If you ignore the locked door behind the couch, you can forget that it is the foyer of a mansion.

At every step, Powtoon reminds you that you are missing the majority of its capabilities. While choosing a background tune, I had to scroll through hundreds of songs that I could play but couldn't use. Only about five songs were available, but they were strewn through the list. Every other item of clipart is only available in the highest-cost plan, but again, you have to scroll through them. You can make a great video without paying anything. You are free to use any song or image that you can upload. But you leave with the deep sense that the software is crippled.

Homework

Do users on your free plan say that they don't see any reason to buy? If the premium features are hidden away, it's time to make them more visible.

Price segmentation

The powtoon Agency plan serves two purposes. Firstly, it lets users who can pay more do so. Some users don't care how much something costs don't want to have to think about it. These users have an option to pay more. But I am not one of them.

The Agency plan is listed first on the pricing page, and psychologically anchors the value of the rest of the plans. When I first used Powtoon I was stuck by the high cost. I couldn't justify it, so I resolved to use the free plan. But a few days later I saw this:

A few days after I signed in, Powtoon sent me this offer. It made me rethink things in a hurry. With this sale, and the "savings" of over $400 buy-now-before-it's-too-late plastered on my monitor every time I sign in, they successfully reframed the middle tier as a bargain.

Homework

In your pricing page, what techniques do you use to get people from the free tier into the middle tier plan? How can you use the high-cost tier to reframe the value of the other plans?

So what did I end up making?

I whipped together a video for my consulting product, Zwibbler. Zwibbler is a drop-in solution that lets users draw in your web app.

Get tips on improving your software business, right in your inbox

Enter your email and I will send you tips on selling software right in your inbox. I have been selling software since 1998, and whether it's consulting products, adsense, or software as a service, I have done it all, and I want to tell you what I wish I knew when I started.

More...

How I run my business selling software to Americans
Posted on: 2013-04-30 14:09:51

I first realized I had overpaid when I received my articles of incorporation from the law firm. Was it because they were in a leather bound binder? Was it because it had been shipped overnight from Toronto to Waterloo, a distance of 83 km, such a distance that I could have driven there and picked it up and then returned and paid less than the cost of shipping it? Instead I had to wait two business days for the Fedex truck to drive back to Cambridge since I wasn't home, and then I had to call in to arrange to pick it up at a "conveniently located" Fedex office a week later.

No, I was miffed because I had paid $1500 to incorporate, when I could have done over the web for far less. Still, it is a very nice binder.

Since that time I have slowly been finding ways to optimize my business, which consists of selling software to Americans. I sell it all kinds of ways.

A typical month:

WebSequenceDiagrams subscriptions $1600
WebSequencediagrams Server Sales $1600
Rhymebrain.com Adsense Revenue $1700
Zwibbler.com licensing and consulting $2000

*My annual reports are on Google Plus, where nobody reads them.

According to the Canada Revenue Agency, I'm a profitable small business. The only thing preventing me from spending it all on iPads, Google Glasses and Surface Tablets is the fact that I have to feed my lovely family.


I keep costs down by feeding them chocolate flavoured soylent in a bucket

Here's some tips on running a business in Canada selling software to Americans.

Incorporation

Incorporation is a good choice. While it gives a valuable sense of security (albeit false) against lawsuits, the most useful benefit is income deferral. I started this company while I was working full time. If it were a sole proprietorship, I would have had to pay the top tax rate of 40% on everything I earned. This would have been a huge disincentive to growing my business.

Evil Tip for starting a company while working somewhere else

Whenever someone asks if it's legal, point them to the corporate policy and claim that there's a simple form that you fill out, and loudly complain that it takes the legal department eight months to answer any emails. With luck, the person that asked will launch into his own stories about the slow legal department, thus deflecting the conversation to a more useful topic.

With all of the profits inside a corporation, I had to pay only the 16% corporate tax on them. But I can keep the profits there until I feel like withdrawing them. It's like having an extra RRSP.

However, incorporation does have some added responsibilities. First, I have to pay Intuit TurboTax $200 every year to file my taxes. And that software only does about 10% of the work -- I have to maintain a balance sheet and income statement for the year so I can get the numbers to enter into TurboTax. Still, I figure we are about even, because Intuit also bought the server edition of WebsequenceDiagrams.

Evil Tip for doing your own taxes

It is easy to make a lot of mistakes the first time. But hiring an accountant costs $2000, while penalties from the government for making a mistake are maybe about $50 tops. I'll re-evaluate this when I'm making sufficiently more profit.

After you incorporate, you can't do very much until you get:

A business bank account

Canada has a cartel of five major banks. Stay away from them. I was explaining banking to my 3 year old daughter (her twitter account):

Me: Banks are a place where you keep your money.

Lillian: WHY?

Me: Because they give you interest... (thinking) but then they take it away and charge you more money.

Lillian: WHY?

Me: I guess you put your money in a bank to keep it safe, and every month they take some away.

Lillian: WHY?

Me: I don't know. If you keep your money in the bank they will slowly take it away from you.

Lillian: I WILL KEEP MY MONIES BESIDE MY POTTY.

Me: Good. Now it's time to watch Dora. Daddy's got to go buy some bitcoin.

Instead, I use a local credit union, which has a pay-as-you go account. For $5 a month I can keep all of my profits there and write cheques. They wanted to sell me a business cheque book. What is it with all these leather-bound things? Does my business have to have everything wrapped in cow skin to appear successful? I imagine it might be useful in a narrow range of situations:

Me in line at the grocery store: Will you take a cheque for these RufflesTM brand potato chips?

Attractive cashier: Um, noooo. What do you think this is, like 1985? Don't you have a Paypass chip?

Me: What about... from THIS chequebook? (whips out the corinthian leather-bound Execu-Check 5000 with dual-signature, day-planner, and matching gold pens.)

Attractive cashier: Oooh, no problem, Mr. Hanov. What are you doing later?

My wife: He'll be sleeping in the basement. Let's go.

I managed to get them to give me personal cheques with my business name written on them by asking very nicely. Credit unions are nice that way.

Unfortunately you have to deal with big banks sometimes. I needed to get:

A credit card

After a lot of research, I selected the Bank of Montreal credit card for businesses, because there is no fee and every December I get some cash back for using it. I filled out the application with my personal information, and since I was working at the time, there was no problem getting it.

Evil tip for paying for things

Currency exchange is expensive. As a rule, I pay for US things with US dollars, and Canadian things with Canadian dollars. This was only a problem with Microsoft Office 365, which insisted on charging my Paypal account in CAD. I had to tell Microsoft that I live in Beverly Hills to use my USD Paypal account. Because that's the only zip code I know.
I've only talked about the Canadian side so far. But many Canadian software companies get all their revenue in US dollars. There is an important trick for dealing with this, which I will get to shortly. But first:

Paypal

Paypal is utterly horrible to use and develop for. For example, to cancel a subscription for a user from last August, I have to page through them all, 25 at a time, waiting 5-10 seconds for each page load, until I get to August. If I didn't know when the subscription started, then I would be there for much longer reading all of the names. For developers, Paypal offers a special sandbox area which hasn't worked in months, and special paypal IPN notifications which are broken for several months out of the year.

But once it's finally working, Paypal works everywhere. It lets me enter in tax rates for all of the Canadian provinces and territories (Why aren't they there already?). It lets me accept orders from Israel, France, the UK, Germany, Australia, New Zealand, Finland, Norway. When companies that I sell to refuse to use Paypal, it lets me pay $35 to enter the credit card details manually. The fee is outrageous, but it's a steal compared to anywhere else I can use. (Update: Use stripe.com.

I use Paypal for most of my sales, but I have a small nagging fear that one day, the US Government, (which to us Canadians appears quite insane, so they would do this kind of thing) will take all of my money one day to fight terror by arming day cares. But when I transfer money out of Paypal into Canada, they use over-the-counter exchange rates. If you are using Paypal for currency-conversion you must stop immediately, because there is a better way, which I will explain to you shortly. But first, to get the money out of Paypal without currency conversion, you need:

A US dollar bank account

My credit union couldn't offer all the services I needed to run my global company. I searched around and I found a reasonable deal with the Royal Bank of Canada USD Checking account. It is regularly $9/month, but with a minimum balance of $2500 the price drops to $2.

I need the USD bank account so I can accept incoming bank transfers with no currency losses. Outside of North America, it is common to pay for large items by exchanging bank account information, so the buyer can transfer the cash directly into the seller's accounts. North American banks discourage this behaviour by levying huge fees. When I invoice a customer, I include the bank details and in a few weeks I receive the full amount, minus the $15 fee for RBC, and $25 for some mysterious "intermediary bank". Still, a flat $40 charge looks pretty good on amounts greater than $1000 when compared to Paypal's fees for the same.

Evil tip for well-connected, wealthy European financiers

An intermediary bank is a good business to get into. Also, drugs.

So I have a Canadian Dollar account in a credit union, US dollars in Paypal, and US dollars in RBC. How do I get my money into good old Canadian loonies and toonies? That's where my favourite part comes in.

XE.com

If you are a Canadian company whose revenue comes in USD, you should immediately get an account at a currency broker. Once set up, it is a simple matter to transfer money between a US and Canadian bank account, at crazy-low conversion fees.

For example, I just went to Paypal and XE and priced out transferring $5000 USD into Canada. Today, the difference is not huge, but the spread has been much higher in the past.

Paypal$4,929.33 CAD
XE.com$4,980.00 CAD

I do not want 1 to 3% of my revenue disappearing off the top, so I use XE. When I signed up, I registered my US account and the Canadian one with them by copying the numbers off some cheques, and now I can initiate a transfer in seconds.

How do you do it?

Do you have a different way for optimizing your cash flow? Did I miss anything? Please share your tips in the comments.

More...

0, 1, Many, a Zillion
Posted on: 2013-04-05 17:17:13

There are only four numbers in computer programs:

0, 1, many, "a zillion"

If you have 2 or more of anything, you are, in general, better off using loops to process many of them.

But what is "a zillion?"

Zillion is a made-up number. Your system cannot hold a zillion items in memory. It cannot show a zillion items on the screen.


Doesn't work for "a zillion":

Select employee name:



Doesn't work for "a zillion":
def handleFiles( filenames: Array[String] ) {
    val results = openFiles(filenames).readAll().processAll()
    results
}
* The program first opens all the files, and then processes them. The OS will run out of file handles.

Doesn't work for "a zillion":

Changing software from handling "many" to "a zillion" is hard if the program is already written.

Decide when you need to handle a zillion.

More...

Give your Commodore 64 new life with an SD card reader
Posted on: 2012-08-01 21:22:38

This August marks the 30th anniversary of the most successful computer model in history. One company put personal computers into the people's homes, and launched an entire industry overnight. For an entire decade, despite attempts at marketing improvements, the original platform stood the test of time, virtually unchanged. Even today, the Commodore 64 is celebrated by a community of hobbyists.

In honour of the anniversary month, here are some up to date instructions on how to read old data from the disks. Floppy disks only have a usable lifespan of 10 years, and most of them in my collection are over 25 already. At this point, the chemical binder between the magnetic particles and the plastic substrate is degrading, and the data can literally fall out of the disks. Fortunately, up to date software from the opencbm project tries very hard to read the data and can often succeed when a disk appears to be unusable when connected to a C64.

Here's what you need to read your floppy disk into your PC:

  1. A commodore 1541 disk drive, in good condition, and power supply.
  2. The drive cable.
  3. This ZoomFloppy adapter
  4. a USB cable.

In addition, if you want to read disk images from an SD card hooked into the real Commodore 64, you will need the uIEC adapter, which can be substituted for a real floppy drive.

Transferring floppy disks to the PC

I want to scan in my copy of Jumpman so its digital bits are preserved.


This copy is totally genuine EPYX product. At least that's what they guy at the swap meet told me, behind the K-mart.

You have to download the special build of the opencbm software from here. I am using the windows version. Read the manual very carefully to install the driver, before you can use the command line tools.

Then, plug the ZoomFloppy into the Commodore drive using the commodore cable (make sure the drive is off!), and into your PC using a USB cable.


Confusingly there's some empty ports. If you want, you can hook other things into it for decoration, I guess.

Now we turn on the drive, put in the disk and cross our fingers. From a command line, I type:
d64copy -r 16 8 "jumpman.d64"

This means:

  • Retry bad blocks up to 16 times before giving up.
  • Copy from device #8. Remember, on Commodore, the first floppy drive was always device 8.
  • Copy the contents of the disk to the file called jumpman.d64

A minute later, and it is done. Unfortunately, we have a bad sector. Hopefully it is in an unused part of the disk, or in some graphics that won't cause a crash.

At this point, we can run the game in the Vice emulator (On windows this distribution seems to work).

Running games from SD cards

No serious commodore enthusiast is without his or her SD card reader. I happen to have the uIEC which Jim Brain soldered together right in front of me at the World of Commodore 2011 in Toronto.

It has to plug into the back of the Commodore, presumably for power, and then the drive cable goes into it as well. Then, it is ready to act exactly like a floppy drive.

But how do folders work?

You can just about fit all the Commodore software in existence onto one card, so how do you access it? You can switch disk images by sending special drive commands to it in BASIC. I've placed the jumpman.d64 file on the SD card. Here is the command to switch to it:

OPEN1,8,15:PRINT#1,"CD:JUMPMAN.D64":CLOSE1

There are many other commands for the uIEC. It has some buttons on the circuit board that let you swap disks on the fly without typing commands, but these have to be set up by listing them in a special file. All the other commands are described here.

A major drawback of the uIEC is that while it emulates the standard Commodore drive operation, the commodore 1541 drive was actually another computer that you could load programs on and run. Some programs did this for copy protection or to implement custom fast-loaders. More recently the retro demo-scene uses it for extra storage and computing power. I was disappointed when the Second Reality demo didn't work.

Well, I have Jumpman loaded anyway, so off to an evening of retro fun.

The cheater method

Of course, if you don't want to go to all this trouble, you could obtain just about any software you can think of using pokefinder. Just Bing it.

More...

20 lines of code that will beat A/B testing every time
Posted on: 2012-05-28 21:10:15
Zwibbler is a drop-in solution that lets users draw on your web site.

A/B testing is used far too often, for something that performs so badly. It is defective by design: Segment users into two groups. Show the A group the old, tried and true stuff. Show the B group the new whiz-bang design with the bigger buttons and slightly different copy. After a while, take a look at the stats and figure out which group presses the button more often. Sounds good, right? The problem is staring you in the face. It is the same dilemma faced by researchers administering drug studies. During drug trials, you can only give half the patients the life saving treatment. The others get sugar water. If the treatment works, group B lost out. This sacrifice is made to get good data. But it doesn't have to be this way.

More...

VP trees: A data structure for finding stuff fast
Posted on: 2011-12-02 08:00:00

Let's say you have millions of pictures of faces tagged with names. Given a new photo, how do you find the name of person that the photo most resembles?

Suppose you have scanned short sections of millions of songs, and for each five second period you have a rough list of the frequencies and beat patterns contained in them. Given a new audio snippet, can you find the song to which it belongs?

What if you have data from thousands of web site users, including usage frequency, when they signed up, what actions they took, etc. Given a new user's actions, can you find other users like them and predict whether they will upgrade or stop using your product?

In the cases I mentioned, each record has hundreds or thousands of elements: the pixels in a photo, or patterns in a sound snippet, or web usage data. These records can be regarded as points in high dimensional space. When you look at a points in space, they tend to form clusters, and you can infer a lot by looking at ones nearby.

In this blog entry, I will half-heartedly describe some data structures for spatial search. Then I will launch into a detailed explanation of VP-Trees (Vantage Point Trees), which are simple, fast, and can easily handle low or high dimensional data.

Data structures for spatial search

When a programmer wants to search for points in space, perhaps the the first data structure that springs to mind is the K-D tree. In this structure, we repeatedly subdivide all of the points along a particular dimension to form a tree structure.

With high dimensional data, the benefits of the K-D tree are soon lost. As the number of dimensions increase, the points tend to scatter and it becomes difficult to pick a good splitting dimension. Hundreds of students have gotten their masters degree by coding up K-D trees and comparing them with an alphabet soup of other trees. (In particular, I like this one.)

The authors of Data Mining: Practical machine Learning Tools and Techniques suggests using Ball Trees. Each node of a Ball tree describes a bounding sphere, using a centre and a radius. To make the search efficient, the nodes should use the minimal sphere that completely contains all of its children, and overlaps the least with other sibling spheres in the tree.

Ball trees work, but they are difficult to construct. It is hard to figure out the optimal placement of spheres to minimize the overlap. For high dimensional data, the structure can be huge. The nodes must store their centre, and if a point has thousands of coordinates, it occupies a lot of storage. Moreover, you need to be able to calculate these fake sphere centres from the other points. What, exactly, does it mean to calculate a point between two sets of users' web usage history?

Fortunately, there are methods of building tree structures which do not require manipulation of the individual coordinates. The things that you put in them do not need to resemble points. You only need a way to figure out how far apart they are.

Entering metric space

Image you are blindfolded and placed in a gymnasium filled with other blindfolded people. Even worse: you also lost all sense of direction. When others talk, you can sense how far away they are, but not where they are in the room. Eventually, some basic laws become clear.

  1. If there is no distance between you and the other person, you are standing in the same spot.
  2. When you talk to another person, they perceive you has being the same distance away as you perceive them.
  3. When you talk to person A and person B, the distance to A is always less than the distance to B plus the distance from A to B. In other words, the shortest distance between two people is a straight line. Distance is never negative.

This is a metric space. The great thing about metric spaces is that the things that you put in them do not need to do a lot. All you need is a way of calculating the distances between them. You do not need to be able to add them together or find bounding shapes or find points midway between them. The data structure that I want to talk about is the Vantage Point Tree (a generalization of the BK-tree that is eloquently reviewed in Damn cool algorithms.

Each node of the tree contains one of the input points, and a radius. Under the left child are all points which are closer to the node's point than the radius. The other child contains all of the points which are farther away. The tree requires no other knowledge about the items in it. All you need is a distance function that satisfies the properties of a metric space.

How searching a VP-Tree works

Let us examine one of these nodes in detail, and what happens during a recursive search for the nearest neighbours to a target.

Suppose we want to find the two nearest neighbours to the target, marked with the red X. Since we have no points yet, the node's center p is the closest candidate, and we add it to the list of results. (It might be bumped out later). At the same time, we update our variable tau which tracks the distance of the farthest point that we have in our results.

Then, we have to decide whether to search the left or right child first. We may end up having to search them both, but we would like to avoid that most of the time.

Since the target is closer to the node's center than its outer shell, we search the left child first, which contains all of the points closer than the radius. We find the blue point. Since it is farther away than tau we update the tau value.

Do we need to continue the search? We know that we have considered all the points that are within the distance radius of p. However, it is closer to get to the outer shell than the farthest point that we have found. Therefore there could be closer points just outside of the shell. We do need to descend into the right child to find the green point.

If, however, we had reached our goal of collecting the n nearest points, and the target point is farther from the the outer shell than the farthest point that we have collected, then we could have stopped looking. This results in significant savings.

Implementation

Here is an implementation of the VP Tree in C++. The recursive search() function decides whether to follow the left, right, or both children. To efficiently maintain the list of results, we use a priority queue. (See my article, Finding the top k items in a list efficiently for why).

I tried it out on a database of all the cities in the world, and the VP tree search was 3978 times faster than a linear search through all the points. You can download the C++ program that uses the VP tree for this purpose here.

It is worth repeating that you must use a distance metric that satisfies the triangle inequality. I spent a lot of time wondering why my VP tree was not working. It turns out that I had not bothered to find the square root in the distance calculation. This step is important to satisfy the requirements of a metric space, because if the straight line distance to a <= b+c, it does not necessarily follow that a2 <= b2 + c2.

Here is the output of the program when you search for cities by latitude and longitude.

Create took 15484122
Search took 36
ca,waterloo,Waterloo,08,43.4666667,-80.5333333
 0.0141501
ca,kitchener,Kitchener,08,43.45,-80.5
 0.025264
ca,bridgeport,Bridgeport,08,43.4833333,-80.4833333
 0.0396333
ca,elmira,Elmira,08,43.6,-80.55
 0.137071
ca,baden,Baden,08,43.4,-80.6666667
 0.161756
ca,floradale,Floradale,08,43.6166667,-80.5833333
 0.163351
ca,preston,Preston,08,43.4,-80.35
 0.181762
ca,ayr,Ayr,08,43.2833333,-80.45
 0.195739
---
Linear search took 143212
ca,waterloo,Waterloo,08,43.4666667,-80.5333333
 0.0141501
ca,kitchener,Kitchener,08,43.45,-80.5
 0.025264
ca,bridgeport,Bridgeport,08,43.4833333,-80.4833333
 0.0396333
ca,elmira,Elmira,08,43.6,-80.55
 0.137071
ca,baden,Baden,08,43.4,-80.6666667
 0.161756
ca,floradale,Floradale,08,43.6166667,-80.5833333
 0.163351
ca,preston,Preston,08,43.4,-80.35
 0.181762
ca,ayr,Ayr,08,43.2833333,-80.45
 0.195739

Construction

I'm too lazy to implement a delete or insert function. It is most efficient to simply build the tree by repeatedly partitioning the data. We build the tree from the top down from an array of items. For each node, we first choose a point at random, and then partition the list into two sets: The left children contain the points farther away than the median, and the right contains the points that are closer than the median. Then we recursively repeat this until we have run out of points.
// A VP-Tree implementation, by Steve Hanov. (steve.hanov@gmail.com)
// Released to the Public Domain
// Based on "Data Structures and Algorithms for Nearest Neighbor Search" by Peter N. Yianilos
#include <stdlib.h>
#include <algorithm>
#include <vector>
#include <stdio.h>
#include <queue>
#include <limits>

template<typename T, double (*distance)( const T&, const T& )>
class VpTree
{
public:
    VpTree() : _root(0) {}

    ~VpTree() {
        delete _root;
    }

    void create( const std::vector& items ) {
        delete _root;
        _items = items;
        _root = buildFromPoints(0, items.size());
    }

    void search( const T& target, int k, std::vector* results, 
        std::vector<double>* distances) 
    {
        std::priority_queue<HeapItem> heap;

        _tau = std::numeric_limits::max();
        search( _root, target, k, heap );

        results->clear(); distances->clear();

        while( !heap.empty() ) {
            results->push_back( _items[heap.top().index] );
            distances->push_back( heap.top().dist );
            heap.pop();
        }

        std::reverse( results->begin(), results->end() );
        std::reverse( distances->begin(), distances->end() );
    }

private:
    std::vector<T> _items;
    double _tau;

    struct Node 
    {
        int index;
        double threshold;
        Node* left;
        Node* right;

        Node() :
            index(0), threshold(0.), left(0), right(0) {}

        ~Node() {
            delete left;
            delete right;
        }
    }* _root;

    struct HeapItem {
        HeapItem( int index, double dist) :
            index(index), dist(dist) {}
        int index;
        double dist;
        bool operator<( const HeapItem& o ) const {
            return dist < o.dist;   
        }
    };

    struct DistanceComparator
    {
        const T& item;
        DistanceComparator( const T& item ) : item(item) {}
        bool operator()(const T& a, const T& b) {
            return distance( item, a ) < distance( item, b );
        }
    };

    Node* buildFromPoints( int lower, int upper )
    {
        if ( upper == lower ) {
            return NULL;
        }

        Node* node = new Node();
        node->index = lower;

        if ( upper - lower > 1 ) {

            // choose an arbitrary point and move it to the start
            int i = (int)((double)rand() / RAND_MAX * (upper - lower - 1) ) + lower;
            std::swap( _items[lower], _items[i] );

            int median = ( upper + lower ) / 2;

            // partitian around the median distance
            std::nth_element( 
                _items.begin() + lower + 1, 
                _items.begin() + median,
                _items.begin() + upper,
                DistanceComparator( _items[lower] ));

            // what was the median?
            node->threshold = distance( _items[lower], _items[median] );

            node->index = lower;
            node->left = buildFromPoints( lower + 1, median );
            node->right = buildFromPoints( median, upper );
        }

        return node;
    }

    void search( Node* node, const T& target, int k,
                 std::priority_queue& heap )
    {
        if ( node == NULL ) return;

        double dist = distance( _items[node->index], target );
        //printf("dist=%g tau=%gn", dist, _tau );

        if ( dist < _tau ) {
            if ( heap.size() == k ) heap.pop();
            heap.push( HeapItem(node->index, dist) );
            if ( heap.size() == k ) _tau = heap.top().dist;
        }

        if ( node->left == NULL && node->right == NULL ) {
            return;
        }

        if ( dist < node->threshold ) {
            if ( dist - _tau <= node->threshold ) {
                search( node->left, target, k, heap );
            }

            if ( dist + _tau >= node->threshold ) {
                search( node->right, target, k, heap );
            }

        } else {
            if ( dist + _tau >= node->threshold ) {
                search( node->right, target, k, heap );
            }

            if ( dist - _tau <= node->threshold ) {
                search( node->left, target, k, heap );
            }
        }
    }
};

More...

Why you should go to the Business of Software Conference Next Year
Posted on: 2011-10-29 00:19:36

Most people, having already paid $2000.00 of their hard earned money, and then having flown, driven, or otherwise travelled to Boston to attend a conference, and then having paid an additional $250/night plus $33/night parking and "tourism taxes" to the Seaport Hotel -- most people, after all this, are unlikely to say that it was a waste of time and they should have stayed home watching the remaining salvaged episodes of Doctor Who on Netflix.

In fact, I found it quite useful.

The talks by Clayton Christenson (author of The Innovators Dilemma), Rory Sutherland (expert on Behavioural economics) and the dozens of entrepreneurs (both serial and parallel) were all very fascinating and useful, and they were all broadcast for free, and they will soon be up for streaming, for free.

So why go through all of this effort to physically go to the conference?


One of the conference rooms at Business of Software 2011.

What the the World Trade Center in Boston lacks in number of bathrooms, it more than makes up for in hallways. It has roughly 1000 miles of hallways in which you can bump into successful business people. And every one of them is trying to meet you and get your take on important, urgent business-related matters like, "Have you seen an empty bathroom?"

Seriously, when not at the conference, and people ask what I do, I have learned to say something like "I do computers". People here understand when I talk about NoSQL databases, SaaS models, and programmer development tools. The amount of time until their eyes glaze over is well over the 60 second mark.

You also get some inside info. People aren't shy talking about their pricing. How much does the super-mega-ultra corporate option cost? The one where instead of a price, it says "Call"? These people will tell you, because they don't get to talk about it much, and they are honestly trying to help.

I talked to C.E.O.s, and C.T.Os, of 3 to 30 person companies. I talked to VPs, Cloud Engineers, and Intrapreneurs of big companies. For many, this is the first opportunity to talk to an outsider about their businesses. It is like psychotherapy. Often they would come to a sudden realization. "Hey," a micro-ISV would say, "I just have a fear of releasing the next version because it's missing some difficult features. I should just do it anyway!". If you go to this conference, you probably already know what you should do to improve your business. But having Jason Cohen, or some seasoned CEO tell you in person moves it up onto the todo list.

General trends

  • Disruption - Disruption is big. If you're not disruptive, you might as well be selling mainframes and typewriters. Companies are disrupting each-other at an astounding rate. Sometimes, while one company is busy disrupting an industry, another one will sneak up behind it and try to disrupt it when it is not looking. That is why companies need to be agile and pivot frequently.
  • Metrics - The info-geeks have taken over. Founders are demanding dashboards for their business, updated in real time. But not only for themselves -- every click of the web site, and every cancellation is streamed to every employee to give an accurate picture of the health of the company. A special version containing only the "Customer Happiness Index" and a huge happy face is streamed to the investors.
  • Crowd-sourced employee recognition - At least three companies are working on this. It can be hard for bosses to identify their best contributors to allocate bonuses. The idea is to crowd-source this from their workforce. "So we'll give them a button -- so whenever anybody does something nice, other people will just push it and they get a -- a pony point --- yeah! And then I just have to add them all up to find the best contributors!" If you've worked at a large company for more than a year, you already know what an awesome idea this is. Just rename "pony" to "stab" and invert the score.
  • Skype - Ask anybody, in tiny or large companies. Odds are that they bypass their Enterprise Collabosoft GrouperWare system and secretly use Skype to communicate. Just a minute while I go privately Skype to people about why Microsoft should acquire my startup.
  • Dishonesty - Jason Cohen gave a talk about how honesty in business can differentiate you. If you are a small company, he says, you should not try to hide it. Companies will be refreshed by your truthfulness, and it sets the correct expectations at the outset. Most of the attendees believe honesty is a great idea. Companies should all be honest! Because Jason Cohen says it pays! But if you are in a uniquely special business, such as storing data securely in the cloud, or selling software as a service, or selling licensed software, or you offer a limited or very diverse product line, or you have competition -- in these very special situations, honesty definitely will never work. At least, that's the going opinion.

I hope you're convinced of the value that Business of Software has to offer, and I hope to see you there next year. I should be finished Doctor Who by then.

More...

More Posts

Email
steve.hanov@gmail.com

Other posts by Steve

Yes, You Absolutely Might Possibly Need an EIN to Sell Software to the US How Asana Breaks the Rules About Per-Seat Pricing 5 Ways PowToon Made Me Want to Buy Their Software How I run my business selling software to Americans 0, 1, Many, a Zillion Give your Commodore 64 new life with an SD card reader 20 lines of code that will beat A/B testing every time [comic] Appreciation of xkcd comics vs. technical ability VP trees: A data structure for finding stuff fast Why you should go to the Business of Software Conference Next Year Four ways of handling asynchronous operations in node.js Type-checked CoffeeScript with jzbuild Zero load time file formats Finding the top K items in a list efficiently An instant rhyming dictionary for any web site Succinct Data Structures: Cramming 80,000 words into a Javascript file. Throw away the keys: Easy, Minimal Perfect Hashing Why don't web browsers do this? Fun with Colour Difference Compressing dictionaries with a DAWG Fast and Easy Levenshtein distance using a Trie The Curious Complexity of Being Turned On Cross-domain communication the HTML5 way Five essential steps to prepare for your next programming interview Minimal usable Ubuntu with one command Finding awesome developers in programming interviews Compress your JSON with automatic type extraction JZBUILD - An Easy Javascript Build System Pssst! Want to stream your videos to your iPod? "This is stupid. Your program doesn't work," my wife told me The simple and obvious way to walk through a graph Asking users for steps to reproduce bugs, and other dumb ideas Creating portable binaries on Linux Bending over: How to sell your software to large companies Regular Expression Matching can be Ugly and Slow C++: A language for next generation web apps qb.js: An implementation of QBASIC in Javascript Zwibbler: A simple drawing program using Javascript and Canvas You don't need a project/solution to use the VC++ debugger Boring Date (comic) barcamp (comic) How IE <canvas> tag emulation works I didn't know you could mix and match (comic) Sign here (comic) It's a dirty job... (comic) The PenIsland Problem: Text-to-speech for domain names Pitching to VCs #2 (comic) Building a better rhyming dictionary Does Android team with eccentric geeks? (comic) Comment spam defeated at last Pitching to VCs (comic) How QBASIC almost got me killed Blame the extensions (comic) How to run a linux based home web server Microsoft's generosity knows no end for a year (comic) Using the Acer Aspire One as a web server When programmers design web sites (comic) Finding great ideas for your startup Game Theory, Salary Negotiation, and Programmers Coding tips they don't teach you in school When a reporter mangles your elevator pitch Test Driven Development without Tears Drawing Graphs with Physics Free up disk space in Ubuntu Keeping Abreast of Pornographic Research in Computer Science Exploiting perceptual colour difference for edge detection Experiment: Deleting a post from the Internet Is 2009 the year of Linux malware? Email Etiquette How a programmer reads your resume (comic) How wide should you make your web page? Usability Nightmare: Xfce Settings Manager cairo blur image surface Automatically remove wordiness from your writing Why Perforce is more scalable than Git Optimizing Ubuntu to run from a USB key or SD card UMA Questions Answered Make Windows XP look like Ubuntu, with Spinning Cube Effect See sound without drugs Standby Preventer Stock Picking using Python Spoke.com scam Stackoverflow.com Copy a cairo surface to the windows clipboard Simulating freehand drawing with Cairo Free, Raw Stock Data Installing Ubuntu on the Via Artigo Why are all my lines fuzzy in cairo? A simple command line calculator Tool for Creating UML Sequence Diagrams Exploring sound with Wavelets UMA and free long distance UMA's dirty secrets Installing the Latest Debian on an Ancient Laptop Dissecting Adsense HTML/ Javascript/ CSS Pretty Printer Web Comic Aggregator Experiments in making money online How much cash do celebrities make? Draw waveforms and hear them Cell Phones on Airplanes Detecting C++ memory leaks What does your phone number spell? A Rhyming Engine Rules for Effective C++ Cell Phone Secrets