Tag: data

Dec 26 2014

The Great Boxing Day Data Cleanse of 2014

I just spent around an hour shrinking my Facebook friends list from 1,500+ to 535. I ignored another 2,000 friend requests. I made my entire Facebook feed from the beginning of time private, which eliminated 33,000+ followers (dear Facebook followers – you really meant to follow me on Twitter, that’s where all the public fun is.) I turned off all my email notifications.

Hint – if you want to do stuff like this, use the iOS app instead of the web app – it’s so, so, so much faster. Last night I tried to do this on the Facebook web app in front of the TV. It was a total fail – every few unfriends caused the page to refresh and I had to start scrolling all over again. This morning I was pleasantly surprised with how much better / cleaner / faster it was with the iOS app.

I cleared out all my outstanding LinkedIn friend requests. I’m much more promiscuous there and will accept anyone who either I recognize, writes me a personal note, or seems interesting. I turned off all my email notifications and re-inserted LinkedIn in my Daily browser folder.

I spent some time fixing up all the friend requests in Goodreads. I don’t care who follows me, but I got rid of the folks I follow who I don’t know and focused that list a lot better to see if the feed would be useful going forward.

I just deleted everything off my iPhone that I never use and put the infrequently used stuff in various folders. That took things from eight screens to two. Charm King – how the fuck did you end up on my iPhone?

It will continue. Feedly – clean up feeds and add ones from companies in our portfolio that I haven’t been following. Consolidate all photos and music in one place and make sure they are accessible from all computers. And whatever else I run into.

There’s something very satisfying about the winter cleaning that I seem to do every year.

Jun 9 2013

Does The Government Already Have All Of Our Data?

Near the end of the week last week, the lastest “the US government is spying on US citizens” scandal broke. For 24 hours I tried to ignore it but once big tech companies, specifically Facebook, Google, and Yahoo, started coming out with their denials about being involved in PRISM, I got sucked into all the chatter. I was able to ignore it yesterday because I took a digital sabbath but ended up reading a bunch of stuff about it this morning.

While I’m a strong believer in civil liberties and am opposed to the Patriot Act, I long ago gave up the notion that we have any real data privacy. I’ve regularly fought against attempts at outrageous new laws like SOPA/PIPA but I’m not naive and realize that I’m vastly outgunned by the people who want this kind of stuff. Whenever I get asked if I’ll write huge checks to play big money politics against this stuff, I say no. And recently, I’ve started quoting Elon Musk’s great line at the All Things Digital Conference, “If we give in to that, we’ll get the political system we deserve.”

I read around 50 articles on things this morning. I’m no more clear on what is actually going on as the amount of vagueness, statements covered with legal gunk, illogical statements, and misdirection is extraordinary, even for an issue like this one.

Following are some of the more interesting things I read today.

And I always thought PRISM was about teleportation.

And finally, the Wikipedia article, like all Wikipedia articles, is the definitive source of all PRISM information at this point, at least to the extent that anything around PRISM is accurate.

Aug 21 2010

Using Open Source to Bootstrap Your Data Service

Last week SimpleGeo and their partner Stamen Design jointly released a project they have been working on together called Polymaps.  It’s absolutely beautiful and a stunning example of what you can do with the SimpleGeo API.  They’ve released the Polymaps source code on GitHub so any developer can quickly see how the API is used, play around with real production code, and modify the base examples for their own use.

When I first started program, it was 1979.  I started on an Apple II – I learned BASIC, Pascal, and 6502 Assembler.  I studied every page and example in the Apple II Reference Manual (the “Red Book”).  Whenever I got source code for any application at a user group meeting, I stared at it, played with it, and tried to understand what it was doing.

When I started programming on an IBM PC in 1983, I did exactly the same thing.  I spent a lot of time with Btrieve and there were endless source code examples to build on.  I had a few friends that were also using BASIC + the IBM BASIC Compiler + Btrieve so we shared code (by handing each other floppy disks).  We built libraries that did specific things and as each of us improved them, we shared them back with each other.

In my first company, we were heavy users of Clarion.  While Clarion was compiled, it still came with a solid library of example code, although we quickly built our own libraries that we used throughout the company as we grew.  When I started investing in companies that were building Web apps in 1994, it was once again all HTML / source code and examples everywhere.  My friends at NetGenesis (mostly Raj Bhargava and Eric Richard) wrote one of the first Web programming books – Build a Web Site: The Programmer’s Guide to Creating, Building and Maintaining a Web Presence – I vaguely remember NetGenesis getting paid something like $25,000 (which was a ton of money to them at the time) to write it.

In the last few months, the phrase “data as a service” has started to be popular.  I’m not totally sure I understand what people mean by it and I’ve been involved in several larger discussions about it and even noticed an article today in the New York Times titled “Data on Demand Is an Opportunity.”  I’ve invested in several companies that seem to fit within this categorization, including SimpleGeo, Gnip, and BigDoor, but we don’t really think about them as “data as a service” companies (SimpleGeo and Gnip are in our Glue theme; BigDoor is in our Distribution theme).

When I reflect on all of this, it seems painfully obvious to me (and maybe to you also) that the best way to popularize “data as a service” is to start with an API (which creates the revenue model dynamic) and build a bunch of open source examples on top of it.  Your goal should be to make it as simple as possible for a developer to immediately start using your API in ways relevant to them.  By open sourcing the starting point, you both save an enormous amount of time and give the developers a much more interactive way to learn rather than forcing them to start from scratch and figure out how the API works.

I like how SimpleGeo has done this and realize that this can apply to a bunch of companies we are both investing in and looking at.  I’m not sure that it has anything to do with the construct of “data as a service” (which I expect will quickly turn into DaaS) but it does follow from the long legacy of how people have learned from each other around the creation of software, especially around new platforms.

While we are using SFLA (silly four letter acronyms – we’ve got PaaS, and IaaS, along with our old friend SaaS), any ideas what ZaaS is going to stand for?

Jan 31 2009

Entering Data

I weigh 209.4 this morning.  That’s down from 220 when I Declared A Jihad on My Weight on 10/27/08 although it doesn’t look like I’ll make my Anti-Charity goal of 200 by 1/31/09 (more on that in a post on 2/1/09).

I was thinking about my weight this morning as I entered it into the online system at GoWear.  I thought about it again when I entered it into Gyminee.  And then into Daytum. I’m going for a run in a little while so I’ll enter it again into TrainingPeaks

Here’s what I’m doing:

  1. Go to the appropriate web site.
  2. Choose the appropriate place to enter the data.
  3. Type 209.4 and press return.

Four times.  Every day.  Pretty ridiculous.  If you reduce the data to its core elements, they are:

  1. Web site id [GoWear, Gyminee, Daytum, TrainingPeaks]
  2. User Id (almost always bfeld)
  3. Timestamp (or two fields – date, time) – automatically generated by my computer
  4. Weight

The only actual piece of data that I need to measure is weight.  I measure this by standing on a scale each morning.  The scale is a fancy one – it cost about $100, looks pretty, and has a bunch of extra things I don’t look at such as BMI.  I have an identical scale in my house in Keystone (although the battery is dead and needs to be replaced.)

Some day, in the future, I’ll be able to step on the scale.  And that will be it.  My weight will automatically go into whatever online systems I want it to.  I won’t have to do anything else. 

Of course, one of the assumptions is that my scale(s) are “network compatible”.  While you may joke that this is the age old “connect my refrigerator to the Internet problem” (and it is), I think it’s finally time for this to happen.  As broadband and wifi become increasing ubiquitous and inexpensive, there is no reason that any electronic devices shouldn’t be IP connected, in the same way that microprocessors are now everywhere and pretty much everything has a clock in it (even if some of them still blink 12:00.)

So, accept this assumption.  Then, I’m really only taking about a “Brad-centric” data payload.  While I’ll have a lot more data than simply weight that I might want in my payload, let’s start with the simple case (weight).  Right now, we are living in a system-centric world where data is linked first to a system and then a user.  Specifically, you have to operate in the context of the system to create data for a user.

Why not flip this?  Make things user-centric.  I enter my data (or a machine / device collects my data.)  I can configure my data inputs to feed data into “my data store” (which should live on the web / in the cloud).  Then, systems can grab data from my data store automatically.  All I have to do is “wire them up” which is a UI problem that – if someone is forward thinking enough – could also be solved with a single horizontal system that everyone adopts.

Right now there is a huge amount of activity around the inverse of this problem – taking widely diffuse data and re-aggregating it around a single user id.  This deals with today’s current reality of how data is generated (system-centric) but doesn’t feel sustainable to me as user-generated data continues to proliferate geometrically.

Enough.  As I said in my tweet earlier today, “thinking about data.  thinking about running.  thinking about thinking.”  Time to go run and generate some more data.