As part of YC Hacks this past weekend, I worked on a magic remote app thing with Damon Doucet, David Wetterau, and Predrag Gruevski.
The basic idea is that your phone should be aware of what you’re doing on your computer, and try to help you out through context sensitive / application specific controls.
For example, if you’re watching a movie on Netflix, your phone should automatically bring up a huge play/pause button - but as soon as you switch to Keynote, it should instantly transform into a “next slide” button.
To show our concept, we built these sorts of custom controls for Spotify, Netflix, Reddit, Gmail, Facebook, YouTube, and Keynote.
I love the vision behind Apple’s Handoff - basically that I should be able to do whatever I want on my desktop, walk away, and then use my phone to pick up right where I left off. Unfortunately this only works in a world where you’re only using native apps built for Apple’s ecosystem.
As part of Dropbox Hack Week, I started tinkering with building a similar system using Dropbox’s sync. It’s entirely platform agonistic, so anything I do I on Mac (native, web, etc) can sync to almost any other device - not just iPhones, iPads, but also any android device, as well as other computers.
To achieve this, I wrote a super hacky script that perpetually runs on laptop and saves information about active applications and their state to a Dropbox datastore. This all syncs to a special app on my phone, which can then instantaneously kick me into the appropriate mobile equivalent.
In the video above I demonstrate syncing from Native Mac <-> Native iOS, Web <-> iOS Safari, as well as Web <-> Native iOS.
Came across Chris Harrison’s pseudo-3d video conferencing paper, and decided to recreate the effect just for fun. The basic idea is that you can separate the subject and background into distinct layers and then transform them accordingly in 3D space to create artificial depth. Of course it’s also pretty straightforward to apply any other transformation - for example, here I resize myself as the camera pans.
Rather than incorporating voice control through conversational agents like Siri, I’m been tinkering with building a system that can passively listen to conversations and generate reminders. Simply by talking to a friend and asking “Want to grab coffee tomorrow?”, my phone can note my intent and give me a friendly reminder when I wake up the next day. Similarly, I can tell someone “I’ll pick up milk from Walgreens on my way home”, and have a reminder to “pick up milk” associated with a location called “Walgreens”.
Currently my prototype is able to generate reminders out of (admittedly contrived) conversations - post transcription, it identifies sentence candidates, isolates relevant time and/or location triggers, and then trims the transcription into an appropriate reminder. For now the parsing task is entirely running on my laptop, streaming reminders to the phone interface as they are generated. (Note - in the video I incorrectly call voice-transcription “voice recognition”.)
As usual there are numerous problems with this sort of tool (not to mention my implementation), but I’m excited about a future where computers become a little more helpful by simply observing (and not changing) my existing behavior.
Currently working on a prototype that needs to be able to identify object boundaries. While initially I assumed clustering by color would be sufficient, this quickly fell apart as soon as I introduced a complex multicolored object (ex: one with highly reflective surfaces, as shown above).
Instead of trying to distill object boundaries from a single frame, a relatively inexpensive way to identify distinct objects is to track clusters that always move together, and then merge them. For my prototype, I’m merging clusters with moving centroids that always are within a certain distance of each other (of course with thresholds). Measuring distances between centroids - instead of determining some 2D/3D transformation - is relatively inexpensive, and seems to work well enough when objects are simply moving in 2D.
I aimed my webcam at my desk and tried to see if there was anything interesting I could do given a live video of my sketchbook.
While this demonstration frankly isn’t particularly useful or practical, it was really refreshing to try making something that requires physical manipulation (over a mouse or touchscreen). Tangible interfaces are fascinating, and I’m excited to see what the future holds.
Some other ideas I considered included automatically digitizing my sketchbook (presumably for backup and search) and versioning (to see how a doodle evolved).
Lately more and more of my prototypes have involved some sort of user specific machine learning, so I’ve been tinkering with ways to make this a little easier for myself to implement.
I ended up building a very rough Naive Bayes classifier on top of the Dropbox Datastore API, making it fairly straightforward to give any user a classifier that’s specific to them. Check it out at https://github.com/ryhan/automator.
Down the road, it would be really neat if a classifier could be shared between applications (for example, so that your thermostat’s knowledge about your temperature preferences could be carried over to your car).
To ensure that it’s somewhat reasonable to use, I also put together a sample news reading app that pulls articles from a bunch of feeds, lets the user specify which articles they like, and then lets my code automatically group and highlight the best articles.