This past summer, TrueCar undertook a bold experiment. We brought in around 40 bright, motivated interns from top California universities and let them loose on our data and codebases. The hypothesis was that, given unfettered access, a dedicated group of interns could build the foundation for new products at TrueCar. The experiment worked.
I was the lead advisor for a group of six interns interested in computer vision and machine learning: TrueVision. The group started with people of diverse technical backgrounds, from Ph.D. candidates in medical imaging to undergrads just finishing their general coursework. With the summer having drawn to a close, everyone has developed a computer vision product that will directly impact the TrueCar experience. For some extra fun along the way, everyone in the group, including me, learned how to train a convolutional neural network, or CNN.
Prior to the intern program, I worked with several additional TrueCar engineers to plan the future of image processing at TrueCar. Given an image of a car we would like to be able to determine the following:
- Whether the image is an interior or exterior shot
- The location the vehicle is in the image
- The position or angle of the car in the image (e.g., driver side or passenger side)
- The body style (e.g., sedan, coupe, SUV, or truck)
- The color
- And ultimately the make and model
When the summer started, we expected that my group of six interns would be able to accomplish one or two of these items. By the time the summer ended, they had accomplished all of these goals, and more.
These projects brought real intelligence to the TrueCar Image Processor and will have a direct impact on the customer experience. For instance, used car vehicle images are currently displayed in whatever random order the dealer supplies. With the new experience, we will be able to classify and sort the images by type, such as exterior versus interior, or driver side versus passenger. This change will result in a page that provides the customer with a much more uniform and seamless experience. For example, the first view that each customer will see when he views a vehicle will be the driver’s side “triumph” view. Currently, customers search for vehicles on our site by clicking through make/model tabs found on various pages. With these new features, the customer can just take a picture of any car on a dealer lot or on the street and see the TrueCar price curve for that make and model.
There were two keys to the success of the computer vision program. First, we had a group of smart motivated interns. Second, we had a massive imaging dataset (>100 million vehicle images) and most of the images were already tagged (classified by color, body style, make, and model). Where we were missing tags (e.g., interior/exterior), we immediately posted large groups of images to Mechanical Turk for classification by humans. The Turkers classified over 100,000 images in only a couple of hours (at minimal cost to TrueCar). We also found that we could manufacture large tagged datasets by taking advantage of smaller sets of already tagged data. For instance, for angle, we put cutouts of vehicles for which we already knew the position onto a wide variety of backgrounds, thereby increasing the data available for training.
Because the datasets with which we were working were so large, it was easy to have massive numbers of images in the training sets, and still have equivalently large testing sets. In fact, our make/model classifier out-dueled the state of the art academic CNN, based almost entirely on the fact that our training set was more than 100 times larger.
This summer we demonstrated the potential of empowered interns let loose on big data to change a company. We believe that this formula will continue to pay off in future intern classes over the coming years.
— Dr. Jason Melbourne (TrueCar Data Developer)