Project Title: Image Captioning

Project Idea

We want to train the machine to detect details from an image (such as dog and toy) and then output a sentence to describe the image.

For example, for the following image, the model should detect a brown dog and a blue and yellow toy. And it can output a sentence like “A dog is trying to catch a blue and yellow toy.”

Dataset Details

We will use Flickr8K, Flickr30K, and COCO.

Software

Amazon Web Services

FloydHub

TensorFlow

Papers

Deep Visual-Semantic Alignments for Generating Image Descriptions

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Team Members

Yizhen Chen

Tiancheng Luo

Progress Milestones

March 8th: Project Proposal Due

March 11th - March 17th: Read papers and prepare the data.

March 18th - March 24th: Design the model.

March 25th - March 31st: Train the model to detect details from the images.

April 5th: Progress Report Due

April 1st - April 7th: Improve the model. Prepare the progress report.

April 8th - April 14th: Design and train the model to combine details into sentence.

April 15th - April 21st: Improve the model.

April 22nd - April 28th: Improve the model.

April 29th - May 3rd: Finalize the project and prepare the final report

May 3rd: Final Report Due

Progress Report

Click here to view our progress report.