Bitsy AI Labs

3 Key Ingredients for Embedded Computer Vision Apps

Leigh Johnson — Tue, 24 Jan 2023 02:13:25 GMT

I'll cover the key takeaways I learned building PrintNanny.ai, a monitoring/automation system for 3D printer farms.

Introduction

👋 Hi, my name is Leigh and I'm the founder of PrintNanny.ai. I'm currently building PrintNanny OS, a Linux distribution focused on the automation and monitoring of 3D printer farms.

I'm going to walk you through the architecture of PrintNanny's failure detection system, which uses computer vision to monitor the health of a print job. The entire system runs offline on a Raspberry Pi with a camera because I designed PrintNanny with privacy and reliability in mind.

Running a modern computer vision stack on a device as tiny as a Raspberry Pi requires careful optimizations.

Here's a run-down of my secrets, so you can go out and build cool CV applications.

PrintNanny.ai continuously monitors 3D print jobs and uses time-series data to calculate a health score.

1. Stick with Frameworks

I've been building embedded computer vision and machine learning applications for the past eight years. Around 2015, wrangling a state-of-the-art model often required writing an application harness in C/C++ to load data into your model and make decisions based on the model's predictions.

Frameworks like TensorFlow Lite (2017) provide an easier path, allowing you to train a neural network using TensorFlow's GPU/TPU accelerated operations and export the model to a format (.tflite) that can be used on a mobile/edge device.

In 2023, PrintNanny's failure detection stack depends on a few frameworks. These frameworks allow me to stand on the shoulders of giants and are essential to accomplishing so much as a solo developer/entrepreneur.

PrintNanny's CV/ML tech stack

TensorFlow Lite

TensorFlow Lite is a set of tools that enables on-device machine learning on mobile, embedded, and edge devices (like Raspberry Pi).

Gstreamer

Gstreamer is an open-source multimedia framework with a robust plugin. Gstreamer is written in C, but provides bindings in Go, Python, Rust, C++, C#.

PrintNanny depends on Gstreamer's ecosystem to implement:

Sliding window aggregate operations, like calculating a histogram and standard deviation of observations in a 60-second lookback window. Implemented as a Gstreamer plugin element.
Interprocess communication between pipelines, using gst-interpipe.
Control audio/video streaming using TCP messages with Gstreamer Daemon (GstD).

Nnstreamer

Nnstreamer provides a set of Gstreamer plugins compatible with neutral network frameworks like Tensorflow, Tensorflow-lite, Caffe2, PyTorch, OpenVINO, ARMNN, and NEURUN.

2. Build Single-Purpose Pipelines

Early iterations of PrintNanny's detection system were implemented as a single large Gstreamer pipeline, which performed the following tasks:

Buffer raw camera data
Encode camera data with an H264 codec, then divide the encoded data into Real-time Transport Protocol (RTP) packets, write RTP packets to a UDP socket.
Re-encode camera data to an RGB pixel format, quantize each frame into 3 channels of 8-bit integers, then feed normalized frames into an object detection model (TensorFlow Lite).
Aggregate TensorFlow Lite inference results over a sliding 30-60 second window.
Draw a bounding box overlay from TensorFlow Lite inference results, encode with H264 codec, and synchronize RTP packetization with real-time camera stream
Write a JPEG-encoded snapshot to disk every n seconds.

tee elements were used to split the pipeline stream into branches. Unfortunately, this meant that a performance issue ANYWHERE in the pipeline could cause bottlenecks and blockages for the entire system.

PrintNanny.ai's early implementations crammed all functionality into one large Gstreamer pipeline.

Today, PrintNanny's pipelines have been broken into single-purpose segments using the gst-interpipe plugin. Gst-interpipe provides inter-process communication between Gstreamer pipelines, allowing buffers and events to flow between two or more independent pipelines.

PrintNanny runs up to 8 single-purpose pipelines, which can be debugged and tuned independent of each other.

3. Put Commoditized ML into Production ASAP

Before I started training bespoke neural networks for PrintNanny.ai, I spent $200 to train a Google AutoML Vision model using data I scraped from YouTube.

This model was used in production for 1.5 years, while I built and validated other parts of PrintNanny (like the core operating system, PrintNanny OS).

Check out my talk at TensorFlow Everywhere North America (2021) for a deep dive into the prototype process for PrintNanny's detection system. The tl;dr is:

Scrape existing timelapse data from YouTube

Before PrintNanny had hundreds of users, I had to scrape YouTube for training data.

2. Train a baseline model using AutoML Vision Edge.

I hand-labeled the first few hundred frames using Microsoft's Visual Object Tracking Tool, then trained a guidance model to automate labeling the next few thousand frames.

I used Microsoft's' Visual Object Tracking Tool to handle-label frames.

3. Evaluate the baseline model's training performance vs. real-world performance.

For just over $200 USD, I was able to train a baseline model for PrintNanny, which was used in production for more than a year.

In 2023 and beyond, I strongly recommend validating your idea with an off-the-shelf model or commodity machine learning service before dedicating resources to customization/optimization.

Thank you for Reading!

Work @ PrintNanny.ai

If you enjoyed this post and want to work with me, I'm hiring a product engineer comfortable with Rust/Python/Typescript (or eager to learn). No machine learning or embedded application experience is required. Remote ok.

Google supported this work by providing Google Cloud credit.

PrintNanny Newsletter [01.2023]

Leigh Johnson — Sat, 07 Jan 2023 01:31:57 GMT

v0.5.x is live, reflecting on 2022, and looking forward to 2023.

tl;dr:

Full post moved to https://printnanny.ai/blog/printnanny-newsletter-jan-2023/

I needed a sold-out SDWire board, so I learned how to fab PCBs

Leigh Johnson — Thu, 17 Nov 2022 05:43:18 GMT

👋 My name is Leigh, and I'm the founder of PrintNanny.ai.

I desperately needed a Tizen SDWire board to automate smoke tests for PrintNanny OS, a Linux distribution I created to manage 3D printers using a Raspberry Pi. The SDWire board would allow me to re-image the Raspberry Pi's SD card (without physically removing the SD card from the Pi).

Unfortunately, SDWire boards were completely sold-out. Everywhere! I was willing to pay a huge markup for just one of these gosh-darn boards ... but no one could sell me one.

Full post moved to https://printnanny.ai/blog/i-needed-a-sold-out-sdwire-board-so-i-learned-how-to-fab-pcbs/

Open-Sourcing my Solo Founder Application to YC Winter 23 (PrintNanny.ai)

Leigh Johnson — Mon, 12 Sep 2022 02:51:40 GMT

Have you ever stayed up all night, revising a college application/essay right before the deadline? I didn't: I dropped out as a freshman (a story for another time). Writing an application loaded with this much upside is a new experience for me!

Maybe that's why I'm obsessing over my Ycombinator application this weekend. I read that YC gets 15,000 - 20,000 applications per batch. The first round of reviewers might read 200 applications/day (oof). Most applications are discarded in 30 seconds (double-oof).

What should I change to keep my YC app from being thrown out?

I would appreciate your feedback, no matter who you are! I've enabled comments on my draft - if you have 5 minutes to skim. If you prefer email, send your thoughts to leigh+yc23@printnanny.ai

[Leigh Johnson, PrintNanny.ai] YC Application Review

Company Company name: PrintNanny Company URL: https://printnanny.ai/ What is your company going to make? Please describe your product and what it does or will do. PrintNanny is like having a personal assistant for a 3D printing business. There are 600,000 small manufacturers in the United States...

Google Docs

Soft-launching an AI/ML Product as a Solo Founder

Leigh Johnson — Mon, 19 Apr 2021 06:09:46 GMT

Technical deep dive into how I built Print Nanny, which uses computer vision to automatically detect 3D printing failures. I’ll cover each development phase: from minimum viable prototype to scaling up to meet customer demand.

Launching an AI/ML-powered product as a solo founder is a risky bet. Here’s how I made the most of my limited time by defining a winning ML strategy and leveraging the right Google Cloud Platform services at each stage.

For my birthday last spring, I bought myself what every gal needs: a fused filament fabrication system (AKA a 3D printer). I assembled the printer, carefully ran the calibration routines, and broke out my calipers to inspect the test prints.

Most of my prints were flawless! Occasionally though, a print would fail spectacularly. I setup OctoPrint, which is an open-source web interface for 3D printers. I often peeked at the live camera feed, wondering if this was a small taste of what new parents felt for high-tech baby monitors.

Surely there must be a better way to automatically monitor print health?

Images by Author. Every gal needs: good calipers and a fused filament deposition system. Most of my first prints came out flawless, with a few obvious exceptions.

Machine Learning Strategy

Developing an AI/ML-backed product is an expensive proposition — with a high risk of failure before realizing any return on investment.

According to a global study conducted by Rackspace in January 2021, 87% of data science projects never make it into production.

In their efforts to overcome the odds, surveyed companies spent on average $1.06M on their machine learning initiatives. As a solo founder, I wasn’t prepared to spend more than a million dollars on my idea!

Luckily, I’m part of Google’s Developer Expert program — which is full of folks who love sharing their knowledge with the world. Going into this project, I had a good idea of how I could build a prototype quickly and inexpensively by outsourcing the “undifferentiated heavy lifting” to the right cloud services.

What makes Machine Learning a risky bet?

Among the 1,870 Rackspace study participants, here’s a summary of the current use, future plans, and reasons for failure reported among AI/ML projects.

A few common themes are apparent among the challenges and barriers:

All things data: poor data quality, inaccessible data, lacking data stewardship, and inability to structure/integrate data in a meaningful way.
Expertise and skilled talent are in short supply. The good news is: you can develop skills and intuition as you go. Everything I cover in this article can be achieved without an advanced degree!
Lack of infrastructure to support AI/ML. I’ll show you how to progressively build data and ML infrastructure from scratch.
Challenges in measuring the business value of AI/ML.

I’ll show you how I beat the odds (less than 1:10 chance of success) with a rapid iteration plan, made possible by leveraging the right technology and product choices at each maturity stage of an ML-powered product.

I’ll also demonstrate how I did this without sacrificing the level of scientific rigor and real-world result quality most companies spend millions attempting to achieve — all at a fraction of the cost!

Let’s dive right in. 🤓

Image credit: Are Organizations Succeeding at AI and Machine Learning? 87% of data science projects never make it into production.

Define Problems & Opportunities

What’s the problem? 3D printers are so gosh-darn unreliable.

That’s because 3D printer technology is still not yet mature! A few years ago, this technology was the domain of hobbyists. Industrial usage was limited to quick prototyping before committing to a more reliable manufacturing process.

Today, 3D printing is seeing more use in small-scale manufacturing. This trend was accelerated when existing manufacturing supply lines evaporated because of the COVID-19 pandemic.

An unreliable tool is a nuisance for a hobbyist but a potential liability for a small-scale manufacturing business!

In the following section, I’ll outline the problems in this space and underscore the opportunities to solve these with an AI/ML product.

Image Credit: Prusa 3D Print Farm for Nerf Mods (left), Prusa MK3S Printer (right)

Problem: 3D Printers are Unreliable 🤨

What is it that makes 3D printers so unreliable?

Print jobs take hours (sometimes days) to complete and can fail at any moment. Requires near-constant human monitoring.
Lacks closed-loop control of reliable industrial fabrication processes
The most common form of 3D printing involves heating material until molten at 190°C — 220°C. This is a fire hazard if left unattended!

Human error is a factor as well.

The instructions read by 3D printers are created from hand-configured settings, using an application known as a “slicer.” Developing an intuition for the right settings takes time and patience.

Even after I decided to prototype a failure detector, the strategic line of thinking didn’t end there. I identified additional value propositions to test by with quick mock-ups, descriptive copy-writing, and surveys.

The dizzying number of hand-configured settings used by "slicer" software, which converts a 3D model to g-code. G-code instructions are widely used by CNC and other computer-aided manufacturing tools.

Problem: the Internet is Unreliable! 😱

Most small-scale manufacturers are operating from:

Home (basement, garage, storage shed)
Warehouse or industrial space

In many parts of the world, a constant upload stream from multiple cameras can saturate an internet connection — it might not be possible or economical to maintain this.

To ensure Print Nanny would still function offline, I prioritized on-device inference in the prototype. This allowed me to test the idea with zero model-serving infrastructure and leverage my research in computer vision for small devices.

I didn't want Print Nanny to join this club.

Problem: Even the Failures are Unreliable 🤪

No two 3D printing failures look alike! There are a number of reasons for this, including:

Printer hardware is assembled by hand. The same printer model can produce wildly different results! Expert calibration is required to print uniform batches.
Most 3D printing materials are hygroscopic (water-absorbing), resulting in batch variation and defects as volatile as the weather! 🌧️
Many settings for slicing a 3D model into X-Y-Z movement instructions. Choosing the right settings requires trial and error to achieve the best results.

My machine learning strategy would need to allow for fast iteration and continuous improvement to build customer trust — it didn’t matter if the first prototype was not great, as long as subsequent ones showed improvement.

At all costs, to stay nimble, I’d need to avoid tactics that saddled me with a specific category of machine learning technical debt:

Changing Anything Changes Everything (CACE)

CACE is a principle proposed in Hidden Technical Debt in Machine Learning Systems, referring to the entanglement of machine learning outcomes in a system.

For instance, consider a system that uses features x1, ...xn in a model. If we change the input distribution of values in x1, the importance, weights, or use of the remaining n − 1 features may all change. This is true whether the model is retrained fully in a batch style or allowed to adapt in an online fashion. Adding a new feature xn+1 can cause similar changes, as can removing any feature xj. No inputs are ever really independent.

Zheng recently made a compelling comparison of the state ML abstractions to the state of database technology [17], making the point that nothing in the machine learning literature comes close to the success of the relational database as a basic abstraction.

Hidden Technical Debt in Machine Learning Systems

I’ll explain how I side-stepped CACE and a few other common pitfalls like unstable data dependencies in the next section. I’ll also reflect on the abstractions that saved me time and the ones which were a waste of time.

Image credit: 3D-Printed Mistakes Are Inspiring a New Kind of Glitch Art

Prototype the Path

As part of developing a solid product and machine learning strategy, I “scoped the prototype down to a skateboard.” I use this saying to describe the simplest form of transportation from Point A (the problem) to Point B (where the customer wants to go).

I’ve seen machine learning projects fail by buying into / building solutions that are fully formed, like the car in the picture below. Not only is the feedback received on early iterations useless, but cancellation is also a risk before the full production run.

Image credit: mlsdev.com

Minimum Awesome Product

Instead of building a fully-featured web app, I developed the prototype as a plugin for OctoPrint. OctoPrint provides a web interface for 3D printers, web camera controls, and a thriving community.

I distilled the minimum awesome product down to the following:

Train model to detect the following object labels:
{print, raft, spaghetti, adhesion, nozzle}
Deploy predictor code to Raspberry Pi via OctoPrint plugin
Calculate health score trends
Automatically stop unhealthy print jobs.
Provide feedback about Print Nanny’s decisions 👍👎

Images by Author. Click to view full-size.

Raw Dataset

Acquiring quality labeled data is the hardest startup cost to any machine learning project! In my case, enthusiasts often upload timelapse videos to YouTube.

Youtube-dl to download 3D print timelapse videos (pay attention to the license!)
Visual Object Tagging Tool to annotate images with bounding boxes.

Bonus efficiency unlocked: I figured out how to automatically suggest bounding boxes in the Virtual Object Tagging Tool using a TensorFlow.js model (exported from AutoML Vision).

I explain how to do this in Automate Image Annotation on a Small Budget. 🧠

Adjusting the guidance model’s suggestions increased the number of images labeled per hour by a factor of 10, compared to drawing each box by hand.

Prepare Data for Cloud AutoML

I often use Google AutoML products during the prototype phase.

These products are marketed towards folks with limited knowledge of machine learning, so experienced data scientists might not be familiar with this product line.

Why would anyone pay for AutoML if they’re perfectly capable of training a machine learning model on their own?

Here’s why I tend to start with AutoML for every prototype:

Upfront and fixed cost, which is way less expensive than "hiring myself"

Here's how much I paid to train Print Nanny's baseline model.

I spent more on a two-year .ai domain lease ($149), to give you a cost anchor.

Return on investment is easy to realize! In my experience, algorithm / model performance is just one of many contributing factors to data product success. Discovering the other factors before committing expert ML resources is a critical part of picking winning projects.
Fast results - I had a production-ready model in under 24 hours, optimized and quantized for performance on an edge or mobile device.

Images by Author. The label distribution in my initial dataset (left). Google AutoML provides a GUI for exploring labeled data (right).

Besides Cloud AutoML Vision, Google provides AutoML services for:

Tables - a battery of modeling techniques (linear, gradient boosted trees, neural networks, ensembles) with automated feature engineering.

Translation - train custom translation models.

Video Intelligence - classify video frames and segment by labels.

Natural Language

Classification - predict a category/label
Entity Extraction - extract data from invoices, restaurant menus, tax documents, business cards, resumes, and other structured documents.
Sentiment Analysis - identify prevailing emotional opinion

Leveraging AutoML Natural Language Classification for scholarship discovery (acquired 2018). Image by Author.

Train Baseline Model

Cloud AutoML Vision Edge trains a TensorFlow model optimized for edge / mobile devices. Under the hood, AutoML's architecture & parameter search uses reinforcement learning to find the ideal trade-off between speed and accuracy.

Check out MnasNet: Towards Automating the Design of Mobile Machine Learning Models and MnasNet: Platform-Aware Neural Architecture Search for Mobile if you'd like to learn more about the inner workings of Cloud AutoML Vision!

Images by Author. Google AutoML Vision Edge incorporates speed/accuracy preferences in the reinforcement learning policy for neural network architecture search (left). Evaluating the baseline model (right).

Baseline Model Metrics

You can fetch AutoML model evaluation metrics via API, which is a handy way to compare candidate models against the baseline. Check out this gist to see my full example.

from google.cloud import automl
from google.protobuf.json_format import MessageToDict
import pandas as pd

project_id = "your-project-id"
model_id = "your-automl-model-id"

# Initialize AutoMl API Client
client = automl.AutoMlClient()

# Get the full path of the model
model_full_id = client.model_path(project_id, "us-central1", model_id)

# Get all evaluation metrics for model
eval_metrics = client.list_model_evaluations(parent=model_full_id, filter="")

# Deserialize from protobuf to dict
eval_metrics = [MessageToDict(e._pb) for e in eval_metrics ]

# Initialize a Pandas DataFrame
df = pd.DataFrame(eval_metrics)

Engineer Improvable Outcomes

You might remember that my baseline model had a recall rate of 75% at a 0.5 confidence and IoU threshold. In other words, my model failed to identify roughly 1/4 objects in the test set. The real-world performance was even worse! 😬

Luckily, offloading baseline model training to AutoML gave me time to think deeply about a winning strategy for continuous model improvement. After an initial brainstorm, I reduced my options to just 2 (very different) strategies.

Option #1 - Binary Classifier

The first option is train a binary classifier to predict whether or not a print is failing at any single point in time: predict { fail, not fail }.

The decision to alert is based on the confidence score of the prediction. 🔔

Option #2 - Time Series Ensemble

The second option trains a multi-label object detector on a mix of positive, negative, and neural labels like { print, nozzle, blister }.

Then, a weighted health score is calculated from the confidence of each detected object.

Finally, a polynomial (trend line) is fit on a time-series view of health scores.

The decision to alert is based on the direction of the polynomial's slope and distance from the intercepts. 🔔

Binary classification is considered the "hello world" of computer vision, with ample examples using the MNIST dataset (classifying hand-written digits). My personal favorite is fashion mnist, which you can noodle around with in a Colab notebook.

Unable to resist some good science, I hypothesized most data scientists and machine learning engineers would choose to implement a binary classifier first.

I'm developing a brand-new data product. To get over the initial production hump, I want to deploy a model right away and begin measuring / improving performance against real-world data.

If I'm optimizing for rapid iteration and improvement, which approach should I take?

Images by Author. Binary classifiers with complex outcomes can be difficult to explain and interpret (left). Decision-making ensembles or stacked models enable introspection of learned/encoded information per component. (right)

Most** data scientists and machine learning engineers voted for option #1! **Study awaiting peer review and pending decision 🤣

Against the wisdom of the crowd: why did I choose to implement option 2 as part of my winning machine learning strategy?

Option 1: Changing Anything Changes Everything (CACE)

To recap, a binary classifier predicts whether or not a print is failing at any single point in time: predict { fail, not fail }.

The model learns to encode a complex outcome, which has many determinants in the training set (and many more not yet seen by the model).

To improve this model, there are only few levers I can pull:

Sample / label weight
Optimizer hyper parameters, like learning rate
Add synthetic and/or augmented data

Changing any of the above changes all decision outcomes!

Option 2 - Holistic understanding of the data

The second option has more moving pieces, but also more opportunities to introspect the data and each modeled outcome.

The components are:

Multi-label object detector (MnasNet or MobileNet + Single-shot Detector).
Health score, weighted sum of detector confidence.
Fit polynomial (trend line) over a health score time-series.

Instead of answering one complex question, this approach builds an algorithm from a series of simple questions:

What objects are in this image frame?
Where are the objects located?
How does confidence for "defect" labels compare to neutral or positive labels? This metric is a proxy for print health.
Where is the point of no return for a failing print job?
Is health score holding constant? Increasing? Decreasing? How fast (slope) and when did this change start occurring? (y-intercept).
How does sampling frequency impact the accuracy of the ensemble? Can I get away with sending fewer image frames over the wire?

Each component can be interpreted, studied, and improved independently - and doing so helps me gain a holistic understanding of the data and draw additional conclusions about problems relevant to my business.

The additional information is invaluable on my mission to continuously improve and build trust in the outcomes of my model.

Images by Author. Detection confidence broken out by label (left). Heath score time series (middle). Polynomial fit over cumulative health score series (right).

Deploy the Prototype

The first prototype of Print Nanny launched after less than 2 weeks of development, and saw worldwide adoption within days. 🤯

I used the following tools, tech stack, and services to deploy a proof of concept:

Django, Django Rest Framework, and Cookiecutter Django to create web application managing closed Beta signups and invitations.
Hyper Bootstrap theme for a landing page and UI elements.
Google Kubernetes Engine to host the webapp.
Cloud SQL for PostgreSQL for a database with automatic backups.
Google Memorystore (Redis) for Django's build-in cache.
Google Cloud AutoML Vision for Edge to train a low-cost and low-effort computer vision model optimized for mobile devices.
TensorFlow Lite model deployed to a Raspberry Pi, packaged as a plugin for OctoPrint.

Tip: learning a web application framework will enable you test your ideas in front of your target audience. Two Scoops of Django by Daniel Feldroy & Audrey Foldroy is a no-nonsense guide to the Django framework. 💜

Green-lighting the Next Phase

After two weeks (and for a few hundred bucks), I was able to put my prototype in front of an audience and begin collecting feedback. A few vanity metrics:

3.7k landing page hits
2k closed Beta signups
200 invitations sent (first cohort)
100 avg. daily active users - wow!

Besides the core failure detection system, I tested extremely rough mockups to learn more about the features that resonated with my audience.

Refine Results, Add Value

The next section focuses on building trust in machine learning outcomes by refining the model's results and handling edge cases.

Build for Continuous Improvement

Two feedback mechanisms are used to flag opportunities for learning:

When a failure is detected: "Did Print Nanny make a good call?" 👍 👎
Any video can be flagged for additional review 🚩

At this phase, I'm sending flagged videos off to a queue to be manually annotated and incorporated into a versioned training dataset.

I keep track of aggregate statistics about the dataset overall and flagged examples.

Mean, median, mode, standard deviation of RGB channels
Subjective brightness (also called relative luminance)
Mean average precision with respect to intersection over union, with breakouts per label and among boxes with small dimensions (1/10 total area) vs large boxes (1/4 total area).
mAP iou = 0.5, mAP iou = 0.75, mAP small/large

This lets me understand if my model is under-performing for certain lighting conditions, filament colors, and bounding box size - broken down per label.

Restrict Area of Interest

Soon after I deployed the prototype, I realized value from the flexibility of my failure detection ensemble. When I developed the model, I hadn't considered that most 3D printers are built with 3D-printed components. 🤦‍♀️

I added the ability to select an area of interest and excluded objects outside from health score calculations.

"You're not wrong, but you're definitely not right either." Image by Author.

Ingest Telemetry Data

How did I go from on-device predictions to versioned datasets organized neatly in Google Cloud Storage?

At first, I handled telemetry events with a REST API. It turns out, REST is not that great for a continuous stream of events over a very-unreliable connection. Luckily, there is a protocol designed for IoT use-cases called MQTT.

I used Cloud IoT Core to manage Raspberry Pi device identity and establish two-way device communication using MQTT message protocol.

MQTT describes three Quality of Service (QoS) levels:

Message delivered at most once - QoS 0
Message delivered at least once - QoS 1
Message delivered exactly once - QoS 2

Note: Cloud IoT does not support QoS 2!

Besides whisking away the complexity of managing MQTT infrastructure (like load-balancing and horizontal scaling), Cloud IoT Core provides even more value:

Device Registry (database of device metadata and fingerprints).
JSON Web Token (JWT) authentication with HMAC signing.
MQTT messages automatically republished to Pub/Sub topics.

Images by Author. Registering a Raspberry Pi running OctoPrint (left). Creating a key pair and CloudIoT Device (middle). CloudIoT's Device Registry (right).

After device telemetry messages reach Cloud Pub / Sub, integration into many other cloud components and services is possible.

Image Credit: What is Pub/Sub?

Data Pipelines & Data Lake

Print Nanny's data pipelines are written with Apache Beam, which supports writing both streaming and batch processing jobs. Beam is a high-level programming model, with SDKs implemented in Java (best), Python (getting there), and Go (alpha). The Beam API is used to build a portable graph of parallel tasks.

There are many execution engines for Beam (called runners). Cloud Dataflow is a managed runner, with automatic horizontal auto-scaling. I develop pipelines with Beam's bundled DirectRunner locally, then push a container image for use with Cloud Dataflow.

I could easily go on for a whole blog post (or even series) about getting started with Apache Beam! Comment below if you'd be interested in reading more about writing machine learning pipelines with TensorFlow Extended (TFX) and Apache Beam.

Pipelines flow into a Data Lake

My first pipeline (left) is quite large - that's ok! I'll end up breaking it apart into more common components as needed.

The key functionality of this pipeline:

Reading device telemetry data from Pub/Sub
Windowing and enriching data stream with device calibration and other metadata
Packing TF Records and Parquet tables
Writing raw inputs and windowed views to Google Cloud Storage
Maintaining aggregate metrics across windowed views of the data stream.

On the right is a simpler pipeline, which renders .jpg files into an .mp4 video.

Image by Author. A large Apache Beam application sinks windowed data views into Google Cloud Storage, viewed in Google Cloud DataFlow's job graph (left). A smaller application render a video from JPEG frames. (right).

Additional AutoML Model Training

When scaling up a prototype, I try to get a sense for the qualitative impact of performance improvement from the perspective of my customers. Without this anchor, it's easy to fall into the trap of early optimization.

In my experience, improving a performance indicator is the easy part - the hard part is understanding:

Is the performance indicator / metric a good proxy for actual or perceived value?
If not... why does this metric not reflect real-world value? Can I formulate and study a more expressive metric?
When will I see diminishing returns for the time invested?

I leveraged Cloud AutoML Vision again here, this time training on a blended dataset from the beta cohort and YouTube.

At this stage, I manually analyzed slices of the data to understand if the system underperforms on certain printer models or material types.

Image by Author. Preferred types of filament (left). Preferred 3D printer brands (right).

Training an Object Detector

Finally! This is where the real machine learning begins, right?

Source: Hidden Technical Debt in Machine Learning Systems

TensorFlow Model Garden

I started from TensorFlow's Model Garden. This repo contains reference implementations for many state-of-the-art architectures, as well as a few best practices for running training jobs.

The repo is divided into collections, based on the level of stability and support:

Official. Officially maintained, supported, and kept up to date with the latest TensorFlow 2 APIs by TensorFlow, optimized for readability and performance
Research. Implemented and supported by researchers, TensorFlow 1 and 2.
Community. Curated list of external Github repositories.

FYI: TensorFlow's official framework for vision is undergoing an upgrade!

The examples below refer to the Object Detection API, which is a framework in the research collection.

Object Detection API

TensorFlow 2 Detection Model Zoo is a collection of pre-trained models (COCO2017 dataset). The weights are a helpful initialization point, even if your problem is outside of the domain covered by COCO

My preferred architecture for on-device inference with Raspberry Pi is a MobileNet feature extractor / backbone with a Single-Shot Detector head. The ops used in these are compatible with the TensorFlow Lite runtime.

Check out the Quick Start guide if you'd like to see examples in action.

The Object Detection API uses protobuf files to configure training and evaluation (pictured below). Besides hyper-parameters, the config allows you to specify data augmentation operations (highlighted below). TFRecords are expected as input.

Image by Author. Example Object Detection API training pipeline config

Logging parameters, metrics, artifacts with MLFlow

Do your experiments and lab notes look like this hot mess of a notebook? This is what my experiment tracking looked like when I ported AnyNet from Facebook Research's model zoo to TensorFlow 2 / Keras.

Image by Author. Notebook full of screenshots, metrics, manual experiment tracking.

I started using MLFlow to log model hyper-parameters, code versions, metrics, and store training artifacts in a central repository. it changed my life!

Comment below if you'd like to read a deep dive into my workflow for experiment tracking and training computer vision models!

Thank you for reading! 🌻

There is no one-size-fits-all guide to building a successful machine learning system - not everything that worked for me is guaranteed to work for you.

My hope is that by explaining my decision-making process, I demonstrate where these foundational skills support a successful AI/ML product strategy.

Data architecture - everything related to the movement, transformation, storage, and availability of data
Software engineering
Infrastructure and cloud engineering
Statistical modeling

Are you interested in becoming a Print Nanny beta tester?

Click here to request a beta invite

Looking for more hands-on examples of Machine Learning for Raspberry Pi and other small devices? Sign up for my newsletter to receive new tutorials and deep-dives straight to your inbox.

Google supported this work by providing Google Cloud credit

3 Ways to Install TensorFlow 2 on Raspberry Pi

Leigh Johnson — Sun, 22 Nov 2020 01:36:31 GMT

With the new Raspberry Pi 400 (image credit: raspberrypi.org) shipping worldwide, you might be wondering: can this little powerhouse board be used for Machine Learning?

The answer is, yes! TensorFlow Lite models running on Raspberry Pi 4 boards can achieve performance comparable to NVIDIA's Jetson Nano board. If you add a The Coral Edge TPU USB Accelerator, you can achieve real-time performance with state-of-the-art neural network architectures like MobileNetV2.

This performance boost unlocks interesting offline TensorFlow applications, like detecting and tracking a moving object. Offline inference is done entirely on the Raspberry Pi.

Image Credit: Benchmarking TensorFlow Lite on the New Raspberry Pi 4, Model B by Alasdair Allan

Installation is Half the Battle 😠

Cool! So you've decided to build a TensorFlow application for your Raspberry Pi. The first thing you might try is...

$ pip install --no-cache-dir tensorflow
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting tensorflow
  Downloading https://www.piwheels.org/simple/tensorflow/tensorflow-1.14.0-cp37-none-linux_armv7l.whl

Oh no: the version of TensorFlow installed is 1.14. If you try to specify a higher version, you'll see an error like:

$ pip install tensorflow==2.3.1
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting tensorflow==2.3.1
  Could not find a version that satisfies the requirement tensorflow==2.3.1 (from versions: 0.11.0, 1.12.0, 1.13.1, 1.14.0)
No matching distribution found for tensorflow==2.3.1

Can I use TensorFlow 2? 😱

Yes! You'll have to do a bit of extra work, but here are 3 different ways to use TensorFlow 2.x in your next Raspberry Pi project.

Official TensorFlow Wheels

The TensorFlow team builds and tests binaries for a variety of platform and Python interpreter combination, listed at tensorflow.org/install/pip#package-location. Unfortunately, the Raspberry Pi wheels drift a bit behind the latest releases (2.3.1 and 2.4.0-rc2 at the time of this writing). Binaries for aarch64 are not officially supported yet.

Official TensorFlow wheels for Raspberry Pi

Install tensorflow==2.3.0 with pip (requires Python 3.5, Raspberry Pi 3)

$ pip install https://storage.googleapis.com/tensorflow/raspberrypi/tensorflow-2.3.0-cp35-none-linux_armv7l.whl

Community-built Wheels

I maintain TensorFlow binaries that I build from source at github.com/bitsy-ai/tensorflow-arm-bin. Where possible, I recommend using the official wheels because they're more thoroughly tested.

Install tensorflow==2.4.0-rc2 with pip (requires Python 3.7, Raspberry Pi 4)

$ pip install https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_armv7l.whl

Install tensorflow==2.4.0-rc2 with pip (requires Python 3.7, Raspberry Pi 4, 64-bit OS)

$ pip install https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_aarch64.whl

Build from Source

Packaging a code-base is a great way to learn more about it (especially when things do not go as planned). I highly recommend this option! Building TensorFlow has taught me more about the framework's complex internals than any other ML exercise.

Image Credit: MLIR: A new intermediate representation and compiler framework

The TensorFlow team recommends cross-compiling a Python wheel (a type of binary Python package) for Raspberry Pi [1]. For example, you can build a TensorFlow wheel for a 32-bit or 64-bit ARM processor on a computer running an x86 CPU instruction set.

Before you get started, install the following prerequisites on your build machine:

Docker
bazelisk (bazel version manager, like nvm for Node.js or rvm for Ruby)

Next, pull down TensorFlow's source code from git.

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow

Check out the branch you want to build using git, then run the following to build a wheel for a Raspberry Pi 4 running a 32-bit OS and Python3.7:

$ git checkout v2.4.0-rc2
$ tensorflow/tools/ci_build/ci_build.sh PI-PYTHON37 \
    tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

For 64-bit support, add AARCH64 as an argument to the build_raspberry_pi.sh script.

$ tensorflow/tools/ci_build/ci_build.sh PI-PYTHON37 \
    tensorflow/tools/ci_build/pi/build_raspberry_pi.sh AARCH64

The official documentation can be found at tensorflow.org/install/source_rpi.

Grab a snack and water while you wait for the build to finish! On my Threadripper 3990X (64 cores, 128 threads), compilation takes roughly 20 minutes.

threadripper go brrrr pic.twitter.com/27NFDbLdhn
— Leigh (@grepLeigh) November 22, 2020

Depend on TensorFlow 2.x in a Python Package🐍

What if you want to distribute your TensorFlow application for other Raspberry Pi fans?

Here's just one way specify which TensorFlow binary your package needs in setup.py.

# setup.py
import os
from setuptools import setup,

arch = os.uname().machine

if arch == 'armv7l':
	tensorflow = 'tensorflow @ https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_armv7l.whl'
	
elif arch == 'aarch64':
	tensorflow = 'https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_aarch64.whl'	

elif arch == 'x86_64':
	tensorflow = "tensorflow==2.4.0rc2"
else:
	raise Exception(f'Could not find TensorFlow binary for target {arch}. Please open a Github issue.')
    
 requirements = [
 	tensorflow,
    # specify additional package requirements here
 ]
 
 setup(
 	install_requires=requirements,
    # specify additional setup parameters here
 )

# setup.py
	import os
	from setuptools import setup

	arch = os.uname().machine

	if arch == 'armv7l':
	tensorflow = 'tensorflow @ https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_armv7l.whl'

	elif arch == 'aarch64':
	tensorflow = 'https://github.com/bitsy-ai/tensorflow-arm-bin/releases/download/v2.4.0-rc2/tensorflow-2.4.0rc2-cp37-none-linux_aarch64.whl'

	elif arch == 'x86_64':
	tensorflow = "tensorflow==2.4.0rc2"
	else:
	raise Exception(f'Could not find TensorFlow binary for target {arch}. Please open a Github issue.')

	requirements = [
	tensorflow,
	# specify additional package requirements here
	]

	setup(
	install_requires=requirements,
	# specify additional setup parameters here
	)

Your adventure is just beginning ✨

I'm excited to hear about your new TensorFlow applications for Raspberry Pi! Drop your project link or tell us about your idea for offline Machine Learning at edge in the comments below.

If you're looking to start a computer vision project, check my example applications in Real-time Object Tracking with TensorFlow. A few folks have used these examples to jump-start their own research and applied ML projects.

@martin2kid prototype detects face masks and tracks temperature with a thermal array.

Our AI-enabled camera for wildlife surveys is IN for the @ConservationX prize. 🎉🙌🥳

It detects, tracks, and photographs leopards using @Raspberry_pi, #Python , #machinelearning, & some 🪄

Next steps... move beyond proof-of-concept & upgrade hardware for field-testing 📸🐆 1/n pic.twitter.com/zBgpMpF8R1
— Kasim Rafiq (@explorerkas) October 21, 2020

@explorerkas submission to @Conservationx detects and tracks leopards in the field.

Finally,I can measure the Length & Biomass of the Fish. Trained and customised the Fish detection model @EdjeElectronics. A BIG MILEStONE for me. BiG Thanks to @EdjeElectronics @reesber @grepLeigh for your support. #oceans #fish #MachineLearning #rpi #newtech pic.twitter.com/zhpTwuS5VX
— Nitesh Verma (@niteshverma21i1) May 7, 2020

@niteshverma21i1 measures fish length and biomass for aquaculture automation.

Looking for more hands-on examples of Machine Learning for Raspberry Pi and other small device? Sign up for my newsletter!

I publish new examples of real-world ML applications (with full source code) and other nifty tricks like automating away the pains of bounding-box annotation.

zsh: no matches found

Leigh Johnson — Mon, 02 Nov 2020 22:11:50 GMT

I've been using zsh and ohmyz.sh for years, but I still occasionally forget this shell interprets square brackets as a pattern on the command line.

Here's an example:

$ which $SHELL
/usr/bin/zsh
$ pip install -e .[develop,plugins]
zsh: no matches found: [develop,plugins]

Instead of installing the develop and plugins variant of this Python package, zsh attempted to match this pattern. To account for this, I need to escape the square brackets:

$ pip install -e .\[develop,plugins\] 
Obtaining file:///home/leigh/projects/OctoPrint

If I need a more permanent fix, I can use an alias to set -o noglob (disable shell globbing) for a command is run by adding this to my .zshrc file:

alias pip='noglob pip'

Automate Image Annotation on a Small Budget

Leigh Johnson — Mon, 02 Nov 2020 08:27:04 GMT

Learn how to use TensorFlow.js and Automated Machine Learning (AutoML) to prototype a computer vision model, plus increase the efficiency of manual data labeling.

Introduction 👋

Data collection and preparation are the foundation of every machine learning application. You've heard it before: "Garbage in, garbage out" in reference to an algorithm's limited capability to correct for inaccurate, poor-quality, or biased input data.

The cost of quality annotated data prompted a cottage industry of tools/platforms for speeding up the data labeling process. Besides the SaaS/on-prem startup ecosystem, each of the major cloud providers (AWS, Microsoft, Google) launched an automated data labeling product in the last two years. Understandably, these services are often developed with Premium/Enterprise users, features, and price points in mind.

On a limited budget, am I stuck labeling everything by hand?

Good, news! With a bit of elbow grease, you can automate bounding box annotation for yourself or a small team. In this blog post, I will show you the automation technique I used to quickly prototype a 3D print failure detection model.

You'll learn how to:

Create detailed instructions for human labelers
Train a guidance model
Automate bounding box annotation with Microsoft VoTT (Visual Object Tagging Tool) and TensorFlow.js

Pictured: Custom TensorFlow model automatically annotates a video frame in Microsoft VoTT (Visual Object Tagging Tool).

First Annotation Pass 🏷

If you're starting from scratch without labels, you do need to bite the bullet and annotate some data manually. Labeling at least a few hundred examples by hand is required to create written guidelines and evaluation criteria for annotation decisions.

Install VoTT (Visual Object Tracking Tool) from Source

Microsoft VoTT is an Open Source tool for annotating images and videos with bounding boxes (object detection) and polygons (segmentation). I use VoTT because it supports a variety of export formats, can be hosted as a web app, and lets me load a custom TensorFlow.js model to provide bounding box suggestions.

Prerequisites:

NodeJS (>= 10.x) and NPM.

I recommend NVM (Node Version Manager) to manage NodeJS installation and environments.

$ git clone https://github.com/microsoft/VoTT
$ cd VoTT
$ npm ci
$ npm i @tensorflow/tfjs@2.7.0
$ npm start

Upgrading TensorFlow.js package is necessary to use newer ops.

Refer to Using VoTT to create a new project and setup data connections.

Looking for a dataset? Try Google's Dataset Search.

Manually Label Some Examples

The number of images you'll need to label by hand depends on the problem domain, ranging from a few dozen images to thousands. I obtained reasonable results for my problem (detecting 3D print defects) with:

5 labels (distribution below)
67 print time lapse videos, sampled at 3 frames per second
6,248 images viewed
9,004 bounding boxes drawn on 3,215 images, average 3 boxes per image.
8 hours, split over a few days. I caught up on podcasts I've been neglecting without a morning commute.

Dataset statistics provided by VoTT (Visual Object Tracking Tool)

Write Annotation Guidelines

While the task is still fresh in your mind, take the extra time to write clear guidelines (example below) to keep labels consistent and cover common edge cases. Feel free to use my instructions as a template for your own.

Bounding Box Annotation Guidelines

Concept

2-3 sentence introduction to the task/concept

This dataset contains timelapse videos of failed 3D print jobs. Images are in chronological order. Your task is to draw tight boxes around all recognizable pixels matching a print defect or object.

Labels

id, text name, written description of labels and edge cases, positive and negative examples

Should labels start at 0 or 1?
Is 0 reserved for background / unknown class?

0 background

1 nozzle

The print nozzle is a metal object that extrudes hot filament.

If the nozzle is partially occluded, draw a bounding box around the entirety of the object (including occluded pixels).

If the nozzle is entirely occluded, do not label.

2 raft

A "raft" is a thin outline surrounding the first few layers of a print.

If the raft is partially occluded (usually by the print), draw a bounding box around the entire raft.

3 print

The object(s) being printed. If multiple object or pieces are being printed, draw a bounding box around each distinct object.

4 adhesion

...

5 spaghetti

...

Guidelines

Input format
Label format
1 or multiple objects per frame?
1 or multiple boxes per object "instance"?
Tight or loose boxes?
Label reflections (mirrors, water)?

Train a Guidance Model with AutoML 🤖

AutoML (Automated Machine Learning) is a technique that falls under the "brute force" category of algorithms. Cloud-based AutoML platforms are an excellent tool for validating that your problem can and should be solved with Machine Learning.

Even if you plan to train a model, consider putting the AutoML model in front of customers first. Collect customer feedback early and incorporate this information into the custom model's development. An example of some insights...

Customers were not sensitive to false positives (defect reported via SMS, but print was ok).
Most customers appreciates visual updates on the print's progress, even if the detector was incorrect.
Surprisingly, a few customers reported false positives elicited a sense of security.

I used Google Cloud AutoML Vision Edge (Object Detection), which I selected because:

Supports model export to TensorFlow Lite, TensorFlow.js and ops compatible with Edge TPU, ARM, and NVIDIA hardware acceleration.
Disclosure: I'm a Google Developer Expert 🤓

Export Dataset from VoTT

In your project's export settings, choose the CSV provider. Check "include images," save the configuration, and then export the data. If you're exporting thousands of images, this will take a few minutes. Take a break!

VoTT Export Settings

Files exported from VoTTIn

Inspect and Preprocess Data

AutoML Vision requires CSV data in the following format if two vertices are provided:

SET,gs://path/to/img,label,x_min,y_min,,,x_max,y_max

The coordinates must be relative to the image's size, falling in range [0, 1].

Code available in Github Gist

import pandas as pd

# load VoTT CSV export
# notice: coordinates are absolute
df = pd.read_csv('/path/to/vott-csv-export/{project name}-export.csv')
df.head()

import cv2

base_path = '/path/to/vott-csv-export/'

LOG_INTERVAL=2000

# convert absolute coordinates to relative coordinates in [0, 1] range
for index, row in df.iterrows():
    if index % LOG_INTERVAL == 0:
        print(f'finished {index} / {len(df)}')
    filename = row['image_path'].split('/')[-1]
    img = cv2.imread(f'{base_path}{filename}')
    height, width, channels = img.shape
    df.at[index, 'x1_n'] = row['x1'] / width
    df.at[index, 'x2_n']= row['x2'] / width  
    df.at[index, 'y1_n'] = row['y1'] / height
    df.at[index, 'y2_n'] = row['y2'] / height
 
 
# replace relative image paths with a Google Storage bucket path
df['set'] = 'UNASSIGNED'
df['gs_path'] = df['image'] + 'gs://bucket-name/path/to/upload'

# write CSV with columns expected by AutoML Vision
# the "none" columns are required for boxes defined by 2 vertices
df['none'] = ''
df.to_csv('/home/leigh/datasets/spaghetti/labeled/vott-csv-export/spaghetti_v1-normalized-export.csv', 
    columns=['set', 'image_path', 'label', 'x1_n', 'y1_n', 'none', 'none', 'x2_n', 'y2_n', 'none', 'none'],
    index=False
    )

For additional information, refer to preparing your training data.

Upload Data

Upload the data to a Google Cloud Storage bucket. Note: If you're creating a new bucket, AutoML Vision exports in later steps require the destination bucket to be in the us-central-1 region.

New to GCP? Follow the steps in Before you Begin to setup a project and authenticate.
Install gsutil
gsutil rsync -r /path/to/vott-csv-export gs://your-bucket-name/vott-csv-export/

Import Data to AutoML Vision

Open the AutoML Vision Datasets browser in GCP's console.
Create a new dataset. In the import tab, select your CSV file from the Storage bucket.

Take a break while the data imports! 👏

Before you train, sanity-check the imported data and verify labels are correct.

The "raft" of a 3D print is a throwaway supportive structure.

Train Model

AutoML's price system is based on node hours, which is not the same as "wall clock" or elapsed time. A critique I have is that pricing a training job in advance requires some additional effort.

AutoML Vision pricing (USD prices reflected below) varies by feature, with different pricing schedules for:

Cloud-hosted classification & object detection - $3.15 / node hour, $75.6 / 24 hours
Edge (classification) - $4.95 / node hour, $118.80 / 24 hours
Edge (object detection) - $18.00/ node hour, $432 / 24 hours

If these prices are outside of your project's budget, I will cover how I train models with TensorFlow's Object Detection API in a future post. Follow or subscribe to my newsletter to be notified on publication.

For this particular problem (detecting 3D print defects), I saw reasonable results using the recommended training time (24 node hours) for my dataset size. Drilling down into individual labels, "nozzle" detection performed significantly worse compared to the rest.

Left: precision/recall curve for all labels. Right: precision/recall curve for "nozzle". Click to view full-size images.

On closer inspection, I could see that a high false positive rate significantly impacted the model's precision score. Precision is the sum of true positive / (true positive + false positive). I was thrilled to discover a number of examples where I had failed to label nozzles in ground truth data.

🤯 Even though I had gotten sloppy during the first pass of labeling, the guidance model was already good enough to catch these mistakes. Wow! If I had to manually label the entirety of this data, it would be riddled with errors.

My model confidently detected "nozzle" objects in these images, which were scored as "false positives" due to human error in labeling the ground truth data.

Automate VoTT Labeling with TensorFlow.js 🤖

The following section will show you how to use a custom TensorFlow.js model to suggest bounding boxes with VoTT's "Active Learning" feature.

The "Active Learning" uses a TensorFlow.js model to perform an inference pass over a frame, apply non-max suppression, and draw the best box proposal per detected object.

Export TensorFlow.js Model

After model training completes, you can export a TensorFlow.js package in the "Test & Use" tab. The model will export to a Storage bucket (the destination bucket must be in the us-central-1 region).

Create classes.json file

I fixed this manually. AutoML Vision Edge exports a new-line delimited label file. The format required by VoTT is below. Label index MUST start at 1!

[{"id":1,"displayName":"nozzle"}, ... ]

Patch VoTT to fix TensorFlow 1.x -> 2.x bugs

VoTT ships with v1 of @tensorflow/tfjs. The AutoML Vision Edge model uses ops (e.g. AddV2) that require a more recent version. I fixed a few minor issues with the following patch:

Model expects float32 input
Use newer tf.image.nonMaxSupressionAsync() fn

diff --git a/src/providers/activeLearning/objectDetection.ts b/src/providers/activeLearning/objectDetection.ts
index 196db45..a8dff06 100755
--- a/src/providers/activeLearning/objectDetection.ts
+++ b/src/providers/activeLearning/objectDetection.ts
@@ -151,6 +151,8 @@ export class ObjectDetection {
         const batched = tf.tidy(() => {
             if (!(img instanceof tf.Tensor)) {
                 img = tf.browser.fromPixels(img);
+                // model requires float32 input
+                img = tf.cast(img, 'float32');
             }
             // Reshape to a single-element batch so we can pass it to executeAsync.
             return img.expandDims(0);
@@ -166,7 +168,8 @@ export class ObjectDetection {
         const result = await this.model.executeAsync(batched) as tf.Tensor[];
 
         const scores = result[0].dataSync() as Float32Array;
-        const boxes = result[1].dataSync() as Float32Array;
+        // tf.image.nonMaxSepressionAsync() expects tf.Tensor as input
+        const boxes = result[1].dataSync()
 
         // clean the webgl tensors
         batched.dispose();
@@ -177,10 +180,8 @@ export class ObjectDetection {
         const prevBackend = tf.getBackend();
         // run post process in cpu
         tf.setBackend("cpu");
-        const indexTensor = tf.tidy(() => {
-            const boxes2 = tf.tensor2d(boxes, [result[1].shape[1], result[1].shape[3]]);
-            return tf.image.nonMaxSuppression(boxes2, maxScores, maxNumBoxes, 0.5, 0.5);
-        });
+        const boxes2d = tf.tensor2d(boxes, [result[1].shape[0], result[1].shape[1]]);
+        const indexTensor = await tf.image.nonMaxSuppressionAsync(boxes2d, maxScores, maxNumBoxes, 0.5, 0.5);
 
         const indexes = indexTensor.dataSync() as Float32Array;
         indexTensor.dispose();
@@ -188,7 +189,9 @@ export class ObjectDetection {
         // restore previous backend
         tf.setBackend(prevBackend);
 
-        return this.buildDetectedObjects(width, height, boxes, maxScores, indexes, classes);
+        // _.buildDetectedObjects expects Float32Array input
+        const fboxes = boxes as Float32Array
+        return this.buildDetectedObjects(width, height, fboxes, maxScores, indexes, classes);
     }

Automatic Bounding Box Suggestions ✨

Run npm start after patching VoTT
In the "Active Learning" tab, configure "Model Path" to point to your TensorFlow.js export. I recommend enabling the "auto detect" feature, otherwise you'll have to manually press ctrl/cmd+d to perform a detection pass on each frame.

Wrapping Up

You just learned how to leverage a few hours and a few hundred dollars into an automated workflow for bounding box annotation. As a bonus, the guidance model can be adequate for a prototype or even initial production run. I hope this saves you a bit of time/energy in your next object detection project!

AutoML products are expensive compared to other cloud offerings - but they're nowhere near as resource intensive as developing a comparable model from scratch, or even with weight transfer learning.

Are you currently solving problems using an object detector? Tell me more about the concept and the approach you're taking in the comments below.

Subscribe to my newsletter @ bitsy.ai f you want to receive more tips and detailed write-ups of ML applications for Raspberry Pi, Arduino, and other small devices. I'm currently building a privacy-first 3D print monitoring plugin for Octoprint.

Real-time Object Tracking with TensorFlow, Raspberry Pi, and Pan-Tilt HAT

Leigh Johnson — Mon, 09 Dec 2019 00:00:00 GMT

Portable computer vision and motion tracking on a budget.

Pictured: Raspberry Pi 4GB, Pi Camera v2.1, Pimoroni Pan-Tilt HAT, Coral Edge TPU USB Accelerator

Part 1 — Introduction 👋

Are you just getting started with machine/deep learning, TensorFlow, or Raspberry Pi? Perfect, this blog post is for you! I created rpi-deep-pantilt as an interactive demo of object detection in the wild. 🦁

UPDATE (February 9th, 2020) — Face detection and tracking added!

I’ll show you how to reproduce the video below, which depicts a camera panning and tilting to track my movement across a room.

I'm just a girl, standing in front of a tiny computer, reminding you most computing problems can be solved by sheer force of will. 💪

MobileNetv3 + SSD @TensorFlow model I converted #TFLite #RaspberryPi + @pimoroni pantilt hat, PID controller.

Write-up soon! ✨ https://t.co/v63KSJtJHO pic.twitter.com/dmyAlWCnWk
— Leigh (@grepLeigh) November 28, 2019

I will cover the following:

Build materials and hardware assembly instructions.
Deploy a TensorFlow Lite object detection model (MobileNetV3-SSD) to a Raspberry Pi.
Send tracking instructions to pan / tilt servo motors using a proportional–integral–derivative controller (PID) controller.
Accelerate inferences of any TensorFlow Lite model with Coral’s USB Edge TPU Accelerator and Edge TPU Compiler.

Terms & References 📚

Raspberry Pi — a small, affordable computer popular with educators, hardware hobbyists and robot enthusiasts. 🤖

Raspbian — the Raspberry Pi Foundation’s official operating system for the Pi. Raspbian is derived from Debian Linux.

TensorFlow — an open-source framework for dataflow programming, used for machine learning and deep neural learning.

TensorFlow Lite — an open-source framework for deploying TensorFlow models on mobile and embedded devices.

Convolutional Neural Network — a type of neural network architecture that is well-suited for image classification and object detection tasks.

Single Shot Detector (SSD) — a type of convolutional neural network (CNN) architecture, specialized for real-time object detection, classification, and bounding box localization.

MobileNetV3 — a state-of-the-art computer vision model optimized for performance on modest mobile phone processors.

MobileNetV3-SSD — a single-shot detector based on MobileNet architecture. This tutorial will be using MobileNetV3-SSD models available through TensorFlow’s object detection model zoo.

Searching for MobileNetV3

Edge TPU — a tensor processing unit (TPU) is an integrated circuit for accelerating computations performed by TensorFlow. The Edge TPU was developed with a small footprint, for mobile and embedded devices “at the edge”

Part 2— Build List 🛠

Essential

Optional

12" CSI/DSI ribbon for Raspberry Pi Camera.
The Pi Camera’s stock cable is too short for the Pan-tilt HAT’s full range of motion.
RGB NeoPixel Stick
This component adds a consistent light source to your project.
Coral Edge TPU USB Accelerator
Accelerates inference (prediction) speed on the Raspberry Pi. You don’t need this to reproduce the demo.

👋 Looking for a project with fewer moving pieces?Check out Portable Computer Vision: TensorFlow 2.0 on a Raspberry Pi to create a hand-held image classier. ✨

Part 3 — Raspberry Pi Setup

There are two ways you can install Raspbian to your Micro SD card:

NOOBS (New Out Of the Box Software) is a GUI operation system installation manager. If this is your first Raspberry Pi project, I’d recommend starting here.
Write Raspbian Image to SD Card.

This tutorial and supporting software were written using Raspbian (Buster). If you’re using a different version of Raspbian or another platform, you’ll probably experience some pains.

Before proceeding, you’ll need to:

Connect your Pi to the internet (doc)
SSH into your Raspberry Pi (doc)

Part 4— Software Installation

Install system dependencies

$ sudo apt-get update && sudo apt-get install -y python3-dev libjpeg-dev libatlas-base-dev raspi-gpio libhdf5-dev python3-smbus

2. Create a new project directory

$ mkdir rpi-deep-pantilt && cd rpi-deep-pantilt

3. Create a new virtual environment

$ python3 -m venv .venv

4. Activate the virtual environment

$ source .venv/bin/activate && python3 -m pip install --upgrade pip

5. Install TensorFlow 2.0 from a community-built wheel.

$ pip install https://github.com/leigh-johnson/Tensorflow-bin/blob/master/tensorflow-2.0.0-cp37-cp37m-linux_armv7l.whl?raw=true

6. Install the rpi-deep-pantilt Python package

$ python3 -m pip install rpi-deep-pantilt

Part 5 —Pan Tilt HAT Hardware Assembly

If you purchased a pre-assembled Pan-Tilt HAT kit, you can skip to the next section.

Otherwise, follow the steps in Assembling Pan-Tilt HAT before proceeding.

Part 6 — Connect the Pi Camera

Turn off the Raspberry Pi
Locate the Camera Module, between the USB Module and HDMI modules.
Unlock the black plastic clip by (gently) pulling upwards
Insert the Camera Module ribbon cable (metal connectors facing away from the ethernet / USB ports on a Raspberry Pi 4)
Lock the black plastic clip

Image Credit: Getting Started with the Pi Camera

Part 7 — Enable the Pi Camera

Turn the Raspberry Pi on
Run sudo raspi-config and select Interfacing Options from the Raspberry Pi Software Configuration Tool’s main menu. Press ENTER.

3. Select the Enable Camera menu option and press ENTER.

4. In the next menu, use the right arrow key to highlight ENABLE and press ENTER.

Part 8 — Test Pan Tilt HAT

Next, test the installation and setup of your Pan-Tilt HAT module.

SSH into your Raspberry Pi
Activate your Virtual Environment: source .venv/bin/activate
Run the following command: rpi-deep-pantilt test pantilt
Exit the test with Ctrl+C

If you installed the HAT correctly, you should see both servos moving in a smooth sinusoidal motion while the test is running.

Part 9 — Test Pi Camera

Next, verify the Pi Camera is installed correctly by starting the camera’s preview overlay. The overlay will render on the Pi’s primary display (HDMI).

Plug your Raspberry Pi into an HDMI screen
SSH into your Raspberry Pi
Activate your Virtual Environment: $ source .venv/bin/activate
Run the following command: $ rpi-deep-pantilt test camera
Exit the test with Ctrl+C

If you installed the Pi Camera correctly, you should see footage from the camera rendered to your HDMI or composite display.

Part 10— Test object detection

Next, verify you can run an object detection model (MobileNetV3-SSD) on your Raspberry Pi.

SSH into your Raspberry Pi
Activate your Virtual Environment: $ source .venv/bin/activate
Run the following command:

$ rpi-deep-pantilt detect

Your Raspberry Pi should detect objects, attempt to classify the object, and draw a bounding box around it.

$ rpi-deep-pantilt face-detect

Note: Only the following objects can be detected and tracked using the default MobileNetV3-SSD model.

$ rpi-deep-pantilt list-labels
[‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’, ‘fire hydrant’, ‘stop sign’, ‘parking meter’, ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’, ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘backpack’, ‘umbrella’, ‘handbag’, ‘tie’, ‘suitcase’, ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’, ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’, ‘tennis racket’, ‘bottle’, ‘wine glass’, ‘cup’, ‘fork’, ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’, ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’, ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘dining table’, ‘toilet’, ‘tv’, ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’, ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’, ‘hair drier’, ‘toothbrush’]

Part 11— Track Objects at ~8 FPS

This is the moment we’ve all been waiting for! Take the following steps to track an object at roughly 8 frames / second using the Pan-Tilt HAT.

SSH into your Raspberry Pi
Activate your Virtual Environment: $source .venv/bin/activate
Run the following command: $ rpi-deep-pantilt track

By default, this will track objects with the label person. You can track a different type of object using the --label parameter.

For example, to track a banana you would run:

$ rpi-deep-pantilt track --label=banana

On a Raspberry Pi 4 (4 GB), I benchmarked my model at roughly 8 frames per second.

INFO:root:FPS: 8.100870481091935
INFO:root:FPS: 8.130448201926173
INFO:root:FPS: 7.6518234817241355
INFO:root:FPS: 7.657477766009717
INFO:root:FPS: 7.861758172395542
INFO:root:FPS: 7.8549541944597
INFO:root:FPS: 7.907857699044301

Part 12— Track Objects in Real-time with Edge TPU

We can accelerate model inference speed with Coral’s USB Accelerator. The USB Accelerator contains an Edge TPU, which is an ASIC chip specialized for TensorFlow Lite operations. For more info, check out Getting Started with the USB Accelerator.

SSH into your Raspberry Pi
Install the Edge TPU runtime

$ echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

$ sudo apt-get update && sudo apt-get install libedgetpu1-std

3. Plug in the Edge TPU (prefer a USB 3.0 port). If your Edge TPU was already plugged in, remove and re-plug it so the udev device manager can detect it.

4. Try the detect command with --edge-tpuoption. You should be able to detect objects in real-time! 🎉

$ rpi-deep-pantilt detect --edge-tpu --loglevel=INFO

Note: loglevel=INFO will show you the FPS at which objects are detected and bounding boxes are rendered to the Raspberry Pi Camera’s overlay.

You should see around ~24 FPS, which is the rate at which frames are sampled from the Pi Camera into a frame buffer.

INFO:root:FPS: 24.716493958392558
INFO:root:FPS: 24.836166606505206
INFO:root:FPS: 23.031063233367547
INFO:root:FPS: 25.467177106703623
INFO:root:FPS: 27.480438524486594
INFO:root:FPS: 25.41399952505432

5. Try the track command with --edge-tpu option.

$ rpi-deep-pantilt track --edge-tpu

Part 13 — Detect & Track Faces (NEW in v1.1.x)

I’ve added a brand new face detection model in version v1.1.x of rpi-deep-pantilt 🎉

The model is derived from facessd_mobilenet_v2_quantized_320x320_open_image_v4 in TensorFlow’s research model zoo.

The new commands are rpi-deep-pantilt face-detect (detect all faces) and rpi-deep-pantilt face-track (track faces with Pantilt HAT). Both commands support the --edge-tpu option, which will accelerate inferences if using the Edge TPU USB Accelerator.

rpi-deep-pantilt face-detect --help
Usage: cli.py face-detect [OPTIONS]

Options:
  --loglevel TEXT  Run object detection without pan-tilt controls. Pass
                   --loglevel=DEBUG to inspect FPS.
  --edge-tpu       Accelerate inferences using Coral USB Edge TPU
  --help           Show this message and exit.

rpi-deep-pantilt face-track --help
Usage: cli.py face-track [OPTIONS]

Options:
  --loglevel TEXT  Run object detection without pan-tilt controls. Pass
                   --loglevel=DEBUG to inspect FPS.
  --edge-tpu       Accelerate inferences using Coral USB Edge TPU
  --help           Show this message and exit.

Wrapping Up 🌻

Congratulations! You’re now the proud owner of a DIY object tracking system, which uses a single-shot-detector (a type of convolutional neural network) to classify and localize objects.

PID Controller

The pan / tilt tracking system uses a proportional–integral–derivative controller (PID) controller to smoothly track the centroid of a bounding box.

PID Controller Architecture, Leigh Johnson 2019

TensorFlow Model Zoo

The models in this tutorial are derived from ssd_mobilenet_v3_small_coco and ssd_mobilenet_edgetpu_coco in the TensorFlow Detection Model Zoo. 🦁🦄🐼

My models are available for download via Github releases notes @ leigh-johnson/rpi-deep-pantilt.

I added the custom TFLite_Detection_PostProcess operation, which implements a variation of Non-maximum Suppression (NMS) on model output. Non-maximum Suppression is technique that filters many bounding box proposals using set operations.

Image Credit: Non-maximum Suppression (NMS)

Special Thanks & Acknowledgements 🤗

MobileNetEdgeTPU SSDLite contributors: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le.

MobileNetV3 SSDLite contributors: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang.

Special thanks to Adrian Rosebrock for writing Pan/tilt face tracking with a Raspberry Pi and OpenCV, which was the inspiration for this whole project!

Special thanks to Jason Zaman for reviewing this article and early release candidates. 💪

Portable Computer Vision: Tensorflow 2 on a Raspberry Pi

Leigh Johnson — Mon, 24 Jun 2019 00:00:00 GMT

Tiny, low-cost object detection and classification.

Part 1 — Introduction

For roughly $100 USD, you can add deep learning to an embedded system or your next internet-of-things project.

Are you just getting started with machine/deep learning, TensorFlow, or Raspberry Pi? Perfect, this post is for you!

By the end of this post, you'll know:

How to setup Raspberry Pi Camera and install software dependencies.
Basics of Convolutional Neural Networks for image classification
How to deploy a pre-trained model (MobileNetV2) to Raspberry Pi
Convert a model to TensorFlow Lite, a model format optimized for embedded and mobile devices.
Accelerate inferences of any TensorFlow Lite model with Coral’s USB Edge TPU Accelerator and Edge TPU Compiler.

Terms & References 📚

Raspberry Pi — a small, affordable computer popular with educators, hardware hobbyists, and roboticists. 🤖

TensorFlow — an open-source platform for machine learning.

TensorFlow Lite — a lightweight library for deploying TensorFlow models on mobile and embedded devices.

Convolutional Neural Network — a type of deep-learning model well-suited for image classification and object detection applications.

MobileNetV2 — a state-of-the-art image recognition model optimized for performance on modest mobile phone processors.

MobileNetV2: The Next Generation of On-Device Computer Vision Networks

TPUv1 (left), TPUv2 (middle), Edge TPU (right)

Part 2 — Build List ✅

Starter Kit

If you’re just getting started with Raspberry Pi, I recommend the Pi Camera Pack ($90) by Arrow. It includes everything you need begin immediately:

5V 2.4A MicroUSB Power Supply
320x240 2.8" TFT Model PiTFT Resistive Touch-screen
Raspberry Pi 3 Model B
Raspberry Pi Camera v2
Plastic Case
8GB MicroSD Card with NOOBS installation manager pre-loaded

Coral USB Edge TPU Accelerator (Optional)

You can compile TensorFlow Lite models to run on Coral’s USB Accelerator (Link), for quicker model predictions.

Real-time applications benefit significantly from this speed-up. An example would be the decision-making module of an autonomous self-driving robot.

Some applications can tolerate a higher prediction speed and might not require TPU acceleration. For example, you would not need TPU acceleration to build a smart doggie door that unlocks for your pooch (but keeps raccoons out).

If you’re just getting started, skip buying this component.

Are you not sure if you need the USB Accelerator? The MobileNet benchmarks below might help you decide. The measurements below depict inference speed (in ms) — lower speeds are better!

Source: Benchmarking TensorFlow and TensorFlow Lite on the Raspberry Pi by Alasdair Allan

Custom Build

If you already have a Raspberry Pi or some components laying around, the starter kit might include items you don’t need.

Here are the parts I used for my own builds (approximately $250 / unit).

Raspberry Pi Model 3 B+ ($35)
Raspberry Pi Camera v2 ($30)
Coral USB Edge TPU Accelerator — accelerates model inferencing ($75, link)
Pi Foundation Display — 7" Touchscreen Display ($80, link)
SmartiPi Touch Stand ($25, link)
Adjustable Pi Camera Mount ($5, link)
Flex cable for RPi Camera 24'’ ($3, link)

I would love to hear about your own build list! ❤️ Tweet me @grepLeigh or comment below.

Part 3— Raspberry Pi Setup 🍰

If you purchased an SD card pre-loaded with NOOBS, I recommend walking through this overview first: Setting up your Raspberry Pi

Before proceeding, you’ll want to:

Connect your Pi to the internet (doc)
SSH into your Raspberry Pi (doc)

Part 4— Primary Computer: Download & Install Dependencies

rpi-vision is a set of tools that makes it easier for you to:

Install a lot of dependencies on your Raspberry Pi (TensorFlow Lite, TFT touch screen drivers, tools for copying PiCamera frame buffer to a TFT touch screen).
Deploy models to a Raspberry Pi.
Train new models on your computer or Google Cloud’s AI Platform.
Compile 8-bit quantized models for an Edge TPU.

Clone the rpi-vision repo on your primary computer (not your Raspberry Pi)

$ git clone git@github.com:leigh-johnson/rpi-vision.git && cd rpi-vision

2. On your primary computer, create a new virtual environment, then install the rpi-vision package.

$ pip install virtualenv; virtualenv -p $(which python3) .venv && source .venv/bin/activate && pip install -e .

3. Verify you can SSH into your Raspberry Pi before proceeding.

If you’re using the default Raspbian image, your Pi’s hostname will beraspberrypi.local

$ ssh pi@raspberry.local

Part 5— Primary Computer: create configuration files

rpi-vision uses Ansible to manage deployments and tasks on your Raspberry Pi. Ansible is a framework for automating the configuration of computers.

Create 2 configuration files required by Ansible:

.env/my-inventory.ini

If you’re using a custom hostname for your Pi, replace raspberrypi.local.

tee -a .env/my-inventory.ini <

`.env/my-vars.json`

If you’re using a custom hostname for your Pi, replace raspberrypi.local.

tee -a .env/my-vars.ini <

`Part 6— Raspberry Pi: Install Dependencies`

$ make rpi-install

You’ll see the output of an Ansible playbook. Ansible is a framework for automating the configuration of computers.

A quick summary of what’s being installed on your Pi:

rpi-vision repo
rpi-fbcp (a tool for copying framebuffer from PiCamera to TFT touch screen display)
TFT touch screen drivers and X11 configuration

You can inspect the tasks run on your Raspberry Pi by opening playbooks/bootstrap-rpi.yml

While the installation is running, read through the next section to learn how CNNS work and why they are useful for computer vision tasks.

`Part 7— Introduction to CNNs (convolutional neural networks)`

CNNs are the key technology powering self-driving cars and image search engines. The technology is common for computer vision, but can also be applied to any problem with a hierarchical pattern in the data, where a complex pattern can be assembled from simpler patterns.

`Modeling the Visual Cortex`

In the late 1950s and 1960s, David H. Hubel and Torton Wielson performed experiments on cats and monkeys to better understand the visual cortex.

SINGLE UNIT ACTIVITY IN STRIATE CORTEX OF UNRESTRAINED CATS

They demonstrated neurons in the striate cortex respond to stimuli in a limited visual field, which they called a receptive field.

They noted concentric overlapping responses, where complex patterns were combinations of lower-level patterns.

Their findings also revealed specialization, where some neurons would only respond to a specific shape or pattern.

In the 1980s, inspired by Hubel and Wielson, Kunihiko Fukushima published on the neocognitron, a neural network capable of learning patterns with geometrical similarity.

Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position

The neocogitron has two key properties:

Learned patterns are hierarchal. Increasingly complex patterns are composed from simpler patterns.
Learned patterns are position-invariant and translation-invariant. After the network learns a pattern, it can recognize the pattern at different locations. After learning how to classify a dog, the network can accurately classify an upside-down dog without learning an entirely new pattern.

The neocogitron model is the inspiration for modern convolutional neural networks.

`Visualizing a Convolution Operation: 2D`

(Left) 2D 4x4 Input matrix. (Middle) 2D 2x2 kernel. (Right) 2D 2x2 output feature map.

The input layer is fed into convolutional layers, which transform regions of the input using a filter.

The filter is also referred to as a kernel.

The filter “slides” to each possible position, and the result is added to the feature map.

For each position in the input matrix, the convolution operation performs matrix multiplication on each element.

The resulting matrix is summed and stored in a feature map.

The operation is repeated for each position in the input matrix.

`Visualizing a Convolution Operation: 3D`

Applied Deep Learning — Part 4: Convolutional Neural Networks

The input layer of a CNN is usually a 3D data structure with height, width, and channel (RGB or greyscale values).

The deeper we go in the feature map stack, the sparser each map layer becomes. That means the filters detect fewer features.

The first few layers of the feature map stack detect simple edges and shapes, and look similar to the input image. As we go deeper into a feature map stack, features become more abstract to the human eye. Deeper feature layers encode classification data, like “cat face” or “cat ear”.

Applied Deep Learning — Part 4: Convolutional Neural Networks

`Do you want to learn more about CNNS?`

Your dependency installation is probably done by now. To forge ahead, skip to Part 8 - Deploy Pre-trained Model MobileNetV2.

If you plan on training a custom classifier or want to read more about convolutional neural networks, start here:

Applied Deep Learning — Part 4: Convolutional Neural Networks
Hands on Machine Learning with Scikit-learn and TensorFlow, Chapter 13, Convolutional Neural Networks, by Aurélien Géron
Deep Learning with Python, Chapter 5 Deep Learning for Computer Vision, by Francois Chollet

`Part 8 — Deploy Pre-trained Model (MobileNetV2)`

`Live Demo (using TensorFlow 2.0)`

SSH into your Raspberry Pi

$ ssh raspberrypi.local

2. Start a new tmux session

pi@raspberryi:~ $ tmux new-session -s mobilenetv2

3. Split the tmux session vertically by pressing control+b, then “

4. Start an fbcp process, which will copy framebuffer from the PiCamera to the TFT display via SPI interface. Leave this process running.

pi@raspberryi:~ $ fbcp

5. Switch tmux panes by pressing control+b, then o.

6. Activate the virtual environment installed in earlier, in Part 6.

pi@raspberryi:~ $ cd ~/rpi-vision && . .venv/bin/activate

7. Start a mobilenetv2 agent process. The agent will take roughly 60 seconds to initialize.

pi@raspberryi:~/rpi-vision $ python rpi_vision/agent/mobilenet_v2.py

You’ll see a summary of the model’s base, and then the agent will print inferences until stopped. Click for a gist of what you should see.

This demo uses weights for ImageNet classifiers, which you can look up at image-net.org.

`Wrapping Up`

Congratulations, you just deployed an image classification model to your Raspberry Pi! ✨

Follow me @grepLeigh to get updates on this blog series. In my next post, I will show you how to:

Convert a model to TensorFlow Lite, a model format optimized for embedded and mobile devices.
Accelerate inferences of any TensorFlow Lite model with Coral’s USB Edge TPU Accelerator and Edge TPU Compiler.
Employ transfer learning to re-train MobileNetV2 with a custom image classifier.