Talent wins games. Teamwork wins championships.”
— Michael Jordan
So here you are—time for action! Clients don’t hire Prolego to make sleek
documents and pretty slides. Ultimately our success is measured by our
impact. Your AI initiatives will be judged by the same ruthless standards.
In Part 4 I’ll share the major steps to deploy your first AI product in 90 days. But
first I’ll share my point of view so you can understand my advice in context.
I’m a startup guy. For most of my career I have been a founder, investor, or
early employee in high-risk product companies. Most of these endeavors
haven’t worked out—that’s just a reality of high stakes. It’s a reality that
I willingly accept because a few successes (like Palantir) make up for the rest.
As the Lean Startup methodology shows us, the best way to increase your
long-term chances of success is by habitually confronting your products’
highest-risk assumptions as early as possible.
AI projects fail for a lot of reasons. Here are the most common:
How can you avoid failure? Pick a project worth pursuing. Know your data.
Build a great product team. Get early results. Get the model into production
as soon as you can. Evaluate and decide whether to continue.
How can you guarantee failure? Pick a vaguely defined project. Promise
success. Tell your data scientists to work on it without clearly defining
success criteria. Let them toil away without guidance or feedback. Keep
promising success until . . . the CFO ultimately cancels your project.
Let’s explore the steps you need to take to minimize your chance of failure
and get your first AI product rolling. These steps are roughly consecutive,
although you can do many of them simultaneously.
SETTING STRATEGIC ORGANIZATIONAL GOALS
Ultimately you need to make business impact: increase revenue, reduce
costs, etc. Apart from those ends, you should also set organizational goals
that will move you closer to becoming an AI-driven organization.
For example, communicate to your leadership that you aim to reach the
following strategic milestones:
If your first AI project utterly fails in production—perhaps for reasons
outside of your control—you can point to the strategic value of hitting these
organizational milestones. Your milestones prepare your organization to
benefit from AI solutions and are therefore worthy goals even if they don’t
yield results in the near term. These objectives can help see you through the
often tedious and time-consuming work of getting your project off the ground.
In a perfect world your company would easily see the value in AI and begin
preparing for this fundamental shift. In reality you’ll need to persuasively
communicate the vision of what your company can do with AI before you can
get any resources or support. One of the most effective ways to do that is to
build a prototype.
Start by generating a broad list of potential AI use cases by using the four product
patterns as a guideline. Interview business leaders, analysts, or developers and
ask for ideas. Use the resources discussed in Part 2 for inspiration.
I use a simple spreadsheet (Google Sheets or Microsoft Excel) to organize
ideas. For example:
Don’t do a lot of targeted research at this stage—just get all of the general
ideas on the table. I usually try to generate about 20 potential use cases in a
week or two.
After completing the table, I select opportunities according to following criteria:
You might also have organization-specific criteria to consider. Usually a
handful of the potential opportunities meet the core criteria.
For each of the best opportunities, complete an AI Canvas. Review policies,
data dictionaries, and existing operational workflows so you can identify the
key challenges. Have your data scientists review arXiv for relevant papers.
I usually spend a few weeks on this process, depending on the availability of key
resources to help me answer strategic, data, and policy questions. If I can’t get an
answer in a few weeks, I flag it as an open issue on the canvas, and I move on.
Start your first AI governance board meeting with a bang. Open the
discussion by presenting your canvases. Have each board member
independently rank the canvases and pick your first AI project.
Your governance board may be reluctant to choose because of incomplete
information. No matter how thorough your planning, you will have a lot of
open questions. Accept the ambiguity and press forward. Then identify the
best AI Canvases and turn them over to your AI product team.
Before we discuss your founding AI team, let’s be clear on what you’re trying
to do. Your founding AI team needs to accomplish the following tasks:
You may not understand every step in this process, but you get the idea: you need a product team. Although your team doesn’t have to comprise of lofty credentialed researchers, team members do need to be capable of hard, tedious work which requires extensive communication with many parts of the company.
You need a team which will run through walls to get your first release into
production. The two critical roles for this team are (1) AI product managers
and (2) machine learning engineers. As you begin deploying solutions, you
will also need (3) data engineers and perhaps other specialists. Let’s talk
about your AI team’s members.
Take a closer look at the 14 tasks your AI product team needs to complete.
How many involve communication, coordination, and planning? Almost all
of them. The product manager will drive this process. The ideal candidate
knows how AI systems work, knows your data architecture, and is good at
building consensus and making tradeoffs.
I evaluate potential candidates by asking myself, “Could this person design
a Kaggle competition entry based on our AI project?” Visit Kaggle, review
the active competitions, and think about the work required to organize a
competition. The competition’s requirements and assets show what an AI
product manager would need to do to build a similar AI solution:
Look for AI product manager candidates with a math/computer science/
statistics background, good project management skills, and tenacity. Be wary
of product managers who come from a design background; they may not
have the data and statistics background necessary to lead the team.
Your AI product manager will remove obstacles and gather resources so
the technical team can begin building your models and preparing to deploy
solutions into production—an activity I call machine learning engineering.
While the AI product manager is the most critical team member for the
nontechnical side of your project, machine learning engineers are the most
critical team members for the technical side.
Contrary to popular myth, there is no greater shortage of machine learning
engineers than any other type of engineer. Not a lot of people can do machine
learning engineering, but there is also not yet great demand for this skill.
So why the obsession with AI talent shortage? In recent years the large
tech companies have been rapaciously buying AI startups and gutting entire
research labs in a race to gather talent.19 These companies need the very
best talent to win the AI race.
But you aren’t doing cutting-edge AI research—you’re just trying to apply
current AI technology to your business challenges. So all you need is a team
which can quickly learn the AI engineering best practices and put them to
work on your data. This is what professional software engineers do.
I try to hire experienced software engineers who can do machine learning. A
software engineer with five years’ experience can learn how to use modern
frameworks and to build basic solutions with about 6 to 12 months of study.
Some return to graduate school, some take a temporary sabbatical from work,
and others are self-taught in their evenings and weekends. It doesn’t really
matter how they got the skills as long as they can demonstrate proficiency.
In addition to the technical skills (which are beyond the scope of this book),
here are some key attributes of the right machine learning engineers for
your team:
Here are warning signs of an engineer who won’t be a good fit for your team:
When machine learning engineer candidates have any of these attributes,
spend a little more time examining their qualifications to decide whether
they are a good fit for your team.
As you move from prototyping to deployment, you will need software
engineers who can manage data processing, pipelines, and jobs associated
with running AI models in production. I call these data engineers.
Data engineers need to understand how AI works but don’t have to be
skilled at building models. They need to be good programmers who can build
workflows so that the models all run correctly and that outputs are written
to the interfaces where consumers can use them.
Hire experienced software engineers who have a history of building and
supporting complex data processing systems.
They should be proficient at the following:
Your company probably has people who manage ETL (extract, transfer,
and load) systems to ensure data gets processed and stored in your data
warehouse. These people usually have the skills to configure and operate
workflow processes but may not have the software engineering skills
necessary for your AI team.
You need roughly one data engineer for every machine learning engineer
on your team. Unfortunately good data engineers who can build machine
learning pipelines are extremely hard to find. You will be competing with
other companies for talent from a very small pool.
Don’t give up on your talent search. While your machine learning engineers
can probably handle simple deployments on your first release, soon you will
need dedicated data engineers to monitor and run everything in production.
If you’re working with exceptionally large datasets, you may need an
engineer who can optimize your hardware. They might need to configure
clusters of GPUs, optimize cache, or allocate memory and storage, for example.
If you’re pushing the cutting edge in a domain like computer vision, robotics,
or NLP, you may need to bring on researchers who can keep you ahead of
the competition.
Hire these sorts of specialists as you need them.
Do you need data scientists? It depends on your needs and their skills. The term
“data scientist” has various meanings, so you’ll need to understand exactly what
they do to determine whether they could serve a useful role on your team. Here
are some tips for evaluating data scientists for fit within your AI product team.
Some data scientists specialize in building reports and graphs in applications
like Tableau. At a tech company this role is usually referred to as a business
analyst. These specialists may have deep expertise in your business
processes and data and may have good relationships with key stakeholders.
They do little hands-on programming but help others make business decisions
with data. These individuals are potential AI product manager candidates.
Some data scientists have a strong background in statistics and are good
at creating hypotheses, testing assumptions, and drawing conclusions from
historical data. They often focus on marketing and sales challenges such as
identifying the right region or customer set for a targeted campaign. They
often work with statistical packages like SAS, R, or (sometimes) Python.
These data scientists usually provide answers or rules which other engineering
teams deploy into products. They are usually not experienced at building models
which will be directly deployed in production. If they show enthusiasm and
aptitude for the latter, they are potential machine learning engineers.
Some data scientists are machine learning engineers. They have statistical
skills and know how to build and deploy machine learning models. This is
true in most tech companies. You want them on your AI product team.
You’ve identified your first AI pilot project and you’re beginning to build your AI
product team. Now you need to provide infrastructure so they can begin working.
Your team will be able to handle this step, but you’ll want to know the basic setup
for modeling and testing your models. We call the setup an AI Sandbox.
Your machine learning engineers will use an AI Sandbox to build and evaluate
models. AI tools and hardware are evolving rapidly, so it’s impossible to say
exactly what infrastructure you’ll end up with, but here are some general guidelines.
An AI Sandbox requires a server, storage, and a graphical processing unit (GPU).
Here is a good starting point for your first AI Sandbox:
With these basics, even a top-of-the-line system will cost less than $5000.
Don’t start by buying $200K of hardware from NVIDIA unless you have
a clear reason for it. For example, start with one GPU for each machine
learning engineer and add additional GPUs when your team needs to train
models faster.
Your team will work faster if they can use the most popular tools and
hardware. As long as they’re using common tools, when they get stuck (this
happens every day) they can search online for solutions.
Unfortunately the tools and hardware that your company’s decision makers
deem adequate might not be adequate for your AI Sandbox. You may have to
go to battle for your team to get permission for the necessary infrastructure.
Here are the most popular tools for AI teams:
These tools will ease troubleshooting in your AI system.
You’re going to have a parade of product companies telling you that their AI
platform will magically solve your AI problems. If you haven’t already, you’ll
learn their buzzwords by heart: Automated workflow! Pretrained models!
Automated model tuning! Seamless collaboration!
Most of these vendors are selling solutions for small problems. You would
be wise to wait until you actually have a real problem—one that you
understand—before buying anything. In the meantime just use the opensource
tools like the rest of us do.
For the moment your two biggest challenges are (1) knowing what problem
to solve with AI and (2) having enough quality data to do it. Product
companies can’t help you with those challenges.
Your team needs complete control (what we call “root access” in Linux) over
the AI Sandbox. The machine learning engineers need the ability to update
tools without waiting for “approval.” They need to be able to download
untested beta code and see if it works.
AI tools libraries change rapidly, and sometimes your team will need to update
them multiple times per day. Waiting for approval will slow your team’s
progress to a crawl.
Product teams worldwide have spent the past decade migrating their systems
to the cloud to reduce hardware maintenance costs and add resources on
demand. You can get the same benefits by putting your AI Sandbox on the
cloud, but doing so is not necessarily the best option.
Here are a few considerations:
Although stand-alone GPUs and cloud GPUs both have advantages and disadvantages, either option can work.
BUILDING YOUR TRAINING DATA
Your project manager wants more time. Your call center manager wants more reps. Your marketing team wants a bigger budget. And your AI product team wants more training data. Nobody likes limitations, but reality is reality.
We have enough training data.
— Said by no machine learning engineer, ever.
How much training data is “enough”? Although you may get better results if
you have a very large dataset, you don’t need Google-sized databases. Most
practical AI products do not require massive amounts of data.
How much data you need is difficult to determine. If your team can’t
recommend a reasonable dataset size, a good rule of thumb is 50,000 labeled
examples. With a dataset that size, your team can set aside 20%, or 10,000
samples, to validate their models. That is enough to detect 0.1% improvements
from minor changes.
Once you set a size target for your dataset, your team will need the freedom
to gather data and the discipline to put it to work fast. Here are some practical
suggestions to set your team up for success.
Unfortunately you probably don’t have a good inventory of your data assets.
Your data has evolved over time, and you have data assets from companies
you’ve acquired. This means your AI product team could waste months
(seriously, this is common) trying to find the best sources of training data.
When they do find good candidate datasets, the owners may be reluctant to
release them to your team for many reasons. Before sending your team on a
data scavenger hunt, get the support of your colleagues who own the data.
Once your team starts finding data resources, give them a week or two to build
their first training dataset. This time constraint will force them to make trade-offs
and explore techniques like transfer learning (starting with pretrained models).
Refer to Part 2 for advice on building product-specific training data.
In time your team will start getting results in their AI Sandbox. They’ll report
that their prediction accuracy is promising, and they’ll assure you that better
results are on the way—they just need more time and data.
Avoid the temptation to delay deploying your models into production to give
the team more testing time. As soon as your team gets decent results, feed the
model a test dataset. If results are still good, deploy and iterate the model.
Your AI product team can very easily cheat or make a mistake in reporting
results from their models. A team’s model can get amazing results in the AI
Sandbox but poor results in the production environment. The easiest way to
prevent this problem is to test the team’s models on a dataset they don’t have.
This data is called the test dataset.
A good candidate for your test dataset is your most recent operational data. If
possible have a different team do the test. If the model still yields satisfactory
results, you are in good shape.
Your AI product team will initially work to build models based on historical data.
But your ultimate goal is deploying the models into production so they make
predictions on new data. Deploying your models has a technology component
and an organizational component. Once you have a handle on these, deploy and
iterate your product to make it efficient in the production environment.
In terms of technology, deploying AI models is similar to setting up any internal
API service. Any mid-level server-side programmer should have no problem
deploying your models. If you need more guidance, do a quick Google search
and you’ll find plenty of deployment instructions.21
The technological challenges for deployment will likely be much easier to solve
than the organizational challenges. By the time you’re ready to deploy, you
should already have a basic plan and broad support in place. But your plan will
still need a good guide.
Your AI product manager will have to work with the model consumers to decide
where to send the model’s results. Results are commonly sent to applications or
database tables.
Compel your team to get your model into production. Your team will probably
resist this request and will ask for more time to iterate the model in the AI
Sandbox. But the only way to confront the major risks associated with your AI
project is to get it live.
Expect the following:
If you don’t deploy early, you will waste time perfecting a product that’s not
equipped to handle real-world risks.
You’ll overcome the technical challenges of deployment the way you would
overcome the challenges of any traditional software engineering project. As
your AI production environment becomes more reliable, you will start identifying
ways to improve the predictive power of your models.
To improve the functionality of your product, your AI product manager will have
to drive the organizational changes and escalate challenges to you.