Data science projects often involve develop machine learning (ML) models to solve business problems. Although this may seem common in today's business world, it nevertheless carries several risks.
Namely, developing ML models is inherently uncertain, technically demanding, expensive, and time-consuming. These risks motivate project management frameworks specifically designed for data science projects.
Here I will describe one of these approaches and detail the main contributions of a project manager in this context.
The approach I like to use for data science projects is described by the 5-step framework shown below.
Digging deeper, here are some key activities for each phase.
- Phase 0: Problem definition and scope — Formulate the business problem. Design the data science solution. Define project milestones, tasks, and success indicators. Key role: Project Manager
- Phase 1: data acquisition, exploration and preparation — Evaluate the available data. Acquire and explore data. Develop data pipelines. Key roles: Data engineer, Data Scientist
- Phase 2: Solution development — Develop an ML solution. Evaluate the validity and value of the solution. Rehearse with stakeholders and revisit past phases if necessary. Key role: Data scientist
- Phase 3: Deployment of the solution — Integrate the solution into a real business context. Develop a solutions monitoring pipeline. Key roles: ML Engineer, Data Scientist
- Phase 4: Assessment and documentation — Evaluate the results of the project. Provide technical documentation and user guides. Reflect on lessons learned and future work. Key role: Project Manager
An important point here is that data science projects often do not progress linearly through each of these phases. Rather, a certain number of iterations are necessary through key feedback loops. Here are some examples of what this might look like.
- Phase 1 → Phase 0: When exploring the available data, it becomes clear that key information is not available and the project plan needs to be revised.
- Phase 2 → Phase 1: After training a handful of models, it is discovered that an exception was not properly handled during data preparation.
- Phase 2 → Phase 0: Preliminary models do not demonstrate strong predictive performance, which requires reassessing the value of the project.
- Phase 4 → Phase 0: Each project has its opportunities for improvement. Once completed, teams can evaluate these opportunities and launch another project, starting with phase 0.
The project manager (PM) is ultimately responsible for the success of a project. If the project is late, it's in PM. If costs exceed estimates, it's on the PM. If the value doesn't meet expectations, it's on the PM.
Although this responsibility involves a wide range of tasks from multiple contributors, a key determinant of the success of a project is the PM's execution of phase 0 (as described above).
Phase 0 lays the foundations of a data science project. Just as a poorly constructed foundation will result in a difficult construction project, a poorly executed Phase 0 will result in a difficult data science project.
The 3 key elements of phase 0 include problem diagnosis, solution design and implementation plan (1).
1) Diagnosis of the problem
Of the 3 elements, this is the most critical because if you get it wrong you can spend a lot of time and money. solve the wrong problem (i.e. little value is generated). Despite its importance, many tend to gloss over it (or skip it altogether), taking the time to stop and think about the business problem.
Just as a doctor interviews a patient to make a diagnosis, a PM interviews stakeholders to better understand the business problem and identify the root cause. Although there are many ways to do this, I like to keep things simple and focus on the questions two key questions.
- What problem are you trying to solve? — this is always the best starting point for these conversations (1)
- Why is this important to the business? — this can initiate a series of 5 why-based questions to identify the root cause of the problem (see Toyota's 5 Whys Approach) (2)
One of the most important PM skills is to collaborate effectively with stakeholders to understand their problems. I discuss this in more detail in a previous article.
2) Solution design
Once the business problem is clearly understood, the next step is to define how to solve it. Various solutions at different levels of complexity can solve any given problem.
For example, if customer churn is high due to a slow onboarding process, some potential solutions could be to remove unnecessary onboarding steps, analyze where churn is occurring, and rework This step, to personalize the integration based on customer information, etc. Note that these solutions may not require machine learning (and that's okay).
Let's assume that after a lot of back and forth, the stakeholder wants to move forward with developing a personalized onboarding experience based on customer profiles. Even though this narrows things down, this solution can still be implemented in several ways. Therefore, the The PM must use judgment to propose a solution based on conversations with stakeholders, similar industry projects and available resources.
3) Implementation plan
The final element of phase 0 involves translating the proposed solution into a concrete project implementation plan. This plan consists of two key elements: a project roadmap and project requirements.
A project roadmap includes the key stages of the project. I like to base these milestones on Phases 1-4, as described above. Each phase includes tasks assigned to a particular role (e.g. data engineer, data scientist, or ML engineer) and a due date (1).
Project requirements specify all resources needed for implementation, including data requirements, key roles, software tools, and IT infrastructure.
I'll walk through Phase 0 for an example case study to solidify these ideas. While this is intended to be informative, this is a real project that I will implement (and document) in future articles in this series.