Free on-demand webinar
A data scientist told me once that training and selecting a model is only 20% of an AI project. This might sound anecdotal but it should not surprise any data science practitioners who have delivered an AI project.
So, what is the other 80%…
And how do you get your AI projects from an idea to a POC to adoption to ROI?
User needs, data gathering and cleansing, model validation, user training, governance (oh, we forgot about that one on the last project). And we’re not even talking about the vast complexity of the infrastructure needed around your models.
At the end of the day, algorithms don’t implement themselves and we need rigorous processes and project management to get the value we want out of our AI projects.
Our Speakers, Eliot Ahdoot, Olivier Blais, and Simon Shaienks will guide you through the steps in an AI project lifecycle to successfully implement your next AI project.
Below you’ll find the webinar transcript. We’ve obviously used AI to generate it, so maybe you’ll see some incoherence. Let us know if you do! 🙂
Olivier Blais, Simon Shaienks, Eliot Ahdoot
Simon Shaienks 00:02
Welcome everyone to the right path to AI and how to deliver a successful AI project. My name is Simon Shaienks I’ll be hosting today, along with our great speakers Eliot and Olivier. How are you guys doing? Good, how are you?
Very good, perfect guy. We’re gonna get you guys introduced shortly. We’re just gonna wait a couple seconds here where people are joining in just give him a second. Well, we got you we’ve got you know, great webinars for you guys, we’ll be guiding you through the right steps to ensure a team to have a successful AI project. So a couple of important things to get away. Before we get started, the webinar is recorded. We will be sending the recording in the next 24 hours. I I always get the question somehow. I’m still gonna get it, I’m sure. But it is recorded. We will have a live q&a session at the end of the presentation. So you can post your questions as they come during the presentation. There’s a q&a section at the bottom of your zoom windows. We will only be answering them at the end of the presentation, though. We’re going to stay for as long as you guys need us to fulfill free to ask as many questions as you want.
So our our speakers today First we have Olivia Blaie, as he is the founder and the VP data and transformation that Moov AI, an AI consulting consultancy firm, where he’s led and supported many companies into their digital transformation, as well as implementing AI projects in different industries. He will be guiding us through the different steps of an AI project from A to Z. And he is also the member of the Standard Council of Canada Committee that defined the ISO norm on artificial intelligence solutions where he’s proposed the technical specifications on quality evaluation guidelines for AI systems. Thanks, sir. Thanks for the view for being here.
And we’re lucky to have Elliot Elliot is the chief commercial and Innovation Officer at ai protect bring to the table more than 15 years of experience in various fields such as technology, as it as a data centre design server designed disaster recovery. So as you can assume we will be covering how to bring your infrastructure together. And myself, Simon Shanks, I’m a product marketing manager at Snitch AI, a validation tool for your models. I basically help people deliver more robust and high quality AI every day.
And the idea for this webinar came actually through many of the conversations I have every day, many of the folks I speak to have started building models or thinking about it have deployed a few one, they absolutely understand why it is important to test and have high quality models. But what’s high quality models work if you don’t deploy it, and lots of folks aren’t really sure where to start and where it actually ends in terms of an AI project. So we hope to bring some answers for you guys today. Thanks Elliot and Olivier for being here and sharing your knowledge. So let’s get started alleviate the floor is yours.
Olivier Blais 03:43
Yes, hello, everybody. I’m super excited to be here. This is my favourite subject out to make this happen. So when we talk about AI, fn is going to be about technology. And it’s really very concrete, because we don’t necessarily know how this works. But what we know is that we need to automate some kind some decision. So some decision-making could be improved using automated technologies. And this is where we want to go. But you know what, it’s not the first step, actually, there’s a journey to get there. And the first step is going to be about data. You know, I am sure it’s not the first time you hear this, we absolutely need data, we need a good quantity, and a good quality of data to be able to get there. And so the goal will be to be able to identify what’s happening. So if you want to automate some decision-making in finance, or in sales, going to be important to be able to track yourself and to track your customers and everything that’s related to the decision that you intend to do. But then when you have this data, the next step, and here you can see in the graph, so it’s in order of salt, so it’s in order of maturity. So at first it’s really mature. You’re only trying happening, you don’t necessarily go into data and mind the data. So that’s the second step, it’s about being able to identify why something happened. So being able to mine the data, find trends, find root causes to problems. So what it means here, if you start trusting the data, okay, you’ll hear me talk about trust and adoption throughout the presentation. Because this is a major, if you build an AI system, standalone in your basement, you know, like, a mad scientist, it’s never going to work.
So here, and I would say, usually, this is where in companies, it’s stuck. It’s when we’re at step two. So we are we add some data, and we’re starting to build trust, and analysis, and we’re starting to adapt or strategies using data. But then, you know, there’s a roadblock, there’s something we call a chasm, okay. And usually it’s going to be, because this is pretty counter-intuitive, trying to predict the future. You know, we often laugh when, when we were young, we’re talking about like, the crystal balls, and you know, the, the, the other signs like that, it’s impossible to be able to predict the future and ends here. This is what we’re trying to achieve. And by the way, there are several there are 1000s of success stories. So with the right technologies, and the right implementation, we’re talking about this today, we’re able to get there, we’re able to help a debating solution. But it’s a matter of being able to implement it in a smart way. Okay, so the step 3 is going to be about generating predictions. So predictive analytics. And so you’re able to have an insight in the future. Is it enough? This is a good question. Is it enough, for instance, if you’re trying to have an insight on the demand, so you’re trying to predict the demand in the future, so you see the nice curve going up or going down? We hope it’s going up. But we were never sure. So the step four usually is going to be about the simulation is going to be prescriptive analytics, knowing what we currently know, and what we predict in the future, here’s the best solution. So no, understanding where the demand is heading toward, you’re able to, you’re able to plan your production. Okay, to reduce the inventory. And to maximize the work, you are available in store. So this would be a good, this would be a good next step for a model.
And then the last step, it’s like an automated system. So what it means here is that not only you’re able to, to forecast what’s going to happen, and you’re able to generate an assumption, so an optimal solution, but you’re able to implement everything automatically, without any human intervention. Is it necessary? We don’t know. And here, this is a good question.
And if we go to the next slide, it’s because we, it’s because we need to plan carefully, what is the best solution for your needs? So here, I think it’s going to be pretty basic, but this is an important exercise to do. So you first need to, if you Yeah, okay, you can go to the end. So first, you need to add to measure what is the impact of what you’re trying to do. So here, if I’m coming back to the three different use case, I talked about predicting, predicting the future, identifying an optimal solution, and automated, the full work the workflow.
So what is the impact of getting there? So do you really have a greater impact by automated everything, or, or it just like a half an hour, weekly gain in productivity, because you don’t need some someone to, to validate assumptions, you know, so that usually the impact will be a gain in efficiency, cost reduction, it will be gained in revenues and profits. And also it’s going to be the ability to offer new products services, so it’s happening so when you automate something, you’re able to generate new product services. For instance, Google map was a capability that was not accessible before we were able to generate routing solutions. And then here, so this is on the impact side. So it’s being able to understand what’s the impact you’re looking for. And then the complexity is also a very important driver.
So if we go to the next slide. So I’ll talk a little bit about the different the most important complexity, elements or dimensions. So the first one is about the data. I talked about it before. I’ll talk about it later. I mean, I’m telling you this, this is so important. And so is it Do you really have data you have enough data is the data you have in your database, similar to the data that you will receive in, like in their real in real life. So this is a very important dimension that we need to cover. And we need to make sure we were covered on all sides, okay, because otherwise, it’s not going to be possible to do a project. Then after that, there’s something called the task. So the task, it’s what you’re trying to achieve. So the simpler the task, the simpler the project is going to be. So in other words, if I’m coming back to my use case, on demand prediction, it’s going to be easier to try to predict the demand for tomorrow, then trying to predict the demand for a year in the future. So if you’re trying, the longer the day, the insight, or you know, the more complex the insight, the more difficult. Your task is, and your project is going to be a lot more difficult to achieve good performance. So this is, you know, those are two dimensions that are pretty straightforward for us.
Because this is a thing we know, we know, when we do a project we need to have, we need to add a simple, we need to have a goal, and we have to have the data that supports it. Okay, but what we often forget, at the beginning of the project is going to be about adoption. The adaptation is critical, if you do something that’s really complex, or are really critical.
So for instance, if you’re trying to predict cancer, based on image recognition, or if you like, for instance, you’re trying to predict something on a, on an airplane, for instance, it’s going to be way harder to get this project through the door and adopted by the end users. Because this is very difficult. You need to plan a lot of change management, and a lot of work with the end users beforehand. And finally, the last, the last dimension is about the integration. And it’s good if you have you know what you might add a very good project. And everybody’s really excited. But where does it reside? And where can the end users get the predictions. Sometimes this is just the integration is sometimes a project by itself. So if you’re trying if you have a very good project, but then you need to integrate it into SAP, that might take months to get there, because it’s complex. So you need to plan that also ahead of time. So when you have your image, when you have estimated your impact, and your complexity, this is pretty simple to be able to map everything and then IV strategy to prioritize your initiative. And here, it really depends on your corporate strategies, and also your maturity.
But usually what most I seen over the past is that usually organization are not super mature, and they’re not. They have not implemented this that doesn’t have projects. So considering the fact that you might not have implemented any project so far. So usually what we would suggest is to work on the equipment, because the quick wins, usually, and they will add in Good, good impact and low complexity. But also they will be a success story that you can use for the next projects in the future because you’ll be able to build on this success.
So if we go to the next slide, we will, we’ll also talk about something that’s pretty interesting. So you know when we talk about complexity, so usually complexity come in With a cost, and then artificial intelligence machine learning, you know, this is that very, this is not always simple, especially when we have to work on data, we need to work on infrastructure. So, you can think that it’s going to cost a lot. But fortunately, for people in Quebec, Canada, and also I’m pretty sure every everywhere in the world, you can have access to grants, and there are some special programs that will encourage companies in their endeavours. So, you will be able to get some, some money from those organizations. So for instance, in, in Canada, we can think about the separate clusters such as Skelly AI, for instance, or there are also other programs in Quebec, like evader labs, yes, sorry. Like, like invent invest in AI, invest in AI, from evangelize. So they are, they are providing some, some, rivet some funds for you, for the project. So, you’ll be able to, to add help in your project, but also with your, with your data, and, you know, the preparation to get to your project.
And also, he will be able to support your infrastructure costs. So, it’s really interesting for you to look it up, if you want to go and we have created an ebook, on the different programs available in Quebec and Canada, so you should check this out. Pretty, pretty interesting at the type of opportunities, you have to get a little bit faster in your, in your in the end. And, you know, that’s it for the preparation. So once you have your use case, and you have the mean to achieve that.
So now we’ll go into a little bit more concrete steps. So how to build them out to build an AI model, or, you know, an AI solution, let’s say, and so first of all, it’s to be able to, to predict what you the decision that you need to make. Okay, this sounds easy, but you need to identify what the prediction that you need to do. And then I’ll make a very simple, I’ll explain very simply, what AI or machine learning, you know, the engine and machine learning, it’s nothing more than looking at the past, being able to identify trends, and then applying those trends on future data to be able to predict new, unseen data and being able to predict the solution. That’s it. It’s about looking at the past, identify those trends. A
nd it sounds easy. But wait, there are some very complex algorithms. But what do you do those complex algorithms, they’re just better at finding patterns, and trends. So that’s how it works. And here’s the first step. It’s testing different, different algorithms, different configurations, to be able to find the best way, or the best trends and patterns that you can find. And then when you have found those patterns you tested, so you test first on something called a validation that the data set. So you just validate that the patterns are good. And then when you have identified the best model, and this is really important, your goal is to validate the quality of the model. And is this a model of good quality? Is it robust? If there is some noise in the data? Or if there are some trends changing a little bit? Will it break? Or will it or will it create still value for your end users? So here, and this is why here, we’re going to talk about this a little bit later, as this is the framework that you can use to be able to validate the quality of your the solution that you’re you are about to put in production. And you know, this is the next step. Once this is validated, you’re able to demonstrate the quality, you’re ready to click on Run and you’re ready to deploy the chosen solution in the chosen infrastructure.
So, this is you know, this is the like the methodology but the way a project need to run its first by identifying your needs, like I said before, and then he needs to go iteratively. Okay, so at first your goal will be to focus a little bit more on the model. So like the trends being able to demonstrate that it really adds value when you use, when you use machine learning, and that you’re able to do something, you’re able to generate good prediction, okay, you’re generous, you’re able to generate trust.
And then here, when you’re able to do this, then you go in MVP mode, stop working on everything else that will help you generate did these predictions in production, okay. And we’ll talk about it later, because it’s not trivial. You’ll see there are a lot of moving pieces, then we’ll test it, I, again, I cannot repeat it enough. It’s all about being able to make sure that end users are adopting the assumption. So this is why here, you want to go with the beta, you want to test this and demonstrate proof to the end users that it works well. And then when it’s done, you tweak some things that could be improved, and you’re ready for deployment, you’re ready to add the real deal implemented in production.
So talking about production, and if we go to the next slide, you’ll see. So when we talk about a solution in production, it’s a lot more than just the AI code. Like the model, the model is actually just a tiny part portion of the full solution. You see here, anything from the infrastructure, the different tooling for analytics for data ingestion, data management, and also some practices, quality assurance, monitoring, everything, everything here is necessary to be able to add a robust solution that just a robust model. And I’m very, very excited to have Elliot here talking about exactly what’s the what are the right choices in terms of infrastructure.
Thank you, Olivier. So talking about where your AI will reside. So there are a few steps like I mentioned, so on your machine learning, you’ll typically have a GPU centric hardware, as you see in the picture, this will also depend on the size of that machine or the size of those machines will really depend on your data set, and how quickly you’re looking to complete your project.
For the snitch AI software, of course, testing super important. This is a CPU centric system, but it’s a very light load, it doesn’t take all that many resources to run. And then when your actual eyes in operation, you’ll turn on to a CPU centric server. Now, one thing that’s really important to note I there are pictures of servers here, but you can go all the way as small as a desktop or a workstation, all the way up to you know, for you 10 GPU, a large machine. So it really, you know, getting those quick wins is extremely important. And there are ways to really reduce your capex early on, to get some good impacts. So now talking about where did these machines reside?
So next slide. So choosing where to have your AI resides depends on a few factors, for example, the criticality of the application, you got to ask yourself, can you let this application be down for 24 or 48 hours? Or is it something that’s really production related, and you can’t afford more than maybe an hour, or maybe it’s even way, way more important that it doesn’t go down, and you have zero flexibility for downtime. So the second one is, you know, in your company, what is more important to optimize capex or optics, if you’re looking on the long run to reduce your optics budget, that’s one strategy. If you’re telling yourself, well, we can’t afford to go and put down capex, that’s another strategy. So really looking at both those sides is extremely important. The size of the infrastructure will also make a difference. If you only have one machine running the system, you know, don’t really bother with a colocation, you can have that in your closet if you really want to it really, if you have in the adverse side, like a megawatt of infrastructure, then definitely, colocation will probably be the way you want to go.
And funding. It’s very important to note that in certain instances, if you’re using some of that infrastructure, on premises, you can get grants from various energy saving entities, because you can take the heat recovery from the servers to heat your building. Of course, this isn’t in places that have winter like us, we have some sometimes we say nine months of winter. So there’s funding available for those and also it reduces your total objects by using the residual heat of the machines to heat your building. Now, what are the options now there are many options. One of the lowest cheapest ways is if you don’t have a critical application, you’ll just have a machine residing at your office or at your production location which have what we call dirty power or a very many, very few reliability to it so it can go down whenever it goes down.
There’s another alternative, which is a high CAPEX low OPEXmodel, which is really building a data centre within your building. Now, it’s important to note there are gray zones between these there’s, you can take every step in between, it doesn’t have to be one or the other, I’m just giving high level understanding of what these look like, then a medium app, capex very low OPEX, this is like a submerged solution, you can take a summer solution, which will really reduce your environmental needs for that for those machines, and really have a very, very low operational costs. Then when your applications start being quite important, like high availability or medium availability, it gets important to have redundancy.
And in these cases, you’ll want to be in diverse locations. So one of the options is to have some on premises data centre infrastructure, to have some of your machines and some outside in a in a colocation facility. Of course, this is it’s more expensive, you know, a colocation facility will have its cost, generally a cost per kilowatt. And then you have the, let’s say, most expensive operational expenditure, which would be going to dual colocation facilities. And then I didn’t mention it. But there’s a there’s another one after that, which is the highest cost optics model, which is a fully cloud solution, this will be the most expensive to run out of all the options. But it’s definitely on the table if you’re looking for a worldwide presence. And next slide, please.
Eliot Ahdoot 26:39
So what do you need to do? And what’s the do’s and don’ts? So after you’ve answered these questions, figuring out okay, what’s the criticality, what do I need, it’s really important that you tailor the machine to your needs. So every single application will have different needs, and getting the right size machine and the right, you know, functional machine is super important. multisite options, I would really suggest that they’re geographically diverse, because when you have no critical situations, they could take down regions. So it’s important, you have those mid steps. And it’s super important to plan for the future. Because you know, we’re really confident once you do that small win, maybe it’s just with a workstation at first you get a good when it starts gaining traction within your organization. And then you’ll want to do the next project, you want to do the next one. So it’s super important to have a plan for future growth, because we’re very confident that your the ROI will speak for itself. Then, very important. Don’t forget, you have to test the model. This is where things can break down. And I’ll have my friend talk about that. Simon,
Simon Shaienks 27:44
so at this point, I mean, we’ve found an idea. We’ve scoped their project, we’ve, we’ve built our model, and build our infrastructure. So I mean, already at this point, there’s a lot of people involved, and there are some big investment. So how can we make sure that we reap the full benefits of these investments. So I really want to take a few minutes here to talk about quality assurance and why we should include different model valuation practice in our AI projects.
And I really like this quote from Gartner. Because there’s a myriad of factors that can disrupt a model performance and ultimately produce erroneous outcomes. There are all sorts of risks that can go wrong with ml. As Gartner is mentioning here, you can have bias in your predictions, your model can learn erroneous, or non-replicable patterns, you can have the model output vary, you know, widely with just a slight variation in input. And obviously, poor model performance on unseen data.
So and I mean, there’s no shortage of headlines of how AI can go wrong. You’ve probably all seen a few of them. I mean, they’re not hard to find. And I won’t go into the specifics of one.
But how can we learn from existing quality assurance approach? And we don’t have to look too far. I mean, we can inspire ourselves from software development, which has built its QA practice in the last 20 to 30 years into a really mature practice. And what do software developers do in essence is, you know, they know they don’t have the full control over the weather users will interact with their software. So they simply use tasks to coordinate their programming to expect but also unlikely scenarios to basically provoke failure. And so they use those, they use those tests to develop software, which is more robust to different eventualities. And of course, in order to test all these functions, they need to comprehend the entire logic of the code. And these developper will often run these tests many times before that trying a new version of their code. And no software today would be released without rigorous testing. And you know, it should be the same with your AI project.
So today, actually, I’m not going to go into what you should be testing for. Because we did a whole webinar on this subject, which is actually on our website. So if you want to learn more on what kind of tests you should be doing to make sure that you’re producing high quality models, and why you should test them, I can highly encourage you to look into this webinar. But today, the answer, what I want to answer for you guys is, when should you test what are the important quality steps that you have yet you should have minimally in place to make sure that your models will be performing, and your project can be successful.
So there’s three steps, or critical moment in an AI project that I want to speak of today. The first part is, you know, the moment where we’re building our first iteration of our machine learning model, going through what Olivier mentioned earlier to get to our MVM, or minimum viable model. And at this point, in our project, we’re trying to find a technical answer to our problem through an algorithm. It’s not a finished model. And we might even have a few different options of model at this point, we’re simply at a stage where, you know, to say that this is good enough that we can move on to our next phase or MVP stage. And we’ll be starting to work on a complete solution for AI project, which means, obviously, plugging it into different system, building the infrastructure, so even more, you know, investment. So it’s at that point, basically, we’re asking for a permission to production analyze that algorithm that we developed. But we also want to make sure that we understand some of the flaws and weakness and see where we can be able to improve our model on.
So this is the first moment, the second moment, or quality gate would be at the MVP stage. And so at that point, you’re about to deploy, you’ve completed your your whole solution. And you’ve improved your model along the way. And you’re about to hit that, you know, deploy button. And you basically want to answer should you this, does this model meet the quality criteria and metrics that you’ve gave yourself when you started this project? Is your model robust, safe? Is it performant? Enough? So you want to be able to enter these questions before you put it in the end of your your end user, and likely, you’re better. So we’re going to perform a second validation at that point. And we call it a permission to basically deploy. And our third stage is actually post deployment. And again, here, it’s a simple question that we’ll want to answer is, is my model performance or has it degraded?
So we need to remember that our model can start performing poorly on new unseen data, and for will want to monitor any change in our data for shift. So the question here is not necessarily Will my model start losing performance, but actually when, and then we obviously don’t live in a static world and our data will inevitably shift. So we want to be able to kind of capture these moments, these change, and act on them before we start losing performance. So as you can see, with our kind of three stage, even post deployment, when you start monitoring, our project isn’t done after all, once you’ve deployed, and you’ll want to set up proper CI-CD, and I’ll leave it to alleviate, to go into these, these next phase of the project.
Olivier Blais 34:14
So, so at the end of the day, you know, we’ve talked about identifying the right solution for for your needs, to be able to, to develop it properly, to use the right infrastructures, to have the to a robust solution that will live and live for a long time. But like small just said, You know what, as soon as you deploy your solution, you deploy your first iteration, there will be degradation. Okay, I’ll give you some examples. COVID-19 is a good example, that it’s a good situation, but it’s a good example of something called a data drift. So when the so when the context is Changing, and you look at past data to identify trends and patterns, you know, after COVID, the trends and patterns might have changed slightly or a lot. And here, it will, it will make your model obsolete.
So what you need to have is a solution that will, that will first capture those changes, and also help you retrain and redeploy the solution so that it always stay current. Okay. So, considering that you have a good solution, you have the, you have everything in hand to be able to demonstrate to your end users, not only the quality of your model, but also that you’re able to demonstrate that you have, you also have the workflow to make it good, and to keep it good for a long time. So this is are you really generate confidence and adoption. And on this, I was wondering if you had any question?
Simon Shaienks 36:09
Thanks, Olivier. Thanks, Elliot. We’re going to move on to the last portion of of this webinar today. So the q&a section, we’re gonna open up the q&a panel right now. So remember, at the bottom, you’ll be able to post your question we actually already have a lot. So great people, we’re going to go all through them. The first one is from Ricardo, in terms of costs for setting up your infrastructure, can you give us an idea of a breakdown? Sure, no
problem. Bottom line is your cost breakdown is approximately 70 to 80% hardware, so your servers or your machines and 20 to 30% colocation or power and use. Now of course, I think there was another question, which was, how do Why don’t you talk about cloud, I talked about cloud very quickly. Bottom line is cloud will take those two factors and add a markup on top of that. And depending on if you’re going hourly, monthly, yearly, it’s it’s gos times x times higher. So yeah, so that’s what it comes down to. Also, it’s very important to note for the any one of these, there’s financing available, whether it’s the computer hardware, whether it’s the data centre hardware, there is financing available, there’s companies who finance that so you can have it as an optics model also.
Simon Shaienks 37:39
Awesome. Perfect. Thanks. Thanks, Soviet, we’ve got a Well, first, we’ve had a header that asks, Will the recording of the presentation be available? Yes, it will be better. So we’ll send out tomorrow. Tyron asked, Can Can you talk about how to find and mitigate any bias in the data used by an AI? ml model? Olivia, do you want to take maybe that one?
Olivier Blais 38:08
Yes. Yes, definitely. This is, you know, this is a very difficult question. So here, the we call this is opening up the black box. This is not this is very easy. But this is definitely possible. So what you need to do is you need to be able to, to identify the different insights that generate that we’re allowing you to make the predictions. And the simpler your model, the easier it’s going to be. So if you have I’ll give you an example, you have two variables, and you try to predict a, you try to predict something, it’s going to be easy to be to to identify the weight of every one of the variables depending on the the observation. But what’s happening when you have a million and may be exaggerating, but maybe you have 1000 variables, and then you’re using deep learning. So deep learning is that even linear. And so it’s going to be complex, there are some very good, there are very good tools that exist. And I don’t necessarily want to do some advertising, but like for instance, first, instead, what we do, where we have a good tool that will be able to measure the, the the importance, the variable importance, for instance, and then build a lot of different different tools around this. So identifying if some variables are too important. So it’s happening if for instance, you have some variables that you have some variables that should not be in your data set. Because he will not be accessible at the moment of the prediction. Like for instance, you’re trying to predict it, if someone is going to buy, and then you by mistake, you put the thank you email in your in your in your data set. So it should happen after the fact, but you forget to remove it. So this variable would be very important. And so instantly, I will be able to, to, to track this and to flag it to you. Same thing with some variables that are useless, sometimes it happens, we tend to be crazy and say, You know what, let’s put everything in my model. But maybe there’s only maybe there are only 10 variables that really, that are really useful. And the rest, it creates risk. Because if you have the quality problems in your data set, that it might be affecting the useless variables that you should not put in the first place. So those are different type of tool that exists. And there are also ways to be able to, to access the the the relationship between the different variables, like I said before
Simon Shaienks 41:14
some IV that we have some really, really good question, guys. Let’s go through them. There’s ram that said, Haley v. Thank you for a great presentation. I’m currently at a data science bootcamp and your impact complexity chart is really interesting. Before starting a project, how do you find the impact to select project that gives quick win? So I guess, how do you calculate you know, those? The few? Well, I’ll let you into that one.
Olivier Blais 41:45
I think you ran. And so the first mistake, and you know, I’m guilty of this, as a data scientist, is to try to do to be quantifiable like 100%. So you know what, I’ll put a mark on 100%. And like, I’m trying to create a score for the, for the quick wins, for for the impact. So So at the end, the best way you can get to where you need to be is by first you, you look at the values of the department or the companies and or the companies. And then you’re looking at the alignment between the different projects, and the most important values and objective of the company. So let’s say for instance, your company is all about generating like revenues with new technologies, for instance. So in this case, you probably will, will prioritize, initiatives that will make will create some new services or products, and also, the one that will help you generate revenues from it, it’s going to be prioritized a little bit more than another one that is about gaining efficiency, because it’s a little bit less aligned with the values and the objective of the company. So this is a way to do it. But then after that, it’s it’s by brainstorming with the stakeholders, though also another good point there. You need to add this value defined by the end users and by the stakeholders. And the data scientists sometime we we tend to prioritize a little bit like more complex or something that seems to make sense. But you really need to hear out your your end users.
Simon Shaienks 43:41
Perfect, we’ve got another one. For SDN. I think you’re gonna like this one, we’ve got Seymour Hersh, is there a formal definition for quality to use as the corporate standard and see more you actually have an expert here in the VA with Standards and Quality. So
Olivier Blais 44:01
let him answer that one. I’ll try to do I, I can you know, I’m super excited by this. So I could talk for this, about this for hours. I’ll try to be like, cold and concise. But so here maybe what I would suggest is maybe if you have also like technical question, or you have questions about standard approach, please feel free to ping me It could be on LinkedIn or it could be to my my email address. It’s ology at move that AI so it’s pretty it’s simple. But so so here to answer quickly the question, it’s the ish, there’s not really a set of guidelines currently, it’s pretty much a far west on there. And so usually people they they’re trying their best. So what I’m currently working on so I’m currently working with the with the ISO standard on a so they did the project is called It’s called quality evaluation guidelines for AI systems. And the goal is to try to so first you can break down. So So what does it mean to have a system of good quality, so a system of good quality going to be performance is going to be robust, it’s going to be to evolve properly in the future. So because you have the workflow in place, it’s going to mitigate and it is going to mitigate risks, and security threats. So those are some characteristics that define the quality. And then when you you understand this, the goal is just to make sure that you’re able to mitigate every one of those of those characteristics by, by testing by test by making sure you have the right, the right processes in place. And so in a nutshell, this is pretty much what I’m currently working on. So there are some industry specific documents, one of the most popular one, it’s, it’s in the financial industry in the US, I really like this one, it’s called ASR. 11 seven. So this is a set of approach. This is a set of processes and tests. But what what we’re currently working in the ISO standards, it’s a little bit more, it’s more generic, it’s to any type of AI system and to any type of activity of the industries. And also at the same time, because as a, as my role as project manager, we’re trying to also align best Nick. so sneaky, I can make sure it’s able to, to answer the same, the same guidelines. So the guidelines that will be made available, in the end, the ISO standards will also be met. In terms of testing capabilities in snitch
Simon Shaienks 46:59
Perfect, thanks. Thanks, V. We have a anonymous attendee that wrote, what type of prof profile do we need to do in an AI project internally? Probably what type of profile do we need? Like in terms of the people? I’m assuming? If we didn’t get that question correctly, anonymous, let us let us know. But I
Olivier Blais 47:24
guess you could clarify this if Mr. or Mrs. So and that the type of profiles can be very, it’s very different. So first of all, let’s clarify something. And the AI project is a software development project. So I think this is this is important, because if we plan to do a project with, and then we say, you know what, here’s here are three different data scientists, you put them in their room for three months with the food and beverages, it’s not going to happen. So you won’t get where you want to go. So here what I mean, you need to have developers you need to have, and I’m coming back to what Elliott said, because it was really to the point, you need to have people who are in charge of infrastructures you need to have, you need to have the right approach in terms of the infrastructures. And then after that, so when you you have this definition. So you will need to have Yes, data centers for sure to be able to build a model to be able to analyze the data. And also data engineers to make sure that the data is well centralized. It’s stable, and it’s it will remain stable over time.
Simon Shaienks 48:43
Perfect. So we’ve got a another one from an anonymous attendee, how do you compare AI QA to software Qa? I can start a little answer, and then we can maybe complete it. I think in essence, at the when you look at it from a very high level, it’s very similar in the sense that machine learning models and software are subject to unexpected inputs. So they both have kind of that same same thing. They’re also built in relationship with other software components. So they have to be plugged in into various components. And we kind of expect both of them to be really robust, consistent, reliable and usable. So they have these very much similarities, I think where it will change in terms of QA is in terms of the test that we’re going to do. So there’s very specific tests for models. There’s also very specific tests like unit testing, things like that for for software. Maybe you want to complete that. If
Olivier Blais 49:52
Yeah, I’ll add to this. I think you’re right. It’s right what you’re saying the point but Do you validate code that is not written. So it’s, it’s easier to validate a code that has been written, because you’re able to look at the inclusion of that code, the exclusion of that code, the quality of the code, but machine learning. And sometime, I can give you an example, in Google Translate. Google Translate was before the switch to AI, it was 500,000 lines of code. Okay. And it was almost after the performance of today’s version, ai powered Google Translate. And now it’s 5000 lines of code. So So what happened here? It does not. It’s not more simple all the sudden. So, but it’s less code. But the key ingredient is the data that has been used to train the model. And here, how can you validate non existing code. So this is why we need to have a lot of other tests to be able to, to catch errors, because it’s not going to be error in in the line of code, it’s going to be error, because you have included some variables that you shouldn’t add its errors, because you have trained on data that is incorrect, etc. So you need to test for everything that might go wrong, because you don’t know what you’re looking at.
Simon Shaienks 51:37
Perfect. Couple of other ones here. Davis has How can you determine if the quality and quantity of your data is sufficient for an AI model?
Olivier Blais 51:52
This is a very good question. And there’s no rules. And that, but but the rule could be so so here, let Hear me out. It’s not a it’s not a rule, per se. But so So first, there’s a rule, like the rule of thumb, the minimum usually would be like under 50 observation, forget about it, like you, you won’t be able to identify valid trends, okay. But then do you really need millions of data points. So So here, you need to be clear about two things, the tasks that you do the task difficulty that you do. So the more complex the project, the more data you will need, okay? And also the quality of your data, the best quality, your that your, the best quality you have, the less data you need, or vice versa, if your data as noise in it, you’ll need more data, because it will be harder to find the trends that you need to be able to generate value. Okay, but but then in between, because I talked about like the two extremes in between you, you simply have to test so you do a more or you test something called robustness. Okay. So, instead, so what we do, we create, so we create a lot of different analysis. So for instance, if you add noise to your data, and you’ll be able to see if your model is robust to that noise. So if you add a little bit of noise in it, and then you you, you try to predict the you try to predict these data points, you will see if you have a good prediction or not. And then based on this, you will be able to estimate the robustness of your model. So here at the end of the day, you won’t be able to have a straight answer, but you’ll be able to see if with the current level of data that you add to train your data. And considering the fact that you do use a separate validation or test data set, you will be able to at least have a clear picture if your model is good enough at this state to be able to generate the value in production.
Simon Shaienks 54:16
Awesome. So we’ve got one for Elliot, here. If we do a model to analyze image, do we need the same type of infrastructure?
Yeah, so what I was talking about is infrastructure is really a general basis, you’ll have probably more of a GPU centric system for images. I’ll give you an example with financial sector. When they come with trading data they’re machine learning is actually done on CPU intensive applications with higher frequencies on the on the CPUs, where they’ll take, you know, trading data from the last day and try to predict the best way to trade for the day after. And they’re doing this on an iterative basis. And these are both CPU very CPU intensive application. So That’s why at the end, I said, it’s very important to tailor to your needs. And it’s true because there’s so many different types of applications in AI, that it’s really important to single in what machine will do the best. And that’s where you get the best value of your machine. Right, you’ll get the best bang for your buck, and they’ll do the most performance for you.
Simon Shaienks 55:21
Awesome, perfect. Thanks. So yeah, we’ve got a lot of questions, we’re going to go through them. So thanks for sending up your question. There’s me Nana, is asking Haley v. Do you consider the concept of fairness as usual?
Olivier Blais 55:39
Okay, good, that I really like this question. So fairness is something really important. And this is hard. So so so first of all, let’s talk about discrimination for a minute. And so the more discriminant your model is, the better it is. So that’s, that’s the whole purpose of machine learning. It’s because you’re able to use patterns to be able to predict something. But are those patterns fair or not? So here? In other words, maybe a pattern could be like, if you’re thinking about the credit scoring, okay? The pattern, a good pattern would be the spending, and the spending behaviors. If the person spend more than the revenue that person gets, it’s a good pattern to be able to measure. But then if it goes to the neighborhood, for instance, this is this is not a good this is not a fair pattern. So So here, what’s the difficulty with this? Is because there’s no straight answer. So first, you need to learn to understand the different patterns. And then B, you need to evaluate it and see if you’re you are comfortable with those patterns being implemented in your model. So the answer is yes. So yes, this is considered and this is highly encouraged to do to do this. But then you also need a manual intervention. There are some also some automated interventions. So for instance, we are able to create some some clusters of people, and then say, and then being able to see if some clusters are less accurate than others. So here, a good example is for instance, we saw this at Twitter at a problem with their image recognition of pictures when they were unable to identify the face of a black person, so an African person, and here, it created a big fallback, because this was not intentional, but it their their data set was a little bit too, to focus on Caucasian people. And because of that, there was more more errors. When was coming, the diversity. And so so here, this is also another, there are other tests that we can do to be able to flag this so.
Simon Shaienks 58:12
Perfect. Thanks. Thanks, that V. We’ve got Najib that asking for testing various ml AI models to use a tool like ml flow or pi CAD for intense or your own solution? It’s a great question because they’ll allow us to shamelessly plug a snitch in there. But in terms of testing, ml models, yeah, we obviously use snitch a lot. So in this, what we do is we run a batteries of set of tests with snitch. So we’ve automated not just one single test, but different tests on different aspects of quality of your model. And runs it automatically. And it’s things that will go beyond things like accuracy will. So we’ll look into model robustness, sensitivity to noise, all the feature explainability and things like that.
Olivier Blais 59:09
I’ll also add this because it’s like, like small x explained a little bit earlier. And so there’s the identify, so identification of the best model. And the best model, there’s no like, usually, to find the best model at like at the first stage. It’s, you look at the performance. So usually, it’s called the last metric or like the accuracy. So based on the accuracy, because you will do 1000s potentially 1000s of different tests to find, like the best model. But But then, and this is your like, it’s usually it’s going to be like Dallas hyper pykara is a good solution. Also, like grid search, random search in like scikit learn. So there are a lot of different tools that allow you to do this. But this is just like, this is just a loop that will test a lot of different variations of a model. And the goal is to have the the best metric. But here where snitchy, it is exceeding its, once you have this model are the top five models, it should be able to, to break down the model into a lot of different KPIs and say, You know what, okay, maybe it and this happens all the time, you do have the best performance in terms of accuracy. But it’s a lot less robust. So what do you so what would you prefer? a model that’s 89% accurate. But that is, when you add 1% of noise dropped to 60%, or another model that is 85%. accurate, but dropped by few, only a few points when there’s a little bit of noise in it. So probably that you’ll you’ll go for the more robust model, because it makes sense. But you don’t know unless you test.
Simon Shaienks 1:01:09
Perfect, we’ve got another three questions to go. Ingmar is asking, you said adoption is one of the most important aspects of an ml product. Do you have some pointers on how to promote adoption amongst your anger and end user?
Olivier Blais 1:01:27
Sorry, you it’s correct, you remember that? That definitely is the most important aspect, adaptation and data. But But yeah, so for for adoption, there are different ways that it’s never going to be easy. So first, by by understanding this, this is the first key, so it’s going to be harder. So also another way it’s proved to be, it’s probably going to be twice as hard as a typical software development transformation. So I don’t know if you have any, if you have ever deployed like a CRM solution, or our er p solution. So the changes that you will bring, or the transformation that you you’re bringing to the table is going to be harder than this, okay? So big because not only you’re bringing new technology into the processes of people to their role and responsibilities, but you will bring a solution that is hard to understand. And also you take a little bit back some, some some freedom, because people, they will get some suggestion instead of being able to make their own thoughts about about the decision that is about to be automated. So here, what I’m suggesting is, first of all, you incorporate this information into your your project. And this is why here we were talking about the simpler project at first. So if a project is simple, it’s usually because there will be it will be a lesser problem, it will be a problem that is a little bit less sensitive. So first, you want to deal with this. And then you want to include the end users as part of the design sessions. And you want to work in iterative mode. So you want to be able to generate some predictions very quickly, maybe they will probably be bad. And you’ll say, Okay, you know what, guys don’t use this as this. But we want to make sure that you’re comfortable with the way they’re expressed. And it’s going to, so you add them on your site, because they will be part of the solution, they will help design the tool that you they will send us and also this is important to not try to, to automate everything at once, probably you want to start by generating the predictions, people get comfortable with this, you take their feedback, then you create a prescriptive layer suggesting the right solution. So then the validated the the the agree. And after that if you need to go to like a fully fully automated workflow, you do it in in another iteration.
And just to add to that, it’s so important to have like financial gains. That’s why the quick wins are so important. If you can get a good financial gain in the beginning, you’ll gain a lot of traction within your organization.
Simon Shaienks 1:04:33
Awesome. Yeah. All right. Two more. So we’ve Martin has, you’ve talked about data drifting non stationary narrative and hard word to say
Olivier Blais 1:04:45
hard one to save our friendship.
Simon Shaienks 1:04:49
Is it a factor affecting the amount of resource that should be dedicated to the models maintenance? If yes, is there any way to forecast how much maintenance retraining a model required based on the underlying characteristics of the data.
Olivier Blais 1:05:05
And this is definitely, this is definitely a factor that you need to take into consideration. So, at the beginning, so first of all, at the like, at the minimum, you need to make sure there are ways to, to, to to facilitate retraining and redeployment of your of your model of your system. So that that’s a minimum, even though the people will say, you know, what, we’re in a conservative industry, it things never changed here, until someone change one field in the data set. And then like the data set changes, so that you need to take into consideration that the data will change at some point. So you need to be prepared, then you need also to make text and look at the data if the data is varying in time. So especially right now, if you’re able to get like three years of data, two years of data, you’re able to see that fluctuation in the data and also for worst case scenarios. So as COVID affected, your that your data set, Maybe yes, maybe no, depending on the use case. So if you see that it has affected a lot the data or also if you see that the trend is changing. So the non stationarity, you’re able to to prepare for the worst. So here, you will need to think about a solution that retrain automatically. And I can give you an example. We have worked movie is worked with the Montreal, Montreal Transit Authority SDN on predicting the ridership in the metro station. Okay. It was right. So this project has happened because of the COVID-19. And the need for for social distancing. So you’re in the middle of COVID. And the impact was so violent on the under the transport options, that the Add to be able to measure if today is is going to be a crowd, so there will be a crowd in the in the metro station or not. And so what we had to do, we were retraining, we so we had to retrain the model every day to make sure that the model is learning the right pattern. Considering the new trends that we perceive.
Simon Shaienks 1:07:35
Thanks, and congrats for saying nonstationary so eloquently. In that, congrats. Alright. Thanks. We’ve got Elisa, thanks a lot for your terrific presentation. Is it possible to explain a little bit about the feature extraction and feature engineering without any training data? quality reduction?
Olivier Blais 1:08:02
I’m not okay. Sorry. There’s a correction underneath. Because Yeah, and
Simon Shaienks 1:08:06
so it’s instead of without any test data without any training, data, quality training, data quality reduction.
Olivier Blais 1:08:17
So that feature extraction, and it’s, it’s painful, okay, it’s going to be painful. And there are so there are some screw up that some people they say, you know, what, you don’t need to do some feature extraction, just use a complex model and entry Burke have from from Gardner once said that, I think something very funny, it’s that, okay, if you ask a data scientist to do too, so, so to clean data, they say, you know, what, why should I clean data and take two, like two weeks of my time when I can create a very complex model in a year, and deal with this data automatically. So I think this is, this is important to do some feature extraction, you will always lose a little bit of data quality is very, you know, let’s, let’s change a little bit the way you you word, your sentence, because I don’t think your goal is to not lose quality. So you will lose quality, but you will gain robustness because you want so so you want search for a needle in the haystack. Okay, a little bit like if you use like those very complex, deep learning models. So here, you’re trying to serve a wave. So in between iving a, like a lot of features to extract and take a lot of time to do this. Or use a very complex method. So at first you want to work iteratively you probably want to test it to approach so one with a little bit of feedback. extracted, you can do it by looking at. So, Granger causality is, is a good tool for time series. correlations, you can use some, some, like there are a lot of different tools that exist for for for this. And, and also, you could test on the other end a complex deep learning tool, you see which one is best, which one is the more, the more powerful and the more robust. And then after that you see what you need to do. So, usually you tend to want to be in the middle. So, you can use something that is a little bit more complex technique, but you extract the most important features just to add a little bit more confidence and robustness to your model.
Simon Shaienks 1:10:45
Perfect. And our last one, we still got a win Elisa. St. Great, thanks a lot. We still got a lot of people that and here’s the last question that we have. And it goes for all three of us. What is the key takeaway piece of advice for someone just starting their AI project journey? Elliot, do you want to start this one?
Sure. So for me, it’s really take something that’s simple, easy, that will have a quick win, and try to do it as simple as possible to start, that’s how you’ll get traction within your organization. If you’re looking for a home run, you’re gonna have to convince a lot of people that this can be a viable product. If you miss it, it’s not gonna go well. So if you go for Quick Hits, and easy hits, you’re gonna have success.
Olivier Blais 1:11:36
I couldn’t agree more. Yeah, this is exactly what I would have said, Yes.
Simon Shaienks 1:11:40
Okay, I’m going with all you guys. And I would have said the same great words of advice with Elliot.
I think the final is once you’ve kind of figured out what you want to do, come talk to us because we will guide you through that whole process. And we’ll get you to the finish line.
Simon Shaienks 1:11:57
Absolutely. If you’re unsure where to start or still got question. We’re all going to stay available. We’re going to send you the recording. Maybe this afternoon, maybe tomorrow morning. We’ll see how fast we can do it. And feel free to reach out. It was a pleasure to have you guys here today. Thanks a lot for coming. And have a great rest of the day
Olivier Blais 1:12:19
Thank you very much. Bye!