Code-free machine learning platforms are on the rise. A new generation of tools that raises the question of the future of the data scientist.
Alteryx, Azure ML Studio, Dataiku, DataRobot, H2O.ai … Code-free machine learning platforms are multiplying like hot cakes. Your promise? It makes the creation of AI available to non-computer professionals who would later become citizen data scientists. A population that can use these solutions to integrate multiple data sources on the fly and use machine learning automation (ML) technologies to generate their prediction models. “Auto ML solutions allow you to quickly produce models that, in general, are still quite simple. The goal of publishers is to achieve an explicable artificial intelligence,” says Ismaïl Lachheb, data scientist at the French consultancy Octo Technology (Accenture group ). A product vision that follows, for example, Alteryx.
Depending on a particular problem, autoML compares different types of algorithms, but also algorithms of the same category with different configurations in terms of hyperparameters. The goal is to find the configuration that will give the best results. “It’s a time-saving way to quickly identify what can be done,” said Sergio Winter, a Revolve machine learning engineer and an AWS expert at ESN Devoteam. “Models resulting from this process, however, can be difficult to deploy as they are.”
First drawback: the choice of hyperparameters is not neutral. “It is often necessary for the data scientist to interpret the consequences of the choice of parameters for companies,” warns Didier Gaultier, director of data science and AI at Business & Decision (Orange group). “In a recommendation engine, for example, a ranking threshold will have a direct impact on the turnover, margin, and proportion of population it targets during a marketing campaign.”
“The ML car is losing its engineering function, which is preparing, refining, a bit like refining petrol oil and enriching the model’s input data.”
Didier Gaultier adds: “Auto ML also loses function engineering, which involves preparing, refining, a bit like refining petroleum gasoline and enriching the model’s input data.” The consultant insists: “We can recode this or that variable so that its distribution is compatible with the type of algorithm used. We can also cross this predictor with another to create a more relevant indicator at the input of the algorithm. This is where the impact of human reasoning on the end result is the most important, and it is above all the care taken during this phase that will characterize the competence of a data scientist.
The more complex the business problem, the deeper the job of functional engineering will have to be. “In the maritime sector, for example, container identification numbers follow a specific description format with sequences of numbers to identify the owner, operator, type of container, refrigerated containers, etc.”, explains Sergio Winter. All of these elements need to be considered in functional engineering … and only a data scientist can make these comparisons. “Code-free AI solutions allow the end user to have more control over the models. Which isn’t bad at all. But it doesn’t eliminate all the data science work,” Sergio Winter sums up.
The code cannot enter the data
Ismaïl Lachheb insists: “Code-free AI platforms focus mainly on the choice of models and hyperparameters. But the bulk of a machine learning project, which is still covered by the data scientist, is to understand the data. its mode of production, control its biases, but also clean and prepare the data “. Ismaïl Lachheb continues: “Predicting the behavior of an industrial production process involves integrating, federating and controlling different data streams from temperature or voltage sensors installed at different points in a production line. AI tools without code they are far from being able to model this complexity. “
Below, non-code facilitates the production of machine learning models before recycling. When it comes to MLOps, AI platforms like Dataiku or DataRobot integrate code-free environments adapted to manage the entire lifecycle of a model, from learning to production (read the MLOps comparison article: Dataiku and DataRobot versus open source alternatives). Octo is also inspired by this logic through its IA Factory offer. “Along with the emergence of any code, data scientists tend to increasingly specialize in MLOps, or one of the many fields of AI, such as computer vision, automatic language processing, or reinforcement learning, ”says Ismail Lachheb. The data scientist still has a bright future ahead of him.