04-10-2023 08:43 AM - edited 04-10-2023 08:44 AM
Many years ago, I worked for a company that used AI to search for a way to increase the production of a certain medicine. The woman in charge of that projects fed the AI algorithm with all kinds of relevant data. I don't recall which relevant data she used, so I'll make up some data for the purpose of this discussion:
She measured each of these things at regular intervals over a period of weeks and fed that data and the amount of successful production to a machine learning engine designed to identify and rank the most important factors to grow the medicine.
Here's the twist. She threw in irrelevant data, too. In this case, she added the number of sunspots on each of the days. The idea was to give the AI something irrelevant to chew on which should theoretically increase the weight of the relevant data.
It turned out that the AI identified sunspots as the most important factor affecting the production of the medicine.
I recall suggesting to her that sunspots could, indeed, have an influence on the production of the medicine. She rejected that notion, probably rightly so. Regardless, I don't recall the final conclusions.
My question to you is, do you deliberately inject irrelevant data into your machine learning process to draw out the more important factors? If so, have you ever run into an experience like the above, where the algorithm identifies the supposedly irrelevant data as the most important?
Solved! Go to Solution.
04-11-2023 12:52 PM
I think injecting irrelevant data into a machine learning process is not a common practice because the fundamental principles of ML is to feed the algorithm with relevant data that is representative of the problem at hand. The goal is to train the algorithm to recognize patterns and make accurate predictions based on the data it has been trained on.
Sometimes you would want to intentionally add noise or irrelevant data to the training data set. This technique is called "regularization," and its purpose is to prevent overfitting, which occurs when an algorithm becomes too specialized in recognizing patterns in the training data and fails to generalize well to new, unseen data. These methods work by adding constraints or modifications to the model or the training process, such as adding penalties for large weights or randomly dropping out nodes during training. The goal is not to identify the irrelevant data as important, but rather to help the algorithm focus on the more relevant features in the data.
04-11-2023 12:52 PM
I think injecting irrelevant data into a machine learning process is not a common practice because the fundamental principles of ML is to feed the algorithm with relevant data that is representative of the problem at hand. The goal is to train the algorithm to recognize patterns and make accurate predictions based on the data it has been trained on.
Sometimes you would want to intentionally add noise or irrelevant data to the training data set. This technique is called "regularization," and its purpose is to prevent overfitting, which occurs when an algorithm becomes too specialized in recognizing patterns in the training data and fails to generalize well to new, unseen data. These methods work by adding constraints or modifications to the model or the training process, such as adding penalties for large weights or randomly dropping out nodes during training. The goal is not to identify the irrelevant data as important, but rather to help the algorithm focus on the more relevant features in the data.
04-11-2023 02:59 PM
Yes, this took place in the '80s. The engine she was using wasn't designed to train an algorithm. It was designed to identify the data one should use to train an algorithm.
The company made some fun machines. We had a dual-Z80 box with a connected ultrasonic transducer. We'd use the transducer to look at 30 bad spot welds, and 30 good ones, and then the box could identify bad spot wells with 90%+ accuracy. It was fun stuff.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide