![]() | ![]() | ![]() | |||||||||||
![]() |
|
||||||||||||
![]() | ![]() | ![]() | |||||||||||||||
![]() |
|
||||||||||||||||

Техническая поддержка
ONLINE
![]() | ![]() | ![]() | |||||||||||||||||
![]() |
|
||||||||||||||||||
Computer Scientist Explains Machine Learning in 5 Levels of Difficulty WIRED
ruticker 04.03.2025 15:25:03 Recognized text from YouScriptor channel FLTC ITMO University
Recognized from a YouTube video by YouScriptor.com, For more details, follow the link Computer Scientist Explains Machine Learning in 5 Levels of Difficulty WIRED
# Machine Learning Explained ## Level 1: Introduction Hi, I'm Hilary Mason. I'm a computer scientist, and today I've been asked to explain machine learning in five levels of increasing complexity. Machine learning gives us the ability to learn things about the world from large amounts of data that we, as human beings, can't possibly study or appreciate. So, machine learning is when we teach computers to learn patterns from looking at examples in data, such that they can recognize those patterns and apply them to new things that they haven't seen before. ## Level 2: Basic Understanding Hi! Hi, I'm Hilary. What's your name? I'm Brynn. Do you know what machine learning means? Have you heard that before? No. So, machine learning is a way that we teach computers to learn things about the world by looking at patterns and examples of things. Can I show you an example of how a machine might learn something? Sure! **Is this a dog or a cat?** It's a dog. **And this one?** A cat. **What makes a dog a dog and a cat a cat?** Well, dogs are very playful, I think, more than cats. Cats lick themselves more than dogs. Do you think if we look at these pictures, we could say, "Well, they both have pointy ears, but the dogs have a different kind of body, and the cats like to stand up a little differently"? Do you think that makes sense? Yeah! **What about this one?** A dog. **A cat?** My thinking cat because it's more skinny, and also its legs are really tall, and its ears are a little pointy. This one's a jackal, and it's actually a kind of dog, but you made a good guess. That's what machines do too; they make guesses. **Is this a cat or a dog?** None. **What is it?** It's humans. **And how did you know that?** Because cats and dogs walk on their paws, and their ears are like right here, not right here, and they don't wear watches. So, you did something pretty amazing there because we asked the question, "Is it a cat or a dog?" and you said, "I disagree with your question; it's a human." So, machine learning is when we teach machines to make guesses about what things are based on looking at a lot of different examples. I build products that use machine learning to learn about the world and make guesses about things in the world. When we try to teach machines to recognize things like cats and dogs, it takes a lot of examples. We have to show them tens of thousands or even millions of examples before they can get even close to as good at it as you are. ## Level 3: School and Testing Do you have tests in school? Yeah, I have a review after every unit, and then we have a test. Are those like the practice problems you do before the test? Well, just like everything that's going to be on the test is on the review, which means that in the test, you're not seeing any problems that you don't know how to solve, as long as you did all your practice, right? Yeah! So, machines work the same way. If you show them a lot of examples and give them practice, they'll learn how to guess. Then, when you give them the test, they should be able to do that. So, we looked at eight pictures, and you were able to answer really quickly. But what would you do if I gave you 10 million examples? Would you be able to do that so quickly? Yeah! So, one of the differences between people and machines is that people might be a little better at this but can't look at 10 million different things. So now that we've been talking about machine learning, is this something you want to learn how to do? Um, kind of, because I kind of want to become a spy, and we used to do coding, so I may be kind of good at it. Machine learning is a great way to use all those math skills, all those coding skills, and would be a super cool tool for a spy. ## Level 4: Real-World Applications Hello! Hi! Are you a student? Lucy. Yes, I just finished ninth grade. Congratulations! Thank you! It's very exciting. Have you ever heard of machine learning before? I'm going to assume that it means humans being able to teach machines or robots how to learn themselves. That's right! When we teach machines to learn from data, to build a model from that data or a representation of that, and then to make a prediction, one of the places we often find machine learning in the real world is in things like recommendation systems. So, do you have an artist you really like? Yeah, Melanie Martinez. So, I'm gonna look up Melanie Martinez, and it says here, "If you like Melanie Martinez, one of the other songs you might like is by Aura." Do you know who that is? I do not. So, let's listen to a hint of this song. **Why do you think Spotify might have recommended that song?** Well, I know that in Melanie Martinez's music, she used a lot of the filtered voice to make it sound very deep and low, and that song had that. That's actually a really interesting thing to think about because that creepy vibe is something that you can perceive and I can perceive, but it's actually really hard to describe to a machine. What do you think might go into that pitch of the music if it's really low or if it's super high? It could know that. **What can the machine understand?** It's a great question! The machine can understand whatever we tell it to understand. So, there might be a person thinking about things like the pitch or the pacing or the tone, or sometimes machines can figure out things about music or images or videos that we don't tell it to discover, but that it can learn from looking at a lot of different examples. **Why do you think companies might use machine learning?** I think things like Facebook or Instagram probably use it to target ads. Sometimes the ads you see are really uncanny, and I think that's because they're based on so much data. They know where you live; they know where your device is. It's also important to realize that people in aggregate are actually pretty predictable. Like when we talk to each other, we like to talk about the novel things. Like here, we're having this conversation; we don't do this every day, but we probably still ate breakfast, we're gonna eat lunch, we're gonna eat dinner. You probably are going to the same home you go to most of the time. So, they're able to take that data that we already give them and make predictions based on that as to what ads they should show us. **So you're saying I give them enough data as it is about what I might be talking about or thinking about that they can read my mind?** But just use the data that I've already given them, and it almost seems like that's watching us. To do machine learning, we use something called algorithms. Have you heard of algorithms before? A set of steps or a process carried out to complete something. That's right! So, do you think that we've been able to teach machines enough so that they can do things that even we can't do? And on the opposite side of that, do you think there are things that we can do that a machine might never be able to do? So, there are things that machines are really great at that humans are actually not great at. Imagine watching every video posted to TikTok every day. We just don't have enough time to do that at the rate at which we can actually watch those videos. But a machine can analyze all of them and then make recommendations to us. And then thinking about things that machines are bad at and people are good at: people are really great with only one or two examples of learning something new and incorporating that into our model of the world to make good decisions, whereas machines often need tens of thousands of examples. And that's not even getting into things like good judgment because we care about people. We can imagine a future that we want to live in that doesn't exist today, and that's something that is still uniquely human. Machines are great at predicting based on what they've seen in the past, but they're not creative. They're not going to invent; they're not going to really change where we're going to go. That's up to us. ## Level 5: Advanced Concepts I'm Sunny, and what are you majoring in? I study math and computer science. So, in your studies, have you learned about machine learning? Yeah, I have. So, to me, machine learning is essentially exactly what it sounds like. It's trying to teach a machine specifics about something by inputting a lot of data points, and slowly the machine will build up knowledge about it over time. For example, my Gmail program—I assumed that there would be a lot of machine learning models happening at once. Right, absolutely! And that's a great example because you have models that are operating to do things like figure out if a new email is spam or not. **So what would you think about if you were looking at an email and trying to decide if it went in one category or another?** I'd probably look at certain keywords, maybe if the recipient and the sender had exchanged emails before, and what generally those fell into in the past. So these are things we would call features. We go through a process where we do feature engineering, where somebody looks at the example and says, "Okay, these are the things that I think might allow us to statistically tell the difference from something in one category versus another." So, for example, perhaps you don't speak Russian, and you start getting a lot of email in Russian. Obviously, the features that you just described are features that a person would have had to think about. Are there features that the machine itself could learn? This is a great question because it really gets to the difference between some of our different tools in our machine learning tool belt in addressing problems like this. So if we were to use a supervised learning classic classification approach, a person would need to think about those features and creatively come up with them—an approach we call the kitchen sink approach, which is just try everything you can possibly think of and see what works. Unsupervised learning, where we don't have labeled data and we're trying to infer some structure out of the data, you're projecting that data into a space and looking for things like clusters. There's a bunch of really fun math about how you do that—how you think about distance. And by distance, I mean that if we have two data points in space, how do we decide if they're similar or not? **And how do the algorithms themselves usually differ between unsupervised and supervised learning?** In supervised learning, we have our labels, and we're trying to figure out what statistically indicates if something matches one label or another label. In unsupervised learning, we don't necessarily have those labels; that's the thing we're trying to discover. Reinforcement learning is another technique that we use sometimes. You can think about it like a turn in a game, and you can play millions and millions of trials so that you're able to develop a system that, by experimenting with reinforcement learning, can eventually learn to play these games pretty successfully. **Deep learning, which is essentially using neural networks and very large amounts of data to eventually iterate on a network structure that can make predictions—how does reinforcement learning differ from deep learning?** It seems to me that reinforcement learning isn't sort of like the kitchen sink approach that you were talking about earlier, where you're just kind of trying everything. It is, but it also thrives in environments where you have a decision point—a palette of actions to choose from. It actually comes historically from trying to train a robot to navigate a room. If it bonks into this chair, it can't go forward anymore, and if it falls into that pit, you know it's not going to succeed. But if it keeps exploring, it'll eventually get to the goal. Oh, like Roombas? Yes! Oh wow, I didn't realize it was that deep! **Is there a situation where you'd want to use a deep learning algorithm over a reinforcement learning algorithm?** Typically, you would choose deep learning if you have sufficient high-quality data, hopefully labeled in a useful way, if you really are happy not to necessarily understand or be able to interpret what your system is doing, or you're willing to invest in another set of work afterward to understand what the system is doing once you've already trained it. This also comes down to the fact that some things are actually really easy to solve with linear regression or simple statistical approaches, and some things are impossible. **What would be the outcome if you were to choose the "wrong" approach?** You build a system that could actually be useless. So, years ago, I had a client—there was a big telecom company, and they had a data scientist who built a deep learning system to predict customer churn. It actually was very accurate, but it wasn't useful because nobody knew why the prediction was what it was. So they could say, "Sonny, you're likely to quit next month," but they had no idea what to do about it. I think there are a bunch of failure modes. Would that be an example of linear regression where the regression is accurate, but for marketing purposes, it's like, "If you don't know why I'm quitting the service, then how can we fix this?" Yeah, this is actually a good example of a very real-world kind of machine learning problem where the solution to this was to build an interpretable system on top of the accurate predictions. So not to throw it away, but to do a bunch more work to figure out the why. **How can we improve machine learning algorithms?** It's actually fairly new that we're able to solve all of these problems and start to build these products and apply it in businesses and apply it everywhere. So we're still developing good practices and what it means to be a professional in machine learning. We're really developing a notion of what good looks like. I'm in my first year of a PhD in computer science, and I'm studying natural language processing and machine learning. So would you mind telling me a bit about what you've been working on or interested in lately? I've been looking at understanding persuasion in online text and the ways that we might be able to automatically detect the intent behind that persuasion or who it's targeted at and what makes effective persuasive techniques. **So what are some of the techniques you're applying to look at that debate data?** Something I'm interested in exploring is how well it works to use deep learning and sort of automatically extracted features from this text versus using some of the more traditional techniques that we have—things like lexicons or some sort of template matching techniques for extracting features from text. That's a question I'm just interested in in general: when do we really need deep learning versus when can we use something that's a little bit more interpretable, something that's been around for a while? Do you think there are going to be general principles that guide those decisions? Because right now, it's generally up to the machine learning engineer to decide what tools they want to apply. I definitely think there is, but I also sort of see it varying a lot based on the use case. Something that kind of works out of the box and maybe works a little bit more automatically might be better, and in other cases, you do sort of want a lot of fine-grain control. **So is that where some of that frustration around the lack of controllability and interpretability comes from?** Yeah, if you're building a model that just predicts the next thing based off of everything it's seen from text online, then yeah, you're really going to be replicating whatever that distribution online is. If you train a model off of language off the internet, it sometimes says uncomfortable things or inappropriate things and sometimes really biased things. **Have you ever run into this yourself, and then how do you think about that problem of potentially even measuring the bias in a model that we've trained?** Yeah, it's a really tricky question. As you said, these models are trained to sort of predict the next sequence of words given a certain sequence of words. So we could start with just sort of prompts like "the woman was" versus "the man was" and kind of pull out common words that are sort of more used with one phrase versus the other. So that's sort of a qualitative way of looking at it. It's not ever kind of a guarantee of how the model is going to behave in one particular instance, and I think that's what's really tricky. That's why I sort of think it's really good for creators of systems to just be honest about this is sort of what we have seen, and so then someone can make their own judgment about, "Is this going to be too high risk for sort of my particular use case?" I imagine in the last few years we've seen a lot of changes and improvements in the capabilities of NLP systems. So is there anything in that that you're particularly excited about exploring further? I'm really interested in sort of the creative potential that we've started to see from NLP systems with things like GPT-3 and other really powerful language models. It's really easy to write long grammatical passages, thinking about the way that we can then harness the human ability to actually give meaning to those words and sort of provide structure and how we can combine those things with the kind of generative capabilities of these models now is really interesting. Yeah, I agree! **So hi, Claudia! It's so great to see you! It has been far too long. You know we first met 10, 11 years ago, and machine learning has changed a lot since then.** The tooling that we now have, the capacity, and also an elevation of the problem sets that we're dealing with and how to frame the problem. I'm almost struggling to figure out whether it's a blessing or a curse that it has become as accessible and as democratized and as easy to execute. You just build another new company from scratch. **So what's been kind of your reflection on that?** Well, you're absolutely right that the attention machine learning gets has grown dramatically. Twenty years ago, going to gatherings and telling people what I was working on and how to seeing the blank face or the "where's the turn and walk away" like, "Oh no!" The accessibility of the tooling—like we can now do in like five lines of code something that would have taken 500 lines of very mathematical, messy, gnarly code even, you know, five years ago. And that's not an exaggeration. There are tools that mean that pretty much anyone can pick this up and start playing with it and start to build with it, and that is also really exciting. In contrast, what I'm struggling with is a friend of mine who asked me to look at some healthcare data for him, and despite the capabilities that we're having in all of the kind of bigger societal problems alongside with data collection, engineering, all the gnarly stuff that is actually not the machine learning itself—it's the rest of it—where certain data isn't available. To me, it's staggering how difficult it is to get it off the ground and actually use. Part of the challenge of it is not the mathematics of building models, but the challenge is making sure that the data is sufficiently representative, potentially high quality. But how transparent do I need to build it for it to be adopted at some point? What types of biases in the data collection and then also in the usage? We now call it the bias, but we're still struggling with society not really living up to its expectations and then machine learning bringing it to the forefront, right? **And so to say that another way, when you're collecting data from the real world and then building machine learning systems that automate decisions based on that data, all of the biases and problems that are already in the real world can then be magnified through that machine learning system, so it can make many of these problems much worse.** Feeling increasingly challenged that my skill set of being very good at programming has become somewhat secondary, and it's really the bigger picture understanding of who would be using that, how transparent do I need to build it for it to be adopted at some point? What types of biases in the data collection and then also in the usage? I think in certain areas we have societal expectations as to what is fair and what isn't. So it's not just the provenance of that data, but it's sort of deeply understanding why does it look the way it looks? Why was it collected this way? What are the limitations of it? We need to think about that in the entire process—how we document that process. This is an issue in companies where somebody might create something that even their peers can't recreate. **What have you seen in terms of which industries where they stand? Who is adopting now? Who is ready to utilize it? Where would you maybe wish they didn't even try?** These are great questions! So things like actuarial science, operations research, where they actually are not using machine learning as much as you might think. And then you have other sorts of companies or on the fintech side or even the ad tech side of things where they perhaps are using machine learning to the point of even absurdity. So I spent about eight years working in ad tech, and the motivation was really because it was such an amazingly exciting playground to push that technology that used to largely live in academia really out in the world and see kind of what it can achieve. It has created such a hunger for data that now everything is being collected. I'm curious when we're going to make a foray into things like agriculture about smart production of the things we eat. You see and hear these interesting stories, but I feel like we're not ready yet to put that into an economically viable situation. So when we think about the next five to ten years, the things that are really still holding us back are these uneven applications of resources to problems because the problems that get attention are the high-value ones in terms of how much money you can make or the things that are fashionable enough that you can publish a paper on it. **So what do you think is holding us back?** I fully agree on the steps you pointed out and the processes. I think there is a chicken-and-egg problem. Like your former example, these areas that need to wait for data—the value of the data collection is then also slightly less apparent, and so it gets delayed further. You see that happening. But what my experience has been—yes, unfortunately, I feel a drifting apart between academia and the uses of AI. But I'm somewhat frustrated with a generation of students who have standard data sets that they never think about what the model needs to be used for, that they never have to think about how the data was collected. **So with all these challenges ahead of us, how optimistic are you about this world that I deeply believe we can create and the steps towards it?** I am incredibly optimistic, and perhaps it's a personality flaw, but I can't help but look at the potential of the technology to reduce harm, to give us information, to help us make better decisions, and to think that we would choose to address the big problems ahead of us. I don't think we have a hope of addressing them without figuring out the role that machine learning will play, and to think that we would then choose not to do that is just unthinkable. Despite the raised concerns about the challenges ahead, I think they also make us a society better. They challenge us to be a lot clearer about what fairness means to all of us. So with all of the setbacks, I think we have exciting years to come, and I am looking forward to a world where a lot more of that is used for the right purposes. I hope you learned something about machine learning. There has never been a better time to study machine learning because you're now able to build products that have tremendous potential and impact across any industry or area that you might be excited about.
Залогинтесь, что бы оставить свой комментарий