{"id":21312,"date":"2026-02-12T23:31:27","date_gmt":"2026-02-12T23:31:27","guid":{"rendered":"https:\/\/microvibenews.com\/?p=21312"},"modified":"2026-02-12T23:31:27","modified_gmt":"2026-02-12T23:31:27","slug":"the-flawed-assumptions-behind-matt-shumers-viral-x-post-on-ais-looming-impact","status":"publish","type":"post","link":"https:\/\/microvibenews.com\/?p=21312","title":{"rendered":"The flawed assumptions behind Matt Shumer\u2019s viral X post on AI\u2019s looming impact"},"content":{"rendered":"<p><img src=\"https:\/\/fortune.com\/img-assets\/wp-content\/uploads\/2026\/02\/1644257218541.jpg?w=2048\" \/><\/p>\n<p>AI Influencer Matt Shumer penned a viral blog on X about AI\u2019s potential to disrupt, and ultimately automate, almost all knowledge work that has racked up more than 55 million views in the past 24 hours.<\/p>\n<p>Shumer\u2019s 5,000-word essay certainly hit a nerve. Written in a breathless tone, the blog is constructed as a warning to friends and family about how their jobs are about to be radically upended. (<em>Fortune <\/em>also ran an adapted version of Shumer\u2019s post as a commentary piece.)<\/p>\n<p>\u201cOn February 5th, two major AI labs released new models on the same day: GPT-5.3-Codex from OpenAI, and Opus 4.6 from Anthropic,\u201d he writes. \u201cAnd something clicked. Not like a light switch \u2026 more like the moment you realize the water has been rising around you and is now at your chest.\u201d<\/p>\n<p>Shumer says coders are the canary in the coal mine for every other profession. \u201cThe experience that tech workers have had over the past year, of watching AI go from \u2018helpful tool\u2019 to \u2018does my job better than I do,\u2019 is the experience everyone else is about to have,\u201d he writes. \u201cLaw, finance, medicine, accounting, consulting, writing, design, analysis, customer service. Not in 10 years. The people building these systems say one to five years. Some say less. And given what I\u2019ve seen in just the last couple of months, I think \u2018less\u2019 is more likely.\u201d<\/p>\n<p>But despite its viral nature, Shumer\u2019s assertion that what\u2019s happened with coding is a prequel\u00a0for what will happen in other fields\u2014and, critically, that this will happen within just a few years\u2014seems wrong to me. And I write this as someone who wrote a book (<em>Mastering AI: A Survival Guide to Our Superpowered Future<\/em>)<em> <\/em>that predicted AI would massively transform knowledge work by 2029, something which I still believe. I just don\u2019t think the full automation of processes that we are starting to see with coding is coming to other fields as quickly as Shumer contends. He may be directionally right, but the dire tone of his missive strikes me as fearmongering, and based largely on faulty assumptions.<\/p>\n<div>\n<h2 class=\"wp-block-heading\"><strong>Not all knowledge work is like software development<\/strong><\/h2>\n<p>Shumer says the reason code has been the area where autonomous agentic capabilities have had the biggest impact so far is that AI companies have devoted so much attention to it. They have done so, Shumer says, because these frontier model companies see autonomous software development as key to their own businesses, enabling AI models to help build the next generation of AI models. In this, the AI companies\u2019 bet seems to be paying off: The pace at which they are churning out better models has picked up markedly in the past year. And both OpenAI and Anthropic have said that the code behind their most recent AI models was largely written by AI itself.<\/p>\n<p>Shumer says that while coding is a leading indicator, the same performance gains seen in coding arrive in other domains, although sometimes about a year later than the uplift in coding. (Shumer does not offer a cogent explanation for why this lag might exist although he implies it is simply because the AI model companies optimize for coding first and then eventually get around to improving the models in other areas.)\u00a0<\/p>\n<p>But what Shumer doesn\u2019t mention is another reason that progress in automating software development has been more rapid than in other areas: Coding has some quantitative metrics of quality that simply don\u2019t exist in other domains. In programming, if the code is really bad it simply won\u2019t compile at all. Inadequate code may also fail various unit tests that the AI coding agent can perform. (Shumer doesn\u2019t mention that today\u2019s coding agents sometimes lie about conducting unit tests\u2014which is one of many reasons automated software development isn\u2019t foolproof.)<\/p>\n<p>Many developers say the code that AI writes is often decent enough to pass these basic tests but is still not very good: that it is inefficient, inelegant, and most important, insecure, opening an organization that uses it to cybersecurity risks. But in coding there are still some ways to build autonomous AI agents to address some of these issues. The model can spin up sub-agents that check the code it has written for cybersecurity vulnerabilities or critique the code on how efficient it is. Because software code can be tested in virtual environments, there are plenty of ways to automate the process of reinforcement learning\u2014where an agent learns by experience to maximize some reward, such as points in a game\u2014that AI companies use to shape the behavior of AI models after their initial training. That means the refinement of coding agents can be done in an automated way at scale.<\/p>\n<p>Assessing quality in many other domains of knowledge work is far more difficult. There are no compilers for law, no unit tests for a medical treatment plan, no definitive metric for how good a marketing campaign is before it is tested on consumers. It is much harder in other domains to gather sufficient amounts of data from professional experts about what \u201cgood\u201d looks like. AI companies realize they have a problem gathering this kind of data. It is why they are now paying millions to companies like Mercor, which in turn are shelling out big bucks to recruit accountants, finance professionals, lawyers, and doctors to help provide feedback on AI outputs so AI companies can train their models better.<\/p>\n<p>It is true that there are benchmarks that show the most recent AI models making rapid progress on professional tasks outside of coding. One of the best of these is OpenAI\u2019s GDPval benchmark. It shows that frontier models can achieve parity with human experts across a range of professional tasks, from complex legal work to manufacturing to health care. So far, the results aren\u2019t in for the models OpenAI and Anthropic released last week. But for their predecessors, Claude Opus 4.5 and GPT-5.2, the models achieve parity with human experts across a diverse range of tasks, and beat human experts in many domains.<\/p>\n<p>So wouldn\u2019t this suggest that Shumer is correct? Well, not so fast. It turns out that in many professions what \u201cgood\u201d looks like is highly subjective. Human experts only agreed with one another on their assessment of the AI outputs about 71% of the time. The automated grading system used by OpenAI for GDPval has even more variance, agreeing on assessments only 66% of the time. So those headline numbers about how good AI is at professional tasks could have a wide margin of error.<\/p>\n<h2 class=\"wp-block-heading\">Enterprises need reliability, governance, and auditability<\/h2>\n<p>This variance is one of the things that holds enterprises back from deploying fully automated workflows. It\u2019s not just that the output of the AI model itself might be faulty. It\u2019s that, as the GDPval benchmark suggests, the equivalent of an automated unit test in many professional contexts might produce an erroneous result a third of the time. Most companies cannot tolerate the possibility that poor quality work is being shipped in a third of cases. The risks are simply too great. Sometimes, the risk might be merely reputational. In others, it could mean immediate lost revenue. But in many professional tasks, the consequences of a wrong decision can be even more severe: professional sanction, lawsuits, the loss of licenses, the loss of insurance coverage, and, even, the risk of physical harm and death\u2014sometimes to large numbers of people.<\/p>\n<p>What\u2019s more, trying to keep a human in the loop to review automated outputs is problematic. Today\u2019s AI models are genuinely getting better. Hallucinations occur less frequently. But that only makes the problem worse. As AI-generated errors become less frequent, human reviewers become complacent. AI errors become harder to spot. AI is wonderful at being confidently wrong and at presenting results that are impeccable in form but lack substance. That bypasses some of the proxy criteria humans use to calibrate their level of vigilance. AI models often fail in ways that are alien to the ways humans fail at the same tasks, which makes guarding against AI-generated errors more of a challenge.<\/p>\n<p>For all these reasons, until the equivalent of software development\u2019s automated unit tests are developed for more professional fields, deploying automated AI workflows in many knowledge work contexts will be too risky for most enterprises. AI will remain an assistant or copilot to human knowledge workers in many cases, rather than fully automating their work.<\/p>\n<p>There are other reasons that the kind of automation software developers have observed is unlikely for other categories of knowledge work. In many cases, enterprises cannot give AI agents access to the kinds of tools and data systems they need to perform automated workflows. It is notable that the most enthusiastic boosters of AI automation so far have been developers who work either by themselves or for AI-native startups. These software coders are often unencumbered by legacy systems and tech debt, and often don\u2019t have a lot of governance and compliance systems to navigate.<\/p>\n<p>Big organizations often currently lack ways to link data sources and software tools together. In other cases, concerns about security risks and governance mean large enterprises, especially in regulated sectors such as banking, finance, law, and health care, are unwilling to automate without ironclad guarantees that the outcomes will be reliable and that there is a process for monitoring, governing, and auditing the outcomes. The systems for doing this are currently primitive. Until they become much more mature and robust, don\u2019t expect enterprises to fully automate the production of business critical or regulated outputs.<\/p>\n<h2 class=\"wp-block-heading\">Critics say Shumer is not honest about LLM failings<\/h2>\n<p>I\u2019m not the only one who found Shumer\u2019s analysis faulty. Gary Marcus, the emeritus professor of cognitive science at New York University who has become one of the leading skeptics of today\u2019s large language models, told me Shumer\u2019s X post was \u201cweaponized hype.\u201d And he pointed to problems with even Shumer\u2019s arguments about automated software development.<\/p>\n<p>\u201cHe gives no actual data to support this claim that the latest coding systems can write whole complex apps without making errors,\u201d Marcus said.<\/p>\n<p>He points out that Shumer mischaracterizes a well-known benchmark from the AI evaluation organization METR that tries to measure AI models\u2019 autonomous coding capabilities that suggests AI\u2019s abilities are doubling every seven months. Marcus notes that Shumer fails to mention that the benchmark has two thresholds for accuracy, 50% and 80%. But most businesses aren\u2019t interested in a system that fails half the time, or even one that fails one out of every five attempts.\u00a0<\/p>\n<p>\u201cNo AI system can reliably do every five-hour-long task humans can do without error, or even close, but you wouldn\u2019t know that reading Shumer\u2019s blog, which largely ignores all the hallucination and boneheaded errors that are so common in everyday experience,\u201d Marcus says.<\/p>\n<p>He also noted that Shumer didn\u2019t cite recent research from Caltech and Stanford that chronicled a wide range of reasoning errors in advanced AI models. And he pointed out\u00a0that Shumer has been caught previously making exaggerated claims about the abilities of an AI model he trained. \u201cHe likes to sell big. That doesn\u2019t mean we should take him seriously,\u201d Marcus said.\u00a0<\/p>\n<p>Other critics of Shumer\u2019s blog point out that his economic analysis is ahistorical. Every other technological revolution has, in the long run, created more jobs than it eliminated. Connor Boyack, president of the Libertas Institute, a policy think tank in Utah, wrote an entire counter-blog-post making this argument.<\/p>\n<p>So, yes, AI may be poised to transform work. But the kind of full-task automation that some software developers have started to observe is possible for <em>some<\/em> tasks? For most knowledge workers, especially those embedded in large organizations, that is going to take much longer than Shumer implies.<\/p>\n<\/div>\n<p>#flawed #assumptions #Matt #Shumers #viral #post #AIs #looming #impact<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI Influencer Matt Shumer penn&hellip; <\/p>\n","protected":false},"author":1,"featured_media":21313,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[2654,9397,4044,1827,2867,1202,2006,12885,1731,827,11500,7257,4599,12886,7280,10234,829,12887],"_links":{"self":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts\/21312"}],"collection":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21312"}],"version-history":[{"count":0,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts\/21312\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/media\/21313"}],"wp:attachment":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21312"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21312"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}