{"id":12332,"date":"2026-01-14T19:59:28","date_gmt":"2026-01-14T19:59:28","guid":{"rendered":"https:\/\/microvibenews.com\/?p=12332"},"modified":"2026-01-14T19:59:28","modified_gmt":"2026-01-14T19:59:28","slug":"ai-models-are-starting-to-crack-high-level-math-problems","status":"publish","type":"post","link":"https:\/\/microvibenews.com\/?p=12332","title":{"rendered":"AI models are starting to crack high-level math problems\u00a0"},"content":{"rendered":"<p><br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Over the weekend, Neel Somani, who is a software engineer, former quant researcher, and a startup founder, was testing the math skills of OpenAI\u2019s new model when he made an unexpected discovery. After pasting the problem into ChatGPT and letting it think for 15 minutes, he came back to a full solution. He evaluated the proof and formalized it with a tool called Harmonic \u2014 but it all checked out.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cI was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they struggle,\u201d Somani said. The surprise was that, using the latest model, the frontier started to push forward a bit.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">ChatGPT\u2019s <a href=\"https:\/\/chatgpt.com\/share\/69630fa9-02d4-8012-8ef2-84c443c04922\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">chain of thought<\/a>\u00a0is even more impressive, rattling off mathematical axioms like\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Legendre%27s_formula\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Legendre\u2019s formula<\/a>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Bertrand%27s_postulate\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Bertrand\u2019s postulate<\/a>,\u00a0and\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Star_of_David_theorem\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the Star of David theorum<\/a>. Eventually, the model found\u00a0<a href=\"https:\/\/mathoverflow.net\/questions\/138209\/product-of-central-binomial-coefficients\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a Math Overflow post from 2013<\/a>,\u00a0where\u00a0Harvard mathematician Noam\u00a0Elkies\u00a0had given an elegant solution to a similar problem. But ChatGPT\u2019s final proof differed from Elkies\u2019 work in important ways, and gave a more complete solution to a version of the problem posed by legendary mathematician Paul Erd?s, whose vast collection of unsolved problems has become a proving ground for AI.<\/p>\n<p class=\"wp-block-paragraph\">For anyone skeptical of machine intelligence,\u00a0it\u2019s\u00a0a surprising result \u2014 and\u00a0it\u2019s\u00a0not the only one. AI tools have become ubiquitous in mathematics, from formalization-oriented LLMs like Harmonic\u2019s Aristotle to literature review tools like OpenAI\u2019s deep research. But since the release of GPT 5.2 \u2014 which Somani describes as \u201canecdotally more skilled at mathematical reasoning than previous iterations\u201d \u2014 the sheer volume of solved problems has become difficult to ignore, raising new questions about large language models\u2019 ability to push the frontiers of human knowledge.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Somani was looking at the Erd?s problems, a set of over one thousand conjectures by the Hungarian mathematician that are\u00a0<a rel=\"nofollow\" href=\"https:\/\/www.erdosproblems.com\/\">maintained\u00a0online<\/a>. The problems have become a tempting target for AI-driven mathematics, varying significantly in both subject matter and difficulty. The first batch of autonomous solutions came in November from\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2511.02864\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a Gemini-powered model called AlphaEvolve<\/a>\u00a0\u2014 but more recently, Somani and others have found GPT 5.2 to be remarkably adept with high-level math.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Since Christmas, 15 problems have been moved from \u201copen\u201d to \u201csolved\u201d on the Erd?s website \u2014 and 11 of the solutions have specifically credited AI models as involved in the process.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The revered mathematician Terence Tao has a more nuanced look at the progress <a rel=\"nofollow\" href=\"https:\/\/github.com\/teorth\/erdosproblems\/wiki\/AI-contributions-to-Erd%C5%91s-problems\">on his GitHub page<\/a>, counting eight different problems where AI models made meaningful autonomous progress on an Erd?s problem, with six other cases where progress was made by\u00a0locating\u00a0and building on\u00a0previous\u00a0research.\u00a0It\u2019s\u00a0a long way from AI systems being able to do math without human intervention, but\u00a0it\u2019s\u00a0clear that\u00a0there\u2019s\u00a0an important role\u00a0for large models to play.\u00a0<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 13-15, 2026<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"wp-block-paragraph\"><a rel=\"nofollow\" href=\"https:\/\/mathstodon.xyz\/@tao\/115891257393270694\">On Mastodon<\/a>, Tao conjectured that the\u00a0scalable\u00a0nature of AI systems makes them \u201cbetter suited for being systematically applied to the \u2018long tail\u2019 of obscure Erd?s problems, many of which\u00a0actually have\u00a0straightforward solutions.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cAs such, many of these easier Erd?s problems are now more likely to be solved by purely AI-based methods than by human or hybrid means,\u201d Tao continued.<\/p>\n<p class=\"wp-block-paragraph\">Another driving force is a recent shift towards formalization, a labor-intensive task that makes mathematical reasoning easier to verify and extend. Formalization\u00a0doesn\u2019t\u00a0require use of AI or even computers, but a new crop of automated tools have made the process far easier. The open-source \u201cproof assistant\u201d Lean, which was developed at Microsoft Research in 2013, has become widely used within the field as a way of formalizing proof\u2014 and AI tools like Harmonic\u2019s Aristotle promise to automate much of the work of formalization.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">For Harmonic founder Tudor Achim, the sudden jump in solved Erd?s problems is less important than the fact that the world\u2019s greatest mathematicians are starting to take those tools seriously.\u00a0\u201cI care more about the fact that math and computer science professors are using [AI\u00a0tools],\u201d Achim said. \u201cThese people have reputations to protect, so when they\u2019re saying they use Aristotle or they use ChatGPT, that\u2019s real evidence.\u201d\u00a0<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2026\/01\/14\/ai-models-are-starting-to-crack-high-level-math-problems\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over the weekend, Neel Somani,&hellip; <\/p>\n","protected":false},"author":1,"featured_media":12333,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[249],"tags":[420,2930,8807,8808],"_links":{"self":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts\/12332"}],"collection":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12332"}],"version-history":[{"count":0,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/posts\/12332\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=\/wp\/v2\/media\/12333"}],"wp:attachment":[{"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/microvibenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}