Skip to main content
Taskadetaskade
PricingLoginSign up for free →Sign up for free →
Loved by 1M+ users·Hosting 100K+ apps·Deploying 500K+ AI agents·Running 1M+ automations·Backed by Y Combinator
TaskadeAboutPressPricingFeaturesIntegrationsChangelogContact us
GalleryReviewsHelp CenterDocsFAQ
VibeVibe AppsVibe AgentsVibe CodingVibe Workflows
Vibe MarketingVibe DashboardsVibe CRMVibe AutomationVibe PaymentsVibe DesignVibe SEOVibe Tracking
Community
FeaturedQuick AppsTools
DashboardsWebsitesWorkflowsProjectsFormsCreators
DownloadsAndroidiOSMac
WindowsChromeFirefoxEdge
Compare
vs Cursorvs Boltvs Lovable
vs V0vs Windsurfvs Replitvs Emergentvs Devinvs Claude Codevs ChatGPTvs Claudevs Perplexityvs GitHub Copilotvs Figma AIvs Notionvs ClickUpvs Asanavs Mondayvs Trellovs Jiravs Linearvs Todoistvs Evernotevs Obsidianvs Airtablevs Basecampvs Mirovs Slackvs Bubblevs Retoolvs Webflowvs Framervs Softrvs Glidevs FlutterFlowvs Base44vs Adalovs Durablevs Gammavs Squarespacevs WordPressvs UI Bakeryvs Zapiervs Makevs n8nvs Jaspervs Copy.aivs Writervs Rytrvs Manusvs Crewvs Lindyvs Relevance AIvs Wrikevs Smartsheetvs Monday Magicvs Codavs TickTickvs Any.dovs Thingsvs OmniFocusvs MeisterTaskvs Teamworkvs Workfrontvs Bitrix24vs Process Streetvs Toggl Planvs Motionvs Momentumvs Habiticavs Zenkitvs Google Docsvs Google Keepvs Google Tasksvs Microsoft Teamsvs Dropbox Papervs Quipvs Roam Researchvs Logseqvs Memvs WorkFlowyvs Dynalistvs XMindvs Whimsicalvs Zoomvs Remember The Milkvs Wunderlist
Genesis AIApp BuilderVibe CodingAgent Builder
Dashboard BuilderCRM BuilderWebsite BuilderForm BuilderWorkflow AutomationWorkflow BuilderBusiness-in-a-BoxAI for MarketingAI for Developers
AI Agents
FeaturedProject ManagementProductivity
MarketingTranslatorContentWorkflowResearchPersonalSalesSocial MediaTo-Do ListCRMTask AutomationCoachingCreativityTask ManagementBrandingFinanceLearning and DevelopmentBusinessCommunity ManagementMeetingsAnalyticsDigital AdvertisingContent CurationKnowledge ManagementProduct DevelopmentPublic RelationsProgrammingHuman ResourcesE-CommerceEducationLegalEmailSEODeveloperVideo ProductionDesignFlowchartDataPromptNonprofitAssistantsTeamsCustomer ServiceTrainingTravel PlanningUML DiagramER DiagramMath TutorLanguage LearningCode ReviewerLogo DesignerUI WireframeFitness CoachAll Categories
Automations
FeaturedBusiness-in-a-BoxInvestor Operations
Education & LearningHealthcare & ClinicsStripeSalesContentMarketingEmailCustomer SupportHubSpotProject ManagementAgentic WorkflowsBooking & SchedulingCalendarReportsSlackWebsiteFormTaskWeb ScrapingWeb SearchChatGPTText to ActionYoutubeLinkedInTwitterGitHubDiscordMicrosoft TeamsWebflowRSS & Content FeedsGoogle WorkspaceManufacturing & OperationsAI Agent TeamsAll Categories
Wiki
GenesisAI AgentsAutomation
ProjectsLiving DNAPlatformIntegrationsProductivityMethodsProject ManagementAgileScrumAI ConceptsCommunityTerminologyFeatures
Templates
FeaturedChatGPTTable
PersonalProject ManagementSalesFlowchartTask ManagementEngineeringEducationDesignTo-Do ListMarketingMind MapGantt ChartOrganizationalPlanningMeetingsTeam ManagementStrategyGamingProductionProduct ManagementStartupRemote WorkY CombinatorRoadmapCustomer ServiceLegalEmailBudgetsContentConsultingE-CommerceStandard Operating Procedure (SOP)Human ResourcesProgrammingMaintenanceCoachingSocial MediaHow-TosResearchMusicTrip PlanningAll Categories
Generators
AI AppAI WebsiteAI Dashboard
AI FormAI AgentClient PortalAI WorkspaceAI ProductivityAI To-Do ListAI WorkflowsAI EducationAI Mind MapsAI FlowchartAI Scrum Project ManagementAI Agile Project ManagementAI MarketingAI Project ManagementAI Social Media ManagementAI BloggingAI Agency WorkflowsAI ContentAI Software DevelopmentAI MeetingAI PersonasAI OutlineAI SalesAI ProgrammingAI DesignAI FreelancingAI ResumeAI Human ResourceAI SOPAI E-CommerceAI EmailAI Public RelationsAI InfluencersAI Content CreatorsAI Customer ServiceAI BusinessAI PromptsAI Tool BuilderAI SEOAI Gantt ChartAI CalendarsAI BoardAI TableAI ResearchAI LegalAI ProposalAI Video ProductionAI Health and WellnessAI WritingAI PublishingAI NonprofitAI DataAI Event PlanningAI Game DevelopmentAI Project Management AgentAI Productivity AgentAI Marketing AgentAI Personal AgentAI Business and Work AgentAI Education and Learning AgentAI Task Management AgentAI Customer Relations AgentAI Programming AgentAI SchemaAI Business PlanAI Pitch DeckAI InvoiceAI Lesson PlanAI Social Media CalendarAI API DocumentationAI Database SchemaAll Categories
Converters
AI Featured ConvertersAI PDF ConvertersAI CSV Converters
AI Markdown ConvertersAI Prompt to App ConvertersAI Data to Dashboard ConvertersAI Workflow to App ConvertersAI Idea to App ConvertersAI Flowcharts ConvertersAI Mind Map ConvertersAI Text ConvertersAI Youtube ConvertersAI Knowledge ConvertersAI Spreadsheet ConvertersAI Email ConvertersAI Web Page ConvertersAI Video ConvertersAI Coding ConvertersAI Task ConvertersAI Kanban Board ConvertersAI Notes ConvertersAI Education ConvertersAI Language TranslatorsAI Business → Backend App ConvertersAI File → App ConvertersAI SOP → Workflow App ConvertersAI Portal → App ConvertersAI Form → App ConvertersAI Schedule → Booking App ConvertersAI Metrics → Dashboard ConvertersAI Game → Playable App ConvertersAI Catalog → Directory App ConvertersAI Creative → Studio App ConvertersAI Agent → Agent App ConvertersAI Audio ConvertersAI DOCX ConvertersAI EPUB ConvertersAI Image ConvertersAI Resume & Career ConvertersAI Presentation ConvertersAI PDF to Spreadsheet ConvertersAI PDF to Database ConvertersAI PDF to Quiz ConvertersAI Image to Notes ConvertersAI Audio to Notes ConvertersAI Email to Tasks ConvertersAI CSV to Dashboard ConvertersAI YouTube to Flashcards ConvertersAll Categories
Prompts
Blog WritingBrandingPersonal Finance
Human ResourcesPublic RelationsTeam CollaborationProduct ManagementSupportAgencyReal EstateMarketingCodingResearchSalesAdvertisingSocial MediaCopywritingContentProject ManagementWebsite CreationDesignStrategyE-commerceEngineeringSEOEducationEmail MarketingUX/UIProductivityInfluencer MarketingAnalyticsEntrepreneurshipLegalAll Categories
Blog
The HyperCard Moment: From Bill Atkinson to AI Micro Apps (2026)How to Generate Creative Ideas: Idea Stacking, Visual Thinking & Storytelling Frameworks (2026)History of Apple: Steve Jobs' 50-Year Vision, From a Garage to a $3.7 Trillion AI Powerhouse (2026)Why One-Person Companies Are the Future of Work: AI Agents, Solo Founders, and the $1B Prediction (2026)
Build Your Own AI CRM vs Paying Salesforce $300/Seat (2026)The Great SaaS Unbundling: How AI Agents Break Per-Seat Pricing (2026)Garry Tan SaaS Prediction Scorecard: 3 Months Later (2026)History of Obsidian: From a Dynalist Side Project to the Second Brain Movement and the AI Knowledge OS Era (2026)State of Vibe Coding 2026: Market Size, Adoption & TrendsWhat is NVIDIA? Complete History: Jensen Huang, CUDA, GPUs, AI Revolution, Vera Rubin & More (2026)The SaaSpocalypse Explained: $285 Billion Wiped, AI Agents Rising (2026)AI-Native vs AI-Bolted-On: Why Software Architecture Decides Who Wins (2026)History of Mermaid.js: Diagrams as Code, From a Lost Visio File to 85K GitHub Stars (2026)The Complete History of Computing: From Binary to AI Agents — How We Got Here (2026)The BFF Experiment: From Noise to Life in the Age of AI Agents (2026)What Are AI Claws? Persistent Autonomous Agents Explained (2026)They Generate Code. We Generate Runtime — The Taskade Genesis Manifesto (2026)What Is Intelligence? From Neurons to AI Agents — A Complete Guide (2026)What Is Artificial Life? How Intelligence Emerges from Code (2026)
AIAutomationProductivityProject ManagementRemote WorkStartupsKnowledge ManagementCollaborative WorkUpdates
Changelog
GitHub App Export & EVE Mentions (Mar 30, 2026)GitHub App Import & Agent Editor Redesign (Mar 27, 2026)Improved EVE Selector & App Kit Polish (Mar 26, 2026)
App Kit Template Redesign & Community Creators (Mar 26, 2026)Agent Media Commands & Workflow Indicators (Mar 23, 2026)Asana Integration & Dark Mode Diagrams (Mar 22, 2026)Notion Integration & Smarter Agent Teams (Mar 21, 2026)
Wiki
GenesisAI AgentsAutomation
ProjectsLiving DNAPlatformIntegrationsProductivityMethodsProject ManagementAgileScrumAI ConceptsCommunityTerminologyFeatures
© 2026 Taskade.
PrivacyTermsSecurity
Made withTaskade AIforBuilders
Blog›AI›What Is Agentic Engineering?…

What Is Agentic Engineering? Complete History: From Turing to Karpathy, AutoGPT to Autoresearch & Beyond (2026)

The complete history of agentic engineering from Turing's first spark to Karpathy's 2026 declaration. How AI agents evolved from academic papers to a $4.7B industry, why vibe coding became passe, and what the shift to orchestrating autonomous agents means for every builder. Updated March 2026.

March 9, 2026·Updated March 30, 2026·80 min read·Taskade Team·AI·#agentic-engineering#vibe-coding#ai-agents
On this page (84)
What Is Agentic Engineering?The Prehistory: Foundations of Machine Intelligence (1950–2011)Alan Turing and the First Spark (1950)The Birth of AI as a Field (1956)The First AI Winter (1974–1980)Expert Systems and the Second Winter (1980–1993)From Perceptrons to Hopfield Networks: The Memory Problem (1957–1986)The Backpropagation Breakthrough and Neural Network Renaissance (1986–2011)The Deep Learning Revolution (2012–2016)ImageNet and the AlexNet Moment (2012)Andrej Karpathy: The Thread Through the StoryDeepMind, AlphaGo, and Reinforcement Learning (2014–2016)The Transformer Paradigm (2017–2022)"Attention Is All You Need" (2017)The GPT Series (2018–2022)The Academic Foundations of Agentic AI (2022)The Autonomous Agent Explosion (2023)Toolformer: Machines Learn to Use Tools (February 2023)AutoGPT: The Viral Proof of Concept (March 2023)BabyAGI: The Minimalist Vision (April 2023)LangChain: The Infrastructure Layer (2023)The Lilian Weng Blog Post (June 2023)The Infrastructure Year (2024)GPT-4 and the Reasoning Revolution (2024)Devin: The First AI Software Engineer (March 2024)Anthropic's Model Context Protocol — MCP (November 2024)Karpathy's LLM OS Vision (2024)The Competitive Landscape CrystallizesThe Vibe Coding Phenomenon (2025)February 2, 2025: The Tweet That Changed EverythingKarpathy's Software 3.0 Framework (June 2025)The Explosion of Vibe Coding Platforms (2025)The Problems Surface (2025)Karpathy's 2025 LLM Year in Review (December 2025)The Convergence on Harness Engineering (2026)The Agentic Engineering Era (2026)February 8, 2026: Karpathy Declares Vibe Coding PasseWhy the Name Change MattersAddy Osmani's Principles (February 2026)The Factory Model: From Coder to ConductorThe Four Species of AI Agents (2026)How to Pick the Right Species: The Decision FlowchartReal-World Agent Species in Production (2026)Where Taskade Genesis Fits in the TaxonomyThe Anti-Patterns: What Goes WrongOpenAI's Internal EvidenceInside Claude Code: Building for the Model 6 Months From NowThe Standards War (Late 2025 – 2026)The Agentic AI Foundation — AAIF (December 2025)Google's Agent2Agent Protocol — A2A (2025)The Enterprise Adoption WaveKarpathy's Autoresearch: Agentic Engineering in Action (March 2026)How It WorksThe Three-File Architecture: Why Autoresearch WorksWhat the Agent Actually FoundReal-World ImpactBeyond Text: Autoresearch for Music GenerationAutoresearch as a Work PrimitiveThe Three Conditions (And Where Autoresearch Fails)AgentHub: GitHub for AgentsKarpathy's SETI@Home Vision for AI ResearchKarpathy's Claws: The Layer Above AgentsThe Multi-Agent Reality: Token Throughput as the New MetricEvolutionary Agents: From Stepping Stones to Scientific DiscoveryThe Shopify Precedent: Agentic Engineering Goes CorporateHow Taskade Genesis Embodies Agentic EngineeringThe Workspace DNA ArchitectureWhy Platform Beats FrameworkThe Complete Timeline: From Turing to Agentic EngineeringWhat Comes Next: The Agentic Engineering RoadmapPhase 1: Vibe Coding (2025) — CompletedPhase 2: Agentic Engineering (2026) — CurrentPhase 3: Supervised Autonomy (2027–2028)Phase 4: Autonomous Systems (2029+)The Agentic Engineering Stack (2026)For Non-DevelopersFor DevelopersThe ConvergenceRelated ReadingContext Engineering: The Foundation of Agentic SystemsIntent Engineering: The Third DisciplineAgentic Engineering Platforms ComparedGet Started: Build Your First Agentic WorkflowFAQ

Agentic engineering is the discipline that will define how software gets built for the next decade. But it did not appear overnight. It is the product of seven decades of research, three waves of AI hype, a handful of viral open-source projects, one Stanford PhD who keeps coining the right term at the right time, and an industry that finally has models smart enough to act on their own.

This is the complete history — from Alan Turing's first spark to Andrej Karpathy's February 2026 declaration that vibe coding is passe, and from AutoGPT's 100,000-star explosion to the Agentic AI Foundation that now governs the standards. Every milestone, every inflection point, every thread that connects the dots.

TL;DR: Agentic engineering — coined by Karpathy in Feb 2026 — is orchestrating AI agents with human oversight. It evolved through 70+ years: Turing (1950) → deep learning (2012) → Transformers (2017) → AutoGPT (2023) → MCP (2024) → vibe coding (2025) → agentic engineering (2026). The agentic AI market is projected to grow from $7-9B in 2026 to $47-93B by 2030-2032 (Fortune Business Insights, Grand View Research, MarketsandMarkets). Gartner predicts 40% of enterprise apps will have AI agents by end of 2026, up from less than 5% in 2025. Taskade Genesis embodies this evolution — 150,000+ apps built with AI agents, automations, and workspace-level orchestration.


What Is Agentic Engineering?

Agentic engineering is a software development approach where humans orchestrate AI agents who do the actual coding, testing, and deployment, while the human provides architectural oversight, quality standards, and strategic direction. The term was coined by Andrej Karpathy on February 8, 2026, as the professional successor to vibe coding.

Karpathy's exact words:

"Agentic, because the new default is that you are not writing the code directly 99% of the time. You are orchestrating agents who do and acting as oversight. Engineering, to emphasize that there is an art and science and expertise to it."

The distinction is precise:

Vibe Coding Agentic Engineering
Who writes code AI generates, human accepts AI generates, human reviews with same rigor as a human PR
Planning Start prompting immediately Plan before prompting — design docs, specs, architecture
Testing Hope it works Test relentlessly — the biggest differentiator
Ownership "It works, I think" Own the system — docs, version control, CI, monitoring
Best for Prototypes, exploration, learning Production systems, team projects, anything that must be maintained
Risk 1.7x more major issues, 2.74x more security vulnerabilities (CodeRabbit data) Human-level quality with AI-level speed
Who benefits most Beginners getting started Senior engineers as force multipliers (Osmani)

Google's Addy Osmani identified the "80% Problem": agents generate 80% of a solution fast, but the remaining 20% — architecture, edge cases, production hardening — requires deep engineering knowledge. Agentic engineering is the discipline of directing that last 20%.

This is not casual prompting. It is not "accept all and hope for the best." It is a discipline — with principles, tools, patterns, and a 70-year intellectual lineage that makes it the logical conclusion of everything computer science has been building toward.

To understand why agentic engineering matters, you need to understand where it came from.

Taskade Genesis — orchestrating AI agents to build live applications from a single prompt


The Prehistory: Foundations of Machine Intelligence (1950–2011)

Alan Turing and the First Spark (1950)

Every history of AI begins with Alan Turing. His 1950 paper "Computing Machinery and Intelligence" asked the question that launched the field: Can machines think?

Turing proposed what became known as the Turing Test — if a machine can converse with a human and the human cannot reliably distinguish it from another human, the machine can be said to "think." This was not a technical specification. It was a philosophical provocation. And it worked — it gave the field a North Star.

A rebuilt Bombe machine designed by Alan Turing

A rebuilt "Bombe" machine designed by Alan Turing. The device allowed the British to decipher encrypted German communication during World War II. Image credit: Antoine Taveneaux

The Birth of AI as a Field (1956)

In 1956, John McCarthy coined the term "artificial intelligence" at the Dartmouth Conference — a summer workshop where a small group of researchers declared that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."

The optimism was extraordinary. Herbert Simon predicted in 1957 that within ten years, a computer would be chess champion and discover an important mathematical theorem. He was wrong by about four decades on the chess part and arguably still waiting on the math.

The First AI Winter (1974–1980)

Early AI research hit a wall. The models were too simple, the computers too slow, and the problems too hard. Funding dried up. DARPA cut grants. The field entered its first "AI winter" — a period of reduced funding and pessimism that would repeat.

Expert Systems and the Second Winter (1980–1993)

The 1980s brought expert systems — rule-based programs that encoded human knowledge into if-then rules. On a pivotal 1984 episode of The Computer Chronicles, three of AI's founding figures laid out the vision: John McCarthy (who coined "artificial intelligence" and invented LISP), Nils Nilsson (Stanford), and Edward Feigenbaum (who coined the term "knowledge engineering").

The promise was intoxicating. MYCIN could diagnose 20 infectious diseases using 300 hand-coded rules. Companies like Digital Equipment Corporation deployed XCON, which saved $40 million annually configuring computer orders. Dendral could infer molecular structures from mass spectrometry data. AI was a billion-dollar industry.

But McCarthy, even as the field celebrated, identified the fatal flaw: expert systems had no common sense. They could diagnose a rare blood infection but could not understand that a patient is a person who lives in a world with gravity, weather, and emotions. Feigenbaum's knowledge engineers could extract specialist expertise, but the "things everybody knows" — the vast ocean of implicit knowledge humans navigate unconsciously — proved impossible to formalize into rules.

Nilsson called these systems brittle — a word that would prove prophetic. A system that works perfectly within its narrow domain and fails catastrophically one step outside it is not intelligence. It is a lookup table with ambitions. By the late 1980s, expert systems collapsed under the weight of their own maintenance costs and inflexibility. The second AI winter followed.

The irony is that expert systems were the first proto-agents — software that made autonomous decisions within a domain. The concept of "knowledge engineering" — encoding human expertise into a system that can act on it — is a direct ancestor of today's agentic engineering. The difference: modern AI agents learn from data rather than from hand-coded rules, and they generalize across domains rather than shattering at the boundary.

Expert Systems → Modern AI Agents: The Lineage

Expert System (1980s) AI Agent (2026)
┌──────────────────┐ ┌──────────────────┐
│ Hand-coded rules │ │ Learned weights │
│ 300 rules max │ │ Billions of │
│ One domain only │ │ parameters │
│ Brittle at edges │ │ Cross-domain │
│ No learning │ │ Continuous │
│ No memory │ │ learning │
│ No tool use │ │ 22+ tools │
└──────────────────┘ └──────────────────┘

Same goal: autonomous decision-making
Different foundation: rules vs. learned representations

From Perceptrons to Hopfield Networks: The Memory Problem (1957–1986)

Frank Rosenblatt's Perceptron stunned the world in 1957 — a machine that could learn to recognize patterns completely automatically. The New York Times reported it was "expected to walk, talk, see, write, reproduce itself, and be conscious of its existence." It learned by adjusting weighted connections between inputs (dials multiplying signals) until it could classify patterns correctly. The Perceptron Learning Rule was elegant: if the output is wrong, adjust the weights by a fixed learning rate. If correct, leave them alone.

But Marvin Minsky and Seymour Papert's 1969 book Perceptrons exposed a fatal limitation: single-layer networks could not learn non-linearly separable patterns like XOR. The field stalled — nobody could train networks with multiple layers. Widrow and Hoff's LMS algorithm came agonizingly close but could not push gradients through layers with binary step functions (slope = zero everywhere). Neural network research nearly died.

Then in 1982, John Hopfield published a paper that changed how we think about memory itself. His Hopfield network — a recurrent network where neurons influence each other through weighted connections — showed that memories in neural networks are not stored in locations like computer RAM. They are stored as stable states of the entire network. Feed the network a corrupted version of a memory and it auto-completes, gravitating back to the stored pattern. This is associative memory: you recall by content, not by address.

The insight was profound: computer memory has a place (a binary address), but neural network memory has a time — a dynamic trajectory toward a stable attractor. Hopfield proved that networks of simple neurons exhibit emergent memory as a natural behavior of the system, not as an engineered feature. His work won the 2024 Nobel Prize in Physics — recognition that the physics of neural networks is foundational science, not applied engineering.

This matters for the agentic engineering story because the same principle — memory as a dynamic property of connected systems, not static storage — is exactly what separates agentic workspaces from traditional software. A Workspace DNA system stores knowledge not as files in folders but as patterns of context that agents can retrieve associatively: ask a question and the relevant memory surfaces. Hopfield networks proved this was physically possible. Modern AI agents make it practical.

The Backpropagation Breakthrough and Neural Network Renaissance (1986–2011)

The solution to multi-layer training came in 1986 when Rumelhart, Hinton, and Williams replaced the binary step activation function with a smooth sigmoid curve — giving gradients a slope to follow. The backpropagation algorithm generalized Widrow and Hoff's delta rule through the chain rule of calculus, propagating error signals backward through every layer. The same principle — adjusted for scale — trains every neural network today, including the 175 billion parameters of GPT-3 and the transformer architectures behind modern AI agents.

IBM's Deep Blue defeated world chess champion Garry Kasparov in 1997 — the moment AI entered public consciousness.

Gary Kasparov competing against IBM's Deep Blue chess computer in 1997

Gary Kasparov competing against IBM's Deep Blue chess computer in 1997. Image credit: kasparov.com

The 2000s brought big data, better algorithms, and increasing compute. By 2011, IBM Watson won Jeopardy!, and the stage was set for the deep learning revolution that would change everything.

Year Milestone Significance
1950 Turing's "Computing Machinery and Intelligence" Proposed the Turing Test, launched the field
1956 Dartmouth Conference McCarthy coins "artificial intelligence"
1957 Perceptron (Frank Rosenblatt) First neural network hardware — learns by adjusting weighted connections
1969 Perceptrons (Minsky & Papert) Exposed single-layer limits (XOR problem), nearly killed neural network research
1974 First AI Winter begins Funding cuts, pessimism
1982 Hopfield network Memory as stable states, not addresses — associative recall (2024 Nobel Prize in Physics)
1984 Expert systems peak (MYCIN, XCON) McCarthy warns: no common sense
1986 Backpropagation (Rumelhart, Hinton, Williams) Smooth activation functions + chain rule let gradients flow through layers
1997 Deep Blue defeats Kasparov AI enters public consciousness
2011 IBM Watson wins Jeopardy! NLP reaches mainstream awareness

The Deep Learning Revolution (2012–2016)

ImageNet and the AlexNet Moment (2012)

In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton submitted AlexNet to the ImageNet Large Scale Visual Recognition Challenge. It won by a staggering margin — reducing the error rate from 26% to 15.3%. This was not an incremental improvement. It was a paradigm shift.

The key insight: deep convolutional neural networks, trained on GPUs, could learn visual features that hand-engineered systems could not. The entire computer vision field pivoted to deep learning within months.

This matters for the agentic engineering story because one of AlexNet's co-authors — Ilya Sutskever — would go on to co-found OpenAI. And one of the students in the Stanford lab that developed the ImageNet dataset was Andrej Karpathy, who would later coin both "vibe coding" and "agentic engineering."

Andrej Karpathy: The Thread Through the Story

To understand agentic engineering, you need to understand the man who named it.

Andrej Karpathy was born in Bratislava, Czechoslovakia, in 1986. His family moved to Toronto when he was 15. He completed his undergraduate degree in Computer Science and Physics at the University of Toronto in 2009, a master's at the University of British Columbia in 2011, and a PhD at Stanford in 2015 under Fei-Fei Li — the computer scientist behind ImageNet.

During his PhD, Karpathy interned at Google Brain (2011), Google Research (2013), and DeepMind (2015). He authored and became primary instructor of Stanford's CS 231n: Convolutional Neural Networks for Visual Recognition — one of the largest classes at Stanford, growing from 150 students in 2015 to 750 by 2017.

Period Role Key Contribution
2009–2015 Stanford PhD student ImageNet research, CS 231n course
2015–2017 OpenAI founding member Research scientist, built core AI capabilities
2017–2022 Tesla Director of AI Led Autopilot vision, real-world AI deployment
Feb 2023 Returned to OpenAI Brief second stint
Feb 2024 Left OpenAI Founded Eureka Labs
Feb 2025 Coined "vibe coding" Changed how millions think about AI-assisted building
Jun 2025 YC AI Startup School "Software Is Changing (Again)" — defined Software 3.0
Dec 2025 2025 LLM Year in Review Identified 6 paradigm shifts including "ghosts" and "vibe coding"
Feb 2026 Coined "agentic engineering" Declared vibe coding passe, named the next era
Mar 2026 Released autoresearch Open-source proof of agentic engineering in ML research
Mar 2026 Launched AgentHub Agent-first collaboration platform — "GitHub for agents"

Karpathy is not just an observer. He is the thread that connects deep learning research, real-world AI deployment at Tesla, OpenAI's foundational work, and the conceptual frameworks that name each era. When he coins a term, the industry listens.

DeepMind, AlphaGo, and Reinforcement Learning (2014–2016)

While Karpathy was at Stanford, Google acquired DeepMind in January 2014 for approximately $500 million. In March 2016, DeepMind's AlphaGo defeated world Go champion Lee Sedol 4-1 — a feat that many AI researchers had predicted was decades away.

AlphaGo's significance for the agentic engineering story: it demonstrated that AI could make decisions in complex, ambiguous environments with long-term consequences. Go has more possible board positions than atoms in the universe. AlphaGo learned to evaluate positions and plan sequences of moves — a precursor to the planning capabilities that modern AI agents would need.


The Transformer Paradigm (2017–2022)

"Attention Is All You Need" (2017)

In June 2017, eight Google researchers published a paper that would reshape the entire field: "Attention Is All You Need." The Transformer architecture they introduced replaced sequential processing with parallel attention mechanisms, enabling models to process entire sequences simultaneously.

The Transformer made everything that follows in this history possible — GPT, BERT, Claude, Gemini, and every AI agent that orchestrates them.

The same month the Transformer paper was published, Karpathy left OpenAI to become Tesla's Director of AI, where he would spend five years applying deep learning to real-world autonomous systems.

The GPT Series (2018–2022)

OpenAI used the Transformer to build the GPT (Generative Pre-trained Transformer) series:

Model Year Parameters Key Innovation
GPT-1 2018 117M Proved unsupervised pre-training works
GPT-2 2019 1.5B "Too dangerous to release" (initially withheld)
GPT-3 2020 175B Few-shot learning, first signs of emergent behavior
InstructGPT 2022 — RLHF alignment, followed instructions better
ChatGPT Nov 2022 — 100M users in 2 months, fastest-growing consumer app ever

ChatGPT's launch in November 2022 was the moment AI went mainstream. It reached 100 million users in two months — faster than TikTok (9 months) and Instagram (2.5 years). For the first time, anyone could have a conversation with an AI that felt genuinely intelligent.

But ChatGPT was a chatbot, not an agent. It could answer questions, not take actions. The gap between "impressive conversational AI" and "autonomous AI agent" would take another year to begin closing.

Anthropic CEO Dario Amodei drew this exact line in his interview with Nikhil Kamath (2026): "Coding is going away first. The broader task of software engineering will take longer." The elements that remain human — system design, understanding user demand, managing teams of AI models — are precisely the skills agentic engineering would later formalize.

The Academic Foundations of Agentic AI (2022)

Two academic papers published in 2022 laid the theoretical groundwork for everything that would follow:

Chain of Thought Prompting (Wei et al., 2022) — Researchers at Google demonstrated that prompting language models to "think step by step" dramatically improved performance on complex reasoning tasks. This was the first proof that LLMs could decompose problems into sequential steps — a prerequisite for any agent that needs to plan.

ReAct: Reasoning + Acting (Yao et al., 2022) — This paper introduced the agent loop that would power every subsequent AI agent framework: think → act → observe → repeat. ReAct showed that LLMs could synergize reasoning traces with tool use, overcoming hallucination by grounding responses in real-world interactions.

These papers were not consumer products. They were not viral tweets. But without Chain of Thought and ReAct, there is no AutoGPT, no LangChain, no Claude Code, and no agentic engineering.

Goal not met Goal met Perceive(Read context, inputs, environment) Reason(Chain of Thought decomposition) Act(Call tools, write code, search) Observe(Evaluate result against goal) Return Result


The Autonomous Agent Explosion (2023)

Toolformer: Machines Learn to Use Tools (February 2023)

In February 2023, Meta AI published Toolformer — a model that could teach itself which external tools (calculators, search engines, APIs) to call, when to call them, and how to incorporate results. This was the missing piece: language models that could not only reason but interact with the outside world.

AutoGPT: The Viral Proof of Concept (March 2023)

On March 30, 2023, game developer Toran Bruce Richards released AutoGPT — an open-source project that connected GPT-4 to a loop of planning, execution, and self-evaluation. AutoGPT could browse the web, write and execute code, manage files, and pursue multi-step goals with minimal human intervention.

The repository exploded. Within weeks, it had over 100,000 GitHub stars — one of the fastest-growing open-source projects in history.

AutoGPT was deeply flawed. It burned through API credits, got stuck in loops, and hallucinated confidently. But it proved something that academic papers could not: autonomous AI agents were not a research curiosity. They were a product category.

BabyAGI: The Minimalist Vision (April 2023)

Days after AutoGPT went viral, venture capitalist Yohei Nakajima released BabyAGI — a stripped-down Python script that demonstrated the core autonomous agent loop in just 140 lines of code. BabyAGI could create tasks, prioritize them, and execute them using GPT-4 and a vector database for memory.

If AutoGPT was the flashy demo, BabyAGI was the elegant proof that the agent pattern could be simple, composable, and practical.

LangChain: The Infrastructure Layer (2023)

Harrison Chase's LangChain emerged as the connective tissue of the agent ecosystem. What began as a library for chaining LLM calls evolved into a full orchestration framework with:

  • Agent abstractions for tool use and planning
  • Memory systems for maintaining conversation context
  • Retrieval-augmented generation (RAG) for grounding responses in documents
  • Integration with dozens of LLM providers and tools

LangChain's download numbers tell the story: 47+ million PyPI downloads and the largest community ecosystem in the agent space.

The Lilian Weng Blog Post (June 2023)

In June 2023, OpenAI researcher Lilian Weng published "LLM Powered Autonomous Agents" — a comprehensive blog post that became the definitive reference for how agent systems work. She formalized the architecture into four components:

  1. Planning — Task decomposition and self-reflection
  2. Memory — Short-term (context window) and long-term (vector databases)
  3. Tool use — APIs, code execution, web browsing
  4. Action — Executing plans in the real world

This framework became the blueprint that every subsequent agent platform would follow — including Taskade's AI Agents.

Project Launched GitHub Stars Key Innovation
AutoGPT Mar 2023 100K+ First viral autonomous agent
BabyAGI Apr 2023 20K+ Minimalist agent loop (140 lines)
LangChain 2023 94K+ Agent orchestration framework
MetaGPT Mid 2023 48K+ Multi-agent software company simulation
GPT-Engineer Mid 2023 52K+ Full codebase generation from prompts

Taskade AI agents — custom tools, slash commands, persistent memory for agentic workflows


The Infrastructure Year (2024)

If 2023 was the year of viral demos, 2024 was the year the industry built real infrastructure.

GPT-4 and the Reasoning Revolution (2024)

OpenAI's GPT-4o launched in May 2024 — the first truly multimodal model handling text, audio, and vision in real-time. But the real paradigm shift came in September with o1-preview, OpenAI's first reasoning model that "thinks step by step" before answering.

This mattered enormously for agents: reasoning models could plan multi-step workflows, evaluate their own output, and course-correct — the exact capabilities that separate a useful agent from a hallucinating loop.

Devin: The First AI Software Engineer (March 2024)

On March 12, 2024, Cognition Labs announced Devin — marketed as "the world's first AI software engineer." Devin could plan and execute complex engineering tasks end-to-end, using a shell, code editor, and browser within a sandboxed environment.

Devin resolved 13.86% of real-world GitHub issues on the SWE-bench benchmark — far exceeding the previous state-of-the-art of 1.96%.

The reaction was polarizing. Some called it the beginning of the end for software engineering. Others pointed out that 13.86% was still failing 86% of the time. But Devin proved that autonomous coding agents were a real product category, not just an open-source experiment.

Anthropic's Model Context Protocol — MCP (November 2024)

In November 2024, Anthropic released the Model Context Protocol (MCP) — an open standard for connecting AI models to external tools and data sources. MCP defined how agents could securely interact with databases, APIs, file systems, and external services.

MCP was the USB-C of AI agents — a universal connector that made tools portable across platforms and reduced vendor lock-in. Its importance cannot be overstated: before MCP, every agent framework had its own proprietary tool integration. After MCP, tools became interoperable.

But adoption exposed a design problem. Jeremiah Lowin, creator of FastMCP and CEO of Prefect, observed that most early MCP servers simply mirrored CRUD operations — create_user, get_user, update_user, delete_user — which is "REST-brain" thinking. Lowin articulated the core principle that would define good MCP server design: design for outcomes, not operations. A single outcome-oriented tool (like check_order_status) can replace four or five CRUD tools, cutting token usage and reducing agent confusion. He also identified a critical performance threshold: agent quality degrades noticeably above approximately 50 tools, making curation essential. These design principles — flatten arguments, respect the token budget, curate ruthlessly, and treat errors as prompts for progressive disclosure — became the emerging best practices for the MCP ecosystem.

By March 2026, MCP has been adopted by OpenAI, Google DeepMind, Microsoft, and dozens of other companies. It was donated to the Linux Foundation's Agentic AI Foundation in December 2025.

Karpathy's LLM OS Vision (2024)

Throughout 2024, Karpathy developed his vision of the LLM Operating System — the idea that LLMs are not chatbots but the kernel process of a new computing paradigm. He described the system:

"LLMs not as a chatbot, but the kernel process of a new Operating System. It orchestrates input and output across modalities (text, audio, vision), code interpreter ability to write and run programs, browser/internet access, and embeddings database for files and internal memory storage and retrieval."

This framing was prophetic. Every major agent platform in 2025-2026 — Taskade Genesis, Cursor, Claude Code, Devin — implements some version of the LLM OS architecture.

The Competitive Landscape Crystallizes

Framework Category Launch Key Innovation
LangGraph Enterprise orchestration 2024 Graph-based stateful agent workflows
CrewAI Business automation 2024 Role-based multi-agent systems
AutoGen (Microsoft) Research 2023-2024 Asynchronous multi-agent conversations
OpenAI Function Calling API 2023-2024 Native tool use in GPT models
Anthropic MCP Standard Nov 2024 Universal agent-tool protocol
Devin (Cognition) Autonomous coder Mar 2024 End-to-end software engineering

The Vibe Coding Phenomenon (2025)

February 2, 2025: The Tweet That Changed Everything

On February 2, 2025, Andrej Karpathy posted a tweet that would become the most influential statement about software development since "move fast and break things":

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

He elaborated: "I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like 'decrease the padding on the sidebar by half' because I'm too lazy to find it. I 'Accept All' always, I don't read the diffs anymore."

The term went supernova. Within months:

  • Collins Dictionary named "vibe coding" its 2025 Word of the Year
  • The vibe coding market grew to $4.7 billion (projected $12.3B by 2027, 38% CAGR)
  • 63% of vibe coding users were non-developers
  • r/vibecoding grew to 153,000+ members
  • 25% of Y Combinator startups built 95% of their codebases using AI

Vibe coding gave permission. It told millions of people — many of them non-developers — that they could build software by describing what they wanted. The AI handles the code. You handle the vision.

Karpathy's Software 3.0 Framework (June 2025)

At Y Combinator's AI Startup School on June 17, 2025, Karpathy delivered a keynote titled "Software Is Changing (Again)" that formalized his thinking into the Software 3.0 framework:

Era Paradigm Programming Interface Who Programs
Software 1.0 Code Explicit instructions (C, Python, Java) Trained developers
Software 2.0 Weights Data + optimization (neural networks) ML engineers
Software 3.0 Prompts Natural language (English) Everyone

The key insight: LLMs are a new kind of programmable entity, and the programming language is natural language itself. This was not a incremental change — it was "the most profound shift in software development since the 1940s."

Karpathy's prescription: build "Iron Man suits" that augment expert capabilities, with a highly efficient "AI Generation → Human Verification" loop.

The Explosion of Vibe Coding Platforms (2025)

The vibe coding concept spawned an entire category of AI-powered development platforms:

Platform Category Key Metric Approach
Cursor AI code editor $2B ARR in 24 months Background Agents in VS Code
Replit Cloud IDE 30M+ users Browser-based, instant deployment
Lovable App builder $100M ARR No-code, prompt-to-app
Bolt.new Web builder Rapid growth Instant web app generation
Taskade Genesis AI workspace 150K+ apps built Agents + automations + workspace
Windsurf Code editor Acquired by OpenAI ($3B) AI-first development
v0 UI builder Vercel ecosystem React component generation

The Problems Surface (2025)

As vibe coding scaled, its limitations became impossible to ignore:

  1. Quality degradation — AI-generated code that "worked" on first test broke in edge cases, under load, or after updates
  2. Maintenance nightmare — Code nobody understands is code nobody can maintain
  3. Tech debt acceleration — Zoho CEO Sridhar Vembu's critique landed: "Vibe coding just piles up tech debt faster"
  4. Security vulnerabilities — Code generated without review contained injection vulnerabilities, leaked credentials, and insecure defaults
  5. The 80% problem — AI agents reliably handle 80% of a task but struggle with the remaining 20% that determines production readiness

Google's Addy Osmani crystallized the 80% problem: agents produce impressive first drafts that fail at the edges. The gap between "demo-quality" and "production-quality" became the central challenge.

Karpathy's 2025 LLM Year in Review (December 2025)

On December 19, 2025, Karpathy published his annual review identifying six paradigm shifts:

  1. RLVR (Reinforcement Learning from Verifiable Rewards) — The new dominant training methodology replacing RLHF
  2. Ghosts vs. Animals — LLMs are "summoned ghosts, not evolved animals" — optimized under entirely different constraints than biological intelligence
  3. Cursor / New LLM App Layer — Revealed a distinct bundling and orchestration layer for LLM applications
  4. Claude Code / AI on Your Computer — First convincing demonstration of extended agentic problem-solving: "a little spirit/ghost that lives on your computer"
  5. Vibe Coding — Code became "free, ephemeral, malleable, discardable after single use"
  6. Nano Banana / LLM GUI — First hints of graphical interfaces for LLMs

His conclusion about coding agents: they had "crossed a qualitative threshold since December — from brittle demos to sustained, long-horizon task completion with coherence and tenacity."

He described delegating an entire local deployment — SSH keys, vLLM, model download, benchmarking, server endpoint, UI, systemd service, and report — with minimal intervention. The future was not typing code. It was orchestrating agents.

The Convergence on Harness Engineering (2026)

By early 2026, the emerging discipline of harness engineering began crystallizing from multiple independent sources. OpenAI published a blog post titled "Harness Engineering." Anthropic released a guide on building effective harnesses for long-running agents. Manus (the AI company later acquired by Meta) published their context engineering lessons after rebuilding their entire agent framework five times in six months.

The term "harness" describes everything wrapped around the model: what context it can see, what tools it has access to, how it recovers from failures, and how it maintains state across sessions. The evolution was clear: prompt engineering (optimize a single turn) gave way to context engineering (optimize a single session), which gave way to harness engineering (design systems that work across sessions, agents, and workflows).

The EPICS Agent benchmark — which tests AI on real professional tasks that take humans 1-2 hours — revealed why this matters. The best frontier model completed those tasks only 24% of the time, despite scoring above 90% on standard benchmarks. After eight attempts: only ~40%. The failures were not about model intelligence. The agents could reason through problems fine. They failed at execution and orchestration — getting lost after too many steps, looping on failed approaches, losing track of the original objective.

Three of the most successful agent systems arrived at the same insight from completely different directions:

  1. OpenAI Codex: Layered architecture — orchestrator plans, executive handles tasks, recovery layer catches failures
  2. Claude Code: Minimal harness — just four tools (read, write, edit, bash) with extensibility via MCP and skills
  3. Manus: Reduce-offload-isolate — shrink context, use file system as memory, spin up sub-agents, bring back summaries

All three converged on the same conclusion: the harness matters more than the model. Richard Sutton's bitter lesson — that approaches scaling with compute always beat hand-engineered knowledge — applied directly: as models get smarter, harnesses should get simpler, not more complex.

Taskade workspace DNA — Memory, Intelligence, Execution working together


The Agentic Engineering Era (2026)

February 8, 2026: Karpathy Declares Vibe Coding Passe

Exactly one year after coining vibe coding, Karpathy declared his own term obsolete:

"LLMs have gotten much smarter. Vibe coding is passe."

His replacement — agentic engineering — was deliberately chosen:

"Agentic, because the new default is that you are not writing the code directly 99% of the time. You are orchestrating agents who do and acting as oversight. Engineering, to emphasize that there is an art and science and expertise to it."

The key phrase: "orchestrating agents who do and acting as oversight." The human role shifted from code writer to system architect, agent director, and quality gatekeeper.

Vibe Coding (2025) Iterate Approved Human architects spec AI agents implement Human reviews rigorously Test + Deploy Agents maintain Human prompts AI generates code Accept All Hope it works

Why the Name Change Matters

This was not semantic wordplay. The shift from "vibe coding" to "agentic engineering" represented three critical changes:

Dimension Vibe Coding (2025) Agentic Engineering (2026)
Philosophy "Forget the code exists" "Own the architecture, delegate the implementation"
Human role Prompter Architect + reviewer + orchestrator
Quality bar "Does it seem to work?" "Does it pass the test suite?"
AI role Code generator Autonomous agent with tools
Maintenance "I'll prompt it again later" Persistent memory + continuous testing
Professional legitimacy Awkward in job descriptions "Agentic Engineer" on your resume
Accountability Unclear Human owns the system

Addy Osmani's Principles (February 2026)

Google engineering lead Addy Osmani published the most comprehensive framework for agentic engineering practice, which quickly became industry consensus:

1. Plan Before Prompting — Write a specification before touching an AI agent. Design docs, structured prompts, or task breakdowns — the spec is the highest-leverage artifact.

2. Direct with Precision — Give agents well-scoped tasks. The skill is decomposition: breaking a project into agent-sized work packages with clear inputs, outputs, and success criteria.

3. Review Rigorously — Evaluate AI output with the same rigor you would apply to a human engineer's PR. Do not assume the agent got it right because it looks right.

4. Test Relentlessly — "The single biggest differentiator between agentic engineering and vibe coding is testing." Test suites are deterministic validation for non-deterministic generation.

5. Own the System — Maintain documentation, use version control and CI, monitor production. The AI accelerates the work; you are responsible for the system.

The Factory Model: From Coder to Conductor

Osmani also published "The Factory Model," describing the generational evolution of AI coding tools:

Generation Model Human Role Example
1st Gen Accelerated autocomplete Writer with suggestions GitHub Copilot (early)
2nd Gen Synchronous agents Director with real-time review Cursor, Claude Code
3rd Gen Autonomous agents Architect with checkpoint review Background Agents, Devin 2.0

The critical insight: "You are no longer just writing code. You are building the factory that builds your software."

And the data backed it up:

  • New website creation: +40% year-over-year
  • New iOS apps: +49% increase
  • GitHub code pushes in US: +35% jump

These metrics had been flat for years. Agentic engineering was not just changing how software was built — it was changing how much software existed.

The Four Species of AI Agents (2026)

As agentic engineering matured, a critical realization emerged: saying "agents" was too vague. Not all agents are the same — and using the wrong species for the wrong work is one of the most common and costly mistakes in production AI systems.

All four species share the same primitive — LLM + tools + feedback loop. What differs is the construction of that loop: the context, scope, human involvement, and optimization target. Getting this taxonomy right is fundamental to practicing agentic engineering effectively.

Species Scale Human Role Quality Gate When to Use
Coding Harness Individual task Manager — decomposes, delegates, reviews Human judgment Your judgment is the gold standard
Project Harness Team / project Architect — involved at beginning and end Planner agent + human review 8-20 developers' worth of complexity
Dark Factory Fully autonomous pipeline Spec writer + evaluator Automated eval + optional human review You trust the evals, want to minimize bottlenecks
Auto Research Metric optimization Goal setter + result reviewer Metric improvement You have a measurable rate to optimize

Coding Harnesses are the simplest pattern — an agent taking the place of a developer in an engineering process. Claude Code, Codex, and the Peter Steinberger model all operate here. The critical skill is decomposition: breaking a big problem into well-defined chunks, each given to a single-threaded agent. Karpathy runs his agents 16 hours a day; Steinberger manages multiple agents simultaneously across 10+ repository checkouts.

Project Harnesses extend the pattern to team-scale work. Cursor proved this across browsers and compilers — millions of lines of code — using a planner agent that manages tasks, keeps notes, tracks memory, and evaluates executor work. Short-running "grunt" agents are spun up for exactly one problem, then disposed. The critical learning: Cursor tried three levels of management hierarchy and it failed. Simple scales well with agents.

Dark Factories remove humans from the middle entirely. Spec goes in, software comes out. Humans are heavily involved at the top (design, requirements, excellent specifications) and at the end (verifying evals, code review for accountability), but the system runs autonomously in between. The name comes from Chinese automated factories where the lights are off — robots work end-to-end. Amazon learned the risks the hard way when AI-generated incidents from junior engineers triggered a company-wide review by senior and principal engineers.

Auto Research is a different species entirely — descended from classical machine learning, not software engineering. The agent climbs a hill by relentlessly running experiments to optimize a specific metric. Shopify CEO Tobi Lutke used it to make a 20-year-old codebase 53% faster overnight. Karpathy's autoresearch ran 700 experiments in two days. The critical distinction: is your problem software-shaped or metric-shaped? If you have a rate to optimize, use auto research. If you need working software, use a harness.

A fifth pattern — Orchestration — routes work across agents with genuinely specialized roles (researcher → writer → editor, or ticket pickup → research → resolution). Frameworks like LangGraph and CrewAI serve this pattern. The coordination overhead only pays off at scale — 10,000+ items, not 100.

The Four Species of AI Agents Remove human middle Different problem shape 🔧 Coding HarnessIndividual tasks 🏗️ Project HarnessTeam-scale projects 🏭 Dark FactorySpec in → software out 📈 Auto ResearchMetric optimization

How to Pick the Right Species: The Decision Flowchart

The most common mistake teams make is using the wrong agent species for the wrong kind of work. Use this decision tree:

Yes — conversion rate,latency, cost No — I needworking software Full control —I review everything Minimal — I trustthe evals Single task orfeature Team-scale,multi-file What kind of problemdo you have? Is it a metricyou want to optimize? 📈 Auto ResearchRun experiment loops How much humanoversight do you need? How complexis the project? 🏭 Dark FactorySpec in → software out 🔧 Coding HarnessClaude Code, Codex 🏗️ Project HarnessPlanner + executors

Real-World Agent Species in Production (2026)

Company Agent Species What They Built Result
Shopify (Tobi Lutke) Auto Research Optimized 20-year-old Liquid framework 53% faster runtime overnight
Anthropic (Boris Cherny) Coding Harness Claude Code — multiple parallel instances 70% productivity gain per engineer
Cursor Project Harness Browser + compiler via planner-executor Millions of lines, shipped to production
OpenAI (Sherwin Wu) Coding Harness 95% of engineers on Codex daily 70% more PRs from agentic-leaning engineers
Monday.com (Eran Zinman) Dark Factory Replaced 100-person SDR team with AI agents Response time: 24h → 3 minutes
Stripe Coding Harness Agent-authored PRs at scale 1,000+ PRs/week merged from agents
Karpathy Auto Research Autoresearch for GPT-2 optimization 700 experiments in 2 days, 11% speed gain

Where Taskade Genesis Fits in the Taxonomy

Taskade Genesis operates as a runtime dark factory — but with a critical difference from code-generating dark factories. Traditional dark factories produce code that still needs deployment, hosting, and maintenance. Genesis produces deployed, living applications with AI agents, automations, and database built in.

Traditional Dark Factory Prompt Genesis builds Live app+ agents+ automations+ database Spec Agent codes Code files Human deploys Human maintains

The Workspace DNA architecture maps directly to the species taxonomy: Memory (projects and databases) provides the context that all four species need. Intelligence (AI agents with 22+ built-in tools) provides the execution engine. Execution (automations with 100+ integrations) provides the reliable workflow layer. When a user prompts Genesis, the system acts as an integrated dark factory where the "spec" is the prompt, the "eval" is the live application, and the "human review" is the builder iterating in real time.

For teams that want to experience agentic engineering without building their own agent infrastructure — no harness configuration, no prompt engineering, no deployment pipeline — Genesis is the fastest path from intent to deployed system. Over 150,000 apps built and counting.

The Anti-Patterns: What Goes Wrong

The anti-patterns are just as important as the patterns:

Anti-Pattern Why It Fails What to Do Instead
Using auto research to build software Auto research optimizes metrics, not produces working code Use a coding harness or dark factory
Calling individual assistants a "dark factory" Going to make coffee for 20 min ≠ autonomous pipeline Be honest about human involvement level
Adding complexity to agent architectures Cursor tried 3 management levels — failed. Manus rebuilt 5 times, got simpler Keep the harness simple; complexity kills agents
Skipping decomposition for individual harnesses Decomposition is the skill Break problems into agent-sized chunks first
Using orchestration at low scale Coordination overhead exceeds value under 1,000 items Use a simple coding harness instead

The deeper lesson: the art of building good agents is often the art of finding different simple configurations that enable the agent to do the particular work you have in front of you. Frame your work around making it easy for the agent — not around keeping the human at the center of everything.

OpenAI's Internal Evidence

The shift from coder to conductor is not theoretical — it is already the default at the companies building the models themselves. Sherwin Wu, head of engineering for OpenAI's API and developer platform, shared that 95% of OpenAI engineers use Codex daily and 100% of PRs are reviewed by Codex. Engineers who lean into agentic tools open 70% more PRs than those who do not, and the gap is widening.

"Engineers are becoming tech leads. They're managing fleets and fleets of agents. It literally feels like we're wizards casting all these spells. And these spells are kind of like going out and doing things for you." — Sherwin Wu, OpenAI

Wu described engineers running 10 to 20 parallel Codex threads simultaneously — not actively coding, but steering agents, checking output, and providing feedback. One internal team is maintaining a 100% Codex-written codebase with no human escape hatch, forcing them to solve the exact context and documentation problems that agentic engineering principles address.

The biggest lesson from that experiment: when agents fail, the problem is almost always context — underspecified instructions or missing tribal knowledge. The fix is encoding that knowledge into the codebase via documentation, .md files, and structured code comments — exactly the kind of specification-first discipline that Osmani's five principles demand.

The pace of reinvention required is staggering. As Harry Stebbings observed on the 20VC podcast (2026): "The prize for winning is to reinvent the company from scratch and the product from scratch every 6 to 9 months." Companies that treat agentic engineering as a one-time adoption rather than a continuous discipline will fall behind.

Inside Claude Code: Building for the Model 6 Months From Now

Boris Cherny, the creator of Claude Code and a former Meta principal engineer, revealed a design philosophy that captures the essence of agentic engineering. In a 2025 interview, Cherny described the principle that guides Claude Code's development:

"Don't build for the model of today. Build for the model 6 months from now."

The product should get better as models improve — without changing any code. This is the opposite of traditional software engineering, where features are hand-built for current capabilities. Claude Code's architecture is designed so that smarter models automatically unlock better agentic workflows.

Cherny also described how agentic engineering has already transformed Anthropic internally: even though the company tripled in size, productivity per engineer grew ~70% because of Claude Code. Engineers run multiple Claude Code instances in parallel, let them work for hours, and return to completed PRs. Cherny gives agents tools like Puppeteer so they can see UI and self-correct — exactly the kind of feedback loop that distinguishes agentic engineering from passive code generation.

The hiring philosophy at Anthropic reinforces the shift. Cherny's Claude Code team recruits generalists — engineers who code, do product work, design, and talk to users:

"Our product managers code, our data scientists code, our user researchers code a little bit. I just love these generalists."

This is Osmani's "coder to conductor" transition made concrete. When the AI handles most implementation, the engineer who can think across product, design, and infrastructure becomes the highest-leverage contributor. Cherny's career arc — from building Undux (React state management) and writing the TypeScript book, to directing AI agents at Anthropic — is itself the agentic engineering story in miniature.

One more principle from Cherny that crystallizes the discipline: latent demand — the most important principle in product development. At Meta, 40% of Facebook Group posts were buy/sell activity. Users were already doing commerce; Marketplace just formalized it. The same pattern drives agentic engineering adoption: developers were already copy-pasting code from ChatGPT into their editors. Claude Code just formalized the workflow.

"You can never get people to do something they do not yet do. Find the intent they have and steer it." — Boris Cherny


The Standards War (Late 2025 – 2026)

The Agentic AI Foundation — AAIF (December 2025)

On December 9, 2025, the Linux Foundation announced the formation of the Agentic AI Foundation (AAIF) — the first neutral governance body for AI agent standards.

Founding contributions:

  • Anthropic → Model Context Protocol (MCP)
  • Block → goose (open-source local-first agent framework)
  • OpenAI → AGENTS.md (project-specific guidance standard)

Platinum members: AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI.

This was unprecedented. The companies building the most advanced AI systems — companies that compete fiercely on model quality — agreed to collaborate on the standards that connect those models to the real world.

Google's Agent2Agent Protocol — A2A (2025)

Google launched the Agent2Agent (A2A) protocol in April 2025 with support from over 50 partners including Salesforce, SAP, and ServiceNow. While MCP standardizes how agents connect to tools, A2A standardizes how agents communicate with each other.

The emerging stack:

Layer Standard Purpose Governed By
Agent-to-Tool MCP Connect agents to external tools and data AAIF (Linux Foundation)
Agent-to-Agent A2A Inter-agent communication and coordination Linux Foundation
Agent-to-Project AGENTS.md Project-specific agent configuration AAIF

The Enterprise Adoption Wave

Gartner and McKinsey data paint a clear picture of where the industry is heading:

Metric Value Source
Enterprise apps with AI agents by end of 2026 40% (up from <5% in 2025) Gartner
Enterprise software with agentic AI by 2028 33% Gartner
Agentic AI annual value potential $2.6T–$4.4T McKinsey
Median ROI for mature implementations 540% McKinsey
Organizations investing in agentic AI 61% (19% significant, 42% conservative) Gartner
Agentic AI projects canceled by end of 2027 >40% Gartner
Day-to-day decisions made by agentic AI by 2028 15% (up from 0% in 2024) Gartner

The last statistic is sobering: Gartner predicts over 40% of agentic AI projects will be canceled by 2027. Agentic engineering is not magic. Without the discipline Karpathy and Osmani describe, agent projects fail.


Karpathy's Autoresearch: Agentic Engineering in Action (March 2026)

On March 7, 2026, Karpathy open-sourced autoresearch — a 630-line Python tool that lets AI agents run autonomous ML experiments on a single GPU. It was not just a tool release. It was a live demonstration of every agentic engineering principle.

How It Works

Autoresearch gives an AI agent a small but real LLM training setup and lets it experiment overnight:

  1. Agent reads human-provided instructions (the spec)
  2. Agent modifies training code — architecture, optimizers, hyperparameters
  3. Training runs for exactly 5 minutes per experiment
  4. Agent evaluates results against an unambiguous metric: validation bits-per-byte (lower is better)
  5. Agent keeps or discards the change
  6. Repeat — approximately 12 experiments per hour, ~100 experiments overnight
    AUTORESEARCH: AGENTIC ENGINEERING IN PRACTICE
    ══════════════════════════════════════════════

HUMAN (Agentic Engineer)          AI AGENT
┌─────────────────────┐          ┌─────────────────────┐
│ 1. Write spec       │────────►│ 2. Read instructions │
│ 2. Set metric       │          │ 3. Modify code       │
│ 3. Review results   │◄────────│ 4. Train (5 min)     │
│ 4. Adjust direction │          │ 5. Evaluate metric   │
│                     │          │ 6. Keep or discard   │
│                     │          │ 7. Repeat x100       │
└─────────────────────┘          └─────────────────────┘

Principles demonstrated:
✓ Plan before prompting (human writes spec)
✓ Direct with precision (5-min time budget, single metric)
✓ Test relentlessly (every experiment evaluated)
✓ Own the system (human reviews final results)

The Three-File Architecture: Why Autoresearch Works

Autoresearch's elegance lies in a strict three-file constraint that prevents the agent from gaming its own evaluation:

  1. program.md — The human-written instruction file. Defines the goal, constraints, and rules the agent must follow. This is the most important file — the human setting the objective. Karpathy optimized his own program.md extensively before letting the agent run.

  2. train.py — The one file the agent can modify. This could be training code, a configuration, a prompt template, a marketing script — literally anything you want optimized. The constraint is crucial: one file, not two, not zero.

  3. prepare.py — The evaluation script the agent cannot touch. This defines what "better" means. Without this restriction, the agent could rewrite the scoring function to fake its results. The metric must be unambiguous and automatically computable.

The fixed 5-minute time budget per experiment is equally critical. By giving every experiment the same compute budget, the system ensures fair comparison — only the raw quality of the idea wins, not how long the agent trains. As Karpathy explains: if you give one applicant seven days and another seven minutes, the results are meaningless. Equal time makes every experiment directly comparable.

TL;DR: One file to change, one metric to chase, one time budget per experiment. If you can score it, you can auto-research it.

What the Agent Actually Found

The results were remarkable — not just for the improvements, but for what they revealed about agent-driven research:

  • 56% improvement in validation bits-per-byte (val_bpp) on the Tiny Stories dataset — a metric where even single-digit gains are considered significant in language modeling
  • The agent discovered and fixed bugs in the training code that humans had missed for years — subtle issues in data loading and gradient accumulation that only surfaced through systematic experimentation
  • Running on a single consumer GPU, the agent matched or exceeded results that would typically require a researcher spending days or weeks of manual hyperparameter tuning
  • The agent's experimentation log showed it developing what Karpathy called "intuition by brute force" — trying architectural modifications, learning rate schedules, and tokenization changes that a human researcher might dismiss but that yielded measurable gains
AUTORESEARCH RESULTS (TINY STORIES DATASET)
═══════════════════════════════════════════

Metric: val_bpp (validation bits-per-byte — lower = better)

Baseline (human config) ████████████████████████████ 1.00x
After 50 experiments ██████████████████████ 0.78x
After 100 experiments ████████████████ 0.62x
Final (agent-optimized) ████████████ 0.44x ← 56% improvement

Key findings by the agent:
✓ Fixed data loading bug (missed by humans for years)
✓ Discovered non-obvious learning rate schedule
✓ Identified optimal tokenization strategy
✓ Found architecture modifications humans wouldn't try

The most telling metric was not the improvement itself but the prediction accuracy: on a held-out set of Tiny Stories completions, the agent-optimized model's predictions were nearly indistinguishable from human predictions — approaching the theoretical floor of what is predictable given the randomness inherent in language.

Real-World Impact

Following the release, Shopify CEO Tobi Lutke adapted the autoresearch framework internally. An agent-optimized smaller model achieved a 19% improvement in validation scores, eventually outperforming a larger model configured through standard manual methods.

This was agentic engineering working exactly as Karpathy described: human sets the goal, agent executes autonomously, results are objectively measurable, and the human reviews and adjusts direction. The autoresearch experiment proved something deeper: agents are not just automating existing research workflows — they are finding things humans miss, because they test hypotheses a human would dismiss as unlikely and they never get tired of systematic iteration.

Beyond Text: Autoresearch for Music Generation

The autoresearch framework proved its generality when developers applied it beyond text to ABC notation sheet music — training a model on the Sanderwoods Irishman dataset of traditional Irish folk music. The results demonstrated that autoresearch's power extends to any domain with a measurable objective:

  • Baseline: val BPB of 2.08 (model essentially lost, producing garbled notation that sounded like "a child running on a piano")
  • After 18 experiments: val BPB dropped to 0.97 — a 53% improvement
  • The optimized model produced coherent melodies with proper chord progressions, bar structure, and musical rhythm
  • Key insight: the optimal strategy for small, structured, low-entropy datasets was making the model smaller and faster to see the data more times within the 5-minute budget, rather than building a larger model that barely completes one pass

The winning configuration: aspect ratio of 32, head dimension of 64, batch size of 2^14, depth of 8, and 5% warm-up — discovered entirely by the agent through systematic experimentation. The biggest single win came from reducing batch size (4x more optimizer steps), not from increasing model capacity. This counterintuitive finding — that for structured data, throughput beats capacity — is exactly the kind of insight agents find because they test hypotheses humans dismiss.

Autoresearch as a Work Primitive

The deeper significance of autoresearch extends beyond ML research. The concept of an iterative agentic loop — define goal, execute experiment, measure result, keep or discard, repeat — is emerging as a new fundamental work primitive.

Work primitives are basic building blocks so fundamental they show up everywhere across roles and industries. New ones don't appear often. The last major primitive was arguably the spreadsheet (1979). Autoresearch demonstrates that agentic loops may be the next one:

  • A/B testing for marketing — agent writes landing page variants, sends traffic, measures conversions, keeps winners, iterates indefinitely
  • Niche optimization agents — Amazon listing experimenter, email sequence tuner for realtors, SaaS pricing optimizer — each a packaged autoresearch loop tuned for one painful niche
  • Trading signal generation — agent runs backtests of simple trading rules overnight, keeps promising strategies
  • CRM lead qualification — agent tests scoring rules and follow-up messages against conversion data, surfaces only high-value leads
  • Internal productivity labs — define KPIs (response time, close rate, ticket resolution), let agents iterate on workflows, templates, and routing rules

The scale of this shift is staggering. As marketing strategist Eric Sue observed: "Most marketing teams run 30 experiments per year. The next generation will run 36,000 — roughly 100 per day." Each experiment follows the same autoresearch pattern: agent modifies the copy, measures conversions, decides whether to keep or discard.

Practical autoresearch use cases emerging in 2026:

  • Website performance optimization — Agent tweaks CSS, JavaScript, and asset loading; measures page load time via Puppeteer benchmarks; keeps improvements, reverts regressions. In one demo, a portfolio site went from 50ms to 25ms load time — a 50% improvement — in under 4 minutes of autonomous iteration
  • Trading strategy refinement — Agent adjusts buy/sell rules and risk parameters across years of historical market data, scoring each experiment by its Sharpe ratio (risk-adjusted returns). Hundreds of strategies tested overnight while the trader sleeps
  • Prompt engineering at scale — Agent fine-tunes system instructions behind AI agents, testing different phrasing, tone levels (beginner, PhD-level), even different languages to find which prompt configuration produces the best task completion rate
  • Open-source model compression — Developers point autoresearch at open-source LLMs to find configurations that run faster on consumer hardware. The prediction: Sonnet-quality models running on iPhones within months, discovered entirely through agent-driven experimentation
  • Email and ad creative testing — Agent generates subject lines, body copy, and CTA variants; sends to test segments; measures open rates and click-through; iterates 100x faster than any human marketing team

The key to successful autoresearch in business contexts is the metric hierarchy — a three-tier scoring system that prevents agents from gaming shallow metrics:

Tier Role Example (Email Marketing) Example (Landing Page)
Primary The metric you optimize for Reply rate Conversion rate
Secondary Supporting metrics that validate quality Open rate, click-through rate Time on page, scroll depth
Guardrail Hard limits the agent cannot violate Unsubscribe rate < 2%, spam rate < 0.1% Bounce rate < 40%, load time < 3s

Without guardrail metrics, agents find shortcuts — subject lines that maximize opens but tank conversions, or landing pages that convert but load so slowly they lose 60% of visitors. The metric hierarchy is intent engineering applied to autoresearch: primary metrics define what you want, guardrail metrics define what you will not sacrifice to get it.

A practical 4-week autoresearch implementation roadmap:

Week Focus Deliverable
Week 1 Define metric + baseline Primary metric chosen, guardrail limits set, baseline measured across 7 days
Week 2 Build the loop Agent configured with one editable variable, evaluation automated, first 50 experiments run
Week 3 Analyze + refine Review winning experiments, adjust metric hierarchy if guardrails triggered, expand to secondary variables
Week 4 Scale + systematize Move from single variable to multi-variable optimization, document learnings, share pattern with team

In Taskade Genesis, this maps to: Week 1 — create a project with your baseline data. Week 2 — train an AI agent to modify and test one variable. Week 3 — set up automations to run the loop on schedule. Week 4 — expand the workspace with additional agents for multi-variable experiments.

Stripe CEO Patrick Collison and Shopify CEO Tobi Lutke have both publicly endorsed the pattern — recognizing that autoresearch is not limited to ML but applies to any measurable business process.

Shopify CEO Tobi Lutke captured this shift: "Auto research works even better for optimizing any piece of software. Make an auto folder. Add a program.md and a bench script. Make a branch and let it rip."

Autoresearch Loop Better Worse direction ML Researchval BPB ↓56% Music Generationval BPB ↓53% A/B Testingconversion rate Trading Signalsbacktest returns CRM Optimizationlead conversion ProductivityKPI improvement Apps Define Goal+ Metric Agent RunsExperiment MeasureResult Keep Change Discard

The pattern is the same everywhere: human defines the objective and evaluation metric, agent executes the search autonomously, results are measured against ground truth. The only things that change are the domain, the search space, and the metric. This is agentic engineering distilled to its essence.

The Three Conditions (And Where Autoresearch Fails)

Autoresearch works when three conditions are met simultaneously:

  1. A clear metric — One number with a clear direction (lower latency, higher conversion rate, better Sharpe ratio). Not a committee vote, not a feeling, not "does this look good?"
  2. An automated evaluation — No human in the loop during the experiment cycle. If you need a human to judge each result, the loop runs at human speed and loses its power. The evaluation must be scriptable.
  3. One file the agent can change — A single, bounded search space. Multiple files create combinatorial explosion that agents handle poorly.

Remove any one condition and the loop breaks:

Missing Condition What Happens
No clear metric Agent optimizes in a random direction with high confidence
Human in the loop Loop slows to human speed; no longer runs while you sleep
Multiple files to edit Combinatorial search space; agent makes conflicting changes

Where autoresearch fails: Brand design, UX feel, pricing strategy (for low-traffic sites), editorial voice — anything where "better" is subjective. If the success criterion is a judgment call or a feeling, the agent cannot tell what is working. It will optimize confidently in the wrong direction.

The key insight: if you give it a bad metric, it will very confidently optimize the wrong thing. Choosing the right metric is the human skill that makes autoresearch valuable — and the skill that will separate practitioners from amateurs in the agentic engineering era.

AgentHub: GitHub for Agents

Following autoresearch, Karpathy launched AgentHub — an agent-first collaboration platform described as "GitHub for agents." Where GitHub organizes human collaboration around branches, PRs, and merges, AgentHub strips all of that away:

  • No main branch — a sprawling DAG of commits in every direction
  • No PRs or merges — agents commit directly
  • Message board — agents coordinate via a shared message board rather than code review
  • First use case: autoresearch, but designed to be far more general

AgentHub represents a vision where agent swarms work on the same codebase simultaneously, each exploring different directions. The first use case was autoresearch — multiple agents running parallel experiments on the same training code — but the architecture supports any collaborative agent workflow.

As Karpathy wrote: "Think of it like a stripped-down GitHub where there's no main branch, no PRs, no merges — a sprawling DAG of commits in every direction with a message board for agents to coordinate." The repo already has 25,000+ GitHub stars.

GitHub (Humans) Evolution direction DAG of CommitsEvery Direction Message BoardAgent Coordination Parallel ExperimentsNo Merges Needed AH Main Branch Pull Requests Code Review Merge

Karpathy's SETI@Home Vision for AI Research

Karpathy's end vision for autoresearch reaches far beyond individual experiments. In the early 2000s, the SETI@Home project let anyone donate spare computer power to search for extraterrestrial intelligence. Karpathy envisions the same model for AI research: millions of AI agents distributed across thousands of computers, with humans allocating where that research effort goes.

This is not speculative — it is the logical extension of autoresearch + AgentHub. If one agent running 100 experiments overnight can achieve a 56% improvement on a training benchmark, what happens when thousands of agents run millions of experiments across distributed infrastructure? The answer is recursive self-improvement at civilization scale — and Karpathy believes we may already be in its early stages.

"We might be in the early stages of the singularity."

Every frontier AI lab — OpenAI, Anthropic, Google DeepMind — is investing tens of millions in researchers doing essentially this same work manually. Karpathy made the pattern open-source and accessible to anyone with a GPU and a clear metric.

Karpathy's Claws: The Layer Above Agents

In his March 2026 interview on the No Briars podcast, Karpathy described a new abstraction layer above agents called claws — persistent autonomous entities with their own sandboxes, looping independently, with sophisticated memory systems:

"It really when I say a claw I mean this layer that takes persistence to a whole new level. It's not something that you are interactively in the middle of. It kind of like has its own little sandbox, does stuff on your behalf even if you're not looking."

His personal claw, Dobby the House Elf, controls his entire home. The discovery was startling in its simplicity — he told an agent "I think I have Sonos at home. Can you try to find it?" The agent did an IP scan of the local network, found the Sonos system (which had no password protection), reverse-engineered the APIs, and played music. Three prompts from discovery to playback.

"I can't believe I just typed in 'can you find my Sonos?' And suddenly it's playing music."

Dobby now controls lights, HVAC, shades, the pool and spa, and a security camera system where a Quinn vision model watches camera feeds via change detection and sends WhatsApp alerts — "Hey, a FedEx truck just pulled up." Six separate smart home apps replaced by one natural language interface.

"I used to use like six apps, completely different apps and I don't have to use these apps anymore. Dobby controls everything in natural language. It's amazing."

The implications extend beyond home automation. Karpathy sees claws as the consumer-ready layer of AI — where agents are still semi-finished primitives requiring interactive guidance, claws are autonomous entities that maintain state, make decisions, and execute without human intervention. The hierarchy is clear: LLMs (raw token generators) → Agents (semi-finished) → Claws (consumer-ready, deployable).

AI Abstraction Layers direction Lights & HVAC Sonos & Music Security CamerasVision Model + WhatsApp Pool & Spa DOBBY LLMsRaw Token Generators AgentsSemi-Finished, Interactive ClawsConsumer-Ready, Autonomous

For builders on Taskade Genesis, the claw pattern maps directly to Workspace DNA: Memory provides the persistent state, AI Agents provide the intelligence, and Automations provide the autonomous execution loop. A Genesis app with workspace memory, trained agents, and triggered automations is functionally a claw — a system that acts on your behalf without requiring your presence.

The Multi-Agent Reality: Token Throughput as the New Metric

The interview revealed how top practitioners actually work with agents in 2026. Karpathy described the Peter Steinberg model — multiple Codex agents displayed on a monitor wall, each running ~20-minute tasks across 10+ repository checkouts simultaneously:

"It's not just like here's a line of code, here's a new function. It's like here's a new functionality and delegate it to agent one. Here's a new functionality that's not going to interfere with the other one. Give it to two."

The developer's role becomes orchestration at the macro action level — research agent, code agent, planning agent, all running in parallel. The metric that matters is no longer lines of code or features shipped. It is token throughput:

"What is your token throughput and what token throughput do you command? I feel nervous when I have subscription left over — that just means I haven't maximized my token throughput."

Karpathy compared this to his PhD days when idle GPUs felt like wasted potential. The resource anxiety shifted from FLOPs to tokens. And when capability outstrips what any individual can direct, the diagnosis is always the same:

"It all kind of feels like skill issue when it doesn't work. It's not that the capability is not there. It's that you just haven't found a way to string it together of what's available."

This framing — that agent limitations are configuration problems, not capability problems — has profound implications for agentic engineering. The agents.md file, the memory system, the parallelization strategy — these are the new engineering skills. Karpathy's progression rule maps the path: single session → multiple agents → agent teams → claws → optimization over claws.

Karpathy's autoresearch is part of a broader wave of autonomous AI research systems emerging in 2025-2026: Google DeepMind's FunSearch discovered new mathematical constructions by having LLMs write and evaluate programs. Weco AI's AIDE automates ML engineering pipelines end-to-end. Sakana AI's The AI Scientist generates research hypotheses, runs experiments, and writes papers. What unites them all is the agentic engineering pattern: human defines the objective and evaluation metric, agent executes the search, results are measured against ground truth.

Evolutionary Agents: From Stepping Stones to Scientific Discovery

Agentic engineering is not limited to coding and deployment. In March 2026, Sakana AI published Shinka Evolve — a system that uses frontier LLMs as mutation operators inside an evolutionary algorithm to discover new solutions to open mathematical and scientific problems.

The architecture mirrors agentic engineering principles. A population of programs is maintained in a database. Parent programs are sampled, paired with "inspiration" programs, and handed to an LLM that proposes mutations — diffs, full rewrites, or crossovers between two parents. Each mutated program is evaluated against a fitness function, and successful innovations propagate through the tree.

Three innovations made Shinka Evolve remarkably sample-efficient, matching or exceeding Google DeepMind's Alpha Evolve results in under 200 program evaluations:

  1. Multi-model ensembling with bandit selection — Instead of using a single frontier model, Shinka Evolve ensembles models from OpenAI, Anthropic, and Google, using an Upper Confidence Bound (UCB) algorithm to adaptively select which model proposes each mutation. Different models excel at different types of edits, and the system learns which to deploy when.

  2. Meta scratch pad — Programs are summarized, and global insights are extracted and fed back into the system prompt. This creates a form of semantic memory — the evolutionary process accumulates not just better programs but better understanding of why they work.

  3. Adaptive operator selection — The algorithm itself co-evolves alongside the solutions. The evolutionary strategy adapts on the fly — hence the name: Shinka means "evolve" in Japanese, so Shinka Evolve literally means "evolve evolve."

The deepest insight from this work echoes Kenneth Stanley's Why Greatness Cannot Be Planned: sometimes solving the wrong problem works better. Shinka Evolve's circle packing experiments showed that using a relaxed fitness function (allowing tiny circle overlaps as a surrogate problem) converged faster than the exact formulation. The surrogate problem served as a stepping stone — a concept from open-endedness research where intermediate discoveries enable future breakthroughs even when they do not directly solve the target problem.

This has profound implications for agentic engineering. Current AI agents optimize for the exact problem they are given. But human researchers routinely reformulate problems, invent proxies, and transfer insights across domains. The next frontier of agentic systems — what Robert Lange calls "vibe optimization" and "vibe researching" — envisions AI shepherds overseeing populations of evolving solutions across parallel threads, checking results in the morning like a researcher reviewing overnight experiments.

The connection to Workspace DNA is structural: Memory stores the population of solutions and accumulated insights. Intelligence (multi-model agents) proposes mutations and evaluates fitness. Execution runs the evaluations and propagates successful innovations. The evolutionary loop is the Memory-Intelligence-Execution cycle, operating at the frontier of scientific discovery.


The Shopify Precedent: Agentic Engineering Goes Corporate

Shopify's adoption of agentic engineering principles deserves special attention because it shows where every company is heading.

In April 2025, Shopify CEO Tobi Lutke sent an internal memo that became public:

"Reflexive AI usage is now a baseline expectation at Shopify."

The key mandate: before requesting additional headcount, teams must demonstrate why they cannot accomplish the work using AI. The memo asked teams to consider: "What would this area look like if autonomous AI agents were already part of the team?"

This is agentic engineering applied to organizational design — not just code, but every knowledge work function.

Monday.com CEO Eran Zinman shared a concrete example of this shift on the 20VC podcast (2026): his company replaced its entire 100-person SDR team with AI agents, cutting response times from 24 hours to 3 minutes while improving conversion rates across every metric. All Monday.com developers now use Claude Code and Cursor. "Nobody will want to buy software that's not doing the majority of the work for them," Zinman said — a statement that makes agentic engineering not optional but existential for software companies.


How Taskade Genesis Embodies Agentic Engineering

When Karpathy described agentic engineering — "orchestrating agents who do and acting as oversight" — he described the architecture Taskade Genesis has been building since launch.

The Workspace DNA Architecture

Taskade Genesis implements agentic engineering through three pillars that form a self-reinforcing loop:

Agentic Engineering Principle Workspace DNA Pillar Implementation
Persistent context Memory (Projects) Projects store data, history, and context across 8 views (List, Board, Calendar, Table, Mind Map, Gantt, Org Chart, Timeline)
Autonomous execution Intelligence (Agents) AI Agents v2 with 22+ built-in tools, custom tools via MCP, persistent memory, multi-agent collaboration
Reliable workflows Execution (Automations) Automations with durable execution, 100+ integrations, branching/looping/filtering

Memory feeds Intelligence → Intelligence triggers Execution → Execution creates Memory. This is not a marketing framework. It is the engineering architecture that makes agentic engineering practical at scale.

feeds triggers creates Memory(Projects as Databases)8 views, persistent context Intelligence(AI Agents v2)22+ tools, multi-agent Execution(Automations)100+ integrations

Why Platform Beats Framework

The tools comparison for agentic engineering reveals a critical insight:

Approach Example Requires Deploys To Maintains Via
Code generator Cursor, Devin Developer skills Separate hosting Manual updates
Agent framework CrewAI, LangGraph Python skills BYO infrastructure Custom code
AI workspace Taskade Genesis Natural language Instant (built-in) Agents + automations

For the 63% of AI-assisted builders who are non-developers, Taskade Genesis is the only platform that implements all five agentic engineering principles without requiring code:

  1. Plan → Write a detailed prompt (the spec) — or grab one from the prompt template library
  2. Direct → AI agents build the app using 11+ frontier models from OpenAI, Anthropic, and Google
  3. Review → Interact with the live app immediately
  4. Test → Iterate by describing changes
  5. Own → AI agents and automations maintain the system over time

150,000+ apps built. Custom domains, password protection, Community Gallery publishing, 7-tier RBAC (Owner, Maintainer, Editor, Commenter, Collaborator, Participant, Viewer).

Taskade Genesis feature capabilities — the full platform for agentic engineering


The Complete Timeline: From Turing to Agentic Engineering

Year Event Significance for Agentic Engineering
1950 Turing's "Computing Machinery and Intelligence" First formal framework for machine intelligence
1956 Dartmouth Conference — "AI" coined Field gets a name
1986 Backpropagation (Hinton) Neural networks can learn
1997 Deep Blue defeats Kasparov AI beats humans at complex strategy
2012 AlexNet wins ImageNet Deep learning revolution begins
2015 OpenAI founded (Karpathy co-founds) Mission: safe, beneficial AGI
2016 AlphaGo defeats Lee Sedol AI handles ambiguous, long-horizon planning
2017 "Attention Is All You Need" (Transformer) Architecture that enables everything
2017 Karpathy joins Tesla as Director of AI Real-world AI deployment at scale
2018 GPT-1 Unsupervised pre-training works
2020 GPT-3 (175B parameters) Emergent few-shot learning
2022 Chain of Thought prompting (Wei et al.) LLMs can reason step-by-step
2022 ReAct: Reasoning + Acting (Yao et al.) Think → Act → Observe loop
Nov 2022 ChatGPT launches AI goes mainstream (100M users in 2 months)
Feb 2023 Toolformer (Meta) LLMs learn to use external tools
Mar 2023 AutoGPT released 100K+ stars, autonomous agents go viral
Apr 2023 BabyAGI released Minimalist agent loop proves the pattern
Jun 2023 Lilian Weng's agent architecture post Definitive reference for agent design
2023 LangChain ecosystem emerges Agent orchestration infrastructure
Feb 2024 Karpathy leaves OpenAI, founds Eureka Labs Independent AI education and research
Mar 2024 Devin announced (Cognition) "First AI software engineer" — 13.86% SWE-bench
Sep 2024 OpenAI o1-preview First reasoning model, think-before-answer
Nov 2024 Anthropic releases MCP Universal agent-tool protocol
Dec 2024 OpenAI o3 preview 87.5% on ARC-AGI benchmark
Feb 2025 Karpathy coins "vibe coding" "Forget the code exists" — goes viral
Apr 2025 Google launches A2A protocol Agent-to-agent communication standard
Apr 2025 Shopify memo: "Reflexive AI usage" Enterprise agentic engineering mandate
Jun 2025 Karpathy YC keynote: Software 3.0 Natural language as programming interface
Aug 2025 GPT-5 launches Algorithmic efficiency > brute-force scale
Nov 2025 Collins Dictionary: "vibe coding" Word of Year Cultural mainstreaming of AI-assisted building
Dec 2025 AAIF formed (Linux Foundation) Neutral governance for agent standards
Dec 2025 Karpathy: 2025 LLM Year in Review 6 paradigm shifts, "ghosts on your computer"
Feb 2026 Karpathy coins "agentic engineering" Declares vibe coding passe
Feb 2026 Osmani publishes agentic engineering principles 5 principles become industry consensus
Mar 2026 Karpathy releases autoresearch Live demo of agentic engineering in ML research

What Comes Next: The Agentic Engineering Roadmap

The trajectory from vibe coding to agentic engineering points to a clear future:

Phase 1: Vibe Coding (2025) — Completed

Humans prompt, AI generates, humans accept or reject. Minimal oversight, minimal quality control. Proved the concept: AI can write functional software.

Phase 2: Agentic Engineering (2026) — Current

Humans architect and oversee, AI agents implement with human review. The middle loop emerges. Quality improves dramatically. The discipline gets a name and principles.

Phase 3: Supervised Autonomy (2027–2028)

AI agents handle entire subsystems with human checkpoint reviews. Agents run test suites, fix their own bugs, and flag only high-risk changes for human review. The middle loop becomes shorter and more focused.

Phase 4: Autonomous Systems (2029+)

AI agents build, maintain, and improve software autonomously. Humans set goals and constraints; agents handle everything else. Karpathy's "tokens tsunami" — tight agentic loops requiring massive token throughput — becomes the dominant compute workload.

Taskade Genesis is built for this trajectory. Workspace DNA — Memory, Intelligence, Execution — provides the foundation where each phase builds on the previous one. Today's agentic engineering becomes tomorrow's supervised autonomy, all within the same workspace.

Human prompts, AI generates Add oversight + discipline Agents handle subsystems Agents build autonomously Phase 2 — Agentic Engineering (2026) Phase 3 — Supervised Autonomy (2027-28) Phase 4 — Autonomous Systems (2029+) VC

Taskade automations — durable execution powering agentic engineering workflows


The Agentic Engineering Stack (2026)

For Non-Developers

Layer Tool Purpose
Specification Natural language prompt Define what to build
Building Taskade Genesis AI agents build the app
Infrastructure Taskade Workspace Database, hosting, security, 8 views
Intelligence Taskade AI Agents 22+ tools, persistent memory, multi-agent
Automation Taskade Automations 100+ integrations, durable execution
Deployment Instant (built-in) Custom domains, password protection

For Developers

Layer Tool Options Purpose
Specification Design docs, structured specs Define architecture + requirements
Building Cursor, Claude Code, Devin, Taskade Genesis AI agents write code
Orchestration LangGraph, CrewAI, AutoGen Multi-agent coordination
Testing TDD frameworks, CI pipelines Deterministic validation
Standards MCP, A2A, AGENTS.md Interoperability
Deployment CI/CD, or Taskade for instant deploy Ship to production

The Convergence

The agentic engineering landscape is moving toward what industry analysts call the Agentic Mesh — a modular ecosystem where different tools specialize in different layers:

Layer Best Tool Function
End-user apps Taskade Genesis Non-developers build living software
Business automation CrewAI Role-based multi-agent workflows
Enterprise orchestration LangGraph Production agent systems
Code development Cursor, Devin, Claude Code AI-assisted engineering
Standards MCP + A2A (AAIF) Universal interoperability
Model infrastructure OpenAI, Anthropic, Google Foundation models

The winning strategy is not choosing one tool. It is choosing the right tool for each layer. For most teams, that means Taskade Genesis for end-user applications and team tools, combined with developer-focused agents for custom engineering work.

Start practicing agentic engineering →


Related Reading

  • From Vibe Coding to Agentic Engineering: What Karpathy's New Term Means — Deep dive on the paradigm shift
  • Agentic Engineering Tools and Platforms — 10+ platforms compared
  • What Is Vibe Coding? — The foundational concept Karpathy evolved from
  • Best Claude Code Alternatives — Terminal-first AI coding agents compared
  • Best OpenClaw Alternatives — Managed alternatives to the open-source agent framework
  • Best Vibe Coding Tools — 15 tools for the full spectrum
  • What Is OpenAI? Complete History — The company behind GPT and the agent revolution
  • What Is Anthropic? History of Claude AI — MCP, Claude Code, and the safety-first approach
  • What Are AI Agents? — Foundational guide to AI agents
  • How Workspace DNA Works Inside Taskade Genesis — The architecture behind it
  • Taskade Genesis Reviews — What users are building with agentic engineering
  • Vibe Coding vs No-Code vs Low-Code — How AI app building compares
  • What Are AI Micro Apps? — The output of agentic engineering at scale
  • Vibe Coding for Teams — Team-level agentic engineering in practice
  • Best OpenClaw Alternatives — Open-source agent frameworks compared
  • Claude Code Alternatives — Terminal-based AI coding tools
  • AI Prompts Library — 1,000+ ready-to-use prompts for agentic workflows
  • AI Convert Tools — Transform content with AI agents

Context Engineering: The Foundation of Agentic Systems

Context engineering is the discipline of designing the information environment that AI agents operate in — what data they can access, which documents they reference, what tools they can call, and how instructions are structured. The term gained traction in 2026 through Gartner research and Phil Schmid at Hugging Face, who argued that most agent failures are not model failures but context failures.

The relationship between context engineering and agentic engineering is hierarchical. Context engineering is the foundation; agentic engineering is the execution layer built on top of it.

The Vercel case study illustrates this perfectly. When Vercel's team analyzed their AI coding agent's accuracy, they discovered that removing unnecessary tools from the agent's context — giving it fewer options, not more — pushed accuracy from 80% to 100%, reduced token usage by 40%, and made responses 3.5x faster. The lesson: better context beats bigger models.

This aligns with the EPICS benchmark findings (2026), which tested frontier models on real professional tasks across engineering, product management, and customer support. The result: even the best models achieved only 24% success on authentic workplace tasks. The bottleneck was not model intelligence — it was context. Models failed when they lacked the right documents, the right tool access, or the right framing of the problem.

Prompt EngineeringSingle-turn instructions Context EngineeringData, docs, tools, memory Harness EngineeringPipelines, guardrails, routing Agentic EngineeringOrchestration, autonomy, oversight

Each layer builds on the previous. Prompt engineering handles single-turn instructions. Context engineering designs what the model sees. Harness engineering adds pipelines, guardrails, and routing. Agentic engineering adds autonomous decision-making, multi-step execution, and human oversight loops.

Taskade's Workspace DNA implements all four layers natively: Memory provides context (documents, knowledge bases, project history), Intelligence provides agentic capabilities (AI agents with 22+ tools and persistent memory), and Execution provides harness-level automation (100+ integrations with branching, looping, and error handling).


Intent Engineering: The Third Discipline

Prompt engineering taught us how to talk to AI. Context engineering taught us what AI needs to know. Intent engineering — the discipline emerging in 2026 — teaches us what AI needs to want.

The distinction matters because AI agents that succeed at the wrong objective cause more damage than agents that fail entirely. In January 2026, fintech company Klarna reported that its AI agent handled 2.3 million customer conversations across 23 markets in 35 languages, doing the work of 700 full-time employees. Resolution times dropped from 11 minutes to 2. The CEO projected $60 million in savings.

Then customers started complaining. Generic answers, robotic tone, no judgment. The AI agent was technically brilliant — optimizing for exactly the metric it was given (resolve tickets fast). But Klarna's actual organizational goal was not fast resolution. It was building lasting customer relationships that drive lifetime value in a competitive fintech market. Those are profoundly different objectives requiring profoundly different decisions at the point of interaction.

Klarna CEO Sebastian Siemiatkowski later reflected on this publicly. In a 20VC interview, he acknowledged the early approach had "too much focus on cost" and described the pivot: "The future of VIP experience will be the human connection, the relationship... We need to transform our customer service from thinking about it as just good customer service to making it the human part of what Klarna is." Klarna now recruits its most passionate customers — not outsourced call center workers — as part-time support agents through an Uber-style model, resulting in dramatically higher NPS and customer satisfaction.

The deeper lesson: Siemiatkowski also explained why Klarna could not buy customer service off the shelf. "For customer service agents, whether AI or human, to answer questions really well, they need as much context as possible. Where is that context? It's in the source code of your software." This is the intent engineering problem in miniature — AI agents need not just data but organizational context: how the company calculates interest, when to bend policy, which customers are at risk. When that tacit knowledge was never formalized, the AI optimized a proxy metric (speed) instead of the real objective (relationships).

A senior human agent with five years at the company knows when to bend policy, when to spend extra time because a customer's tone signals they are about to churn, when efficiency is the right move versus when generosity is the right move. That knowledge was never documented — it lived in tacit institutional experience. When the human agents were laid off, that knowledge walked out the door.

The three disciplines of AI engineering stack on each other:

Discipline Era Core Question What It Governs
Prompt Engineering 2023-2024 How do I talk to AI? Individual instructions
Context Engineering 2025-2026 What does AI need to know? Information state, RAG, MCP
Intent Engineering 2026+ What does AI need to want? Goals, values, trade-offs, decision boundaries

Intent engineering requires something most organizations have never had to produce: machine-readable expressions of organizational purpose. Not "increase customer satisfaction" (a human-readable aspiration), but structured parameters an agent can act on: What signals indicate satisfaction in our context? What data sources contain those signals? What actions am I authorized to take? What trade-offs am I empowered to make — speed versus thoroughness, cost versus quality? Where are the hard boundaries I may not cross?

This is why Workspace DNA matters at the organizational level. Memory stores the institutional knowledge that senior employees carry in their heads. Intelligence (AI agents) interprets that knowledge against live context. Execution (automations) acts within defined boundaries. The workspace becomes the intent layer — encoding not just what the agent can do, but what it should do given the organization's actual values.

Deloitte's 2026 State of AI report found that 84% of companies have not redesigned jobs around AI capabilities and only 21% have a mature model for agent governance. Meanwhile, 74% report no tangible value from AI deployments. The models work. The context pipelines are improving. What is missing is the organizational infrastructure that connects AI capability to organizational purpose.

The Microsoft Copilot story reinforces this pattern. One of the most heavily invested enterprise AI products in history — billions in infrastructure, AI embedded in every Office application — achieved 85% Fortune 500 adoption. Then it stalled. Gartner found only 5% of organizations moved from Copilot pilot to larger-scale deployment. Bloomberg reported Microsoft slashing internal sales targets. Inside companies that signed six-figure Copilot deals, employees preferred other AI tools. The issue was not model quality or UX — it was deploying AI across an organization without intent alignment. Forty thousand knowledge workers given AI tools but never told how those tools connect to what the company is trying to accomplish.

The investment behind this gap is staggering. Big tech's combined AI capital expenditure approached half a trillion dollars in 2025 and is projected to exceed that in 2026 — with the big five (Amazon, Microsoft, Google, Meta, Oracle) planning to add over $2 trillion in AI-related assets in the next four years. Meanwhile, the SWE-bench coding benchmark went from 4% AI solve rate in 2023 to approximately 90-95% saturation in 2025 — a capability doubling time that is itself shrinking. The models are not the bottleneck. The organizational infrastructure that connects model capability to organizational purpose — that is the bottleneck.

The companies that win the next phase will not be the ones with the best model subscription. They will be the ones with the best organizational intent architecture — goals, values, decision frameworks, and trade-off hierarchies that are discoverable, structured, and agent-actionable. As one analyst put it: a company with a mediocre model and extraordinary intent infrastructure will outperform a company with a frontier model and fragmented organizational knowledge every single time.

The autoresearch insight applies here too: if you give the agent a bad metric, it will very confidently optimize the wrong thing. Choosing the right metric — the one that reflects actual organizational intent, not just the one that is easiest to measure — is the skill that separates successful AI deployments from expensive failures.

For teams building with Taskade Genesis, intent engineering starts with Workspace DNA: define your goals as structured project data (Memory), train AI agents with explicit decision boundaries and knowledge bases (Intelligence), and encode your workflows with the right triggers and escalation rules (Automations). The workspace is your intent layer — persistent, collaborative, and auditable. Start building →


Agentic Engineering Platforms Compared

The agentic engineering ecosystem in 2026 spans no-code platforms, developer frameworks, and low-code automation tools. Here is how the major platforms compare:

Platform Code Required Multi-Agent Memory Integrations Pricing
Taskade Genesis No Yes Persistent 100+ $16/mo (10 users)
CrewAI Python Yes Custom Via code Open source
LangGraph Python Yes Custom Via code Open source
n8n Low-code Limited Basic 400+ $20+/mo
AutoGen Python Yes Custom Via code Open source

Taskade Genesis is the only platform that delivers agentic engineering without code — persistent memory across sessions, multi-agent collaboration, and 100+ native integrations out of the box. Developer frameworks like CrewAI, LangGraph, and AutoGen offer more customization but require Python expertise and custom infrastructure. n8n bridges the gap as a low-code option but has limited multi-agent orchestration.

For most teams, the right approach is Taskade Genesis for business workflows and team tools, combined with developer-focused frameworks for custom engineering projects. See our full agentic engineering tools comparison.


Get Started: Build Your First Agentic Workflow

You do not need to be a developer to practice agentic engineering. Taskade Genesis lets any team build agentic workflows in minutes:

Step 1: Create a workspace. Go to taskade.com/create and describe what you want to build. Genesis generates a living application — not a prototype, but a deployed system with a database, UI, and logic.

Step 2: Add AI agents with custom tools and knowledge. Configure AI agents with persistent memory, train them on your documents and knowledge sources, and equip them with 22+ built-in tools. Browse the Prompts Library for ready-to-use agent instructions.

Step 3: Connect automations to trigger agent workflows. Set up automation workflows with 100+ integrations — Slack, email, CRM, payments, and more. Agents run on schedule, on trigger, or on demand. Explore what others have built in the Community Gallery.

This is agentic engineering in practice: you define the goal, configure the agents, set the guardrails, and let the system execute. The same pattern Karpathy describes for code applies to every workflow — plan, direct, review, test, own.

Start building your first agentic workflow →


FAQ

What exactly is agentic engineering?

Agentic engineering is orchestrating AI agents who write, test, and deploy code while you provide architectural oversight, quality standards, and strategic direction. Coined by Andrej Karpathy in February 2026, it emphasizes that directing AI agents effectively is an art and science — not just casual prompting. The five core principles: plan, direct, review, test, own.

How is agentic engineering different from vibe coding?

Vibe coding means accepting whatever AI generates without rigorous review. Agentic engineering adds five disciplines: plan before prompting, direct with precision, review rigorously, test systematically, and own the architecture. Both use AI to build software, but agentic engineering produces production-quality results.

Who coined the term and when?

Andrej Karpathy coined agentic engineering on February 8, 2026. He had previously coined vibe coding on February 2, 2025. Exactly one year later, he declared vibe coding passe because LLMs had gotten smart enough that casual prompting was no longer sufficient — orchestration with oversight was the new professional standard.

What are the five principles of agentic engineering?

Google's Addy Osmani codified them: 1) Plan before prompting — write specs and break work into agent-sized tasks, 2) Direct with precision — give agents well-scoped tasks, 3) Review rigorously — evaluate output like a human PR, 4) Test relentlessly — the single biggest differentiator from vibe coding, 5) Own the system — maintain docs, version control, CI, and production monitoring.

Do I need to be a developer to practice agentic engineering?

No. The principles apply to anyone orchestrating AI agents. On Taskade Genesis, non-developers practice agentic engineering by writing detailed prompts (planning), reviewing generated apps (oversight), iterating on designs (testing), and deploying AI agents for ongoing improvement. 63% of AI-assisted builders are non-developers.

What is the Model Context Protocol (MCP)?

MCP is an open standard created by Anthropic in November 2024 for connecting AI models to external tools and data sources. Think of it as USB-C for AI agents — a universal connector. It was donated to the Linux Foundation's Agentic AI Foundation in December 2025 and adopted by OpenAI, Google, Microsoft, and dozens of others.

What are the best agentic engineering tools?

By category: Taskade Genesis for non-developers (free tier, Pro $16/mo for 10 users). CrewAI for role-based business automation (open-source). LangGraph for enterprise orchestration. Cursor ($20/mo) and Devin 2.0 ($20/mo) for professional coding. Claude Code for terminal-based workflows. See our full agentic engineering tools comparison.

What did Gartner predict about agentic AI?

Gartner predicts 40% of enterprise applications will feature task-specific AI agents by end of 2026, up from less than 5% in 2025. By 2028, 33% of enterprise software will include agentic AI. However, they also predict over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls.

What is Karpathy's autoresearch project?

Autoresearch is a 630-line Python tool released by Karpathy on March 7, 2026. It gives an AI agent an LLM training setup and lets it experiment autonomously — approximately 12 experiments per hour, 100 overnight. It demonstrates agentic engineering: human sets the goal and metric, agent executes autonomously, results are objectively measurable.

How does Taskade Genesis implement agentic engineering?

Taskade Genesis implements agentic engineering through Workspace DNA — Memory (projects as databases), Intelligence (AI agents with 22+ tools and persistent memory), and Execution (automations with 100+ integrations). Users orchestrate these components to build, deploy, and maintain living software — exactly the pattern Karpathy describes.

What is the middle loop in agentic engineering?

The middle loop is supervisory work between writing code (inner loop) and delivery operations (outer loop). It involves directing AI agents, evaluating their output, calibrating trust, and maintaining architectural coherence. Senior engineering leaders identified it as the most important emerging skill category for the AI era.

Is agentic engineering a fad or a lasting shift?

Agentic engineering represents a permanent shift. The $4.7B vibe coding market growing at 38% CAGR, Gartner's 40% enterprise adoption forecast, the Linux Foundation's AAIF, and MCP becoming the universal standard all point to structural change. The discipline of orchestrating agents becomes more valuable as AI becomes more capable, not less.

What is cognitive debt?

Cognitive debt is the gap between system complexity and human understanding — when AI-generated systems work but no human fully comprehends why. It is the agentic engineering equivalent of technical debt. Taskade Genesis reduces cognitive debt by keeping architecture visible (workspace structure), agents transparent (inspectable instructions), and history preserved.

How does agentic engineering connect to the "SaaS is dead" debate?

Y Combinator CEO Garry Tan predicted non-technical teams would vibe-code custom solutions instead of buying SaaS, naming Taskade among the disruptors. Klarna CEO Sebastian Siemiatkowski went further in his 20VC interview, arguing that AI agents will demolish SaaS switching costs entirely: "The next thing that's going to hit everyone bad is the switching cost of data... What's going to happen is people are going to start solving that problem — how do I get all my data from the existing vendor and move it to the new vendor with the help of AI through one click." On a weekend, Siemiatkowski built what he calls "company in a box" — an open-source accounting system + CRM + Claude agent that could bookkeep invoices and manage customers via natural language. The winner of the future, he argues, is not a siloed SaaS tool but something "extremely broad" — an AI-native operating system for the entire company. Klarna has already dropped Salesforce and approximately 1,200 other SaaS services, shrinking from 7,000 employees to below 3,000 through AI-driven agentic workflows. Agentic engineering elevates the SaaS debate: teams will orchestrate AI agents to build, deploy, and maintain living software that replaces over-bundled per-seat tools. See: The SaaSpocalypse Explained and Will Vibe Coding Kill SaaS?

What is the difference between agentic engineering and context engineering?

Context engineering focuses on designing the information environment for AI — what data, documents, and tools agents can access. Agentic engineering is broader: it includes context engineering plus the orchestration patterns, tool use, and autonomous decision-making that make agents useful. Think of context engineering as the foundation and agentic engineering as the full building. Taskade's Workspace DNA implements both — Memory provides context, Intelligence provides agentic capabilities, Execution automates the results.

How do I start with agentic engineering without code?

Taskade Genesis lets non-technical teams build agentic workflows without writing a single line of code. Create AI agents with 22+ built-in tools, train them on your knowledge sources, connect 100+ integrations, and set up automation workflows — all through a visual interface. Over 150,000 apps have been built this way. Start free →

0%

On this page

What Is Agentic Engineering?The Prehistory: Foundations of Machine Intelligence (1950–2011)Alan Turing and the First Spark (1950)The Birth of AI as a Field (1956)The First AI Winter (1974–1980)Expert Systems and the Second Winter (1980–1993)From Perceptrons to Hopfield Networks: The Memory Problem (1957–1986)The Backpropagation Breakthrough and Neural Network Renaissance (1986–2011)The Deep Learning Revolution (2012–2016)ImageNet and the AlexNet Moment (2012)Andrej Karpathy: The Thread Through the StoryDeepMind, AlphaGo, and Reinforcement Learning (2014–2016)The Transformer Paradigm (2017–2022)"Attention Is All You Need" (2017)The GPT Series (2018–2022)The Academic Foundations of Agentic AI (2022)The Autonomous Agent Explosion (2023)Toolformer: Machines Learn to Use Tools (February 2023)AutoGPT: The Viral Proof of Concept (March 2023)BabyAGI: The Minimalist Vision (April 2023)LangChain: The Infrastructure Layer (2023)The Lilian Weng Blog Post (June 2023)The Infrastructure Year (2024)GPT-4 and the Reasoning Revolution (2024)Devin: The First AI Software Engineer (March 2024)Anthropic's Model Context Protocol — MCP (November 2024)Karpathy's LLM OS Vision (2024)The Competitive Landscape CrystallizesThe Vibe Coding Phenomenon (2025)February 2, 2025: The Tweet That Changed EverythingKarpathy's Software 3.0 Framework (June 2025)The Explosion of Vibe Coding Platforms (2025)The Problems Surface (2025)Karpathy's 2025 LLM Year in Review (December 2025)The Convergence on Harness Engineering (2026)The Agentic Engineering Era (2026)February 8, 2026: Karpathy Declares Vibe Coding PasseWhy the Name Change MattersAddy Osmani's Principles (February 2026)The Factory Model: From Coder to ConductorThe Four Species of AI Agents (2026)How to Pick the Right Species: The Decision FlowchartReal-World Agent Species in Production (2026)Where Taskade Genesis Fits in the TaxonomyThe Anti-Patterns: What Goes WrongOpenAI's Internal EvidenceInside Claude Code: Building for the Model 6 Months From NowThe Standards War (Late 2025 – 2026)The Agentic AI Foundation — AAIF (December 2025)Google's Agent2Agent Protocol — A2A (2025)The Enterprise Adoption WaveKarpathy's Autoresearch: Agentic Engineering in Action (March 2026)How It WorksThe Three-File Architecture: Why Autoresearch WorksWhat the Agent Actually FoundReal-World ImpactBeyond Text: Autoresearch for Music GenerationAutoresearch as a Work PrimitiveThe Three Conditions (And Where Autoresearch Fails)AgentHub: GitHub for AgentsKarpathy's SETI@Home Vision for AI ResearchKarpathy's Claws: The Layer Above AgentsThe Multi-Agent Reality: Token Throughput as the New MetricEvolutionary Agents: From Stepping Stones to Scientific DiscoveryThe Shopify Precedent: Agentic Engineering Goes CorporateHow Taskade Genesis Embodies Agentic EngineeringThe Workspace DNA ArchitectureWhy Platform Beats FrameworkThe Complete Timeline: From Turing to Agentic EngineeringWhat Comes Next: The Agentic Engineering RoadmapPhase 1: Vibe Coding (2025) — CompletedPhase 2: Agentic Engineering (2026) — CurrentPhase 3: Supervised Autonomy (2027–2028)Phase 4: Autonomous Systems (2029+)The Agentic Engineering Stack (2026)For Non-DevelopersFor DevelopersThe ConvergenceRelated ReadingContext Engineering: The Foundation of Agentic SystemsIntent Engineering: The Third DisciplineAgentic Engineering Platforms ComparedGet Started: Build Your First Agentic WorkflowFAQ

Related Articles

/static_images/Agentic engineering platforms for AI agent orchestration compared in 2026
March 15, 2026AI

12 Best Agentic Engineering Platforms and Tools for AI Agent Orchestration in 2026

Compare 12 agentic engineering platforms for AI agent orchestration in 2026. Side-by-side valuations, GitHub stars, pric...

/static_images/Best vibe coding tools 2026: AI app builders compared
February 5, 2026AI

Best Vibe Coding Tools & AI App Builders Compared (2026)

The 17 best vibe coding tools in 2026 — ranked by real capabilities. Compare Taskade Genesis, Cursor, Windsurf, Bolt.new...

/static_images/Best Claude Code alternatives in 2026 — AI coding agents and tools compared
March 14, 2026AI

15 Best Claude Code Alternatives in 2026: AI Coding Agents and Tools Compared

Compare 15 Claude Code alternatives for 2026. Side-by-side pricing, features, GitHub stars, and detailed breakdowns for ...

/static_images/Best free AI app builders 2026 — comparison of top platforms for building apps without code
February 14, 2026AI

15 Best Free AI App Builders in 2026 (Tested & Compared)

We tested 15 free AI app builders and compared their actual free tiers — what you really get, what's locked behind paywa...

/static_images/What Are AI Claws? Persistent autonomous agents that loop independently with sophisticated memory
March 20, 2026AI

What Are AI Claws? Persistent Autonomous Agents Explained (2026)

AI claws are persistent autonomous agents that loop independently with sophisticated memory and real-world tool access. ...

/static_images/Agentic engineering without code: build multi-agent systems for teams in 2026
March 4, 2026AI

Agentic Engineering Without Code: Build Multi-Agent Systems for Teams (2026)

Agentic engineering doesn't require Python. Learn how non-technical teams build multi-agent systems with Taskade Genesis...

View All Articles
What Is Agentic Engineering? Complete History & Guide (2026) | Taskade Blog