A Bundesbank BOP-F survey of more than 7,000 firms found generative AI use rising from 26% in 2024 to an expected 56% in 2026. In manufacturing, though, the share of working time involving GenAI remains low: 7.5% in 2024, expected to reach 8.9% by 2026. That gap explains the real state of generative AI in manufacturing. Interest is high. Plant-level maturity is uneven. The useful question is not whether GenAI can help. It is where it can help without creating safety, quality, or compliance risk.
For manufacturing leaders, plant teams, engineering groups, IT/OT teams, and product owners, the near-term value is clearest in the knowledge work around production: maintenance support, documentation, planning, design exploration, quality investigation, training, and technical search. If you want a companion view of practical examples, Refact’s guide to generative AI use cases breaks down where these systems fit without pretending they can run the plant.
Generative AI in manufacturing is an interface layer, not a factory brain
The strongest manufacturing use cases treat GenAI as a guarded assistant. It retrieves, summarizes, drafts, explains, and translates messy human inputs into structured requests for other systems. It does not replace the systems that already handle control, scheduling, validation, or safety.
That distinction matters because a fluent answer can still be wrong. A model may draft a maintenance checklist that sounds reasonable but cites an obsolete procedure. It may suggest a planning change that ignores a capacity constraint. It may generate PLC code that breaks assumptions built into an interlock. In a factory, those errors can become downtime, scrap, safety exposure, warranty cost, or a compliance problem.
The better pattern is hybrid. GenAI handles language and knowledge access. Existing tools handle deterministic work: MES for execution, PLM for product data, APS or optimization solvers for schedules, CAD and CAE for engineering validation, CMMS for maintenance workflows, and safety systems for control boundaries.
A 2024 scheduling paper by Ding et al. shows this pattern clearly. The LLM translates informal production constraints into structured forms, then a mixed-integer linear programming solver produces the schedule. The model does not become the scheduler. It helps people express the problem in a way the scheduler can solve.
The best first use cases are small, frequent, and reviewable
Manufacturing teams often start too broadly. “AI for the plant” is not a project. “Reduce the time technicians spend searching past work orders for one machine family” is a project.
The first use case should have four traits:
- It happens often enough to create measurable waste.
- The source material already exists in manuals, tickets, SOPs, logs, reports, or drawings.
- A qualified person can review the output before it affects production.
- The result can be measured against a baseline.
Good starting points include document search, maintenance-ticket triage, shift report generation, technical publication drafting, requirements analysis, OEE summaries, batch deviation summaries, and work instruction drafts. These workflows are valuable because they sit close to production but do not directly change machine behavior.
PTC and Accenture have reported 40% to 50% effort reduction for generating technical publications in automotive contexts and up to 80% time reduction when converting functional requirements into MBSE models in a specific assisted workflow. Those numbers should not be treated as universal benchmarks. They do show where GenAI can save time when the task is document-heavy, repetitive, and bounded by review.
Planning support can also produce measurable gains when it is tied to real systems and KPIs. A Vassardigital planning AI case study reported a 15% improvement in schedule adherence, a 12% reduction in finished goods inventory, and planning cycle time cut from days to hours. The lesson is not “AI fixes planning.” The lesson is that a planning assistant can help when it is connected to the constraints, data, and decision process planners already use.
Data quality is the part most teams underestimate
Manufacturing data usually looks cleaner in diagrams than it does inside a plant. The needed context may be split across MES, ERP, PLM, APS, CMMS, SCADA, HMI screens, PLCs, spreadsheets, line-side PCs, emails, PDFs, and handwritten notes. Asset names may vary by department. A part may have several identifiers. A procedure may exist in three versions, only one of which applies to the machine on line two.
This is why generic chatbots fail in factories. The model may understand the words but not the configuration. It needs to know which asset is being discussed, which equipment revision applies, which document version is current, which safety procedure is valid, and which user is allowed to see the answer.
Bosch’s reported service-assistant experience shows the risk. Broad retrieval over service documents could produce obsolete service steps because the system found a similar document that did not apply to the exact asset and revision. The fix was not a larger model. The fix was retrieval restricted to documents tagged as current for the specific equipment version, with confidence thresholds and the ability to say “I do not know.”
Practitioner discussions echo the same point. The complaint is rarely that the model is not clever enough. The complaint is that manufacturing has dirty data, old systems with weak connections, manual mapping, inconsistent names, and Excel exports sitting between the physical process and the digital record.
If your team is still deciding whether to use APIs, RAG, fine-tuning, or a custom model, Refact’s article on building an AI model explains why the system around the model usually matters more than the model alone.
RAG needs manufacturing context, not just more documents
Retrieval-augmented generation, or RAG, is often presented as the cure for hallucinations. In manufacturing, basic RAG is only a start. A system that retrieves “relevant” documents can still retrieve the wrong revision, the wrong work center, the wrong machine variant, or a superseded safety step.
Manufacturing RAG needs a context layer. That layer should map relationships among assets, BOMs, parts, documents, procedures, work orders, maintenance history, engineering changes, user roles, and current configuration state. Knowledge graphs and domain ontologies help because they describe how those things relate to each other.
For example, a technician asking about a vibration alarm should not receive generic bearing guidance pulled from any manual in the company. The system should narrow the answer to the asset, model, location, revision, maintenance history, and current approved procedure. It should cite the source. It should preserve the retrieved document set for audit. If the source base is incomplete, it should stop.
AWS has written about using retrieval-augmented generation for manufacturing knowledge access in the context of workforce support. That framing is useful because the problem is not chat. The problem is giving workers faster access to verified internal knowledge without losing control over source, permission, or context.
GenAI should support safety-critical work, not bypass it
Manufacturing has a different risk profile from office knowledge work. A bad sales email is embarrassing. A bad torque value, batch release summary, ladder logic change, or setpoint suggestion can create physical and regulatory consequences.
That is why GenAI should draft, explain, suggest, and summarize in high-stakes workflows, then route the decision through existing controls. In pharma, a model may summarize a batch record or flag anomalies, but release decisions belong in validated systems with human sign-off. In automation, a model may generate a PLC code snippet, but production code needs review, test, validation, and change control. In process industries, suggested setpoint changes must respect safety margins, sensor drift assumptions, and functional safety standards such as IEC 61508 or IEC 61511.
The research brief included a chemical-plant example where a pilot was halted after generated ladder logic broke existing interlock assumptions and the compliance team could not produce a defensible validation trail. That is the right failure to study. The issue was not whether the model could write code. It was whether the organization could prove the change was safe, reviewed, tested, and traceable.
For regulated manufacturing, logging is part of the product. Store the prompt, retrieved sources, model version, output, user, approval action, and linked work order, ticket, batch record, or engineering change order. Refact’s piece on the AI TRiSM framework covers this control mindset for AI systems that touch sensitive workflows.
Operator adoption depends on physical context
Text quality is not the same as shop-floor usefulness. A work instruction can be accurate and still fail because it is too long, spatially unclear, or detached from the workstation.
A 2025 human-factors study in an automotive plant found that GenAI-generated instructions could increase cognitive load when they used vague spatial language such as “tighten the left screw.” Left from whose position? Which side of the assembly? Which orientation of the part? Operators went back to original manuals when the assistant made the task harder.
The fix required more than better prompting. Teams brought in ergonomics experts, predefined instruction templates, photos, 3D models, and spatially grounded language tied to the operator’s position, tool, part orientation, and task sequence.
This is why human-in-the-loop design is not a checkbox. Operators, technicians, planners, quality teams, manufacturing engineers, IT/OT staff, and compliance leaders need to shape the workflow before rollout. If the tool adds another screen, another approval path, or another ambiguous instruction, people will avoid it.
Do not measure the model. Measure the workflow.
A demo can look impressive and still produce no P&L impact. Several enterprise AI analyses cited in the research brief report that roughly 80% to 95% of AI pilots fail to generate measurable P&L impact, and other surveys suggest only 10% to 15% of organizations scale AI beyond pilots. The exact numbers vary, but the pattern is familiar: pilots stall when they are disconnected from plant budgets, daily workflows, and owned KPIs.
Start with a baseline. If the pilot is a maintenance assistant, measure diagnostic time, repeat tickets, mean time to repair, first-time fix rate, or technician search time. If it is a planning assistant, measure schedule adherence, planning cycle time, inventory, expedite costs, or planner rework. If it is a documentation assistant, measure draft time, review time, defects found in review, and publication cycle time.
| Workflow | Useful GenAI role | Metric to track |
|---|---|---|
| Maintenance troubleshooting | Retrieve service history and draft guided checks | Diagnostic time, repeat failures, mean time to repair |
| Technical publications | Draft procedures from approved source material | Draft effort, review cycles, publication time |
| Production planning | Explain scenarios and translate constraints | Schedule adherence, inventory, planning cycle time |
| Quality investigation | Summarize defects, inspections, and possible causes | Investigation time, rework, first-time-right rate |
| Operator onboarding | Turn approved SOPs into guided learning material | Training time, supervisor questions, error rates |
The KPI owner matters as much as the metric. If no plant leader, engineering manager, quality owner, or operations sponsor is accountable for the number, the pilot will drift toward demonstration work.
A practical roadmap starts before model selection
Model selection is usually too early as a first decision. A stronger roadmap starts with the workflow, the data, the risk boundary, and the expected business result.
Choose one bounded workflow
Pick one plant, line, machine family, document set, or planning process. Avoid broad assistants that promise to answer anything about the factory. Scope protects the project from confusion and makes evaluation possible.
Map the source systems
List the systems the assistant needs to read: MES, PLM, ERP, APS, CMMS, SCADA, maintenance tickets, SOP repositories, CAD files, QA reports, or shift logs. Then identify which system is the source of truth for each entity. If two systems disagree about an asset name or procedure version, resolve that before the pilot depends on it.
Build a configuration-aware retrieval layer
Tag documents by asset, model, revision, line, product, process, approval status, effective date, and permission level. Do not let the assistant retrieve obsolete procedures unless the user is specifically asking for history.
Define what the system may not do
Set action boundaries early. A maintenance assistant may draft checks but not close work orders. A planning assistant may explain scenarios but not publish a schedule. A code assistant may draft snippets but not deploy PLC logic. Boundaries reduce risk and make training easier.
Design review and audit paths
Decide who approves outputs, what gets logged, how corrections are captured, and how the system is monitored after launch. In regulated environments, make validation evidence part of the build from the start.
Test against real edge cases
Use past incidents, obsolete manuals, renamed assets, similar part numbers, missing sensor data, confusing work instructions, and known exceptions. If the assistant performs well only on clean examples, it is not ready for production use.
Refact sees the same principle in automation work outside manufacturing. In our Automated News Pipeline project, the core challenge was not simply connecting tools. It was making the data flow reliable enough that people could trust the output inside a daily operating workflow. Manufacturing raises the stakes, but the pattern is similar: automation works when the process, data, and ownership are clear before code starts.
What to avoid when planning a manufacturing GenAI pilot
The expensive mistakes tend to happen early. Teams buy a platform before they define the workflow. They connect a chatbot to a broad document dump. They test with clean sample data and ignore the ugly cases. They measure model accuracy but not time saved, scrap avoided, or schedule performance. Then the pilot never leaves the lab.
Avoid these patterns:
- Broad factory chatbots. They create risk because they blur scope, permission, and source grounding.
- Unversioned document retrieval. Manufacturing answers depend on the exact procedure, revision, asset, and effective date.
- Autonomous control claims. GenAI should not directly modify safety-critical setpoints, PLC logic, batch release decisions, or validated schedules.
- Prompt-only fixes. Adding rules after every error can make systems brittle. Fix the context layer, source data, and workflow instead.
- ROI without baselines. A pilot needs a before-and-after number, not a general belief that AI saves time.
- Interfaces outside daily tools. If planners, technicians, or operators must leave their normal system to use the assistant, adoption will suffer.
GenAI is one layer in a broader industrial AI stack. Forecasting, optimization, simulation, computer vision, digital twins, predictive maintenance, expert systems, statistical process control, and rules engines often do the core technical work. GenAI makes those systems easier to query, explain, and operate.
The right question is readiness, not ambition
Generative AI in manufacturing is real, but the useful version is narrower and more disciplined than the marketing version. It helps people find the right information, understand scenarios, draft documents, investigate problems, and move faster through review-heavy workflows. It does not remove the need for clean data, validated systems, safety controls, or experienced judgment.
The next step is a readiness check, not a platform demo. Pick one painful workflow. Identify the source systems. Decide which outputs require approval. Define the metric. Test against ugly real cases. If the pilot cannot survive that level of clarity, it was not ready for the plant.
If you are trying to decide which manufacturing workflow is safe and valuable enough to build first, Refact’s AI development work starts with that early scoping. Clarity before code matters most when the cost of a wrong answer is higher than a bad demo.




