MIS 661 Topic 1 DQ 1
.docx
keyboard_arrow_up
School
Grand Canyon University *
*We aren’t endorsed by this school
Course
661
Subject
Information Systems
Date
Feb 20, 2024
Type
docx
Pages
2
Uploaded by MasterTitanium11775
Data mining should be viewed as a process. As with all good statistical analyses, one needs to be
clear about the purpose of the analysis. Just to "mine data" without a clear purpose, without an
appreciation of the subject area, and without a modeling strategy will usually not be successful.
Describe a data mining process that is popular in today's landscape that could be an alternative to
the CRISP-DM Lifecycle.
CRISP-DM is widely adopted and has extensive documentation and support from the data
mining community. It is generally seen as a practical and effective methodology for data mining
projects. CRISP-DM covers the entire data mining project lifecycle, including understanding
business goals, data collection and preparation, model building, evaluation, and deployment.
A
data mining process that is popular in today's landscape that could be an alternative to the
CRISP-DM Lifecycle is SEMMA. While CRISP-DM means: Cross-industry standard process
for data mining; SEMMA means: Sample, explore, modify, model, assess. SEMMA was
developed by SAS (a software company) as a framework for their data mining software. It
focuses primarily on the modeling phase and is more specific to SAS’s software suite. However,
it has also been used more broadly in the context of data analysis and modeling. SEMMA and
CRISP-DM are both process models used in the field of data mining and machine learning to
guide the steps involved in developing predictive models and extracting useful insights from
data. While they share some similarities, they also have distinct differences.
SEMMA outlines five key phases:
1.
sample,
2.
explore,
3.
modify,
4.
model,
and
5.
assess.
SEMMA focuses primarily on the modeling phase, offering guidance on data sampling,
exploration, modification, modeling, and model assessment. SEMMA is more specific to SAS
software and is often used as a companion to other, more comprehensive methodologies like
CRISP-DM. CRISP-DM is a more comprehensive and widely accepted data mining process
model that covers the entire project lifecycle.
SEMMA, on the other hand, is a more specialized
framework, primarily focusing on the modeling phase and is closely associated with SAS
software. The choice between the two depends on the specific needs and tools of a given project,
with CRISP-DM as a more general and flexible approach. SEMMA, while useful for model-
building within the SAS environment, may be less familiar and less widely adopted outside of
the SAS user base. SEMMA data mining methodology can be used to solve a wide range of
business problems, including fraud identification, customer retention and turnover, database
marketing, customer loyalty, bankruptcy forecasting, market segmentation, as well as risk,
affinity, and portfolio analysis.
Starburst. (n.d.). SEMMA vs CRISP-DM
https://www.starburst.io/learn/data-fundamentals/semma-crisp-dm/
Hotz, N. (2023). Data Science Process Alliance. What is SEMMA?
https://www.datascience-pm.com/semma/#:~:text=You%20can%20use%20the%20SEMMA,
%2C%20affinity%2C%20and%20portfolio%20analysis
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help