Domain modeling is an important step in the transition from
natural-language requirements to precise specifications. For
large systems, building a domain model manually is a laborious
task. Several approaches exist to assist engineers with
this task, whereby candidate domain model elements are
automatically extracted using Natural Language Processing
(NLP). Despite the existing work on domain model extraction,
important facets remain under-explored: (1) there is
limited empirical evidence about the usefulness of existing
extraction rules (heuristics) when applied in industrial settings;
(2) existing extraction rules do not adequately exploit
the natural-language dependencies detected by modern NLP
technologies; and (3) an important class of rules developed
by the information retrieval community for information extraction
remains unutilized for building domain models.
Motivated by addressing the above limitations, we develop
a domain model extractor by bringing together existing extraction
rules in the software engineering literature, extending
these rules with complementary rules from the information
retrieval literature, and proposing new rules to better
exploit results obtained from modern NLP dependency
parsers. We apply our model extractor to four industrial
requirements documents, reporting on the frequency of different
extraction rules being applied. We conduct an expert
study over one of these documents, investigating the accuracy
and overall effectiveness of our domain model extractor.
Author(s): Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer
Year: 2016
Language: English
Pages: 11
Tags: Model Extraction; Natural-Language Requirements; Natural Language Processing; Case Study Research