Text mining and software engineering: an integrated source code and document analysis approach
Text mining and software engineering: an integrated source code and document analysis approach
- Author(s): R. Witte ; Q. Li ; Y. Zhang ; J. Rilling
- DOI: 10.1049/iet-sen:20070110
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): R. Witte 1 ; Q. Li 1 ; Y. Zhang 2 ; J. Rilling 2
-
-
View affiliations
-
Affiliations:
1: Institut für Programmstrukturen und Datenorganisation (IPD), Fakultät für Informatik, Universität Karlsruhe (TH), Germany
2: Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada
-
Affiliations:
1: Institut für Programmstrukturen und Datenorganisation (IPD), Fakultät für Informatik, Universität Karlsruhe (TH), Germany
- Source:
Volume 2, Issue 1,
February 2008,
p.
3 – 16
DOI: 10.1049/iet-sen:20070110 , Print ISSN 1751-8806, Online ISSN 1751-8814
Documents written in natural languages constitute a major part of the artefacts produced during the software engineering life cycle. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. A text mining system capable of populating a software ontology with information detected in documents is presented. A particular novelty is the integration of results from automated source code analysis into a natural language processing pipeline, allowing to cross-link software artefacts represented in code and natural language on a semantic level.
Inspec keywords: reverse engineering; software maintenance; natural language processing; text analysis
Other keywords:
Subjects: Natural language interfaces; Software engineering techniques; Document processing and analysis techniques
References
-
-
1)
- G. Antoniou , F. Harmelen . (2004) A Semantic Web primer.
-
2)
- R. Baeza-Yates , B. Ribeiro-Neto . (1999) Modern information retrieval.
-
3)
- Welty, C.: `Augmenting abstract syntax trees for program understanding', Proc. Int. Conf. Automated Software Engineering, 1997, IEEE Comp. Soc. Press, p. 126–133.
-
4)
- Meng, W., Rilling, J., Zhang, Y., Witte, R., Charland, P.: `An ontological software comprehension process model', Proc. 3rd Int. Workshop on Metamodels, Schemas, Grammars, and Ontologies for Reverse Engineering (ATEM), October 2006, Genoa, Italy, p. 28–35.
-
5)
- G. Antoniol , G. Canfora , G. Casazza , A. Lucia , E. Merlo . Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. , 10 , 970 - 983
-
6)
- Witte, R., Bergler, S.: `Fuzzy coreference resolution for summarization', Proc. 2003 Int. Symp. Reference Resolution Applications Question Answering Summarization (ARQAS), June 2003, Venice, Italy, Università Ca' Foscari, p. 43–50.
-
7)
- Haarslev, V., Möller, R.: `RACER system description', Proc. Int. Joint Conf. Automated Reasoning (IJCAR), 2001, Siena, Italy, Springer-Verlag, Berlin, p. 701–705.
-
8)
- Witte, R., Zhang, Y., Rilling, J.: `Empowering software maintainers with semantic web technologies', Proc. 4th European Semantic Web Conference (ESWC 2007), LNCS, No. 4519, June 2007, Innsbruck, Austria, Springer-Verlag, p. 37–52, Berlin, Heidelberg.
-
9)
- Lethbridge, T.C., Nicholas, A.: `Architecture of a source code exploration tool: a software engineering case study', Technical Report TR-97-07, 1997.
-
10)
- A. Marcus , J.I. Maletic , A. Sergeyev . Recovery of traceability links between software documentation and source code. Int. J. Softw. Eng. Knowl. Eng. , 5 , 811 - 836
-
11)
- P.N. Johnson-Laird . (1983) Mental models: towards a cognitive science of language, inference and consciousness.
-
12)
- Rilling, J., Witte, R., Zhang, Y.: `Automatic traceability recovery: an ontological approach', Proc. Int. Symp. Grand Challenges Traceability (GCT'07), March 2007, Lexington, KY, USA.
-
13)
- R. Seacord , D. Plakosh , G. Lewis . (2003) Modernizing legacy systems: software technologies, engineering processes, and business practices’ ‘SEI series in SE.
-
14)
- Antoniol, G., Canfora, G., Casazza, G., Lucia, A.: `Information retrieval models for recovering traceability links between code and documentation', Proc. IEEE Int. Conf. Software Maintenance, 2000, San Jose, CA, USA.
-
15)
- N.F. Noy , H. Stuckenschmidt . (2005) Ontology alignment: an annotated bibliography'. Semantic interoperability and integration.
-
16)
- Ilieva, M.G., Ormandjieva, O.: `Automatic transition of natural language software requirements specification into formal presentation', Proc. 10th Int. Conf. Applications Natural Language to Information Systems (NLDB), LNCS, 2005, Springer, p. 392–397.
-
17)
- Riva, C.: `Reverse architecting: an industrial experience report', Proc. 7th IEEE Working Conf. Reverse Engineering (WCRE), 2000, p. 42–52.
-
18)
- Sabou, M.: `Extracting ontologies from software documentation: a semi-automatic method and its evaluation', Proc. ECAI-2004 Workshop Ontology Learning and Population, 2004, Valencia, Spain.
-
19)
- Kof, L.: `Natural language processing: mature enough for requirements documents analysis?', Proc. 10th Int. Conf. Applications of Natural Language to Information Systems (NLDB), 2005, Alicante, Spain, Springer, p. 91–102, LNCS.
-
20)
- Witte, R., Bergler, S.: `Next-generation summarization: contrastive, focused, and update summaries', Proc. Int. Conf. Recent Advances Natural Language Processing (RANLP 2007), September 2007, Borovets, Bulgaria.
-
21)
- Marcus, A., Maletic, J.I.: `Recovering documentation-to-source-code traceability links using latent semantic indexing', Proc. 25th Int. Conf. Software Engineering, 2002.
-
22)
- C. Calero , F. Ruiz , M. Piattini . (2006) Ontologies for software engineering and software technology.
-
23)
- M. Shaw , D. Garlan . (1996) Software architecture: perspectives on an emerging discipline.
-
24)
- F. Baader , D. Calvanese , D. MacGuinness , D. Nardi , P. Patel-Schneider , P. Patel-Schneider . (2007) The description logic handbook: theory, implementation and applications.
-
25)
- I. Sommerville . (2006) Software engineering.
-
26)
- M.A. Storey , S.E. Sim , K. Wong . A collaborative demonstration of reverse engineering tools. ACM SIGAPP Appl. Comput. Rev. , 1 , 18 - 25
-
27)
- R. Witte , T. Kappler , C.J.O. Baker . (2007) Ontology design for biomedical text mining’ in ‘Semantic web: revolutionizing knowledge discovery in the life sciences.
-
28)
- M. Lindvall , K. Sandahl . How well do experienced software developers predict software change?. J. Syst. Softw. , 1 , 19 - 27
-
29)
- Mencl, V.: `Deriving behavior specifications from textual use cases', Proc. Workshop Intelligent Technologies Software Engineering, 2004, Linz, Austria, Oesterreichische Computer Gesellschaft, p. 331–341.
-
30)
- Gaizauskas, R., Hepple, M., Saggion, H., Greenwood, M.A., Humphreys, K.: `SUPPLE: a practical parser for natural language engineering applications', Proc. 9th Int. Workshop on Parsing Technologies (IWPT2005), 2005FP, Vancouver.
-
31)
- Jin, D., Cordy, J.: `Ontology-based software analysis and reengineering tool integration: the oasis service-sharing methodology', Proc. 21st IEEE Int. Conf. Software Maintenance (ICSM), 2005.
-
32)
- Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: `GATE: a framework and graphical development environment for robust NLP tools and applications', Proc. 40th Anniversary Meeting ACL, 2002.
-
33)
- `IEEE standard for software maintenance', IEEE 1219, 1998.
-
34)
- C.D. Manning , H. Schütze . (1999) Foundations of statistical natural language processing.
-
1)