Mostrar el registro sencillo del ítem
OneSpace: Detecting cross-language clones by learning a common embedding space
dc.contributor.author | Elarnaoty, Mohammed | |
dc.contributor.author | Servant-Cortés, Francisco Javier | |
dc.date.accessioned | 2024-12-12T09:52:46Z | |
dc.date.available | 2024-12-12T09:52:46Z | |
dc.date.issued | 2024 | |
dc.identifier.citation | Mohammed El Arnaoty, Francisco Servant, OneSpace: Detecting cross-language clones by learning a common embedding space, Journal of Systems and Software, Volume 208, 2024, 111911, ISSN 0164-1212, DOI: https://doi.org/10.1016/j.jss.2023.111911 | es_ES |
dc.identifier.uri | https://hdl.handle.net/10630/35608 | |
dc.description.abstract | Identifying clone code fragments across different languages can enhance the productivity of software developers in several ways. However, the clone detection task is often studied in the context of a single language and less explored for code snippets spanning different languages. In this paper, we present OneSpace, a new cross-language clone detection approach. OneSpace projects different programming languages to the same embedding space using both code and API data. OneSpace, hence, leverages a Siamese Network to infer the similarity of the embedded programs. We evaluate OneSpace by detecting clones across three language pairs; JAVA-Python, Java-C++ and Java-C. We compared OneSpace with the other state-of-art techniques, SupLearn and CLCDSA. In our evaluation, OneSpace provided higher effectiveness than the state of the art. Our ablation study validated some of our intuitions in designing OneSpace, particularly that using a single embedding space (as opposed to separate ones) provides higher effectiveness. Additionally, we designed a variant of OneSpace that uses Word-Mover-Distance Algorithm and provides lower effectiveness, but is much more efficient. We also found that OneSpace provides higher effectiveness than the state of the art, even for: complex implementations, single-method implementations, varying ratios of positive to negative clones in training, varying amounts of training data, and for additional programming languages. | es_ES |
dc.description.sponsorship | NSF CCF-2046403, URJC C01INVESDIST, AEI PID2022-142964OA-I00. | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Elsevier | es_ES |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Software - Diseño | es_ES |
dc.subject.other | Clone detection | es_ES |
dc.subject.other | Software engineering | es_ES |
dc.subject.other | Machine learning | es_ES |
dc.subject.other | Code embedding | es_ES |
dc.subject.other | Siamese neural networks | es_ES |
dc.title | OneSpace: Detecting cross-language clones by learning a common embedding space | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.identifier.doi | 10.1016/j.jss.2023.111911 | |
dc.rights.cc | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.type.hasVersion | info:eu-repo/semantics/acceptedVersion | es_ES |