Abstract:
Background: Identifier naming is one of the main sources of information in program comprehension, where the majority of software development time is spent. When reading natural language texts or code, readers perform lexical access to retrieve the orthography (word shape), phonology (pronunciation), and semantic (meaning) representations of words from memory. The successful retrieval of these representations is vital for success in comprehension and subsequent code maintenance and evolution.
Objective: This paper examines the cost of identifier similarity in orthography, phonology, or semantic representation and how that affects debugging performance and programmer workload. By recognizing common identifier naming combinations that hinder code comprehension, we can discover new programming best-practices and create automated tools that flag problematic naming combinations.
Method: Through a human experiment (n=43), we explore the impact of orthographic, phonological, and semantic similarity on debugging success, time, and workload. In our experiment, participants worked on debugging three programs, each of which has two versions that are identical except for one pair of identifiers, with either similar identifier names (e.g. i and j) or dissimilar names (e.g. row and column). Participants were randomly assigned a version of the code, and their performance was recorded to measure debugging success and time. At the end of each trial, they reported the subjective workload through NASA Task Load Index (NASA-TLX).
Results: We found some differences in debugging success and duration between similar and dissimilar identifiers with advanced programmers, but the differences are not statistically significant.
Conclusion: The results call for further investigation of identifier similarity and its influence on code comprehension. The study of identifier similarity can shed light on new linguistic anti-patterns that could potentially hinder code comprehension.
Objective: This paper examines the cost of identifier similarity in orthography, phonology, or semantic representation and how that affects debugging performance and programmer workload. By recognizing common identifier naming combinations that hinder code comprehension, we can discover new programming best-practices and create automated tools that flag problematic naming combinations.
Method: Through a human experiment (n=43), we explore the impact of orthographic, phonological, and semantic similarity on debugging success, time, and workload. In our experiment, participants worked on debugging three programs, each of which has two versions that are identical except for one pair of identifiers, with either similar identifier names (e.g. i and j) or dissimilar names (e.g. row and column). Participants were randomly assigned a version of the code, and their performance was recorded to measure debugging success and time. At the end of each trial, they reported the subjective workload through NASA Task Load Index (NASA-TLX).
Results: We found some differences in debugging success and duration between similar and dissimilar identifiers with advanced programmers, but the differences are not statistically significant.
Conclusion: The results call for further investigation of identifier similarity and its influence on code comprehension. The study of identifier similarity can shed light on new linguistic anti-patterns that could potentially hinder code comprehension.