Crypto

How researchers used the theory anchored to decode co -pilot problems

Summary and 1. Introduction

2. Methodology and 2.1. Research questions

2.2. Data collection

2.3. Data labeling

2.4. Data extraction

2.5. Data analysis

3. Results and interpretation and 3.1. Type of problems (RQ1)

3.2. Type of causes (RQ2)

3.3. Type of solutions (RQ3)

4.

4.1. Implications for co -pilot users

4.2. Implications for the co -pilot team

4.3. Implications for researchers

5. Threats to validity

6. Related work

6.1. Evaluation of the quality of the code generated by Copilot

6.2. Copilot's impact on practical development and 6.3. Conclusive summary

7. Conclusions, data availability, thanks, declaration of contribution of credits and references

To respond to the three RQs in section 2.1, we have established a set of data elements for the extraction of data, as shown in Table 1. The D1-D3 data elements intend to extract information from the problems, underlying causes and potential solutions of filtered data to respond respectively to RQ1-RQ3. These three data elements could be extracted from any part of a problem of GitHub, a discussion, or therefore of publication, such as the title, the description of the problem, the comments and the discussions.

2.4.1. Pilot data extraction

The first and the third author led an extraction of pilot data on 20 random github problems, 20 discussions and 20 articles, and in the event of differences, the second author was involved in achieving a consensus. The results indicated that the three data elements could be extracted from our data set. Based on the observation, we have established the following criteria for formal data extraction: (1) If the same problem has been identified by several users, we have only saved it once. (2) If several problems have been identified in the same GitHub problem, Github discussion, or therefore the post, we have each recorded separately. (3) For a problem that has several causes mentioned, we recorded the cause confirmed by the journalist of the problem or the co -pilot team as a deep cause. (4) For a problem that has several suggested solutions, we have only recorded the solutions that have been confirmed by the journalist for the problem or the co -pilot team to actually solve the problem.

2.4.2. Formal data extraction

The first and third authors carried out the formal extraction of data from the filtered data set to extract the data elements. Subsequently, they discussed and reached a consensus with the second author on inconsistencies to ensure that the data extraction process adhered to predetermined criteria. Each extracted data element has been examined several times by the three authors to guarantee precision. The data extraction results were compiled and recorded in MS Excel (Zhou et al., 2024).

It is important to note that not all data collected understand the cause and solution of a problem. Although we have selected closed github problems and responded to GitHub discussions and therefore publications during the data collection phase, the specifics of each data element vary considerably. Sometimes respondents to a problem related to the co-pilot could offer a solution without detailed analysis of this problem, preventing us from extracting the underlying causes. In other situations, although the cause of a problem has been identified, the user has not described the specific resolution process. For example, a user found that Copilot “Unable to operate properly on the VSCODE remote server » and realized that it was due to “The bad network”But has not provided any detailed solution (discussion n ° 14907). In addition, even when certain responses have provided both causes and solutions, they may not be effective or proven effective by the problem of the problem or the members of the co -pilot team. For example, a user asked “A way to configure GitHub Copilot in Google Colab”But the user neither accepted nor responded to the three proposed answers (therefore # 72431032). Therefore, we cannot consider any of the three responses as an effective solution to its problem.

Table 1data extracted elements and their corresponding RQTable 1data extracted elements and their corresponding RQ

2.5. Data analysis

To respond to the three RQs formulated in section 2, we have carried out data analysis using the open coding and constant comparison methods, which are two techniques widely used from the theory founded during the qualitative data analysis (Stol et al., 2016). Open coding is not confined by pre -existing theoretical frameworks; Instead, he encourages researchers to generate codes based on real content in data. These codes constitute descriptive data summaries, aimed at entering the underlying themes. In constant comparison, the researchers constantly compare the coded data, refining and dynamically adjusting the categories according to their similarities and their differences.

The specific process of data analysis includes four steps: 1) The first author meticulously examined the data collected, then awarded descriptive codes that succinctly encapsulated the central themes. For example, the question of discussion # 10598 has been coded as “Stop giving suggestions online”which was reported by a user who noticed that his co -pilot previously operating had suddenly ceased to provide code suggestions in VSCODE. 2) The first author compared different codes to identify the models, the common points and the distinctions between them. Thanks to this process of iterative comparison, similar codes have been merged into top types and categories. For example, the discussion code n ° 10598, as well as other Akin codes, have been trained in a type of functionality failure, which also belongs to the category of Operating problem. Once the uncertainties have appeared, the first author embarked on discussions with the second and third authors to achieve consensus. It should be noted that, due to the nature of the constant comparison, the types and categories have undergone several cycles of refinement before reaching their final form. 4) The initial version of the results of the analysis was still verified by the second and third authors, and the approach of the negotiated agreement (Campbell et al., 2013) was used to deal with conflicts. The final results are presented in section 3.

Authors:

(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]));

(2) Peng Liang (corresponding author), computer school, University of Wuhan, Wuhan, China ([email protected]));

(3) Beiqi Zhang, IT school, Wuhan University, Wuhan, China ([email protected]));

(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]));

(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]));

(6) Mojtaba Shahin, School of Computing Technologies, Rmit University, Melbourne, Australia ([email protected]));

(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button