Crypto

Behind the scenes: the prompts and the tips that made ICL operate several shots

Summary and 1 Introduction

2 related work

3 methods and 3.1 models

3.2 Data sets

3.3 Evaluation metrics

4 results and 4.1 increasing number of demonstrating examples

4.2 IMPACT OF LOTS Requests

4.3 Cost and latency analysis

5 discussion

6 Conclusion and references

A. Pristlests used for ICL experiences

B. Quick selection

C. GPT4 (V) -Turbo Performance under ICL several shots

D. ICL performance several blows on medical AIM tasks

Thanks and disclosure of financing

An invite used for ICL experiences

A.1 Invite used for image classification experiences

A.2 prompts used for image classification experiences with a lot

A.3 Pristlests used for ablation experiences by lots

A.3.1 Image prefixation

B Prompt selection

We use a different set of prompts to test the robustness of Manyicl to inviting formulation differences. We randomly sample two data sets (HAM10000 and Eurosat) for this experience because of the budgetary limit.

B.1 Guests used for supplier selection experiences

Note that only the questions section is shown here and that prompt 1 is used for all other image classification experiences.

B.1.1 Invite 1

B.1.2 Invite 2

B.1.3 Invites 3

Figure 5: ICL's sensitivity analysis with several blows. These routes show the change in the performance of the task on two data sets as the number of demonstration examples increases, using three different guests. For all experiences on sensitivity analysis, the Gemini 1.5 Pro model is used. The X axis is on a logarithmic scale, representing the number of demonstrating examples plus one. Log-linear improvement until optimal performance is consistent on all selected prompts.Figure 5: ICL's sensitivity analysis with several blows. These routes show the change in the performance of the task on two data sets as the number of demonstration examples increases, using three different guests. For all experiences on sensitivity analysis, the Gemini 1.5 Pro model is used. The X axis is on a logarithmic scale, representing the number of demonstrating examples plus one. Log-linear improvement until optimal performance is consistent on all selected prompts.

B.2 Quick selection results

Figure 5 shows the sensitivity of performance to the selection of prompts on two data sets with three prompts. Although there is a small gap in performance, but the overall trend in improving the log-linear is consistent.

C GPT4 (V) -Turbo Performance under ICL several shots

GPT4 (V) -Turbo shows mixed results for ICL several blows, with substantial performance improvements on HAM1000, UCMERED, EUROSAT and DTD, but minimum improvements or no improvement between the other six data sets (Figure 6). However, we note that we were unable to increase the number of demonstration examples at the same level as Gemini 1.5 Pro because GPT4 (V) -Turbo has a shorter context window and is more subject to errors of delay delay during scaling. In addition, GPT4 (V) -Turbo generally seems to underperform Gemini 1.5 PRO through the data sets excluding the five and Eurosat for which it seems to correspond mainly to the Gemini 1.5 Pro performance. The GPT4 performance test (V) -Turbo on drugs shows a great variance, resembling that of Gemini 1.5 Pro with advanced performance at 40 examples of demonstration.

D ICL Performance several times on medical AIM tasks

D.1 Invite used for medical experiences of the AQ (Medqa, MEDMCQA)

Figure 6: GPT4 performance (V) -Turbo and GPT-4O zero-shot with several iCL shots. The X axis is on a logarithmic scale.Figure 6: GPT4 performance (V) -Turbo and GPT-4O zero-shot with several iCL shots. The X axis is on a logarithmic scale.

Figure 7: ICL performances several times medical AQ tasks.Figure 7: ICL performances several times medical AQ tasks.

D.2 Results

Figure 7 shows the results of AQ's medical tasks.

Thanks and disclosure of financing

We thank Dr. Jeff Dean, Yuhui Zhang, Dr Mutallip Anwar, Kefan Dong, Rishi Bommasani, Ravi B. Sojitra, Chen Shani and Annie Chen for their comments on ideas and the manuscript. Yixing Jiang is supported by National Science Scholarship (PHD). This work is also supported by Google Cloud Credit. Dr. Jonathan Chen Has Received Research Funding Support in Part by NIH/National Institute of Allergy and Infectious Diseases (1R01AI17812101), NIH/National Institute on Drug Abuse Clinical Trials Network (UG1DA015815 – CTN -0136), Gordon and Betty Moore Foundation (Grant #12409), Stanford Artificial Intelligence in Medicine and Imaging – Grandu Share Centées Centéned Interdial (AIMI -HAI), Google, Inc. CO -I research to take advantage of DSE data to predict a range of clinical results, American Heart Association – Strategically focused research network – Diversity in clinical trials and NIH-NCATS-CTSA Grant (UL1TR003142) for common research resources.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button