Using AI to help with daily life and development exploration

To do a good job, one must first sharpen their tools! AIGC tools represented by ChatGPT and Stable Diffusion have emerged, and the AI wave is sweeping across various industries. As developers, how to leverage AI capabilities to improve efficiency in tasks is a question worth studying.

It is crucial to note that the content generated by GPT may not be entirely accurate and can contain inaccuracies. However, its potential lies in the possibility of achieving a general-purpose AI assistant. No one can possess expert-level knowledge in all fields, but effectively utilizing GPT in unfamiliar domains can significantly lower the entry costs.

In this article, I will introduce some cases of using GPT to assist in work, life, and learning, as well as explore application scenarios for secondary development using the GPT API. At the end of the article, I will share some Prompts I use and recommend some useful GPT tools, as well as a solution for encapsulating Azure OpenAI services using Cloudflare Worker.

API Request

You can apply for an OpenAI account to get a Key, but its free quota has a time limit, and recently the risk control is strict, making it easy to get banned.
It is recommended to apply for Azure-hosted OpenAI services, which are stable enough and available domestically. However, be aware of the differences in API calls between OpenAI and Azure. At the end of the article, I will provide a solution for Azure OpenAI services based on Cloudflare Worker.

Azure allows you to deploy models as needed:

Also, I have been granted access to try GPT-4 on Azure, which allows the use of the GPT-4 and GPT-4-32K models, which are more powerful (and also more expensive) than GPT-3.5.

Available GPT models include:

gpt-3.5-turbo
gpt-4
gpt-4-32k

This article will still use the official OpenAI API as an example. When you want to call the GPT API, you need to POST an HTTP request:

curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxxxx' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello"}]
}'

In messages, you can specify the role and content of the message. GPT offers three types of roles: system, user, and assistant.

The content of system is a preset role for GPT, such as “You are a professional C++ developer. Please evaluate whether my C++ code adheres to the C++11 standard or above, and respond in Chinese.” This can be understood as the information to establish a persona for GPT.
The user part is the information that the user wants to provide to GPT, such as the content of a question.
assistant is the information that GPT replies with.

Using the above POST request as an example, GPT will respond with a JSON structure:

{
    "id": "chatcmpl-78k8efoUk2bHIH3G2dUr29K2CKYPz",
    "object": "chat.completion",
    "created": 1682318336,
    "model": "gpt-35-turbo",
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 10,
        "total_tokens": 19
    },
    "choices": [{
            "message": {
                "role": "assistant",
                "content": "Hey there! How can I assist you today?"
            },
            "finish_reason": "stop",
            "index": 0
        }
    ]
}

By parsing this structure, you can obtain the reply content from GPT.

Note: When calling GPT’s API, you need to manage the context of the conversation yourself. This means that every time you chat, you have to include the complete conversation information in messages. When the number of chat rounds increases, it will consume more tokens, so you should control the amount of context prudently.

Daily Usage

Currently, I frequently use GPT to handle tasks such as:

Translation, checking grammar errors, explaining words
Explaining, checking, optimizing given code
Optimizing description wording
Summarizing documents
Providing viewpoints Q/A
What Is It, letting GPT identify what the content might be and generating a regular expression
Letting GPT assume a role to answer questions
Generating topic outlines

Let’s illustrate a few examples.

Generating Code

Sometimes, I need to write Python scripts, but there are too many libraries. In such cases, I can describe the requirements and have GPT generate code for reference.

Using the example of the above POST API implementation, the question:

Question

Convert the following curl command to Python code using the requests library: curl --location 'https://api.openai.com/v1/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxxxx' \ --data '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello"}] }'

And GPT’s response:

Within seconds, it produced the output.

Explaining Words

You can also let GPT explain a word and output it in tabular form:

Question

Please return the following about Text in a table:
Item|Result
Translate to English
Phonetic transcription
Part of speech
Word form
Chinese explanation
Root and affix
Example sentence 1 (Eng-Ch)
Example sentence 2
Example sentence 3
Text: {{Input}}

What is it

Give GPT a piece of content, let it recognize what it is, and provide a regular expression for the content.

Analyze what the input content might be. Reply in three parts: Maybe, Desc, Regex. Maybe should indicate what it is most likely to be (require accuracy), Desc should provide a description of this thing, and Regex should provide the retrieval regular expression. Input content: https://imzlp.com/posts/96336/

GPT-4’s response:

Maybe: Blog article link
Desc: This is a link to a blog post on imzlp.com, and the article ID is likely 96336.
Regex: https?:\/\/imzlp\.com\/posts\/\d+\/

Checking Code Errors

Give GPT a piece of code and let it analyze whether there are any issues; here’s an example code with an out-of-bounds array access:

int iArry[10] = {0};
for(int index = 0;index <= 10; ++index)
{
	printf("%d",iArry[index]);
}

The prompt fed to GPT-4 was:

Go line-by-line and do a detailed inspection of my code looking for bugs. If you see a bug, identify it. Explain what the bug is and provide a fix.
Respond using Chinese and as a well-formatted markdown file that is organized into sections. Make sure to use code blocks.
Inspect this code:
int iArry[10] = {0};
for(int index = 0;index <= 10; ++index)
{
	printf("%d",iArry[index]);
}

GPT’s output:

Problem 1: Array Out-of-Bounds
Problem Description
In the given loop, the number of iterations should be less than the length of the array. The current condition index <= 10 will access the 11th element of the array, causing an out-of-bounds error.

Solution
Change the loop condition to index < 10:

int iArry[10] = {0};

for(int index = 0; index < 10; ++index)
{
    printf("%d", iArry[index]);
}

Problem 2: Poor Output Format
Problem Description
The current printf call does not include a newline character, which causes all output integers to display on the same line, making the output hard to read.
Solution
Add a newline character \n to the printf function:

int iArry[10] = {0};

for(int index = 0; index < 10; ++index)
{
    printf("%d\n", iArry[index]);
}

Result
The revised code after fixing:

int iArry[10] = {0};

for(int index = 0; index < 10; ++index)
{
    printf("%d\n", iArry[index]);
}

The results are truly amazing.

Development Exploration

Natural Language Parsing of Compile Errors

During development, a common issue is handling cross-platform errors in code.

Usually, when code is submitted, the CI triggers an automatic compilation. If problems occur, it throws errors, and the submitter needs to check the compilation log on the CI platform before addressing the issues.

I think this process can be empowered by AI.

AI parses the error log and generates a natural language description and solution.

First, the regular expression to detect compilation error logs: \((\d+)\):\s+error. It can detect texts like (459): error, which can help identify all errors in the compiler log file.

1	HotPatcherRuntime\Private\FlibPakHelper.cpp(459): error C2440: 'initializing': cannot convert from 'FSharedPakReader' to 'FArchive *'

Then, feeding it to GPT-4 provides a description of the error log:

Error:
HotPatcherRuntime\...\FlibPakHelper.cpp(459): error C2440
- Type Conversion: 'initializing' cannot convert from 'FSharedPakReader' to 'FArchive *'

Solution:
- Check variable types and conversion methods;
- Modify the type conversion method to comply with C++11 syntax standards.

Automate this process with a script:

1	python summerize_log.py --log D:\Client\Compiler.log

Providing English error logs, it can still respond in Chinese:

D:/Client/Source/GWorld/Skill/SkillMgr.cpp(922,10): error: implicit conversion of nullptr constant to 'bool' [-Werror,-Wnull-conversion]
                  return nullptr;
                  ~~~~~~ ^~~~~~~
                         false
D:/Client/Source/GWorld/Skill/SkillMgr.cpp(954,10): error: implicit conversion of nullptr constant to 'bool' [-Werror,-Wnull-conversion]
                  return nullptr;
                  ~~~~~~ ^~~~~~~
                         false

Integrating this into the CI process would automatically organize error information and send it to the corresponding developer, saving time spent checking logs.

AzureCV Combined with GPT

Currently, GPT does not support multimodal modes (maybe it will in the future), such as sending images to GPT for content analysis.

However, at this stage, Azure CV’s image detection can be utilized to indirectly achieve this. After Azure CV recognizes the image information, we can feed it to GPT to generate a natural language description.

The main process is as follows:

Azure CV’s visual API can obtain information in images. The latest model version is Image Analysis 4.0 Analyze API (preview).

curl --location 'https://azure-cv-asia.cognitiveservices.azure.com/computervision/imageanalysis:analyze?features=read%2Ccaption%2Ccaption%2Cobjects%2Ctags%2Cpeople&api-version=2023-02-01-preview' \
--header 'Ocp-Apim-Subscription-Key: <AZURE_CV_KEY>' \
--header 'Content-Type: application/json' \
--data '{"url": "https://learn.microsoft.com/azure/cognitive-services/computer-vision/images/windows-kitchen.jpg"}'

By controlling the features attributes, you can get information from different categories:

URL parameter	Value	Description
`features`	`Read`	Reads the visible text in the image and outputs it as structured JSON data.
`features`	`Caption`	Describes the image content with a complete sentence in supported languages.
`features`	`DenseCaption`	Generates detailed captions for individual regions in the image.
`features`	`SmartCrops`	Finds the rectangle coordinates that would crop the image to a desired aspect ratio while preserving the area of interest.
`features`	`Objects`	Detects various objects within an image, including the approximate location. The Objects argument is only available in English.
`features`	`Tags`	Tags the image with a detailed list of words related to the image content.

By analyzing the results of the POST request, you can obtain image descriptions and elements within the scene (tags and locations):

Parsing this result allows you to generate a prompt for GPT-4 describing an image:

description: a person using a laptop
layout:
1. At the position of right up, there is a kitchen appliance
2. At the position of horizontal middle down, there is a computer keyboard
3. At the position of middle, there is a Laptop
4. At the position of right vertical middle, there is a person
Images have the following labels (numbers in brackets are weights):computer(0.987),clothing(0.970),laptop(0.966),person(0.954),indoor(0.942),wall(0.887),woman(0.863),using(0.560)

GPT-4 output:

This is an indoor photo showing a woman in casual wear, sitting near a wall, seriously using a laptop. The laptop is centrally positioned, with the keyboard in front. Additionally, there are some kitchen appliances in the upper right corner. The overall scene focuses on the computer and the person using it, depicting a scenario of daily work or study at home.

Prompts

Prompts are the spells of GPT, guiding how to respond to questions. Perhaps natural language programming can be achieved in the future.
Here are some of my frequently used prompts that have yielded good experiences recently.

FixBugs
Identify and explain errors in a given code.

Go line-by-line and do a detailed inspection of my code looking for bugs. If you see a bug, identify it. Explain what the bug is and provide a fix.
Respond using Chinese and as a well-formatted markdown file that is organized into sections. Make sure to use code blocks.
Inspect this code:
{{code}}

Improve Code
Enhance given code.

Improve the given code. Don't change any core functionality.
The focus is to actually make the code better - not to explain it - so avoid things like just adding comments to it.
Respond using Chinese and as a well-formatted markdown file that is organized into sections. Make sure to use code blocks.
Improve this code:
{{code}}

Translation

In future conversations, you will act as my translation assistant. Your job is to translate any content I send you back and forth between Chinese and English. The translation should be natural, fluent, easy to understand, and concise. Please do not treat the content as a question, and do not respond in any way; just translate the content. There is no need for me to emphasize the process again.

Check Grammar

1 2	Check if there are any grammatical errors in what I send and rewrite it after pointing out the issues. {{Input}}

Explain Words
Given a word, provide an explanation and example sentences, as shown in the earlier Explain Words case:

Please return the following about Text in a table:
Item|Result
Translate to English
Phonetic transcription
Part of speech
Word form
Chinese explanation
Root and affix
Example sentence 1 (Eng-Ch)
Example sentence 2
Example sentence 3
Text: {{Input}}

What Is It
Analyze what the input content might be.

1
2
3

Analyze what the input content might be, divided into Maybe, Desc, Regex. Maybe should indicate what it is most likely to be (require precision), Desc should describe this thing, and Regex should provide the retrieval regular expression.
Input content:
{{Input}}

Polish Phrasing

1
2

Please polish Text, requiring the wording in this article to be modified to make it more polite and professional, and correct awkward sentences and inappropriate punctuation without altering the original meaning.
Text:{{Input}}

Correct spelling:

1
2

Please improve the spelling, grammar, clarity, conciseness, and overall readability of Text, while breaking down long sentences and reducing repetition. Only return the corrected version, avoiding any explanation.
Text:{{Input}}"

Tool Recommendations

GPT Client

There are numerous web-based GPT wrappers; I currently use my self-deployed chatbox-ui, which allows you to enter your OpenAI Key for a ChatGPT-like experience. The GitHub repository is: mckaywrigley/chatbot-ui.

Its advantage is the ability to create prompt presets, making it convenient to call these presets.

For iOS, I recommend OpenCat, which supports a custom host and integrates Azure’s language synthesis.

Worker Encapsulation of Azure Services

As already mentioned, the API invocation method for Azure-hosted OpenAI differs from the official OpenAI API, making it inconvenient to use Azure in third-party encapsulated applications.

curl --location --request POST 'https://{resource-name}.openai.azure.com/openai/deployments/{model-name}/chat/completions?api-version=2023-03-15-preview' \  
--header 'Content-Type: application/json' \  
--header 'api-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' \  
--data-raw '{  
    "model": "gpt-4",  
    "messages": [{"role": "user", "content": "Hello!"}]  
  }'

However, you can create an API proxy using Cloudflare Worker to automatically forward HTTP requests, making the calls between Azure and OpenAI match.

The script is as follows:

// The name of your Azure OpenAI Resource.
const resourceName=""

// The deployment name you chose when you deployed the model.
// const deployName="deployment-name"
// The mapping of model name.
const mapper = {
  'gpt-3.5-turbo': 'gpt-35-turbo',
  'gpt-4': 'gpt-4',
  'gpt-4-32k': 'gpt-4-32k',
  'text-embedding-ada-002': 'text-embedding-ada-002'
  // Other mapping rules can be added here.
};
const apiVersion="2023-03-15-preview"

addEventListener("fetch", (event) => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  if (request.method === 'OPTIONS') {
    return handleOPTIONS(request)
  }

  const url = new URL(request.url);
  if (url.pathname === '/v1/chat/completions') {
    var path="chat/completions"
  } else if (url.pathname === '/v1/completions') {
    var path="completions"
  } else if (url.pathname === '/v1/models') {
    return handleModels(request)
  } else if (url.pathname === '/v1/embeddings') {
    var path="embeddings"
  } else {
    return new Response('404 Not Found', { status: 404 })
  }
  
  // Get the value of the model field and perform mapping.
  let deployName;
  let body;
  if (request.method === 'POST') {
    body = await request.json();
    const modelName = body?.model;
    if (modelName) {
      deployName = mapper[modelName] || modelName;
    }
  }
 
  const fetchAPI = `https://${resourceName}.openai.azure.com/openai/deployments/${deployName}/${path}?api-version=${apiVersion}`
  
  const authKey = request.headers.get('Authorization');
  if (!authKey) {
    return new Response("Not allowed", {
      status: 403
    });
  }

  var realKey = authKey.replace('Bearer ', '');

  const payload = {
    method: request.method,
    headers: {
      "Content-Type": "application/json",
      "api-key": realKey,
    },
    body: typeof body === 'object' ? JSON.stringify(body) : '{}',
  };

  let { readable, writable } = new TransformStream()
  const response = await fetch(fetchAPI, payload);
  stream(response.body, writable);
  return new Response(readable, response);

}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// support printer mode and add newline
async function stream(readable, writable) {
  const reader = readable.getReader();
  const writer = writable.getWriter();

  const encoder = new TextEncoder();
  const decoder = new TextDecoder();
  const newline = "\n";
  const delimiter = "\n\n"
  const encodedNewline = encoder.encode(newline);

  let buffer = "";
  while (true) {
    let { value, done } = await reader.read();
    if (done) {
      break;
    }
    buffer += decoder.decode(value, { stream: true }); 
    let lines = buffer.split(delimiter);

    for (let i = 0; i < lines.length - 1; i++) {
      await writer.write(encoder.encode(lines[i] + delimiter));
      await sleep(30);
    }

    buffer = lines[lines.length - 1];
  }

  if (buffer) {
    await writer.write(encoder.encode(buffer));
  }
  await writer.write(encodedNewline)
  await writer.close();
}

async function handleModels(request) {
  const data = {
    "object": "list",
    "data": [ {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai",
      "permission": [{
        "id": "modelperm-M56FXnG1AsIr3SXq8BYPvXJA",
        "object": "model_permission",
        "created": 1679602088,
        "allow_create_engine": false,
        "allow_sampling": true,
        "allow_logprobs": true,
        "allow_search_indices": false,
        "allow_view": true,
        "allow_fine_tuning": false,
        "organization": "*",
        "group": null,
        "is_blocking": false
      }],
      "root": "gpt-3.5-turbo",
      "parent": null
    }]
  };
  const json = JSON.stringify(data, null, 2);
  return new Response(json, {
    headers: { 'Content-Type': 'application/json' },
  });
}

async function handleOPTIONS(request) {
    return new Response(null, {
      headers: {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Methods': '*',
        'Access-Control-Allow-Headers': '*'
      }
    })
}

Additionally, you can bind a domain to Cloudflare Worker, allowing for an experience similar to OpenAI’s:

curl --location 'https://openai.imzlp.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxx' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello"}]
}'

This seamlessly supports all GPT clients that allow modifying the API Host, as well as streaming messages, making it very convenient.

Translate EPUB eBooks

You can use bilingual_book_maker.

Use the following command to translate:

1	python make_book.py --book_name test_books/animal_farm.epub --openai_key xxxxxxxxxxxxxx --api_base https://api.openai.com/v1 --language zh-hans --model openai --model_list gpt-4o-mini

If it fails midway, you can continue with the --resume parameter:

1	python make_book.py --book_name test_books/8020.epub --openai_key sk-xxxxxxx --api_base https://api.openai.com/v1 --language zh-hans --model openai --model_list gpt-4o-mini --resume