888casino88free| If you drop it by 97%, I'm free! Big model, quickly start a price war!_News

The Chinese AI model began last year.888casino88freeThis year, the price war has peaked.

Byte jump is regarded as one of the initiators of this round of price war. Last week's byte-beating big model shouted that it was 99% cheaper than the industry.888casino88free.3%888casino88freeThen Ali announced the price reduction of the big model, and today, Baidu Intelligent Cloud announced that the two main models of Wenxin big model are completely free. This means that only one year after the big model came out, it ushered in the era of free.

On May 15, the byte bouncing bean bag model was officially released at the volcano engine propulsion conference. Volcano engine is a cloud service platform under byte Jump. According to Tan Dai, president of Volcano engine, after a year of iteration and market verification, Doubao model is becoming one of the most widely used and richest application scenarios in China. Currently, it processes 120 billion Tokens text a day and generates 30 million images.

"with a large amount of usage, a good model can be polished and the unit cost of model reasoning can be greatly reduced. The price of the bean bag main force model in the enterprise market is only 0.888casino88free0008 yuan per thousand tokens, 0.8% can handle more than 1500 Chinese characters, which is 99.3% cheaper than the industry. " Mr Tan says that the big model, from point pricing to per cent pricing, will help companies accelerate business innovation at a lower cost.

For this claimed cost reduction, some people in the industry do not agree.

A large model service provider introduced to the reporter: "when we use the big model, we will ask the big model questions." In the eyes of large model manufacturers, we can simply regard a question as a request to the large model (Request). A request will include two parts: input and output, that is, users ask questions to the model first, and the model gives the answer. The amount of text contained in the questions and answers is usually converted into a unit that the large model can understand, that is, Token. According to the characteristics of Q & A content, model computing power consumption and other factors, large model manufacturers generally set different prices for input and output content. "

At the same time, in terms of payment mode, it is mainly divided into two modes: prepaid and post-payment. Generally speaking, the prepaid model is cheaper than the postpaid model. Volcano engine's newly released large model Doubao-pro-32k, although it sets an ultra-low postpaid price for model input, does not show the output price of the model at the press conference, which is much higher than the input price.

More importantly, users can enjoy this price only under the strong limit of low business concurrency (60 Q & A requests per minute, that is, 1 Q & A request per second). This means that the "ultra-low price" of the bean bag model can only be used for experience and testing, not really used in the production environment.

In this regard, the byte jump also explained to the reporter: reasoning input accounts for the vast majority of model reasoning, the industry generally believes that the input ratio output is 5: 1. So the press conference is based on reasoning input price as an example. At the same time, enterprises are free to choose the prepaid or post-payment model: prepaid has additional service guarantee, and the price will be higher than post-payment (equivalent to the comparison between business class and economy class), but at the same time, there will be a very big discount on prepaid. You can discount as little as 90%. A small number of large customers who need high concurrency protection will prefer the "prepaid" model; most customers will use the "post-payment" model. "postpaid" is generally the highest restricted TPM (token per minute) and RPM (requests per minute) in the industry. Compared with competitors, the TPM and RPM limits of "postpaid" in Doubao large model are also very high, enough to meet business needs, and customers can also purchase multiple model units to improve concurrency.

After the byte-bouncing big model shouted that it was 99.3% cheaper than the industry, Ali also announced a price reduction. Especially in the byte jump focus on the input price of thousands of tokens, Ali directly gave a lower price.

On May 21, Aliyun official Weibo announced that the price of the Tongyi Qianqian model had been reduced. Among them, the price reduction of Qwen-Long, the main model of GPT-4 level, even reached 97%. The input price of GPT-4 API dropped from 0.02 CNY / thousand tokens (the basic unit in the text) to 0.0005 yuan / thousand tokens. This means that one yuan can buy 2 million tokens, which is equivalent to the word volume of five Xinhua dictionaries.

888casino88free| If you drop it by 97%, I'm free! Big model, quickly start a price war!

The model supports up to 10 million tokens long text input, which is about 1ame400 of the GPT-4 price after the price reduction.

Byte jump responded to Ali's price reduction.

The person in charge of the volcano engine said that the price reduction of Tongyi Qianqian model is very welcome to help enterprises explore AI transformation at a lower cost and accelerate the landing of large model application scenarios. According to reports, while greatly reducing the price, the bean bag large model also provides customers with the highest standards in the industry, TPM (Tokens per minute) and RPM (requests per minute). The processing Tokens limit per minute is several times that of the same specification model in the industry, and can support a large number of concurrent requests, which helps enterprises to invoke the big model in the production system.

As the first platform in China to release a big model, Baidu quickly joined the price scuffle. Today, Baidu Intelligent Cloud announced that ENIRE Speed and ENIRE Lite, the two main models of Wenxin model, are completely free, effective immediately.

The two large models were released in March this year and both support 8K and 128k context lengths. ERNIE Lite is a lightweight large language model developed by Baidu, which takes into account both excellent model effect and reasoning performance, so it is suitable for low computing power AI accelerator card reasoning. ERNIE Speed is a high-performance large language model developed by Baidu, which is suitable for fine tuning as a base model to better deal with specific scene problems and has excellent reasoning performance.

From the byte jump to launch the big model of bean bag and announce that the price of the big model has entered the era of Li, to Baidu directly announced that the two main models are free, and the price reduction intensity and speed of the big model far exceed market expectations.

2024 is considered to be the first year of AI application. Some people in the industry said that the high cost of reasoning still restricts the large-scale application of large models, and price reduction will help to attract more product users.

"I don't know whether to fight or not, but it's in the price anyway." Some people close to a large factory said.

888casino88free| If you drop it by 97%, I'm free! Big model, quickly start a price war!

相关推荐

Category