Supercomputer list or face shuffle

A few days ago we had a simple analysis of various coprocessors capable of accelerating computational power. We mentioned methods like CUDA acceleration. There are some areas that need to be discussed in terms of performance description. Details can be found in "Rational analysis of mainstream coprocessors: Where does 100x efficiency come from?" In fact, there is also a focus of attention in the text, that is, the use of coprocessors. Products such as TESLA provide additional ALUs of large scale for supercomputing to fully enhance computing power. But how can these computing powers be used and how are they actually used? Recently, the TOP500 standard updated information was reported. We may wish to combine the two issues and analyze the TOP500 specification by accelerating the current use of computing hardware.

How did Linpack reign

The TOP500 is the world's supercomputer list and is regularly updated to rank supercomputers based on Linpack performance test scores. The most important news in this field at the moment is that from the next ranking, we will increase the use of the HPCG (Highly Conjugate Gradient Benchmark) and the existing HPL (Highly Parallel Computing Benchmark, also known as the Linpack) standard. This is a more scientific measure of supercomputing performance.

As a tool to measure performance, Linpack has been used for 30 years. The latest version is HPL2.0 updated in 2008. For a long time, Linpack's most successful place is that its Gaussian elimination solution linear equation method can well mobilize thousands of operating cores on supercomputers to work together, so Linpack has not been able to shake.

Thirty years ago, even 10 years ago, Linpack's metrics were all feasible. Because the architecture of a supercomputer is relatively simple, there is almost only one type of processor inside a set of hardware systems. It can simply be thought that a supercomputer is built from thousands of identical processors. The frequencies of these processors are The key indicators such as the instruction set are exactly the same. In other words, multithreaded software was almost identical to Linpack in terms of supercomputing performance. Therefore, using Linpack to measure the supercomputing performance did not receive much doubt. The so-called questioning is nothing more than Linpack's idealization, because the application is Sub-mobility is not so much processor, and not all code supports parallel processing.

In modern times, Supercomputing finds a shortcut in the process of seeking performance enhancements, which is the accumulation of ALUs. In the PC field due to some occasional graphics processing needs, PC graphics cards not only bear the mapping tasks, but also continue to assume more and more heavy mathematical pressure to liberate the CPU, resulting in higher and higher demand for graphics chip computing performance. This demand finally gave birth to a qualitative change. In the field of hardware, a "stream processor" architecture such as CUDA or STREAM was derived, and the accumulating ALU's accelerated computing hardware was derived, which immediately matched the demands of the supercomputing. At the same time, due to the development of chip manufacturing technology, some of the more powerful companies will combine their own high-performance processors with Intel and AMD platforms. For example, China's supercomputers have introduced independent intellectual property rights such as Godson, Feiteng and Shenwei. Device. This forms a multi-heterogeneous architecture supercomputer consisting of a central processor, an accelerator operator, and a third-party processor.

The top supercomputers on the TOP500 list are technology pioneers. They took the lead in introducing accelerators such as TESLA to perform dual-heterogeneous competition, and later pioneered the introduction of more unique chips into supercomputers. Multiple isomerism is formed. Under such circumstances, Linpack still mobilizes all the computing resources to accumulate, presenting the total computing power of all the chips that can participate in the operation in front of the public. But this time, the supercomputer users did not agree.

Take the supercomputer

If you use road construction to metaphor this change. You might as well just describe the supercomputer in full traffic conditions. The residents in Beijing who spoke the most were the Xizhimen overpass and the Siyuan Bridge overpass. This is still the case. The seemingly scientific diversion system of the two overpasses is only theoretically feasible. In actual use, some channels are idle and some channels are congested. The ideal design of vehicles based on the type of direction does not achieve its goal. Heterogeneous computing also has such problems. Some channels (CPUs) are still fully loaded for a long time, and some channels (GPUs) are idle for a long time. There are many reasons, such as the driver (programmer) is not familiar with the rules of the road, such as road direction is not completely compatible with the direction of peak traffic, such as easy to get lost and so on.

The evolution of supercomputer architecture is such a scenario. The early supercomputer is like an ordinary main road. There may be vehicles with different speeds, different types, and different destinations, but we all have only one kind of main road. The efficiency of access depends on the minimum speed limit of the main road and the number of roads, whether it is one or ten or fifty. Although only a small number of cars pass sometimes. However, as long as one's brains quickly guide the vehicle to the road, there is no obstacle to the vehicle itself.

However, this method may not be fast enough to evolve, and there is not much space for the construction of arterial roads. In the face of the ever-increasing traffic demand, designers have begun to find new ways. After observing, the designers found out the rules and classified the vehicles according to the characteristics of their directions, size, and speed. Therefore, special customization was carried out to introduce three-dimensional traffic to divert traffic from the vehicle on the basis of retaining the main road, and a large number of auxiliary roads such as large truck lanes, lanes for small trucks, and bicycle lanes were created. However, the passage of the overpass requires dedicated scheduling. If the scheduling is not good, it may cause congestion and may not be used. In short, the speed of travel under ideal conditions will become faster, but in actual use, it will have to face a very complex situation, such as the need to distinguish and guide the type and direction of vehicles, to timely monitor the road conditions and so on.

When the type of road is more than one, and the scheduling of the vehicle becomes very complicated, Linpack still goes its own way and counts the optimal capacity of all the roads, presents it before us, and uses it as a basis to list the rankings. The basis of design quality is good or bad. Over time, road designers have found that the development of all kinds of dedicated roads like the above is more convenient than the large-scale laying of arterial roads, and it can also greatly improve the Linpack statistical score. Driven by this status quo, designers began to use various methods to increase the theoretical value of transportation capacity madly without considering whether it was practical or not, leading to the disconnection between the theoretical transportation capacity of the road and the actual capacity. Similarly, the supercomputing development plan has now been led by Linpack to an embarrassing road, that is, estimating only the ideal situation, but less and less consider the actual situation.

We can set up different walking rules for different roads, requiring that each type of car can only go from the designated road. However, if the driver can't quickly understand the rules and follow the rules, it may not be as straightforward as the driver can quickly walk through the main road, and the design of the three-dimensional traffic will be lost. In today's informatics, there have been more in-depth studies on this phenomenon. The focus of informatics research is on how to make rules simple, and on simple rules, the efficiency of the entire road system is high. The unit describing the complexity of the rules is called information entropy. It was proposed by Claude Elwood Shannon (1916-2001), the father of information theory. In the case of traffic in this page, one of the resources is effectively dispatched. The bottom line is that the information entropy of the scheduling should be less than or equal to the information entropy of the non-scheduling, otherwise it would be a waste of time after the scheduling, and it is not as good as a full line-of-sight queue to go from the main road.

Heterogeneous Computing Is Difficult to Control and Produce Standard Changes

Hardware can be piled up with funds, but software, algorithms, rules, etc. cannot be reformed in a short time. The development of software and hardware in the field of ultra-computing is extremely uneven. The current situation is that the research on information entropy lags far behind the development of hardware. The information entropy of parallel computing and heterogeneous computing continues to rise, causing supercomputer hardware performance to fail. Play, or even completely unusable. This point has already been reflected in the testing of TOP500. The founder of the Linpack standard is also concerned about the problem of performance and actual disconnection of the supercomputing theory. Jack Dongarra, professor at the University of Tennessee-Knoxville, is the initiator of the new standard for supercomputer TOP500 and the founder of the Linpack standard. He also questioned the objectivity of Linpack at this stage of the interview. Count TITAN as an example.

TITAN has 18,688 nodes, each with a 16-core Opteron processor, 32GB of memory, and an NVIDIA TESLA K20 accelerator. The results of TITAN's ranking are based on the HPL standard and tested by Linpack. In the Linpack test, the Opteron processor only played a part of the performance, and all the floating-point calculations were done on the TESLA K20. However, in actual use, there are very few cases of using supercomputing resources like Linpack. In most cases, the application cannot use TESLA K20 at all, and only uses the CPU's computing resources, and in order to reduce the probability of error and not suitable for different applications. Accelerated application software acceleration will also selectively shut down the TESLA K20.

Obviously, in the era of today's heterogeneous computing, Linpack's invariable test method can reflect the maximum computing performance of supercomputers, but it has a very big gap with the actual use. As for the reason for not using heterogeneous computing to accelerate, it is mainly hampered by the difficulty of development. "There is no difference between 10 hours and 15 hours when the program is run. It's all a result of a sleep. But for a few hours, it takes several times more work to write the code, and it also prevents some unpredictable errors. It is actually worth the candle," said one friend who was doing development.

Therefore, under such circumstances, since the corresponding software development environment is difficult to change in a short time, the actual performance of all heterogeneous systems, including supercomputers, must be re-examined, and even a new set of scientific methods must be developed. Can reflect the actual performance level metrics for performance evaluation. On the one hand, it can prevent new supercomputers from brushing lists with heterogeneous systems that are not yet practical. On the other hand, it is also convenient for users to select suitable equipment. Two sets of results can be tested using the HPL and HPCG standards under the new specification. Emphasis on heterogeneous computing can refer to HPL results. If you seek security, you can refer to HPCG scores.

Anyway, this means that the next TOP500 update is extremely important and may even affect the ranking of many supercomputers. Heterogeneous computing has allowed supercomputers to make rapid progress in a short period of time. However, just a few years later we discovered that heterogeneous computing is a hard-to-tame, high-performance monster that is capable of tapping its full potential. We must change the existing standards and use a more scientific method to measure how much energy these monsters can currently exert.

Solar Led Street Light

solar led street light,High Quality solar led street light,solar led street light Details, CN

ZHONGSHAN G-LIGHTS LIGHTING CO., LTD. , https://www.glightsled.com