I can not stop looking at these rainbow long exposure photographs by Daniel Mercadante. He captured his partner running with a custom built lighting rig. So much appreciation for this. So much!

The brain has evolved over a long time, from very simple worm brains 500 million years ago to a diversity of modern structures today. The human brain, for example, can accomplish a wide variety of activities, many of them effortlessly — telling whether a visual scene contains animals or buildings feels trivial to us, for example. To perform activities like these, artificial neural networks require careful design by experts over years of difficult research, and typically address one specific task, such as to find what's in a photograph, to call a genetic variant, or to help diagnose a disease. Ideally, one would want to have an automated method to generate the right architecture for any given task.

One approach to generate these architectures is through the use of evolutionary algorithms. Traditional research into neuro-evolution of topologies (e.g. Stanley and Miikkulainen 2002) has laid the foundations that allow us to apply these algorithms at scale today, and many groups are working on the subject, including OpenAI, Uber Labs, Sentient Labs and DeepMind. Of course, the Google Brain team has been thinking about AutoML too. In addition to learning-based approaches (eg. reinforcement learning), we wondered if we could use our computational resources to programmatically

In “Large-Scale Evolution of Image Classifiers,” presented at ICML 2017, we set up an evolutionary process with simple building blocks and trivial initial conditions. The idea was to "sit back" and let evolution at scale do the work of constructing the architecture. Starting from very simple networks, the process found classifiers comparable to hand-designed models at the time. This was encouraging because many applications may require little user participation. For example, some users may need a better model but may not have the time to become machine learning experts. A natural question to consider next was whether a combination of hand-design and evolution could do better than either approach alone. Thus, in our more recent paper, “Regularized Evolution for Image Classifier Architecture Search” (2018), we participated in the process by providing sophisticated building blocks and good initial conditions (discussed below). Moreover, we scaled up computation using Google's new TPUv2 chips. This combination of modern hardware, expert knowledge, and evolution worked together to produce state-of-the-art models on CIFAR-10 and ImageNet, two popular benchmarks for image classification.

The following is an example of an experiment from our first paper. In the figure below, each dot is a neural network trained on the CIFAR-10 dataset, which is commonly used to train image classifiers. Initially, the population consists of one thousand identical simple

Inception-ResNet classifier in one step, we would be incorrectly concluding that the algorithm found a good answer. Yet, in that case, all we would have done is hard-coded the final answer into a complex mutation, rigging the outcome. If instead we stick with simple mutations, this cannot happen and evolution is truly doing the job. In the experiment in the figure,

In all the above, even though we were minimizing the researcher's participation by having simple initial architectures and intuitive mutations, a good amount of expert knowledge went into the building blocks those architectures were made of. These included important inventions such as convolutions, ReLUs and batch-normalization layers. We were evolving an

After our first paper, we wanted to reduce the search space to something more manageable by giving the algorithm fewer choices to explore. Using our architectural analogy, we removed all the possible ways of making large-scale errors, such as putting the wall above the roof, from the search space. Similarly with neural network architecture searches, by fixing the large-scale structure of the network, we can help the algorithm out. So how to do this? The inception-like modules introduced in Zoph et al. (2017) for the purpose of architecture search proved very powerful. Their idea is to have a deep

The building blocks introduced in Zoph et al. (2017). The diagram on the left is the outer structure of the full neural network, which parses the input data from bottom to top through a stack of repeated cells. The diagram on the right is the inside structure of a cell. The goal is to find a cell that yields an accurate network. |

Even though the mutation/selection evolutionary process is not complicated, maybe an even more straightforward approach (like random search) could have done the same. Other alternatives, though not simpler, also exist in the literature (like reinforcement learning). Because of this, the main purpose of our second paper was to provide a controlled comparison between techniques.

Comparison between evolution, reinforcement learning, and random search for the purposes of architecture search. These experiments were done on the CIFAR-10 dataset, under the same conditions as Zoph et al. (2017), where the search space was originally used with reinforcement learning. |

One important feature of the evolutionary algorithm we used in our second paper is a form of

The state-of-the-art models we evolved are nicknamed

You’ve heard Justin Bieber mangled into gorgeous ambient cascades of sound. Now, you can experience the magic of PaulStretch as a free plug-in.

It may give you that “A-ha” moment in ambient music. You know:

The developer has various warnings about using this plug-in, which for me make me want to use it even more. (Hey, no latency reporting to the DAW? Something weird in Cubase! No manual? Who cares! Let’s give it a go – first I’m going to run with scissors to grab a beer which I’ll drink at my laptop!)

Specifically:

The plugin is only suitable for radical transformation of sounds. It is not suitable at all for subtle time corrections and such. Ambient music and sound design are probably the most suitable use cases.

You had me at radical / not subtle.

Okay… yeah, this was probably meant for me:

You can use it two ways: either load an audio file, and just run PaulStretch in your DAW, or use it as a live processor on inputs. (That’s weird, given what it does – hey, there was some latency. Like… a *whole lot of latency*.)

It’s on Mac and Windows but code is available and Linux is “likely.”

https://xenakios.wordpress.com/paulxstretch-plugin/

If you want the original:

http://hypermammut.sourceforge.net/paulstretch/

https://github.com/paulnasca/paulstretch_cpp

That does other nifty tricks, like binaural beats and additional stretch modes. And it’s more reliable than this plug-in. The one catch: building for Windows and Linux is easy enough, and there are pre-built binaries available, but Mac users are mostly on their own. There was one project a few years back – but it seems it hasn’t been updated. (The blog post explains some of the complexity):

http://music.cornwarning.com/2011/12/07/new-paulstretch-os-x-build/

But the plug-in I think just became the easiest way to use it. Now go forth and make long sounds and chill to them.

The post A free plug-in brings extreme PaulStretch stretching to your DAW appeared first on CDM Create Digital Music.

If you've come across the Balls Into Bins problem, you probably heard about in context of hash tables. When you hash things into a hash table (especially with separate chaining) it's really useful to be able to ask "If I throw đ‘€ balls into đ‘ bins, what is the distribution of balls in bins?" You can see how this is fundamental to hash tables: the amortized complexity argument for hash tables depends on their being some *load factor* (i.e. đ‘€/đ‘) for which most bins contain a small number of items. Once this stops being true, lookup and insertion time on hash tables starts to get ugly. So from that perspective it's already clearly an important problem.

Hash tables aren't the only place that the Balls Into Bins problem is interesting. It comes up often in distributed systems, too. For one example, think about a load balancer (in this case a distributor of independent requests) sending load to some number of backends. Requests (đ‘€) are balls, and the backends are bins (đ‘) and typically there are multiple requests going to each backend (đ‘€ > đ‘). If we know how to solve for the number of balls in each bin, we can understand the limits of random load balancing, or whether we need a stateful load balancing algorithm like *least connections*. This is an important question to ask, because sharing consistent state limits scalability, and sharing eventually-consistent state can even make load balancing decisions worse. Load balancing is much easier if it can be done statelessly.

A related problem is push-based work allocation. Here, there is some co-ordinator handing out work items to a fleet of workers, and trying to have those workers do approximately equal amounts of work. One way that systems end up with this pattern is if they are using shuffle sharding or consistent hashing to distribute work items (or records). These hashing-based methods can be great for scaling, and so are widely used across all kinds of large-scale systems. Just as with load balancing, its interesting to be able to understand how well requests are distributed.

Traditionally, papers about this problem have been most concerned about the expectation of the maximum number of balls in a bin ("how bad can it get?"), but other statistics like the expectation of the mean and expectation of the median can be interesting when planning and designing for load. It's also interesting to understand the variance of the maximum, and the size of the right tail on the distribution of the maximum. If the maximum can get really high, but will do so infrequently, then load testing can be difficult.

Gaston Gonnet's Expected Length of the Longest Probe Sequence in Hash Code Searching, was one of the first papers to tackle analyzing the problem, in context of separate-chaining hash tables^{2}. Michael Mitzenmacher's PhD thesis (the power of two choices in randomized load balancing) simplifies Gonnet's analysis and finds, for the đ‘€=đ‘ case, the maximum number of balls is^{1} đ(log đ‘/log log đ‘). That's not a curve you'll come across often, so this is what it looks like:

In other words, it grows, but grows pretty slowly. Most real-world cases are probably going to have đ‘€>đ‘, and many will have đ‘€â‰Ťđ‘. To understand those cases, we can turn to one of my favorite papers in this area, Balls Into Bins: A Simple and Tight Analysis by Raab and Steger. They provide a great overview of the problem, a useful survey of the literature, and a table of bounds on the maximum number of balls in any bin, depending on the relationship between đ‘€ and đ‘. The proofs are interesting, and somewhat enlightening, but not necessary to understand to find the paper useful.

While this kind of analysis is very useful, when I've needed to solve these problems in the past, I haven't tended to use the results directly. Instead, I've used them to sanity check the output of simulations. This is where *engineering practice* diverges a bit from *computer science theory* (although it does it in a pretty theoretically rigorous way).

There are a couple of limitations on the usefulness of the closed-form analysis of this problem. One problem is that it's fairly difficult to understand clearly (at least for me), and quite complex to communicate. The bigger problem, though, is that it's quite inflexible: extending the analysis to include cases like balls of different sizes (as requests are in the real world) and balls coming out of bins (requests completing) is difficult, and difficult to code review unless you are lucky enough to work with a team that's very mathematically sophisticated. The good news, though, is that this problem is exceptionally easy to simulate.

When I think about doing these kinds of simulations, I don't generally think about using specialized simulation tools or frameworks (although you could certainly do that). Instead, I generally think about just writing a few tens of lines of Python or R which directly try the thing that I want an answer for many times, and then output data in a form that's easy to plot. Computer simulation is a broad and subtle topic, but this kind of thing (throw balls into bins, count, repeat) avoids many of the subtleties because you can avoid floating point (its just counting) and because you can avoid being too concerned about the exact values.

Knowing the closed-form analysis makes it easy to sanity-check the simulation. According to Gonnet, the đ‘€=đ‘ case should approach logđ‘/loglogđ‘ (1+đ‘œ(1)), and we can plot that curve (choosing a value of đ‘œ(1) to minimize the difference) alongside the simulation results to see if the simulation matches the theory. The results look pretty good.

Gonnet's paper also contains a table of example values, which compare very well to our simulated and modelled numbers. That all increases our confidence that the simulation is telling us sensible things.

You can also extend this basic counting to be closer to real-world load-balancing. Follow a Poisson process (a fancy way of saying "use exponentially distributed random numbers to decide how long to wait") to add random balls into bins over time, and follow your completion time distribution (probably exponential too) to pull them out of the bins. Every so often, sample the size of the biggest bin. Next, output those samples for analysis. If you have real arrival-time data, and completion-time distributions, you can use those to avoid making *any* statistical assumptions. Which is nice.

When you've got the basic simulation, it's easy to add in things like different-sized requests, or bursts of traffic, or the effects of scaling up and down the backends.

For small M and N, the constant factors are a big problem. With đ‘€=đ‘=100, I get an expected maximum of around 4.2. In other words, we can expect the busiest backend to be over 4x busier than the average. That means that you either need to significantly over-scale all your backends, or put up with the problems that come with hotspotting. This problem also gets worse (although very very slowly, going back to the closed-form) with scale.

In the closer-to-reality case with đ‘€=1000 and đ‘=100, the gap shrinks. The expected maximum comes to as 18.8, compared to a mean (aka đ‘€/đ‘) of 10. That still means that the hottest backend gets 80% more traffic, but the gap is closing. By đ‘€=10000 and đ‘=100, the gap has closed to 25%, which starts to be close to the realm of acceptable. Up to đ‘€=100,000 and the gap is closed to 8%. In most distributed systems contexts, 8% is probably within the variation in performance due to other factors.

Still, the conclusion of all of this is that random load balancing (and random shuffle-sharding, and random consistent hashing) distributed load rather poorly when đ‘€ is small. Load-sensitive load-balancing, either stateful or stateless with an algorithm like best-of-two is still very much interesting and relevant. The world would be simpler and more convenient if that wasn't the case, but it is.

- That's a Big Theta, if you're not familiar with it wikipedia has a good explanation of what it means. If you don't feel like reading that, and replace it with a big O in your head, that's close enough in this case.
- That paper also contains some great analysis on the numerical properties of different hash-table probing strategies versus seperate chaining. If you like algorithm analysis, the conclusion section is particularly interesting.

-Paul Dirac, Quantum Mechanics of Many-Electron Systems (1929)

In this passage, physicist Paul Dirac laments that while quantum mechanics accurately models all of chemistry, exactly simulating the associated equations appears intractably complicated. Not until 1982 would Richard Feynman suggest that instead of surrendering to the complexity of quantum mechanics, we might harness it as a computational resource. Hence, the original motivation for quantum computing: by operating a computer according to the laws of quantum mechanics, one could efficiently unravel exact simulations of nature. Such simulations could lead to breakthroughs in areas such as photovoltaics, batteries, new materials, pharmaceuticals and superconductivity. And while we do not yet have a quantum computer large enough to solve classically intractable problems in these areas, rapid progress is being made. Last year, Google published this paper detailing the first quantum computation of a molecule using a superconducting qubit quantum computer. Building on that work, the quantum computing group at IBM scaled the experiment to larger molecules, which made the cover of Nature last month.

Today, we announce the release of OpenFermion, the first open source platform for translating problems in chemistry and materials science into quantum circuits that can be executed on existing platforms. OpenFermion is a library for simulating the systems of interacting electrons (fermions) which give rise to the properties of matter. Prior to OpenFermion, quantum algorithm developers would need to learn a significant amount of chemistry and write a large amount of code hacking apart other codes to put together even the most basic quantum simulations. While the project began at Google, collaborators at ETH Zurich, Lawrence Berkeley National Labs, University of Michigan, Harvard University, Oxford University, Dartmouth College, Rigetti Computing and NASA all contributed to alpha releases. You can learn more details about this release in our paper, OpenFermion: The Electronic Structure Package for Quantum Computers.

One way to think of OpenFermion is as a tool for generating and compiling physics equations which describe chemical and material systems into representations which can be interpreted by a quantum computer

The core OpenFermion library is designed in a quantum programming framework agnostic way to ensure compatibility with various platforms being developed by the community. This allows OpenFermion to support external packages which compile quantum assembly language specifications for diverse hardware platforms. We hope this decision will help establish OpenFermion as a community standard for putting quantum chemistry on quantum computers. To see how OpenFermion is used with diverse quantum programming frameworks, take a look at OpenFermion-ProjectQ and Forest-OpenFermion - plugins which link OpenFermion to the externally developed circuit simulation and compilation platforms known as ProjectQ and Forest.

The following workflow describes how a quantum chemist might use OpenFermion in order to simulate the energy surface of a molecule (for instance, by preparing the sort of quantum computation we described in our past blog post):

- The researcher initializes an OpenFermion calculation with specification of:

- An input file specifying the coordinates of the nuclei in the molecule.
- The basis set (e.g. cc-pVTZ) that should be used to discretize the molecule.
- The charge and spin multiplicity (if known) of the system.

- The researcher uses the OpenFermion-Psi4 plugin or the OpenFermion-PySCF plugin to perform scalable classical computations which are used to optimally stage the quantum computation. For instance, one might perform a classical Hartree-Fock calculation to choose a good initial state for the quantum simulation.
- The researcher then specifies which electrons are most interesting to study on a quantum computer (known as an active space) and asks OpenFermion to map the equations for those electrons to a representation suitable for quantum bits, using one of the available procedures in OpenFermion, e.g. the Bravyi-Kitaev transformation.
- The researcher selects a quantum algorithm to solve for the properties of interest and uses a quantum compilation framework such as OpenFermion-ProjectQ to output the quantum circuit in assembly language which can be run on a quantum computer. If the researcher has access to a quantum computer, they then execute the experiment.

**1 **If we may be allowed one sentence for the experts: the primary function of OpenFermion is to encode the electronic structure problem in second quantization defined by various basis sets and active spaces and then to transform those operators into spin Hamiltonians using various isomorphisms between qubit and fermion algebras.^{↩}

Next Page of Stories