distributed collective mind
457 stories
·
4 followers

Monitoring that really monitors

1 Comment
Today, we’re welcoming a new sponsor, TruePath Technologies! TruePath Technologies is an industry-leading provider of enterprise monitoring software, hardware and services. They specialize in configuring and maintaining your monitoring software so you can avoid costly network downtime and maintenance. TruePath is a premier technical support provider and reseller for partners like SolarWinds, Check_MK, OP5, HWgroup, ntop and more! Visit truepathtechnologies.com to learn more about how they can help optimize your IT network and be sure you never miss downtime again 😉


Read the whole story
sness
9 days ago
reply
truth
milky way
Share this story
Delete

The Kanban dance

1 Comment and 2 Shares

Read the whole story
sness
9 days ago
reply
sherpa-injector metrics
milky way
Share this story
Delete

Learning to Generalize from Sparse and Underspecified Rewards

1 Comment and 2 Shares


Reinforcement learning (RL) presents a unified and flexible framework for optimizing goal-oriented behavior, and has enabled remarkable success in addressing challenging tasks such as playing video games, continuous control, and robotic learning. The success of RL algorithms in these application domains often hinges on the availability of high-quality and dense reward feedback. However, broadening the applicability of RL algorithms to environments with sparse and underspecified rewards is an ongoing challenge, requiring a learning agent to generalize (i.e., learn the right behavior) from limited feedback. A natural way to investigate the performance of RL algorithms in such problem settings is via language understanding tasks, where an agent is provided with a natural language input and needs to generate a complex response to achieve a goal specified in the input, while only receiving binary success-failure feedback.

For instance, consider a "blind" agent tasked with reaching a goal position in a maze by following a sequence of natural language commands (e.g., "Right, Up, Up, Right"). Given the input text, the agent (green circle) needs to interpret the commands and take actions based on such interpretation to generate an action sequence (a). The agent receives a reward of 1 if it reaches the goal (red star) and 0 otherwise. Because the agent doesn't have access to any visual information, the only way for the agent to solve this task and generalize to novel instructions is by correctly interpreting the instructions.
In this instruction-following task, the action trajectories a1, a2 and a3 reach the goal, but the sequences a2 and a3 do not follow the instructions. This illustrates the issue of underspecified rewards.
In these tasks, the RL agent needs to learn to generalize from sparse (only a few trajectories lead to a non-zero reward) and underspecified (no distinction between purposeful and accidental success) rewards. Importantly, because of underspecified rewards, the agent may receive positive feedback for exploiting spurious patterns in the environment. This can lead to reward hacking, causing unintended and harmful behavior when deployed in real-world systems.

In "Learning to Generalize from Sparse and Underspecified Rewards", we address the issue of underspecified rewards by developing Meta Reward Learning (MeRL), which provides more refined feedback to the agent by optimizing an auxiliary reward function. MeRL is combined with a memory buffer of successful trajectories collected using a novel exploration strategy to learn from sparse rewards. The effectiveness of our approach is demonstrated on semantic parsing, where the goal is to learn a mapping from natural language to logical forms (e.g., mapping questions to SQL programs). In the paper, we investigate the weakly-supervised problem setting, where the goal is to automatically discover logical programs from question-answer pairs, without any form of program supervision. For instance, given the question "Which nation won the most silver medals?" and a relevant Wikipedia table, an agent needs to generate an SQL-like program that results in the correct answer (i.e., "Nigeria").
The proposed approach achieves state-of-the-art results on the WikiTableQuestions and WikiSQL benchmarks, improving upon prior work by 1.2% and 2.4% respectively. MeRL automatically learns the auxiliary reward function without using any expert demonstrations, (e.g., ground-truth programs) making it more widely applicable and distinct from previous reward learning approaches. The diagram below depicts a high level overview of our approach:
Overview of the proposed approach. We employ (1) mode covering exploration to collect a diverse set of successful trajectories in a memory buffer; (2) Meta-learning or Bayesian optimization to learn an auxiliary reward that provides more refined feedback for policy optimization.
Meta Reward Learning (MeRL)
The key insight of MeRL in dealing with underspecified rewards is that spurious trajectories and programs that achieve accidental success are detrimental to the agent's generalization performance. For example, an agent might be able to solve a specific instance of the maze problem above. However, if it learns to perform spurious actions during training, it is likely to fail when provided with unseen instructions. To mitigate this issue, MeRL optimizes a more refined auxiliary reward function, which can differentiate between accidental and purposeful success based on features of action trajectories. The auxiliary reward is optimized by maximizing the trained agent's performance on a hold-out validation set via meta learning.
Schematic illustration of MeRL: The RL agent is trained via the reward signal obtained from the auxiliary reward model while the auxiliary rewards are trained using the generalization error of the agent.
Learning from Sparse Rewards
To learn from sparse rewards, effective exploration is critical to find a set of successful trajectories. Our paper addresses this challenge by utilizing the two directions of Kullback–Leibler (KL) divergence, a measure on how different two probability distributions are. In the example below, we use KL divergence to minimize the difference between a fixed bimodal (shaded purple) and a learned gaussian (shaded green) distribution, which can represent the distribution of the agent's optimal policy and our learned policy respectively. One direction of the KL objective learns a distribution which tries to cover both the modes while the distribution learned by other objective seeks a particular mode (i.e. it prefers one mode over another). Our method exploits the mode covering KL's tendency to focus on multiple peaks to collect a diverse set of successful trajectories and mode seeking KL's implicit preference between trajectories to learn a robust policy.
Left: Optimizing mode covering KL. Right: Optimizing mode seeking KL

Conclusion
Designing reward functions that distinguish between optimal and suboptimal behavior is critical for applying RL to real-world applications. This research takes a small step in the direction of modelling reward functions without any human supervision. In future work, we'd like to tackle the credit-assignment problem in RL from the perspective of automatically learning a dense reward function.

Acknowledgements
This research was done in collaboration with Chen Liang and Dale Schuurmans. We thank Chelsea Finn and Kelvin Guu for their review of the paper.
Read the whole story
sness
21 days ago
reply
intermittent rewards
milky way
Share this story
Delete

Fixnzip

2 Comments

This product allows you to replace a broken zipper slider without having to buy and install an entirely new zipper. The slider on my favorite alpaca sweater disintegrated in my hand. I went to a tailor and she wanted $40 to replace the zipper and in addition, I had to supply the new zipper. The existing zipper was fine but no slider. The Fixnzip costs under $13 with shipping and did the trick.

It is a clever slider with a clamp built in so you can install it by slipping it onto one side of the zipper and then the other and turning a thumbscrew to tighten the clamp. It worked perfectly. Detailed instructions are on the Fixnzip website.

-- Jack Lieberman

Fixnzip ($9+)

Available from Amazon

Read the whole story
sness
23 days ago
reply
omg it was the slider the whole time
milky way
Share this story
Delete
1 public comment
mburch42
23 days ago
reply
Sharing so I can find this again if I ever need it.

Transformer-XL: Unleashing the Potential of Attention Models

1 Share


To correctly understand an article, sometimes one will need to refer to a word or a sentence that occurs a few thousand words back. This is an example of long-range dependence — a common phenomenon found in sequential data — that must be understood in order to handle many real-world tasks. While people do this naturally, modeling long-term dependency with neural networks remains a challenge. Gating-based RNNs and the gradient clipping technique improve the ability of modeling long-term dependency, but are still not sufficient to fully address this issue.

One way to approach this challenge is to use Transformers, which allows direct connections between data units, offering the promise of better capturing long-term dependency. However, in language modeling, Transformers are currently implemented with a fixed-length context, i.e. a long text sequence is truncated into fixed-length segments of a few hundred characters, and each segment is processed separately.
Vanilla Transformer with a fixed-length context at training time.
This introduces two critical limitations:
  1. The algorithm is not able to model dependencies that are longer than a fixed length.
  2. The segments usually do not respect the sentence boundaries, resulting in context fragmentation which leads to inefficient optimization. This is particularly troublesome even for short sequences, where long range dependency isn't an issue.
To address these limitations, we propose Transformer-XL a novel architecture that enables natural language understanding beyond a fixed-length context. Transformer-XL consists of two techniques: a segment-level recurrence mechanism and a relative positional encoding scheme.

Segment-level Recurrence
During training, the representations computed for the previous segment are fixed and cached to be reused as an extended context when the model processes the next new segment. This additional connection increases the largest possible dependency length by N times, where N is the depth of the network, because contextual information is now able to flow across segment boundaries. Moreover, this recurrence mechanism also resolves the context fragmentation issue, providing necessary context for tokens in the front of a new segment.
Transformer-XL with segment-level recurrence at training time.
Relative Positional Encodings
Naively applying segment-level recurrence does not work, however, because the positional encodings are not coherent when we reuse the previous segments. For example, consider an old segment with contextual positions [0, 1, 2, 3]. When a new segment is processed, we have positions [0, 1, 2, 3, 0, 1, 2, 3] for the two segments combined, where the semantics of each position id is incoherent through out the sequence. To this end, we propose a novel relative positional encoding scheme to make the recurrence mechanism possible. Moreover, different from other relative positional encoding schemes, our formulation uses fixed embeddings with learnable transformations instead of learnable embeddings, and thus is more generalizable to longer sequences at test time. When both of these approaches are combined, Transformer-XL has a much longer effective context than a vanilla Transformer model at evaluation time.
Vanilla Transformer with a fixed-length context at evaluation time.

Transformer-XL with segment-level recurrence at evaluation time./td>
Furthermore, Transformer-XL is able to process the elements in a new segment all together without recomputation, leading to a significant speed increase (discussed below).

Results
Transformer-XL obtains new state-of-the-art (SoTA) results on a variety of major language modeling (LM) benchmarks, including character-level and word-level tasks on both long and short sequences. Empirically, Transformer-XL enjoys three benefits:
  1. Transformer-XL learns dependency that is about 80% longer than RNNs and 450% longer than vanilla Transformers, which generally have better performance than RNNs, but are not the best for long-range dependency modeling due to fixed-length contexts (please see our paper for details).
  2. Transformer-XL is up to 1,800+ times faster than a vanilla Transformer during evaluation on language modeling tasks, because no re-computation is needed (see figures above).
  3. Transformer-XL has better performance in perplexity (more accurate at predicting a sample) on long sequences because of long-term dependency modeling, and also on short sequences by resolving the context fragmentation problem.
Transformer-XL improves the SoTA bpc/perplexity from 1.06 to 0.99 on enwiki8, from 1.13 to 1.08 on text8, from 20.5 to 18.3 on WikiText-103, from 23.7 to 21.8 on One Billion Word, and from 55.3 to 54.5 on Penn Treebank (without fine tuning). We are the first to break through the 1.0 barrier on char-level LM benchmarks.

We envision many exciting potential applications of Transformer-XL, including but not limited to improving language model pretraining methods such as BERT, generating realistic, long articles, and applications in the image and speech domains, which are also important areas in the world of long-term dependency. For more detail, please see our paper.

The code, pretrained models, and hyperparameters used in our paper are also available in both Tensorflow and PyTorch on GitHub.
Read the whole story
sness
45 days ago
reply
milky way
Share this story
Delete

It’s Time for Some Queueing Theory

1 Comment and 3 Shares

Queueing theory is the scientific study of waiting in line. It can apply to familiar lines like those at the grocery store or bank but also to things like web servers, highway traffic, and telecommunications…basically any situation where you have things entering a system, being processed by a system for a certain period of time, and leaving the system.

The study of queueing is necessary because the effects of waiting in line often run counter to our intuition (which causes people to get cranky about it). Take this example from John Cook of tellers serving customers at a bank:

Suppose a small bank has only one teller. Customers take an average of 10 minutes to serve and they arrive at the rate of 5.8 per hour. What will the expected waiting time be? What happens if you add another teller?

We assume customer arrivals and customer service times are random (details later). With only one teller, customers will have to wait nearly five hours on average before they are served.

Five hours?! I would not have guessed anywhere close to that, would you? Now, add a second teller into the mix. How long is the average wait now? 2.5 hours? 1 hour? According to Cook, much lower than that:

But if you add a second teller, the average waiting time is not just cut in half; it goes down to about 3 minutes. The waiting time is reduced by a factor of 93x

Our lack of intuition about queues has to do with how much the word “average” is hiding…the true story is much more complex.

Aside from the math, designers of queueing systems also have to take human psychology into account.

There are three givens of human nature that queuing psychologists must address: 1) We get bored when we wait in line. 2) We really hate it when we expect a short wait and then get a long one. 3) We really, really hate it when someone shows up after us but gets served before us.

The boredom issue has been tackled in myriad ways — from the mirrors next to elevator banks to the TVs in dentist’s waiting rooms. Larson mentions a clever solution from the Manhattan Savings Bank, which once hired a concert pianist to play in its lobby as customers waited for tellers. “But Disney has been the absolute master of this aspect of queue psychology,” says Larson. “You might wait 45 minutes for an 8-minute ride at Disney World. But they’ll make you feel like the ride has started while you’re still on line. They build excitement and provide all kinds of diversions in the queue channel.” Video screens tease the thrills ahead, and a series of varied chambers that the queue moves through creates a sense of progress. Another solution: those buzzing pagers that restaurants in malls sometimes give you while you’re waiting for a table. Instead of focusing on the misery of the wait, you can go off and entertain yourself-secure in the knowledge that you’ll be alerted when it’s your turn.

Whole Foods had to work around our expectations when it switched to “serpentine” lines that seemed longer but actually served customers more quickly.

By 7 p.m. on a weeknight, the lines at each of the four Whole Foods stores in Manhattan can be 50 deep, but they zip along faster than most lines with 10 shoppers.

Because people stand in the same line, waiting for a register to become available, there are no “slow” lines, delayed by a coupon-counting customer or languid cashier. And since Whole Foods charges premium prices for its organic fare, it can afford to staff dozens of registers, making the line move even faster.

“No way,” is how Maggie Fitzgerald recalled her first reaction to the line at the Whole Foods in Columbus Circle. For weeks, Ms. Fitzgerald, 26, would not shop there alone, assigning a friend to fill a grocery cart while she stood in line.

When she discovered the wait was about 4 minutes, rather than 20, she began shopping by herself, and found it faster than her old supermarket.

See also How to Pick the Fastest Line at the Supermarket, Queue Theory and Design from 99% Invisible, and this paper from Bob Wescott, Seven Insights Into Queueing Theory. One of his insights:

It’s very hard to use the last 15% of anything. As the service center gets close to 100% utilization the response time will get so bad for the average transaction that nobody will be having any fun. The graph below is exactly the same situation as the previous graph except this graph is plotted to 99% utilization. At 85% utilization the response time is about 7x and it just gets worse from there.

For grocery stores or call centers, that means you’re going to have operators or cashiers sitting there “doing nothing” sometimes because if you don’t, you’re gonna be in trouble when a rush hits.

Update: John Frost shares an anecdote about his grandfather’s team designed the queueing system for the Matterhorn Bobsleds at Disneyland:

Another fun family story is the invention of the Matterhorn’s first of its kind switchback queue. Vic Greene and his team of Imagineers developed a system that would have the entrance to the switchback part of the queue be lower than the exit. When you stood at the entrance, the exit would appear closer to you in an optical illusion. The idea was to make your wait seem less cumbersome by visually shortening the queue.

Tags: Disney   mathematics
Read the whole story
sness
46 days ago
reply
this was a great course in grad school, the math was rough though
milky way
Share this story
Delete
Next Page of Stories