It was seen that the best policy converged in 18 iterations with an average reward of -19 points. 2. Stage1 [Q-learning]: Experiment results The RL agent whose only motive was to maximize its reward exhibited different behavior when the rewarding process and learning hyperparameters were changed. Though a variety of experiments were conducted to arrive at an appropriate training mechanism, only a summary of agent’s performance on those experiments are listed below. It may be noted that seeding was used in the program to generate consistent results. i. Experiments with learning rate Learning rate and Speed of convergence: Inverse relationship was found between learning rate and speed of convergence. Higher the learning rate, quicker was the …show more content…
Figure 43: Q-learning experiments with different policies: (a) Varied penalty for convergence duration (b) Varied penalty for revisiting same cell PAGE 61 iv. Experiments with discount rate Convergence was quicker with higher discount rates, thanks to the higher valuation of long term returns available to the agent. When discount rate ɤ was lesser, the agent didn’t have much visibility of future benefits. It was myopic and had to take decisions based on immediate reward. Figure 44: Q-learning experiments with different discount rates 3. Stage2 [Q-network]: Q-function approximation Function approximation with neural network was successfully implemented for path finding problem. The below table summarizes the observations upon comparing it to Qlearning. Figure 45: Comparison of experiment results between Q-network and Q-learning Stage2 experiments were almost similar to Stage1 experiments as all tests were done by varying the hyperparameters. The results did not differ much but the noticeable difference was the reduction of loss and quicker convergence (highlighted below) as the model was retrained over different epochs. Lastly, both models were successful in identifying the optimal policy even though Q-network took a little longer. Figure 46: Loss reduction with Q-function training over various epochs PAGE 62 Elaboration of all experiments is not done as a flavor of these experiments is already conveyed as part of Stage 1 experiments and the only
Controls- The control in this experiment was very important because if it was not contained, then the data would have been faulty. It was very difficult to keep
(or Objective) This part of the experiment expressed clearly in only one or two sentences,
He had strong general convictions, he set his administrations overall priorities early on. He obviously placed his defensive buildup and his economic programs a head of everything else. He was very tactically flexible; he often showed no regret when he had to adjust to political opposition or to changed circumstances. He was also a great negotiator, setting his demands higher than the minimum, and accepting what he could get, making his decisions easily and promptly.
The experiment will be
Do the test smell and food, then no smell and food, then only food. Continue with these steps until experiment is finished. The results for cinnamon scent and food 11/12 correct, for cinnamon with no scent and food 12/12 correct, for cinnamon with food 12/12 correct. The results for mint scent and food 8/12 correct, for mint no scent and food 12/12 correct, for mint with food 12/12 correct. The results for vanilla scent and food 12/12 correct, for vanilla no scent and food 12/12 correct, for vanilla with food 12/12 correct. The predicted results were that with scent and food that the volunteers wouldn’t be able to still taste. What could be done differently if this project was performed again is having one volunteer at a time doing the test. Another thing is give the volunteer a glass of water to clean the mouth from any taste when tasting something new. Experimental error were not taking the volunteers smell away completely. The hypothesis, if the smell was cinnamon then the taste of ketchup will decrease. The hypothesis was not supported. When comparing the foods with smell against the foods without smell the cinnamon had a 9% difference, mint had a 40%
The eventual cost for such lack of foresight quickly became
permanent development that was necessary in a modern economy. Roosevelt couldn’t rely on the courts to
At the beginning of the study, the subjects were not well informed about the whole purpose of the research; neither were they informed of the inherent dangers of the study. The experiment was
What were the key findings from the experiment?
Therefore, they discussed how in the future, there could be some adjustments for the experiment. Although the data did support the hypothesis, the experiment may have worked better if the experimenters had enough background knowledge on the regular salt concentration, usually in the potato itself. Not only this, but if they had multiple types of potatoes, there would be a lot of data, but this could also help the it be narrowed down and specified, especially since there are so many examples involved. As mentioned, one thing that helped the experiment were the factors that were controlled, such as the beaker with 0% salinity. It helped the experimenters compare the other tests to some sort of a
Delineated what problems they were trying to solve, what issues they faced, what resources and investments were needed
What did you think about the “Empty Cup?" As you will see, each chapter in this book will start with a koan. After reading each koan take a moment to reflect on it before continuing to the preceding narrative. Whatever interpretation I provide is not definitive, and should only serve as a guideline. Zen practitioners believe the true meaning of a koan is subjective and will evolve after each reading so it is normal to discover a different meaning or insight each time.
After using the Cornell Note Taking Method, I found it to not be as beneficial as I thought. It is very confusing for me to use than a standard outline. I think this would be very beneficial in Ap classes for getting chapter topics organized and remembering questions to ask the teacher. Also it would be very useful in factual based classes such as biology, economics, and government, using the Cornell Method it would be easier to find notes. I would mostly use the summary portion of the Cornell Method because it would very useful to use when reading large sections of text. This method will not affect my future, because I like to use an outline for notes than the Cornell Method.
When conducting the experiment the results for each alcohol were where they were anticipated to be supporting the
Instead of diluting the resources and efforts on multiple goals and supporting them in a sickly manner they focused on