Jump to content

Talk:Neural network (machine learning)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

History section: request to approve edits of 15-16 September 2024

[edit]

As discussed with User:North8000, on 15-16 September 2024, I edited Neural network (machine learning). He reverted and wrote, "you are doing massive reassignment of credit for Neural Networks based on your interpretation of their work and primary sources and deleting secondary sourced assignments. Please slow down and take such major reassignments to talk first." So here. Please note that most of my edits are not novel! They resurrect important old references deleted on 7 August 2024 in a major edit when User:Cosmia Nebula (whose efforts I appreciate) tried to compress the text‎. This massive edit has remained unchallenged until now. I also fixed links in some of the old references, added a few new ones (both primary and secondary sources), corrected many little errors, and tried to streamline some of the explanations. IMO these edits restored important parts and further improved the history section of the article, although a lot remains to be done. Now I kindly ask User:North8000 and User:Cosmia Nebula who seem to know a lot about the subject: please review the details once more and revert the revert! Best regards, Speedboys (talk) 21:24, 17 September 2024 (UTC)[reply]

Recapping my response from our conversation at my talk page: Thanks for your work and your post. The series of rapid fire edits ended up being entangled where they an't be reviewed/ potentially reverted separately. In that bundle were several which IMO pretty creatively shifted/assigned credit for being the one to pioneer various aspect. So I'm not open to reinstating that whole linked bundle including those. Why not just slow down and put those things back in at a pace where they can be reviewed? And the the ones that are are a reach (transferring or assigning credit for invention) take to talk first. You are most familiar with the details of your edits and are in the best position to know those. Sincerely, North8000 (talk) 00:35, 18 September 2024 (UTC)[reply]

Hello @Speedboys
My main concern with the page were 1. It had too many details that probably should go into History of artificial neural networks. 2. It relies much on Schmidhuber's history, especially "Annotated History of Machine Learning", and Schmidhuber is an unreliable propagandist who bitterly contests priority with everyone else. He aims to show that modern deep learning is mostly originated by his team, or others like Lapa and Fukushima etc, specifically *not* LeCun, Bengio, etc. You can press ctrl+f and type "did not" and find phrases like "This work did not cite the earlier LSTM" "which the authors did not cite" "extremely unfair that Schmidhuber did not get the Turing award"...
It is even more revealing if you ctrl+f on "Hinton". More than half of the citations to Hinton are followed by "Very similar to [FWP0-2]", "although this type of deep learning dates back to Schmidhuber's work of 1991", "does not mention the pioneering works", "The authors did not cite Schmidhuber's original"... You can try the same exercise by ctrl+f on "LeCun" and "Bengio". It is very funny.
His campaign reached levels of absurdity when he claimed that Amari (1972)'s RNN is "based on the (uncited) Lenz-Ising recurrent architecture". If you can call the Ising model as "The first non-learning recurrent NN architecture", then I can call the heat death of the universe "The first non-evolving model of Darwinian evolution". The entire point of RNN is that it is dynamic, and the entire point of the Ising model is that it is about thermal equilibrium at a point where all dynamics has *stopped*.
As one example, the phrase "one of the most important documents in the history of machine learning" used to appear several times all across Wikipedia, and is an obvious violation of WP:NPOV, and it came straight from his "Annotated History of Machine Learning". I removed all examples of this phrase in Wikipedia except in his own page (he is entitled to his own opinions). In fact, the entire paper is scattered with such propagandistic sentences:
> [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The recent decade's most important developments and industrial applications based on the AI of Schmidhuber's team, with an outlook on the 2020s, also addressing privacy and data markets.
As a general principle, if I can avoid quoting Schmidhuber, I must, because Schmidhuber is extremely non-NPOV. I had removed almost all citations to his Annotated History except those that genuinely cannot be found anywhere else. For example, I kept all citations to that paper about Amari and Saito, because 1. H. Saito is so extremely obscure that if we don't cite Schmidhuber on this, we have no citation for this. 2. I can at least trust that he didn't make up the "personal communication" with Amari.
> [GD2a] H. Saito (1967). Master's thesis, Graduate School of Engineering, Kyushu University, Japan. Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) pony in a strange land (talk) 01:36, 18 September 2024 (UTC)[reply]
Dear User:Cosmia Nebula alias "pony in a strange land," thanks for your reply! I see where you are coming from. The best reference to the mentioned priority disputes between Jürgen Schmidhuber, Geoffrey Hinton, Yoshua Bengio, and Yann LeCun (JS,GH,YB,YL) is the very explicit 2023 report[1] which to my knowledge has not been challenged. The most comprehensive surveys of the field are those published by JS in 2015[2] and 2022,[3] with over 1000 references in total; wouldn't you agree? They really credit the deep learning pioneers, unlike the surveys of GH/YB/YL.[4][5] I'd say that JS has become a bit like the chief historian of the field, with the handicap that he is part of it (as you wrote: non-NPOV?). Anyway, without his surveys, many practitioners would not even know the following facts: Alexey Ivakhnenko had a working deep learning algorithm in 1965. Shun'ichi Amari had Deep Learning by Stochastic Gradient Descent in 1967. Kunihiko Fukushima had ReLUs in 1969, and the CNN architecture in 1979. Shun'ichi Amari had Hopfield networks 10 years before Hopfield, plus a sequence-learning generalization (the "dynamic RNN" as opposed to the "equilibrium RNN" you mentioned), all using the must-cite Ising architecture (1925). Alan Turing had early unpublished work (1948) with "ideas related to artificial evolution and learning RNNs."[3] Seppo Linnainmaa had backpropagation (reverse mode of auto-diff) in 1970. G.M. Ostrovski republished this in 1971. Henry J. Kelley already had a precursor in 1960. Tow centuries ago, Gauss and Legendre had the method of least squares which is exactly what's now called a linear neural network (only the name has changed). If JS is non-NPOV (as you write), then how non-NPOV are GH/YB/YL who do not cite any of this? You blasted JS' quote, "one of the most important documents in the history of machine learning," which actually refers to the 1991 diploma thesis of his student Sepp Hochreiter, who introduced residual connections or "constant error flow," the "roots of LSTM / Highway Nets / ResNets."[3] Anyway, thanks for toning that down. You deleted important references to JS' 1991 work on self-supervised pre-training, neural network distillation, GANs, and unnormalized linear Transformers; I tried to undo this on 16 Sept 2024. Regardless of the plagiarism disputes, one cannot deny that this work predates GH/YB and colleagues by a long way. In the interest of historical accuracy, I still propose to revert the revert of my 10 edits, and continue from there. In the future, we could strive to explicitly mention details of the priority disputes between these important people, trying to represent all sides in an NPOV way. I bet you could contribute a lot here. What do you think? Speedboys (talk) 14:21, 18 September 2024 (UTC)[reply]
The most important issue is that any citation to Schmidhuber's blog posts, essays, and "Annotated History" invariably taints a Wikipedia page with non-NPOV. Before all those details, this is the main problem with citing Schmidhuber. Citing earlier works is fine, but it is *NOT* fine to cite Schmidhuber's interpretation of these earlier works.
"Annotated History of Modern AI and Deep Learning" was cited about 63 times, while "Deep learning in neural networks: An overview" was cited over 22k times. It is clear why if you compare the two. The "Deep learning in neural networks" is a mostly neutral work (if uncommonly citation-heavy), while the "Annotated History" is extremely polemical (even beginning the essay with a giant collage of people's faces and their achievements, recalling to mind the book covers from those 17th century Pamphlet wars). It is very strange that you would combine them in one sentence and say "with over 1000 references in total" as if they have nearly the same order of magnitude in citation.
As for the "very explicit 2023 report", it is... not a report. It is the most non-NPOV thing I have seen (beginning the entire report with a damned caricature comic?) and I do not want to read it. He is not the chief historian. He is the chief propagandist. If you want better history of deep learning I would rather recommend something else, such as:
  • The quest for artificial intelligence: a history of ideas and achievements, by Nilsson, Nils J.
  • Mikel Olazaran, A Historical Sociology of Neural Network Research (PhD dissertation, Department of Sociology, University of Edinburgh, 1991); Olazaran, `A Sociological History of the Neural Network Controversy', Advances in Computers, Vol. 37 (1993), 335-425.
  • Anderson, James A., and Edward Rosenfeld, eds. Talking nets: An oral history of neural networks. MiT Press, 2000.
Calling something "unnormalized linear Transformers" is a great rhetorical trick, and I can call feedforward networks "attentionless Transformers". I am serious. People are trying to figure out if attention really is necessary (for example, "Sparse MLP for image recognition: Is self-attention really necessary?" or MLP-mixers). Does that mean feedforward networks are "attentionless Transformers"? Or can I just put Rosenblatt into the Transformer page's history section?
Ising architecture (1925) is NOT a must-cite. It is not even a neural network architecture (though you can really retroactively call an "architecture", but historians call it presentism). Physicists don't cite Newton when they write new papers. They don't even cite Schrödinger. Mathematicians don't cite Gauss-Legendre for least squares. They have a vague feeling that they did something about least squares, and that's enough. It is no serious problem. Historians will do all that detailed credit assignment later.
Ising architecture is NOT a must-cite even in the 1970s, because, as you might notice in the RNN page, there were several ways to arrive at RNN. One route goes through neuroanatomy. The very first McCulloch and Pitts 1943 paper already had RNN, Hebbian learning, and universality. They had no idea of Ising, nor did they need to, because they got the idea from neuroscientists like Lorente de No. Hopfield cited Amari, btw.
Schmidhuber is not reliable by the way. I just checked his "Deep learning in neural networks" and immediately saw an error: "Early NN architectures (McCulloch and Pitts, 1943) did not learn." In fact, it stated right here in the paper:
> We suppose that some axonal terminations cannot at first excite the succeeding neuron; but if at any time the neuron fires, and the axonal terminations are simultaneously excited, they become synapses of the ordinary kind, henceforth capable of exciting the neuron. That is Hebbian learning (6 years before Hebb's 1949 book, but... Hebbian learning was an immediately obvious idea once you have associationism with the neuron doctrine).
You can find it if you ctrl+f "learn" in the paper. A little later they showed that Hebbian learning in a feedforward network is equivalent to an RNN by unrolling that RNN in time. ("THEOREM VII. Alterable synapses can be replaced by circles.", and Figure 1.i. The dashed line is the learnable synapse)
But I am tired of battling over the historical minutae. Misunderstanding history doesn't hurt the practitioners, because ideas are cheap, and are rediscovered all the time (see: Schmidhuber's long list of grievances), so not citing earlier works is not an issue. This is tiring, and I'm signing out of the debate. A word of advice: If you must use Schmidhuber's history, go directly to the source. Do not use his interpretation. @Speedboys, you seem passionate about history. It would be good to try to actually read the primary sources, do not trust Schmidhuber's interpretation, and read some other histories than his history. Other than the references I gave above, I can also recommend this one [1]https://gwern.net/tank as a good example of a microhistory on a specific problem in neural network research. pony in a strange land (talk) 22:07, 18 September 2024 (UTC)[reply]
Dear User:Cosmia Nebula, thanks! I am always going to the source when I find something of interest in a survey. You condemn JS and recommend alternative surveys such as Nils John Nilsson (2009). Unfortunately, Nilsson is not a very good source because he writes things such as, "Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams (1985), introduced a new technique, called back propagation," without mentioning the true inventors of backpropagation. He also writes that "the physicist John J. Hopfield" invented the Hopfield network, without citing Amari who published it 10 years earlier. Neither Nilsson nor the even older surveys you mention cite Ivakhnenko who started deep learning in 1965. Isn't that a rather US-centric non-NPOV here? Most of the community learned about the true pioneers from JS' much more meticulous surveys which you critisize. See my previous message. His 2015 survey lists nearly 900 references, his 2022 update over 500, adding stuff that has become important since 2015 (this is not about citations). Could it be that you have a tiny little bit of non-NPOV of your own? Maybe we all have. But then let's find a consensus. You call "unnormalized linear Transformers" a "great rhetorical trick." Why? Unlike older networks you mention, they do have linearized attention and scale linearly. The terminology "linear Transformer" is due to Katharopoulos et al. (2020), but JS had the machinery already in 1991, as was pointed out in 2021 (see reverted edits). You also claim that early NN architectures (McCulloch and Pitts, 1943) did learn. I know the paper, and couldn't find a working learning algorithm in it. Could you? Note that Gauss and Legendre had a working learning algorithm for linear neural nets over 200 years ago, another must-cite. Anyway, I'll try to follow the recommendations on this talk page and go step by step from now on, in line with WP:CONSENSUS. Speedboys (talk) 11:58, 19 September 2024 (UTC)[reply]
> but if at any time the neuron fires, and the axonal terminations are simultaneously excited, they become synapses of the ordinary kind, henceforth capable of exciting the neuron
That is a learning algorithm. Ignore it if you must. As i said. I'm tired of fighting over this priority dispute.
Again. Gauss or Legendre are not a must cite. I had read hundreds of math and CS papers and never had I needed to know who or what or at what paper least squares was proposed.
Do not reply anymore. pony in a strange land (talk) 21:03, 19 September 2024 (UTC)[reply]
Dear User:Cosmia Nebula, you say, "do not reply," but I must: that's not a learning algorithm. Sure, McCulloch and Pitts' Turing-equivalent model (1943) is powerful enough to implement any learning algorithm, but they don't describe one: no goal, no objective function to maximise, no explicit learning algorithm. Otherwise it would be known as the famous McCulloch and Pitts learning algorithm.
The 2024-09-16 diff under discussion: The original version of the "Early Work" section has a very good and accessible overview of the field, and it wikilinks related subjects in a rather fluid way. I think your version of that section, by going deep into crediting and describing a single primary sources on each topic, just doesn't work. As noted above, doing such a fine-grained step-by-step review of primary works of the history is better for the History of neural networks subarticle.
I don't know the sources on this at all, but I just lend support to editors above for at least this section, on prose, accessibility, and accuracy in a broader conceptual sense, you should not restore your edits wholesale. (I know it's a lot of work, as writing good accessible prose is super hard, but the hardest part -- finding and understanding the source material -- you've already done and banked, so you should definitely keep up editing on this and the many related articles.) SamuelRiv (talk) 19:30, 18 September 2024 (UTC)[reply]
Speedboys, whatever else may be the case, I don't think that you should "revert the revert... and continue from there." WP:CONSENSUS is sufficiently against the content that you had added, that it should not be reverted back in the same form. Please follow the advice of other editors above, and propose specific text to add back, here in talk. --Tryptofish (talk) 18:58, 18 September 2024 (UTC)[reply]

Dear SamuelRiv and Tryptofish, thanks. I'll try to follow your recommendations and go step by step from now on, in line with WP:CONSENSUS. Speedboys (talk) 11:58, 19 September 2024 (UTC)[reply]

Dear all, please review my first proposed edit in line with WP:CONSENSUS. I propose to replace the section "Neural network winter" by the section "Deep learning breakthroughs in the 1960s and 1970s" below. Why? The US "neural network winter" (if any) did not affect Ukraine and Japan, where fundamental breakthroughs occurred in the 1960s and 1970s: Ivakhnenko (1965), Amari (1967), Fukushima (1969, 1979). The Kohonen maps (1980s) should be moved to a later section. I should point out that much of the proposed text is based on older resurrected text written by other editors. Speedboys (talk) 12:17, 19 September 2024 (UTC)[reply]

Not bad, but there is some anti-U.S. tone. E.g. the phrase "of course" falls afoul of MOS:INSTRUCT. Also, the extraordinary claim that CNNs "began with" Neocognitron -- that makes it sound like Neocognitron leveraged the key insight of CNNs which was to reduce the number of weights by using the same weights, effectively, for each pixel, running the kernel(s) across the image. From my limited understand, that is not the case with Neocognitron. The article dedicated to Neocognitron uses the more accurate phrase of that CNNs were "inspired by" Neocognitron. Michaelmalak (talk) 13:22, 19 September 2024 (UTC)[reply]
Dear Michaelmalak, thanks! I agree, I must delete the phrase "of course" in the draft below. I just did. Regarding the Neocognitron: that's another article that must be corrected, because the Neocognitron CNN did have "massive weight replication," and a third party reference on this is section 5.4 of the 2015 survey.[2] I added this to the draft below. Speedboys (talk) 10:29, 20 September 2024 (UTC)[reply]

Deep learning breakthroughs in the 1960s and 1970s

[edit]

Fundamental research was conducted on ANNs in the 1960s and 1970s. The first working deep learning algorithm was the Group method of data handling, a method to train arbitrarily deep neural networks, published by Alexey Ivakhnenko and Lapa in Ukraine (1965). They regarded it as a form of polynomial regression,[6] or a generalization of Rosenblatt's perceptron.[7] A 1971 paper described a deep network with eight layers trained by this method,[8] which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates."[3]

The first deep learning multilayer perceptron trained by stochastic gradient descent[9] was published in 1967 by Shun'ichi Amari.[10] In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned internal representations to classify non-linearily separable pattern classes.[3] Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique.

In 1969, Kunihiko Fukushima introduced the ReLU (rectified linear unit) activation function.[11][12][3] The rectifier has become the most popular activation function for deep learning.[13]

Nevertheless, research stagnated in the United States following the work of Minsky and Papert (1969),[14] who emphasized that basic perceptrons were incapable of processing the exclusive-or circuit. This insight was irrelevant for the deep networks of Ivakhnenko (1965) and Amari (1967).

Deep learning architectures for convolutional neural networks (CNNs) with convolutional layers and downsampling layers and weight replication began with the Neocognitron introduced by Kunihiko Fukushima in 1979, though not trained by backpropagation.[15][16][2]

Others may think differently, but I'd be happy if you just made smaller edits at a slower pace on a WP:BRD basis and (just) seek prior consensus on the controversial ones such as assigning / implying credit to individuals. A slower pace with smaller edits makes it reviewable and so is itself a review process. North8000 (talk) 13:13, 19 September 2024 (UTC)[reply]
Dear User:North8000, thanks for encouraging me to resume the traditional way of editing. I tried to address the comments of the other users. Now I want to edit the article accordingly, and go step by step from there, as you suggested. Speedboys (talk) 10:29, 20 September 2024 (UTC)[reply]
Done. Now the section on CNNs must be adjusted a bit, to reflect the beginnings in the 1970s. Speedboys (talk) 11:08, 20 September 2024 (UTC)[reply]
I'd suggest smaller edits and waiting a day or 2 between them. North8000 (talk) 13:05, 20 September 2024 (UTC)[reply]
I'll wait a bit, as you suggested. But there is still a lot to do. Speedboys (talk) 13:25, 20 September 2024 (UTC)[reply]
I waited for a day, as suggested. The latest edit resurrects references deleted by User:Cosmia Nebula on 7 August: JS' 1991 work on self-supervised pre-training, neural network distillation, GANs, and unnormalized linear Transformers, using the improved text of 24 September.

Dear User:North8000, my next proposed edit (see draft below based on the reverted edit of 15 September) is about important work predating Frank Rosenblatt's work on perceptrons (1958). My third party source is R.D. Joseph (1960) who mentions an even earlier perceptron-like device by Farley and Clark: "Farley and Clark of MIT Lincoln Laboratory actually preceded Rosenblatt in the development of a perceptron-like device." I am also copying additional Farley and Clark references (1954) from History_of_artificial_neural_networks. Finally, Frank Rosenblatt also cites Joseph's work (1960) on adaptive hidden units in multilayer perceptrons. Speedboys (talk) 11:19, 21 September 2024 (UTC)[reply]

I don't have the specialized knowledge to fully evaluate it but overall it looks pretty good to me. Mentions people in the context of early developments without being heavy on claim/credit type wording. North8000 (talk) 01:48, 22 September 2024 (UTC)[reply]

Thanks User:North8000. I inserted this in the article, and will wait a bit. My next proposed edit (see extended draft below based on the reverted edit of 15 September) is about additional important text deleted by User:Cosmia Nebula on 7 August (see discussion above): two centuries ago, Gauss and Legendre had the method of least squares which is exactly what's now called a linear neural network (only the name has changed). Speedboys (talk) 10:23, 22 September 2024 (UTC)[reply]

Forgive me that my limited wiki-minutes permits only a superficial look, coupled with me not having your depth of knowledge on this. That said, if you are using that analysis to assign cretit, that should like too much of a reach per wp"OR and wp:ver. If you are just looking to put in info without making such claims, it sounds cool to me. North8000 (talk) 22:36, 22 September 2024 (UTC)[reply]

Suggest just proceeding (albeit slowly) and taking just any more extraordinary claims to talk first. Nothing needs my approval, and I don't have your depth of expertise on this. Plus for the next 3 weeks I don't have many wiki-minutes, and will be off the grid for about 1/2 pf that. Sincerely, North8000 (talk) 14:26, 30 September 2024 (UTC)[reply]

Ok. Just copied more accurate text on backpropagation from Feedforward neural network. Speedboys (talk) 13:20, 4 October 2024 (UTC)[reply]

Early work

[edit]

Today's deep neural networks are based on early work in statistics over 200 years ago. The simplest kind of feedforward neural network (FNN) is a linear network, which consists of a single layer of output nodes with linear activation functions; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated at each node. The mean squared errors between these calculated outputs and the given target values are minimized by creating an adjustment to the weights. This technique has been known for over two centuries as the method of least squares or linear regression. It was used as a means of finding a good rough linear fit to a set of points by Legendre (1805) and Gauss (1795) for the prediction of planetary movement.[17][18][19][3][20]

In 1958, psychologist Frank Rosenblatt described the perceptron, one of the first implemented artificial neural networks,[21][22][23][24] funded by the United States Office of Naval Research.[25] R. D. Joseph (1960)[26] mentions an even earlier perceptron-like device by Farley and Clark[3]: "Farley and Clark of MIT Lincoln Laboratory actually preceded Rosenblatt in the development of a perceptron-like device." However, "they dropped the subject." Farley and Clark[27] (1954) also used computational machines to simulate a Hebbian network. Other neural network computational machines were created by Rochester, Holland, Habit and Duda (1956).[28] The perceptron raised public excitement for research in Artificial Neural Networks, causing the US government to drastically increase funding. This contributed to "the Golden Age of AI" fueled by the optimistic claims made by computer scientists regarding the ability of perceptrons to emulate human intelligence.[29] The first perceptrons did not have adaptive hidden units. However, Joseph (1960)[26] also discussed multilayer perceptrons with an adaptive hidden layer. Rosenblatt (1962)[30]: section 16  cited and adopted these ideas, also crediting work by H. D. Block and B. W. Knight. Unfortunately, these early efforts did not lead to a working learning algorithm for hidden units, i.e., deep learning.

References

  1. ^ Schmidhuber, Juergen (14 December 2023). "How 3 Turing Awardees Republished Key Methods and Ideas Whose Creators They Failed to Credit. Technical Report IDSIA-23-23". IDSIA, Switzerland. Archived from the original on 16 Dec 2023. Retrieved 19 Dec 2023.
  2. ^ a b c Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
  3. ^ a b c d e f g h Schmidhuber, Jürgen (2022). "Annotated History of Modern AI and Deep Learning". arXiv:2212.11279 [cs.NE].
  4. ^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep Learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
  5. ^ Bengio, Yoshua; LeCun, Yann; Hinton, Geoffrey (2021). "Turing Lecture: Deep Learning for AI". Communications of the ACM. S2CID 3074096.
  6. ^ Ivakhnenko, A. G.; Lapa, V. G. (1967). Cybernetics and Forecasting Techniques. American Elsevier Publishing Co. ISBN 978-0-444-00020-0.
  7. ^ Ivakhnenko, A.G. (March 1970). "Heuristic self-organization in problems of engineering cybernetics". Automatica. 6 (2): 207–219. doi:10.1016/0005-1098(70)90092-0.
  8. ^ Ivakhnenko, Alexey (1971). "Polynomial theory of complex systems" (PDF). IEEE Transactions on Systems, Man, and Cybernetics. SMC-1 (4): 364–378. doi:10.1109/TSMC.1971.4308320. Archived (PDF) from the original on 2017-08-29. Retrieved 2019-11-05.
  9. ^ Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.
  10. ^ Amari, Shun'ichi (1967). "A theory of adaptive pattern classifier". IEEE Transactions. EC (16): 279–307.
  11. ^ Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322–333. doi:10.1109/TSSC.1969.300225.
  12. ^ Sonoda, Sho; Murata, Noboru (2017). "Neural network with unbounded activation functions is universal approximator". Applied and Computational Harmonic Analysis. 43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005. S2CID 12149203.
  13. ^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].
  14. ^ Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. ISBN 978-0-262-63022-1.
  15. ^ Fukushima, K. (1979). "Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron". Trans. IECE (in Japanese). J62-A (10): 658–665. doi:10.1007/bf00344251. PMID 7370364. S2CID 206775608.
  16. ^ Fukushima, K. (1980). "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biol. Cybern. 36 (4): 193–202. doi:10.1007/bf00344251. PMID 7370364. S2CID 206775608.
  17. ^ Mansfield Merriman, "A List of Writings Relating to the Method of Least Squares"
  18. ^ Stigler, Stephen M. (1981). "Gauss and the Invention of Least Squares". Ann. Stat. 9 (3): 465–474. doi:10.1214/aos/1176345451.
  19. ^ Bretscher, Otto (1995). Linear Algebra With Applications (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
  20. ^ Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge: Harvard. ISBN 0-674-40340-1.
  21. ^ Haykin (2008) Neural Networks and Learning Machines, 3rd edition
  22. ^ Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain". Psychological Review. 65 (6): 386–408. CiteSeerX 10.1.1.588.3775. doi:10.1037/h0042519. PMID 13602029. S2CID 12781225.
  23. ^ Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.
  24. ^ Rosenblatt, Frank (1957). "The Perceptron—a perceiving and recognizing automaton". Report 85-460-1. Cornell Aeronautical Laboratory.
  25. ^ Olazaran, Mikel (1996). "A Sociological Study of the Official History of the Perceptrons Controversy". Social Studies of Science. 26 (3): 611–659. doi:10.1177/030631296026003005. JSTOR 285702. S2CID 16786738.
  26. ^ a b Joseph, R. D. (1960). Contributions to Perceptron Theory, Cornell Aeronautical Laboratory Report No. VG-11 96--G-7, Buffalo.
  27. ^ Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by Digital Computer". IRE Transactions on Information Theory. 4 (4): 76–84. doi:10.1109/TIT.1954.1057468.
  28. ^ Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell assembly theory of the action of the brain, using a large digital computer". IRE Transactions on Information Theory. 2 (3): 80–93. doi:10.1109/TIT.1956.1056810.
  29. ^ Russel, Stuart; Norvig, Peter (2010). Artificial Intelligence A Modern Approach (PDF) (3rd ed.). United States of America: Pearson Education. pp. 16–28. ISBN 978-0-13-604259-4.
  30. ^ Rosenblatt, Frank (1962). Principles of Neurodynamics. Spartan, New York.