Guangning Yu

guangningyu.com

Follow

Guangning Yu

37 days ago

实现一个360全景的N种方案

mp.weixin.qq.com

Guangning Yu

40 days ago

网易游戏基于 Flink 的流式 ETL 建设

mp.weixin.qq.com

Guangning Yu

68 days ago

养孩子,是一场全家抱团与世界的博弈

mp.weixin.qq.com

Guangning Yu

75 days ago

写在世界被重塑之前

mp.weixin.qq.com

Guangning Yu

99 days ago

缅甸风云:昂山素季的真实故事。

mp.weixin.qq.com

Guangning Yu

107 days ago

18年了,中国人再次打通东北!

mp.weixin.qq.com

Guangning Yu

107 days ago

在三和玩游戏的人们

chuapp.com

Guangning Yu

163 days ago

前端开发者正在被迫成为全栈开发人员

mp.weixin.qq.com

Guangning Yu

174 days ago

专访|人类学家项飙谈内卷:一种不允许失败和退出的竞争

m.thepaper.cn

Guangning Yu

212 days ago

Parallelising Python with Threading and Multiprocessing

quantstart.com

Guangning Yu

257 days ago

WebAssembly如何演进成为“浏览器第二编程语言”?

mp.weixin.qq.com

Guangning Yu

283 days ago

A Beginner’s Guide to Developing an Addon for World of Warcraft Classic

jimhribar.com

Guangning Yu

366 days ago

Introducing Pandas UDF for PySpark

databricks.com

Guangning Yu

429 days ago

Serverless web application

docs.microsoft.com

Guangning Yu

430 days ago

Why you should (almost) never use an absolute path to your APIs again

freecodecamp.org

Guangning Yu

430 days ago

Should Your Browser Make Client-Side Web API Calls?

dzone.com

Guangning Yu

489 days ago

While the name “DataOps” implies that it borrows most heavily from DevOps, it is all three of these methodologies — Agile, DevOps and statistical process control — that comprise the intellectual heritage of DataOps.

DataOps is NOT Just DevOps for Data

medium.com

Guangning Yu

499 days ago

Git内部原理揭秘!从文件更改到代码储存,Git究竟是怎么实现的

mp.weixin.qq.com

Guangning Yu

506 days ago

独家解读:阿里首次披露自研飞天大数据平台技术架构-InfoQ

infoq.cn

Guangning Yu

507 days ago

20 Predictions about Software Development trends in 2020

towardsdatascience.com

Guangning Yu

512 days ago

如何保障云上数据安全?一文详解云原生全链路加密

mp.weixin.qq.com

Guangning Yu

525 days ago

从零开始入门 K8s | 深入剖析 Linux 容器

mp.weixin.qq.com

Guangning Yu

527 days ago

Deployment Strategies Defined

blog.itaysk.com

Guangning Yu

768 days ago

Defining a Distinguished Engineer

blog.jessfraz.com

Guangning Yu

792 days ago

Organizations need people who can talk to both people and machines and they need people in their upper echelons who specialize in talking to machines.

Why AI Underperforms and What Companies Can Do About It

hbr.org

Guangning Yu

837 days ago

Overall, we found that Giraph was better able to handle production-scale workloads, while GraphX offered a number of features that enabled easier development.

A comparison of state-of-the-art graph processing systems

engineering.fb.com

Guangning Yu

1084 days ago

Ethereum for web developers

medium.com

Guangning Yu

1262 days ago

Partitioning Large Tables

gpdb.docs.pivotal.io

Guangning Yu

1273 days ago

Big Data 101 – The Rise and Fall of Greenplum

ness.com

Guangning Yu

1273 days ago

Big Data 101: Massively Parallel Processing

ness.com

Guangning Yu

1374 days ago

Singular Value Decomposition Part 1: Perspectives on Linear Algebra

jeremykun.com

Guangning Yu

1402 days ago

Complete Guide to Parameter Tuning in XGBoost with codes in Python

analyticsvidhya.com

Guangning Yu

1406 days ago

Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

analyticsvidhya.com

Guangning Yu

1444 days ago

The two reasons to use XGBoost are also the two goals of the project:
1. Execution Speed.
2. Model Performance.

A Gentle Introduction to XGBoost for Applied Machine Learning

machinelearningmastery.com

Guangning Yu

1460 days ago

我们的Serverless AI应用用到了两种技术。首先使用了公共云提供的对象存储和数据库服务,统称为BaaS(Backend as a Service,后端即服务)。其次用了Lambda框架,称为FaaS(Functions as a Service,函数即服务)。

使用BaaS和FaaS是Serverless应用的基本特征,符合这两个基本特征的应用可称为Serverless应用。

Serverless,后端小程序的未来

geek.csdn.net

Guangning Yu

1463 days ago

The stochastic gradient descent method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, usually 32--512 data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.

Tradeoff batch size vs. number of iterations to train a neural network

stats.stackexchange.com

Guangning Yu

1470 days ago

If the sequence_length vector is provided, dynamic calculation is performed.
This method of calculation does not compute the RNN steps past the maximum
sequence length of the minibatch (thus saving computational time).

Unclear documentation on dynamic_rnn vs rnn for efficient dynamic sequence length computation · Issue #3801 · tensorflow/tensorflow

github.com

Guangning Yu

1472 days ago

Recurrent neural networks (RNNs) offer several advantages for our product.

1. Most prominently, they operate directly on sequences of data and thus are a perfect fit for modeling consumer histories.
2. Time-intensive human feature engineering is no longer required. In general, learning from raw data can help to avoid limitations when placing too much confidence in human domain modeling.
3. Furthermore, demand for explaining the predictions of machine learning models is increasing strongly. RNNs can be helpful in providing explanations as they make it easy to directly relate event sequences to predicted probabilities.

Deep Learning in Production for Predicting Consumer Behavior

tech.zalando.com

Guangning Yu

1482 days ago

阿里的语音技术目前在哪些垂直领域有比较成熟的应用?
1.与昆石一起发布的质检云系统,通过语音识别的方式,把语音变成文字以后,诈骗的套路就比较容易被检测出来。
2.在直播里生成字幕。
3.第三个就是泛质检类的服务,比如帮做短租的合作伙伴监听两边的用户是否互留联系方式。
4.法庭速记。

专访 | 阿里iDST初敏博士和陈一宁博士:如何打破语音技术的落地怪圈

dataunion.org

Guangning Yu

1483 days ago

CRISPR is the most powerful genetic engineering tool ever created.

But CRISPR only allows us to modify one gene at a time, one organism at a time. To make species-level changes, CRISPR must be amplified by another powerful phenomenon: gene drive.

Gene drive is any mechanism that makes a gene particularly “selfish” in that it increases the probability that that particular gene will be inherited above 50%, regardless of any selection pressure.

Hacking DNA: The Story of CRISPR, Ken Thompson, and the Gene Drive

blog.ycombinator.com

Guangning Yu

1483 days ago

i.e. "TDM"

What is a term-document matrix?

quora.com

Guangning Yu

1484 days ago

Why did Deep Learning only take off in the 2010s and not earlier?
1. Some important discoveries in the 2000s made training deep neural nets feasible.
2. Computing power and the amount of data required to train deep neural nets was not available until recently.

Artificial Intelligence

quora.com

Guangning Yu

1484 days ago

自然语言理解很难,自然语言处理现在用数据驱动的办法去做,有五个最基本的问题,即分类、匹配、翻译、结构预测和马尔可夫决策过程。
在具体的问题上,有了数据就可以跑AI的闭环,就可以不断提高系统的性能、算法的能力。
深度学习在我刚说的五个大任务里的前四个都能做得很好,特别是运用seq toseq的翻译和语音识别。单论对话也能做的越来越好,但是多轮对话需要去研究和解决。

华为李航:NLP 有 5 个基本问题,深度学习有4个做得很好 (PPT)| 北大AI公开课

mp.weixin.qq.com

Guangning Yu

1484 days ago

Stemming = heuristically removing the affixes of a word, to get its stem (root).

Lemmatization = morphological analysis of a word that returns its lemma, which is a normalized form of a set of morphologically related forms, chosen by convention (nominative singular for nouns, infinitive for verbs, etc.) to represent that set. This is the form in which a word appears in the dictionary.

What is difference between stemming and lemmatization?

quora.com

Guangning Yu

1485 days ago

WordNet is a semantically-oriented dictionary of English, consisting of synonym sets — or synsets — and organized into a network.

2. Accessing Text Corpora and Lexical Resources

nltk.org

Guangning Yu

1485 days ago

在正负样本都非常之少的情况下,应该采用数据合成的方式;在负样本足够多,正样本非常之少且比例及其悬殊的情况下,应该考虑一分类方法;在正负样本都足够多且比例不是特别悬殊的情况下,应该考虑采样或者加权的方法。

如何解决机器学习中数据不平衡问题

dataunion.org

Guangning Yu

1485 days ago

我觉得这种讨论的背后焦点要比理论和应用的关系更为深刻,它表明的是不同的学者对于智能(包括视觉在内)完全不一样的学术方法论。这种争论有一个并不明显但是非常重要的假设上的不同,那就是实现智能“是否需要”和“是否有可能”像物理学那样去将复杂的体系简化成易于理解的严谨描述。

如何评价 UCLA 朱松纯教授近日访谈中对深度学习的批评?

dataunion.org

Guangning Yu

1485 days ago

Negative sampling is one of the ways of addressing this problem- just select a couple of contexts c1 at random. The end result is that if cat appears in the context of food, then the vector of food is more similar to the vector of cat (as measures by their dot product) than the vectors of several other randomly chosen words (e.g. democracy, greed,freddy), instead of all other words in language.

What is negative sampling?

quora.com

Guangning Yu

1486 days ago

To summarize - you first reduce dimensionality to a small set of numbers.

Then by randomly choosing numbers in this small set, and decompressing, you get a set of images similar to what you had in your training set.

How can deep learning networks generate images? - Quora

quora.com

Guangning Yu

1486 days ago

A word embedding is a representation of a word, just like an oil painting might be a representation of a sunflower. Word embeddings use numbers to represent words. For a neural network like word2vec, they may use 300-500 numbers. Each of those numbers is in a dimension, and each locates a word in, say, 300-dimensional space.

What is the definition of word embedding (word representation)?

quora.com

Like this set of items on Pocket? Share with friends.