[{"categories":["tech"],"content":" Overview # In functional programming, we deal with exceptions and error Arrow is a FP(functional programming) library in kotlin, which aims to provide an interfaces and abstractions across Kotlin libraries.\n# ","date":"2020-07-29T15:06:14+02:00","lastmod":"2020-07-29T15:06:14+02:00","permalink":"https://www.xiaoli-yang.com/engineering/languages/kotlin-arrow/","summary":"This is an introduction to leveraging Arrow to handle exceptions and errors in functional programming.","tags":["Kotlin","Funcitonal Programming","Error Handling","Arrow","Either","Exception"],"title":"Using Arrow to Handle Errors and Exceptions in Kotlin","topics":["Programming Languages"]},{"categories":["tech"],"content":"Shell programming is such an essential skill that you need to master, if you want to step further into the Linux/Unix world. Being able to program on shell will accelerate your operation work. Because in daily operation, you cannot avoid searching\u0026amp;analying logs, running and proceeding other system scripts periodically, etc. I know it may bore you easily, that is why I want to write down those tips.\nBash Shell Script Basics # We usually call it as Bash Shell, because Bash is the most widely used shell interpreter. It is widely available on various operating systems and is a default command interpreter on most GNU/Linux systems. more info and definition pls check: Bash Scripting Tutorial for Beginners\nIn this post, I will briefly introduce 10 tips when we are writing bash script.\n1. Variable # OPeration # String # 3. Loop # 3. Conditional Statement # 4. Function # 5. Search # Built-in Command lines # regular expression # Cronjob # ","date":"2020-03-01T09:57:55+01:00","lastmod":"2020-03-01T09:57:55+01:00","permalink":"https://www.xiaoli-yang.com/engineering/tools/bash-operations/","summary":"Shell programming is such an essential skill that you need to master, if you want to step further into the Linux/Unix world.","tags":["shell","MAINT","SDE"],"title":"10 Basic Tips on Bash Scripting to Optimize Your Operation Work","topics":["Developer Tools"]},{"categories":["tech"],"content":" Eco-system Hadoop # Basic Concept # How do you know about RedHat, Apache. I know, it is straightforward google-it questions. however\nHadoop ecosystem\nHDFS # what is the mechanism. Can we change a part of it? [structure] Namenode, Datanode\nnamenode containes the metadata, import so, usually it has at least\nsequentialized\nwhat is a block? in windows, the block is x KB\nRDD # ","date":"2019-08-18T16:31:00+02:00","lastmod":"2019-08-18T16:31:00+02:00","permalink":"https://www.xiaoli-yang.com/engineering/data/spark-hadoop-qa/","summary":"In this series of blogs contain some essential questions about Spark and Hadoop, which are frequently being asked in many interviews. At the same time, those Q\u0026A will help you better understand the big data framework, ecoystem, especially know hot to better apply them","tags":["bigdata","framework"],"title":"Spark Hadoop: the essential questions you may need to know","topics":["Data Engineering"]},{"categories":["tech"],"content":" Introduction # Git works more than as a version controlling tool, it glues the teamwork together. It is widely used by almost all developers nowadays. Basic usages can be git-cheatsheet\nHowever, in some scenarios, more advanced skills are required. In this post, I will present some usages, which can impove our efficiency.\n1. git stash: store temporary dirty working branch # Scenario: Interrupted workflow When you are in the middle of coding, your boss comes in and demands that you fix something immediately. Traditionally, you would make a commit to a temporary branch to store your changes away, and return to your original branch to make the emergency fix, like this:\n$ git commit -a -m \u0026quot;WIP\u0026quot; $ git checkout bugfix/branch $ ...edit emergency fix $ git commit -a -m \u0026quot;Fix in a hurry\u0026quot; $ git checkout my_original_branch # ... continue hacking ... You can use git stash to simplify the above, like this:\n# ...doing your work until your boss comes... $ git stash $ edit emergency fix (in this branch or another branch) $ git commit -a -m \u0026quot;Fix in a hurry\u0026quot; $ git stash pop # ... continue hacking ... more reference\n2. Retrieve committed code # you have committed code, but forget to push to the remote repo. After had checked out to another branch, and changed back to previous branch, you found out that all the changes are gone. How to retrieve code? This method may apply to other scenarios.\n# to check the commit info and the abbrivation of commit-hashcode $ git reflog # to get the full information of logs. copy the hashcode $ git fsck --cache --no-reflogs # retrieve $ git reset $(hash-code) 3. Overwrite complete a branch # the production branch regularly completely compy the latest verion of dev branch, and replace the old version(not merge) to avoid conflicts.\ngit checkout -B production_branch dev_branch 4. git fetch vs. git pull # In most cases:\ngit pull = git fetch + git merge\ngit fetch: When no remote is specified, by default the origin remote will be used, unless there’s an upstream branch configured for the current branch. You can retrieve new updates(newly created branch in remote repo)\ngit pull Fetch from and integrate with another repository or a local branch\nReference # https://docs.microsoft.com/en-us/biztalk/technical-guides/planning-the-development-testing-staging-and-production-environments https://blogs.technet.microsoft.com/devops/2016/06/21/a-git-workflow-for-continuous-delivery/ https://nvie.com/posts/a-successful-git-branching-model/ ","date":"2019-06-28T09:57:55+01:00","lastmod":"2019-06-28T09:57:55+01:00","permalink":"https://www.xiaoli-yang.com/engineering/tools/advanced-git/","summary":"Git is widely used. In some scenarios, more advanced skills are required. In this post, I will present some usages, which can impove our efficiency.","tags":["Git","SDE"],"title":"Advanced Git Usage","topics":["Developer Tools"]},{"categories":["tech"],"content":"Hyperledger fabric is not stable and robust as we thought, usually you may find out conflicts of versions, disappearing docker images, invalid links, unclear descriptions, shabby documentary. like one of the bloger said:\nI have faced “n” number of errors while I was working in Blockchain project, I have spent countless hours reading out articles, reaching out experts to fix the same. This article is to just help the people who are beginners \u0026amp; struggling to solve the errors that are quite often pop up during your development stage. So I\u0026rsquo;d love to share my experience in debugging.\nError No.1 # DEBU 003 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = \u0026#34;transport: Error while dialing dial tcp: lookup orderer.example.com on 127.0.0.11:53: no such host\u0026#34;; Reconnecting to {orderer.example.com:7050 \u0026lt;nil\u0026gt;} Reason: In most cases, the orderer failed to setup. # use docker ps to take a look:\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d0b84b7aa41e hyperledger/fabric-tools:latest \u0026#34;/bin/bash\u0026#34; 31 seconds ago Up 29 seconds cli 13d174121481 hyperledger/fabric-peer:latest \u0026#34;peer node start\u0026#34; 36 seconds ago Up 32 seconds 0.0.0.0:9051-\u0026gt;7051/tcp, 0.0.0.0:9053-\u0026gt;7053/tcp peer0.Distributor.example.com e959d8a0b974 hyperledger/fabric-peer:latest \u0026#34;peer node start\u0026#34; 36 seconds ago Up 31 seconds 0.0.0.0:6051-\u0026gt;7051/tcp, 0.0.0.0:6053-\u0026gt;7053/tcp peer0.Retailer.example.com aa0ababa9028 hyperledger/fabric-peer:latest \u0026#34;peer node start\u0026#34; 36 seconds ago Up 33 seconds 0.0.0.0:7051-\u0026gt;7051/tcp, 0.0.0.0:7053-\u0026gt;7053/tcp peer0.Operator.example.com ba5fa6d6c8ff hyperledger/fabric-peer:latest \u0026#34;peer node start\u0026#34; 36 seconds ago Up 30 seconds 0.0.0.0:8051-\u0026gt;7051/tcp, 0.0.0.0:8053-\u0026gt;7053/tcp peer0.Supplier.example.com Solution 1: # check the location info written in docker-compose-base.yml and any file containing the relevant infomation:\nvolumes: - ../channel-artifacts/genesis.block:/var/hyperledger/orderer/orderer.genesis.block - ../crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp:/var/hyperledger/orderer/msp - ../crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tls/:/var/hyperledger/orderer/tls the absolute path with file, we should add the quotation marks:\n- \u0026#34;../channel-artifacts/genesis.block:/var/hyperledger/orderer/orderer.genesis.block\u0026#34; Solution 2: # If still unsolved, we should try to read the log and find out. Here\u0026rsquo;s the way to find error in logs. We try to setup the orderer individually by creating a new config file docker-compose-li-test.yaml with the content:\n# Copyright IBM Corp. All Rights Reserved. # # SPDX-License-Identifier: Apache-2.0 # version: \u0026#39;2\u0026#39; services: orderer.example.com: extends:R file: base/docker-compose-base.yaml service: orderer.example.com container_name: orderer.example.com and run the respectively command\ndocker-compose -f docker-compose-cli-test.yaml up Naja, I found another new bug, it\u0026rsquo;s like Recursion\u0026hellip; So I have to find what\norderer.example.com | Kafka.Topic.ReplicationFactor = 3 orderer.example.com | Debug.BroadcastTraceDir = \u0026#34;\u0026#34; orderer.example.com | Debug.DeliverTraceDir = \u0026#34;\u0026#34; orderer.example.com | 2018-05-31 19:40:14.823 UTC [orderer/common/server] initializeServerConfig -\u0026gt; INFO 003 Starting orderer with TLS enabled orderer.example.com | 2018-05-31 19:40:14.850 UTC [fsblkstorage] newBlockfileMgr -\u0026gt; INFO 004 Getting block information from block storage orderer.example.com | 2018-05-31 19:40:14.882 UTC [orderer/commmon/multichannel] newLedgerResources -\u0026gt; PANI 005 Error creating channelconfig bundle: initializing channelconfig failed: could not create channel Orderer sub-group config: setting up the MSP manager failed: the supplied identity is not valid: x509: certificate signed by unknown authority (possibly because of \u0026#34;x509: ECDSA verification failure\u0026#34; while trying to verify candidate authority certificate \u0026#34;ca.example.com\u0026#34;) orderer.example.com | panic: Error creating channelconfig bundle: initializing channelconfig failed: could not create channel Orderer sub-group config: setting up the MSP manager failed: the supplied identity is not valid: x509: certificate signed by unknown authority (possibly because of \u0026#34;x509: ECDSA verification failure\u0026#34; while trying to verify candidate authority certificate \u0026#34;ca.example.com\u0026#34;) orderer.example.com | What works for me is the following method:\n# remove previous crypto material and config transactions rm -fr channel-artifacts/* rm -fr crypto-config/* Error No.2: channel creation failed # Error: got unexpected status: BAD_REQUEST -- Attempted to include a member which is not in the consortium Reason: failed to set the channel name. # Solution: # export CHANNEL_NAME=mychannel peer channel create -o orderer.example.com:7050 -c $CHANNEL_NAME -f ./channel-artifacts/channel.tx --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem ","date":"2018-06-21T16:22:23+02:00","lastmod":"2018-06-21T16:22:23+02:00","permalink":"https://www.xiaoli-yang.com/engineering/blockchain/hyperledger-errors/","summary":"I'm an avid open-source supporter, and have always been thinking that open-source projects overweight than most of similar project, until recently I encounters countless errors and problems with Hyperledger Fabric, I started to think the limitations of open-source software","tags":["hyperledger-fabric","blockchain","debug"],"title":"Errors and Solutions with Hyperledger Fabric","topics":["Blockchain"]},{"categories":["tech"],"content":" Find out what type is your computer\u0026rsquo;s CPU # using the command lscpu\nsherry@sherry:~/Documents/Blog/public$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 78 Model name: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz Stepping: 3 CPU MHz: 1291.148 CPU max MHz: 2800,0000 CPU min MHz: 400,0000 BogoMIPS: 4800.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 ARM vs. x86 # Types ARM x86 ISA RISC CISC whenever we mention the ARM vs. x86, it sounds like a fight between Apple and Intel+AMD. ARM processors one of CPU\u0026rsquo;s type, stands for Advanced RICS Machine. More commonly used x86 processors made by Intel and AMD use the CISC.\nCPU architecture types # Instruction set architecture (ISA) is a description of computer architecture based on a command set it can execute. Usually we categorize it into two types:\nComplex Instruction Set Computers(CISC)\nPowerful, complex instructions Instructions are variable in length (1 - n bytes) Reduced Instruction Set Computer(RISC)\nFixed instruction length Enables efficient pipelining \u0026amp; high clock frequencies Clear distinction between data loading/storing and manipulation Little Endian vs. Big Endian # Big Endian Byte Order: The most significant byte (the \u0026ldquo;big end\u0026rdquo;) of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.\nLittle Endian Byte Order: The least significant byte (the \u0026ldquo;little end\u0026rdquo;) of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.\nGlossary # Bandwidth(B): Maximum data rate that can be held during a transfer\nMessage Passing Interface (MPI) : Message Passing Interface (MPI) is a community driven specifi- cation for the message-passing model.\nMoore’s law: The observation made by Gordon Moore that the number of transistors in an integrated circuit doubles roughly every two years.\nReference # https://www.maketecheasier.com/differences-between-arm-and-intel/ https://chortle.ccsu.edu/AssemblyTutorial/Chapter-15/ass15_3.html\n","date":"2018-05-18T09:57:55+01:00","lastmod":"2018-05-18T09:57:55+01:00","permalink":"https://www.xiaoli-yang.com/engineering/tools/understanding-cpus/","summary":"You may get confused by the questions about your CPUs, here's some frequently questions I","tags":["CPU","Hardware","linux"],"title":"Know about your CPU","topics":["Developer Tools"]},{"categories":["tech"],"content":" Foreword # Recently I joined a Hololens-Unity AR development project. Since the clients require the backend server running on the MAC server, however not everyone in the Backend team owns a MAC. so we decide to use the docker as the middleware.\nWhat\u0026rsquo;s docker? # https://docs.docker.com/get-started/#setup\nIn short # when to use the docker? # https://hub.docker.com/r/ibmcom/swift-ubuntu/ https://github.com/IBM-Swift/swift-ubuntu-docker\n","date":"2017-12-11T16:22:23+02:00","lastmod":"2017-12-11T16:22:23+02:00","permalink":"https://www.xiaoli-yang.com/engineering/languages/swift-on-linux/","summary":"How to test a Swift package on Linux using Docker.","tags":["techniques","backend","Docker","Linux"],"title":"Test swift via Docker on Ubuntu","topics":["Programming Languages"]},{"categories":["tech"],"content":" Preface # Block A set of transactions that are bundled together and added to the chain at the same time.\nBlockchain ApplicationIn a blockchain application, the blockchain will store the state of the system, in addition to the immutable record of transactions that created that state. A client application will be used to send transactions to the blockchain. The smart contracts will encode some (if not all) of the business logic.\nByzantine Fault Tolerance Algorithm A consensus algorithm designed to defend against failures in the system caused by forged or malicious messages. In order to be fault tolerant of a Byzantine fault, the number of nodes that must reach consensus is 2f+1 in a system containing 3f+1, where f is the number of faults in the system.\nChannels are data partitioning mechanisms that allow transaction visibility for stakeholders only. Each channel is an independent chain of transaction blocks containing only transactions for that particular channel.\nChaincode -Smart contracts in Hyperledger Fabric. They encapsulate both the asset definitions and the business logic (or transactions) for modifying those assets.Chaincode is programmable code, written in Go, and instantiated on a channel\nConsensus Algorithm - Refers to a system of ensuring that parties agree to a certain state of the system as the true state.\nCouchDB\nCryptocurrency - is a digital asset that is used as a medium of exchange. A cryptocurrency is exchanged by using digital signatures to transfer ownership from one cryptographic key pair to another key pair. Since this digital asset has characteristics of money (like store of value and medium of exchange), it is generally referred to as currency. Note: It should not be confused with digital currency or virtual currency.\nCryptography - The study of the techniques used to allow secure communication between different parties, and to ensure the authenticity and immutability of the data being communicated.\nDistributed Ledger\nA type of data structure which resides across multiple computer devices, generally spread across locations and regions. well you could also say it the system of record for a business which is shared by all the participants. Hash Function - It is used to map data of any size to a fixed length. The output of a hash function is referred to as a hash, hash value, or digest. One important characteristic of a hash function is that, when given a specific input, the hash function will always produce the exact same output.\nKey/Value Pair - It consists of two parts, one designated as a \u0026lsquo;key\u0026rsquo;, and another as a \u0026lsquo;value\u0026rsquo;. The \u0026lsquo;key\u0026rsquo; is an identifier that allows you to look up the \u0026lsquo;value\u0026rsquo;. The \u0026lsquo;value\u0026rsquo; is the data that is stored for a given \u0026lsquo;key\u0026rsquo;.\nMining - The process of solving computational challenging puzzles in order to create new blocks in the Bitcoin blockchain.\nNode - Computer device attached to a blockchain network. Types of nodes include: mining nodes, validator nodes, committer nodes, and endorser nodes. Nodes are sometimes also called \u0026lsquo;peers\u0026rsquo; because they make up the devices within a peer-to-peer network.\nPeer-to-Peer Network - A network witch consists of computer systems which are directly connected to each other via the Internet without a central server.\nPrivate/Public Keys - Private keys are used to derive a public key. While private keys remain confidential, public keys are available to everyone in the network (similar to an email address). Anything encrypted with a public key can only be decrypted using its corresponding private key, and vice versa.\nProof of Elapsed Time (PoET) - Consensus algorithm used by Hyperledger Sawtooth that utilizes a lottery function in which the node with the shortest wait time creates the next block.\nProof of Stake (PoS) - Consensus algorithm where nodes are randomly selected to validate blocks, and the probability of this random selection depends on the amount of stake held.\nProof of Work (PoW) - Consensus algorithm first utilized by Bitcoin that involves solving a computational challenging puzzle in order to create a new block.\nSmart Contract - Computer program that executes predefined actions when certain conditions within the system are met. Smart contracts were first proposed by Nick Szabo in 1996 (http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/smart_contracts_2.html).\nState - Contains up-to-date data that represents the latest values for all keys included in the network\u0026rsquo;s ledger. The state of a network encompasses all past transactions in the network, from the genesis block to the present time.\nTransaction - A record of an event, cryptographically secured with a digital signature, that is verified, ordered, and bundled with other such records into blocks.\nTuring-Complete - Named after Alan Turing, an English mathematician and computer scientist, it refers to a computer that can solve any problem that a Turing Machine can. A Turing Machine is a machine that can simulate any computer algorithm, no matter how complicated. Bitcoin scripting language is not Turing-Complete, as there are no looping and branching types of computing sequences. Ethereum\u0026rsquo;s Solidity language is considered Turing-Complete, as it does have looping and branching.\nReference # https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS171x+3T2017/6c82a18df6994ffca695cabb2a44ce72/ http://hyperledger-fabric.readthedocs.io/en/release/glossary.html https://github.com/ethereum/wiki/wiki/Glossary ","date":"2017-11-09T09:57:55+01:00","lastmod":"2017-11-09T09:57:55+01:00","permalink":"https://www.xiaoli-yang.com/engineering/blockchain/glossary/","summary":"","tags":["Blockchain","Concepts"],"title":"Blockchain glossary","topics":["Blockchain"]},{"categories":["tech"],"content":" 楔子 # 这次完全拿到的是裸机，所以从零开始配置。其实集群和单节点差不多，见我前面的blog\n本机配置 # Centos 5.8 4 cores 8G 节点布置 Masters\u0026amp;Slaves # Master 119.254.168.33 Slaves1 119.254.168.34 Slaves2 119.254.168.36 Slaves3 119.254.168.38 环境配置 Environment # JAVA 环境 # 见Apache Spark单节点安装和环境配置 SCALA 环境 # 见Apache Spark单节点安装和环境配置 SSH 配置 # 背景：搭建Hadoop环境需要设置无密码登陆，所谓无密码登陆其实是指通过证书认证的方式登陆 ，使用一种被称为”公私钥”(RSA)认证的方式来进行ssh登录。 在linux系统中,ssh是远程登录的默认工具,因为该工具的协议使用了RSA/DSA的加密算法.该工具做linux系统的远程管理是非常安全的。\n所谓ssh就是ssh免密码登录服务器，其中用到了RSA加密算法。其中的细节和原理我有时间再写。\n确保安装好 # ssh：（ubuntu版） $ sudo apt-get update $ sudo apt-get install openssh-server $ sudo /etc/init.d/ssh start ssh(centos): 确认系统已经安装了SSH。 rpm –qa | grep openssh rpm –qa | grep rsync yum install ssh //安装SSH协议 yum install rsync //rsync是一个远程数据同步工具，可通过LAN/WAN快速同步多台主机间的文件 service sshd restart –\u0026gt;启动服务 2. 生成并添加密钥： $ ssh-keygen -t rsa $ cat ~/.ssh/id_rsa.pub \u0026gt;\u0026gt; ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys service sshd restart //一般修改过都需要重启服务 如果已经生成过密钥，只需执行后两行命令。 测试ssh localhost\n$ ssh localhost $ exit 查看端口：是否打开\nnetstat -anp |grep ssh Hadoop cluster Installation 基本和前面相同\n修改hdfs-site.xml\n\u0026lt;configuration\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;dfs.namenode.secondary.http-address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:9001\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;dfs.namenode.name.dir\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;file:///opt/hadoop-2.6.1/dfs/name\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;dfs.datanode.data.dir\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;file:///opt/hadoop-2.6.1/dfs/data\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;dfs.replication\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;3\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;dfs.webhdfs.enabled\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;true\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; mapred-site.xml\n\u0026lt;configuration\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;mapreduce.framework.name\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;yarn\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;mapreduce.jobtracker.http.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;nameNode:50030\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;mapreduce.jobhistory.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;nameNode:10020\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;mapreduce.jobhistory.webapp.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;nameNode:19888\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;/configuration\u0026gt; yarn 修改\n\u0026lt;configuration\u0026gt; \u0026lt;!-- Site specific YARN configuration properties --\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.nodemanager.aux-services\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;mapreduce_shuffle\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.resourcemanager.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:8032\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.resourcemanager.scheduler.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:8030\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.resourcemanager.resource-tracker.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:8031\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.resourcemanager.admin.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:8033\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;property\u0026gt; \u0026lt;name\u0026gt;yarn.resourcemanager.webapp.address\u0026lt;/name\u0026gt; \u0026lt;value\u0026gt;kexinyun1:8088\u0026lt;/value\u0026gt; \u0026lt;/property\u0026gt; \u0026lt;/configuration\u0026gt; slaves 文件\n119.254.168.38 //(slaves3) 119.254.168.36 //(slaves2) 119.254.168.34 //(slaves1) vi hadoop-env.sh export JAVA_HOME=your java home vi yarn-env.sh export JAVA_HOME=your java home 格式化（同以前） 启动/停止\n查看：jsp\n访问:\nhttp://ip:9001/\nSpark Cluster Installation\n基本同单节点类似 文件配置部分\nexport SCALA_HOME=/opt/scala-2.11.4 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101.x86_64/ export SPARK_HOME=/opt/spark export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_JAR=/opt/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar export SPARK_MASTER_IP=localhost export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_MEMORY=1g 启动\n$SPARK_HOME/sbin/start-all.sh problem # http://www.2cto.com/os/201209/155681.html\nenglish version\nhttp://pingax.com/install-hadoop2-6-0-on-ubuntu/\nReference # http://www.cnblogs.com/lanxuezaipiao/p/3525554.html http://pingax.com/install-hadoop2-6-0-on-ubuntu/ http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html http://blog.csdn.net/greensurfer/article/details/39450369\n","date":"2016-03-14T16:22:23+02:00","lastmod":"2016-03-14T16:22:23+02:00","permalink":"https://www.xiaoli-yang.com/engineering/data/spark-multi-node/","summary":"Spark - Multinodes - start from 0","tags":["bigdata","framework"],"title":"Spark多节点配置","topics":["Data Engineering"]},{"categories":["tech"],"content":" Excerpt # The Desiderata (part 2)\nby Max Ehrmann, 1927\nIf you compare yourself with others,\nyou may become vain or bitter;\nfor always there will be greater and lesser persons than yourself.\nEnjoy your achievements as well as your plans.\nKeep interested in your own career, however humble;\nit is a real possession in the changing fortunes of time. –TBC\n安装方式 # 源码编译 需要从github上clone下来， 通过maven 或者 sbt 进行编译。读者可以到官网 上进行下载。\n如果需要自己编译，那么需要配置环境（见下）， 然后通过 sbt 进行编译（见下）。当然如果觉得太麻烦可以直接下载预编译的版本。\n预编译的版本 Prebuilt 版本就方便多了spark-1.6.0-bin-hadoop2.6.tgz\n下载到制定位置\n解压\n运行shell\ncd spark/bin\n./spark-shell\n在shell脚本中可以看到很多信息，openjdk版本， scala版本等等。 同时可以通过web 的UI界面访问：http://localhost:4040/jobs/\n环境构建 # spark 支持多个版本的Hadoop， 无论是 Apache Hadoop 还是Cloudera 的CDH， Hartonworks的。 你用什么版本的hadoop所以问题不大\nSpark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). 所以window 就不行了，支持Unix家族的机器 Mac， Linux（Ubuntu， Redhat， Centos等）.\nSpark runs on Java 7+, Python 2.6+ and R 3.1+. Spark 1.6.0 uses Scala 2.10. 所以Scala 配置 2.10.×版本的。\n编译工具安装 - sbt # spark 官网上推荐 maven或者sbt 进行编译scala文件。 我个人推荐用sbt，轻便简单。但是容易在安装时遇到问题（gfw）。 sbt 在之后 运行spark工程时也需要，编译打包放在spark submit 上运行。\n官网下载地址 http://www.scala-sbt.org/\n运行安装包\n创建工程： # 新建工程文件夹 比如现在的工程名为“sparksample”。那么\ncd sparksample mkdir project mkdir src/main/scala 一般的工程文件结构如下：\nproject – 工程定义文件\nproject/build/.scala – 主要的工程定义文件\nproject/build.properties – 工程，sbt以及scala版本定义\nsrc/main – 你的应用代码放在这里，不同的子目录名称表示不同的编程语言（例如，src/main/scala,src/main/java)\nsrc/main/resources – 你想添加到jar包里的静态文件（例如日志配置文件）\nlib_managed – 你的工程所依赖的jar文件。会在sbt更新的时候添加到该目录\ntarget – 最终生成的文件存放的目录（例如，生成的thrift代码，class文件，jar文件）\n编写build.sbt文件\nname := \u0026quot;SparkSample\u0026quot; version := \u0026quot;1.0\u0026quot; scalaVersion := \u0026quot;2.10.3\u0026quot; libraryDependencies += \u0026quot;org.apache.spark\u0026quot; %% \u0026quot;spark-core\u0026quot; % \u0026quot;1.1.1\u0026quot; 这里需要注意使用的版本，scala 和spark streaming的版本是否匹配等等。\n查看地址： http://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.10/1.4.1\n构建jar 包 在project的文件目录下（e.g. “sparksample”）\nsbt package 提交到spark submit：\ncd /opt/spark-verisonnuber/bin/ ./spark-submit --class \u0026quot;org.apache.spark.examples.streaming.sparksample\u0026quot; --packages org.apache.spark:spark-streaming-kafka_2.10:1.4.1 --master local[2] /home/sherry/sparksample/target/scala-2.10/sparksample-1.0.jar 10.81.52.88:9092 tintin 具体怎么写参数，请看官方：\nhttp://spark.apache.org/docs/latest/submitting-applications.html#submitting-applications\n注意： 略坑的是， 需要将调用的包手动加入 – packages\nreference # 我的csdn博客\nhttp://www.tuicool.com/articles/AJnIvq\nhttp://www.scala-sbt.org/release/docs/index.html\nhttp://www.supergloo.com/fieldnotes/apache-spark-cluster-part-2-deploy-a-scala-program-to-spark-cluster/\n","date":"2016-01-08T13:26:32+02:00","lastmod":"2016-01-08T13:26:32+02:00","permalink":"https://www.xiaoli-yang.com/chinese/apache-spark-setting/","summary":"","tags":["framework","bigdata"],"title":"Apache Spark安装和环境配置","topics":null},{"categories":["tech"],"content":" Excerpt # For most blog lovers, they usually experience these three phases. First step, as beginners we feel curious and excited to try all kinds way, where we can post our articles. Writing on a free platform,though the templates or layout looks monotonous,still it is a good choice. Later you may have more requirements which the platform can no longer fulfil, then you choose to buy the domain and setup your own website. In the third step, you\u0026rsquo;re tired with maintain a website and then let someone\u0026rsquo;s help you, thus you only focus on the writing.\n搭建自己的博客是有点折腾，但是会乐在其中。 考虑到技术文章会涉及到大量的英文单词，或者命令就用英文写。 概述类和心得篇就用中文。 p.s.旧csdn博客地址\n所以针对新博客搭建有几个plan：\n初期：基本功能分类， 标签，联系方式 中期：评论， 网站分析， rss， 图片托管，Gallery 长期：和顶级域名分离，blog分En 和Ch， 兴许还有De。 初期已经差不多完成了，继续折腾。 Disqus 添加评论功能 # 在很久以前评论还是属于博客主的资源，在搬迁的过程中文章评论一个不能丢。数据存在网站数据中。 近年网络社交兴起博客主为了吸引访客互动会使用第三方网站托管评论。Disqus就是这样的第三方社会化评论平台，注册Disqus后你会发现：这根本就是个社交平台啊！ 所以同志们在你的博客上评论后，你登陆Disqus 后台，会看见评论， 推荐，喜爱 等等。\n个人信息 # Profile: Full Name(昵称，任意取) Avatar: 头像支持本地上传，网络图像URL，所在网站的默认头像，Facebook头像，Twitter头像。当然，后两者需要绑定对应的社交帐号。 Serives: 第三方帐号的管理界面，可进行绑定、解绑社交帐号，目前所支持的除了三大社交网络帐号，还有Yahoo! 已绑定的帐号名在Disqus评论系统上是公开的。 账户 # Username ： 唯一 Email ： 邮箱进行验证。 开始加入Disqus # 注册 这里注册的Site name 就是我们将要用到的shortname 注册后你会得到一个 sherrysblog.disqus.com/东西，其中sherrysblog就是我的shortname, 然后填到 config.toml里。 在我的账户内https://sherrysblog.disqus.com/admin/settings/general/ 可以进行个人网站的设置， 语言设置等。 在setting里面的信息和config.toml要一致。 缺点 # 有收费的可能 不支持国内社交账号 没有私信功能。 可能被墙 Google Analytics 网站分析 # google analytics 这其实是个很强大的工具，但是想想挺可怕的。因为如果作为商业用户，会实时监控我们大众用户的行为，这就是所谓大数据%\u0026gt;_\u0026lt;%\n注册 google analytics的账号 填写自己blog 的位置tintinsnowy.com 将Tracing code 粘贴到自己config.toml中。 通过google analytics 后台账号查看用户访问情况。 —–TBC writen by Sherry photographed by Sherry. \u0026lsquo;Wiener Staatsoper\u0026rsquo;\n","date":"2015-12-27T13:26:32+02:00","lastmod":"2015-12-27T13:26:32+02:00","permalink":"https://www.xiaoli-yang.com/engineering/tools/blog-setup/","summary":"","tags":["techniques"],"title":"Blog Tool","topics":["Developer Tools"]},{"categories":["tech"],"content":" Hello GitHub! # GitHub is more than a collaborative tool to make your software development more effective. In fact, it is more like a community where you share and contribute your code.\nSince we want to build our own blog, so the whole static websites should be stored in Github. Now here we go. sign up in Github\nOkay！ you are now a member of github, enjoy your coding journey!\nBravo！ Hugo # A powerful and fast Website engine is Hugo， which is designed for go lovers！ Hugo flexibly works with many formats and is ideal for blogs, docs, portfolios and much more. Hugo’s speed fosters creativity and makes building a website fun again.\nIt helps you to build your blog, you don’t have to know the details of css, js, MySQL etc. It saves you a lot of trouble, and you can concentrate more on your writing and coding.\nUsage of Hugo # Install Hugo # Follow the quickstart\nBuild from the source\nDownload and unzip move the \u0026ldquo;hugo/\u0026rdquo; into \u0026ldquo;/usr/bin/hugo\u0026rdquo; test hugo Relevant Commands # 1. new a website # $hugo new site tintinsnowy.github.io If you find something wrong with your path, then use the absolute path. Or configure the path of Hugo. Then you can see the tree:\n▸ archetypes/ ▸ content/ ▸ layouts/ ▸ static/ config.toml The content of blog\n2. Create the content # your articles should be store in the path of : content/post. adll the articles are in markdown format.\n$ cd tintinsnowy.github.io $ hugo new about.md $ hugo new post/firstblog.md Then open the firstblog.md , you can see the automatically generated file\n+++ date = \u0026quot;2015-12-26T08:36:54-07:00\u0026quot; draft = true title = \u0026quot;firstblog\u0026quot; +++ 3. Find a joyful theme for your blog # There are a few opensource-themes Every theme has detailed description of usage, but NOT all the configurations are the same. You should customize what you’ve chosen. For example,\n$ pwd \u0026gt;/home/sherry/code/hugo/tintinsnowy.github.io $ mkdir themes $ cd themes $ git clone https://github.com/spf13/hyde.git now you can see that the theme has been in your themes/\n4. Set configuration of your theme # This step is extremely important, different theme has different ways of setting, so it’s your time to design. But before you do, you should read the homepage of your theme.\nHugo can read your configuration as JSON, YAML or TOML. Hugo supp orts parameters custom configuration too. Refer to the theme that you’ve chosen for details.\n…….. you can see how it looks like in localhost:(the hugo-icarus is the name of theme, you need to change it), open http://localhost:1313/\nsherry@sherry:~/code/hugo/tintinsnowy.github.io $sudo /home/sherry/code/hugo/hugo server --buildDrafts --watch If you are satisfied with your blog in localhost, you want to display it on web right? Okay do the following steps!\nCreate your blog’s repository on Github # Aha, I suppose that you are familiar with git. If not, there is a whale of tutorial of git and github. So shall we start?\n1. create a new repository # User \u0026amp; Organization Pages live in a special repository dedicated to GitHub Pages files. You will need to name this repository with the account name.\nYou must use the username.github.io naming scheme. Content from the master branch will be used to build and publish your GitHub Pages site. A repository like joe/bob.github.io will not build a User Pages site. see here . When User Pages are built, they are available at http(s)://\u0026lt;username\u0026gt;.github.io.\nFor example, my account name is tintinsnowy , so I have to name my repository tintinsnowy.github.io\n2. Build your static website # $hugo --theme=hugo-icarus --baseUrl=\u0026quot;https://tintinsnowy.github.io/\u0026quot; Upload your website\n$ cd public $ git init $ git remote add origin https://github.com/tintinsnowy/tintinsnowy.github.io.git $ git add -A $ git commit -m \u0026quot;first commit\u0026quot; $ git push -u origin master See your website\nIn your github repository setting, you can see this:\nYour site is published at https://tintinsnowy.github.io\nNow, it seems like the end of our Project. Yet if you wish your blog’s address more unique, you may buy a custom domain.\nBuy domain # A few suggestion:\ngodaddy aliyun Setting # Adding a CNAME file to your repository: In the file name field, type CNAME (with all caps!). Use tintinsnowy.com, not https://tintinsnowy.com. Note that there can only be one domain in the CNAME file. Setting your account CNAME @ tintinsnowy.github.io. A www 192.30.252.*** A www 192.30.252.*** Don’t forget the . after tintinsnowy.github.io If you want to know why should be setting like this, see the references.\nwaiting..\u0026amp; checkout: tintinsnowy.com Cheers!\nReference: # Hugo Setting up a custom domain with GitHub Pages Github Pages Setting cnfeat.com coderzh quantumman.me tonybai’s blog ","date":"2015-12-21T16:22:23+02:00","lastmod":"2015-12-21T16:22:23+02:00","permalink":"https://www.xiaoli-yang.com/engineering/tools/hugo-blog-setup/","summary":"Hugo is the world’s fastest framework for building (static) websites. For bloger, it saves us a lot of technical trouble.","tags":["techniques","frontend"],"title":"Set up a Blog based on Hugo","topics":["Developer Tools"]}]