utsubo – ページ 7

shinyでデバッグプリントを出す方法

投稿者: utsubo 投稿日: 2016-08-30 in R

すぐ忘れるのでメモ

shinyとは

Rの軽量Webフレームワークshiny。Rで計算した結果をWebで簡単に表現できるので非常に便利です。

ＲＳｔｕｄｉｏを使ってデバッグすればいいのでしょうが、サーバサイドでいきなりコーディングし、訳の分からないエラーが発生した時には非常に困ります。
簡単にデバッグする方法は以下のとおりです

ブラウザ

Chromeのメニューから「その他のツール」ー「デベロッパーツール」を選択します。
そのメニューからConsoleを表示させると、Javascriptでサーバ側のエラーを表示してくれます。
これは非常に便利。

デバッグプリント

ソースコード上にデバッグプリントを埋め込んで変数の内容を確認するには

cat(file=stderr(),"debug=",value)

1	cat(file=stderr(),"debug=",value)

こんな感じでserver.R上に記述します。これで /var/log/shiny-server以下のファイルに出力されます。

まとめ

Rはスクリプト言語なので、エラーになる直前までは普通に実行してくれますので、エラーとなる直前にデバッグプリントを仕込むと良いでしょう

激安$3以下のebayで買ったArduino nanoを動かす

投稿者: utsubo 投稿日: 2016-08-27 in Arduino

激安ArduinoNanoをなんとか動かしたメモ

ebayで最近よく物を買っています。Arudinoなんかですと国内で買う価格の半額どころか１０分の１くらいの値段で手に入ります。

しかし、互換チップなどを使っているのでドライバーを入れたり色々面倒なことが起きたりします。

今回、購入したこちらのArudino uno ですが一筋縄ではいきませんでしたのでその対応のメモです。

環境

PC: Mac OS X 10.11
IDE: Arduino IDE 1.6.11

ch340gドライバ

こちらのUSBドライバは格安の互換品らしく、ドライバーを別途入れてやらないとPCからは認識できません。この辺りの手順は色々なところに書かれていますので、このあたりを参考にドライバー追加します。

IDE

ドライバを追加し、Arduino Nano をPCに接続してやります。自分の場合はMACなのでポートはArduinoIDEからは

/dev/cu.wchusbserial1420

というデバイスで認識されますので、ポートをこちらに接続してやります。

Blinkプログラムの書き込み

USBを接続し、ファイルメニューからスケッチの例ー01.BasicーBlinkを開いてやります。
ツールのボードをArduino Nano にし、プロセッサをATmega328、ポートは先ほどのポートをセットします。
そのまま、マイコンボードに書き込むを実行すると

avrdude: stk500_recv(): programmer is not responding
avrdude: stk500_getsync() attempt 1 of 10: not in sync: resp=0x00

1 2	avrdude: stk500_recv(): programmer is not responding avrdude: stk500_getsync() attempt 1 of 10: not in sync: resp=0x00

このエラー。。。。。色々調べたのですが、ケーブルを入れさししろとか、リセットを押しながらケーブルを入れるとか。。。全く改善されません。

よーく見るとRXやLのLEDが電源をＯＮにしても光っていないことに気づきました。

ブートローダを書き込む

どうやらブートローダが書き込まれていないのではという目星をつけて色々調べたところ、こちらにまさに解決方法が！

手順は以下に示しておきます

必要なもの

Arduino UNO
オスーメスのジャンパーピン

手順

STEP 1

Arduino UNOをPCに接続
ツールメニューのシリアルポートをArduino UNOのポートに
ツールメニューのボードをArduino UNOに
ファイルメニューのスケッチの例からArduino ISPを開く
マイコンボードに書き込むを実行
USBケーブルをはずしてPCからはずす

STEP 2

Arduino Nanoのジャンパーピン１からUNOのD12へ接続
Arduino Nanoのジャンパーピン2からUNOの5Vへ接続
Arduino Nanoのジャンパーピン3からUNOのD13へ接続
Arduino Nanoのジャンパーピン4からUNOのD11へ接続
Arduino Nanoのジャンパーピン5からUNOのD10へ接続
Arduino Nanoのジャンパーピン6からUNOのGNDへ接続

STEP 3

Arduino UNOをPCに接続
ツールメニューのシリアルポートをArduino UNOのポートに
ツールメニューのボードをArduino Nanoに
ツールメニューのプロセッサをATmega328に
ツールメニューの書き込み装置をArduino as ISPへ
ツールメニューのブートローダ書き込みを実行

これでブートローダが書き込まれて無事Arduino Nano のLのLEDが光ります。

STEP 4

ケーブルを全て外してArduino Nanoの動作確認をします

Arduino NanoをPCに接続
ツールメニューのシリアルポートをArduino Nanoのポートに
ツールメニューのボードをArduino Nanoに
ツールメニューのプロセッサをATmega328に
ツールメニューの書き込み装置をAVRISP mkIIに戻しておく（不要かも）
ファイルメニューからスケッチの例ー01.BasicーBlinkを開く
マイコンボードに書き込むを実行

これで無事にLチカができました

bottleでMVC

投稿者: utsubo 投稿日: 2016-08-22 in python

pythonでデータ解析をしていると、その結果をヴィジュアル的に見せたくなってくる時があります。
PythonのWebフレームワークは様々ありますが、最もシンプルなbottleで作成するのが一番簡単です。

こちらのサイトで、BottleをMVCフレームワーク的に作成されているサンプルがありましたのでちょっといじってみました。その際、ちょっとハマったのでのメモです。

環境

OS: ubuntu14.04
python: 2.7.6
MySQL version: 5.1.63
MySQL encode: shift_jis
nginx 1.4.6

設定

nginx

http://server/pythonでアクセスできるようにnginxの設定ファイルを修正します

/etc/nginx/site-availables

server {
..

        location /python {
                rewrite ^/python/(.*)$ /$1 break;
                proxy_pass http://localhost:8081;
                proxy_redirect http://localhost:8081/ $scheme://$host/python/;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection $connection_upgrade;
                proxy_read_timeout 20d;
        }
..

server {

location /python {

rewrite ^/python/(.*)$ /$1 break;

proxy_pass http://localhost:8081;

proxy_redirect http://localhost:8081/ $scheme://$host/python/;

proxy_http_version 1.1;

proxy_set_header Upgrade $http_upgrade;

proxy_set_header Connection $connection_upgrade;

proxy_read_timeout 20d;

}

スクリプト

起動スクリプトを作成します

start.sh

gunicorn -b 127.0.0.1:8081 -c gunicorn.conf.py -w 1 index:app -D --reload

1	gunicorn -b 127.0.0.1:8081 -c gunicorn.conf.py -w 1 index:app -D --reload

gunicorn.conf.py

proc_name = "gunicorn"

bind = 'unix:/tmp/{0}.sock'.format(proc_name)
backlog = 2048


workers = 1
worker_class = 'sync'
worker_connections = 1000
timeout = 30
keepalive = 2


debug = False
spew = False

daemon = True
pidfile = "/tmp/gunicorn.pid"
umask = 0
user = None
group = None
tmp_upload_dir = None

errorlog = '/var/log/gunicorn/error.log'
loglevel = 'debug'
accesslog = '/var/log/gunicorn/access.log'


def post_fork(server, worker):
    server.log.info("Worker spawned (pid: %s)", worker.pid)

def pre_fork(server, worker):
    pass

def pre_exec(server):
    server.log.info("Forked child, re-executing.")

def when_ready(server):
    server.log.info("Server is ready. Spawning workers")

def worker_int(worker):
    worker.log.info("worker received INT or QUIT signal")

    ## get traceback info
    import threading, sys, traceback
    id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""),
            threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename,
                lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    worker.log.debug("\n".join(code))

def worker_abort(worker):
    worker.log.info("worker received SIGABRT signal")

proc_name = "gunicorn"

bind = 'unix:/tmp/{0}.sock'.format(proc_name)

backlog = 2048

workers = 1

worker_class = 'sync'

worker_connections = 1000

timeout = 30

keepalive = 2

debug = False

spew = False

daemon = True

pidfile = "/tmp/gunicorn.pid"

umask = 0

user = None

group = None

tmp_upload_dir = None

errorlog = '/var/log/gunicorn/error.log'

loglevel = 'debug'

accesslog = '/var/log/gunicorn/access.log'

def post_fork(server, worker):

server.log.info("Worker spawned (pid: %s)", worker.pid)

def pre_fork(server, worker):

pass

def pre_exec(server):

server.log.info("Forked child, re-executing.")

def when_ready(server):

server.log.info("Server is ready. Spawning workers")

def worker_int(worker):

worker.log.info("worker received INT or QUIT signal")

## get traceback info

import threading, sys, traceback

id2name = dict([(th.ident, th.name) for th in threading.enumerate()])

code = []

for threadId, stack in sys._current_frames().items():

code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""),

threadId))

for filename, lineno, name, line in traceback.extract_stack(stack):

code.append('File: "%s", line %d, in %s' % (filename,

lineno, name))

if line:

code.append(" %s" % (line.strip()))

worker.log.debug("\n".join(code))

def worker_abort(worker):

worker.log.info("worker received SIGABRT signal")

文字化け

上記環境で参考サイトを元に作成すると、DBの文字列を表示する際に文字化けしてしまいます。この対処法は散々悩んだ挙句この修正で行けました

app/models/db.py

dbhandle = MySQLdb.connect(
  host = config.get('live_db', 'host'),
  port = config.getint("live_db","port"), 
  user = config.get('live_db', 'user'),
  passwd = config.get('live_db', 'password'),
  db = config.get('live_db', 'database'),
  charset = "sjis",  # これを追加
  use_unicode=1
)

dbhandle = MySQLdb.connect(

host = config.get('live_db', 'host'),

port = config.getint("live_db","port"),

user = config.get('live_db', 'user'),

passwd = config.get('live_db', 'password'),

db = config.get('live_db', 'database'),

charset = "sjis", # これを追加

use_unicode=1

)

ESP-WROOM-02とArduino unoでWifi

投稿者: utsubo 投稿日: 2016-08-21 in Arduino

Arduino UnoとWifiモジュールのESP-WROOM-02 DIP化キットをつないで通信をしてみます。

ESP-WROOM-02自体でもArduinoとしてプログラムを書き込んで使用できるようですが、ESP-WROOM-02に繋ぐためのUSBシリアル変換インターフェースを持っていないのでとりあえずArduinoと接続して使えるかどうか確認します。

使用する機器

回路

ArduinoとESP-WROOM-02を繋ぐ回路が必要なのですが、このDIP化キットとArduinoUnoをつないだ回路をWebで発見することが出来ず苦労しました。
このあたりやこのあたり、あとこのページを参考にさせていただいて作成したのがこれ。

プログラム

参考にしたサイトからこれを作成しました。

//https://ics.media/entry/10457/3
//http://okiraku-camera.tokyo/blog/?p=2873
/*
  Software serial multple serial test

 Receives from the hardware serial, sends to software serial.
 Receives from software serial, sends to hardware serial.

 The circuit:
 * RX is digital pin 10 (connect to TX of other device)
 * TX is digital pin 11 (connect to RX of other device)

 Note:
 Not all pins on the Mega and Mega 2560 support change interrupts,
 so only the following can be used for RX:
 10, 11, 12, 13, 50, 51, 52, 53, 62, 63, 64, 65, 66, 67, 68, 69

 Not all pins on the Leonardo and Micro support change interrupts,
 so only the following can be used for RX:
 8, 9, 10, 11, 14 (MISO), 15 (SCK), 16 (MOSI).

 created back in the mists of time
 modified 25 May 2012
 by Tom Igoe
 based on Mikal Hart's example

 This example code is in the public domain.

 */
#include <SoftwareSerial.h>

SoftwareSerial mySerial(11, 10); // RX, TX

void setup() {
  // Open serial communications and wait for port to open:
  Serial.begin(115200);
  while (!Serial) {
    ; // wait for serial port to connect. Needed for native USB port only
  }


  Serial.println("Goodnight moon!");

  // set the data rate for the SoftwareSerial port
  mySerial.begin(115200);
  mySerial.println("Hello, world?");
}

void loop() { // run over and over
  if (mySerial.available()) {
    //Serial.println("mySerial avairable");
    Serial.write(mySerial.read());
  }
  if (Serial.available()) {
    //Serial.println("Serial avairable");
    mySerial.write(Serial.read());
  }
}

//https://ics.media/entry/10457/3

//http://okiraku-camera.tokyo/blog/?p=2873

Software serial multple serial test

Receives from the hardware serial, sends to software serial.

Receives from software serial, sends to hardware serial.

The circuit:

* RX is digital pin 10 (connect to TX of other device)

* TX is digital pin 11 (connect to RX of other device)

Note:

Not all pins on the Mega and Mega 2560 support change interrupts,

so only the following can be used for RX:

10, 11, 12, 13, 50, 51, 52, 53, 62, 63, 64, 65, 66, 67, 68, 69

Not all pins on the Leonardo and Micro support change interrupts,

so only the following can be used for RX:

8, 9, 10, 11, 14 (MISO), 15 (SCK), 16 (MOSI).

created back in the mists of time

modified 25 May 2012

by Tom Igoe

based on Mikal Hart's example

This example code is in the public domain.

#include <SoftwareSerial.h>

SoftwareSerial mySerial(11, 10); // RX, TX

void setup() {

// Open serial communications and wait for port to open:

Serial.begin(115200);

while (!Serial) {

; // wait for serial port to connect. Needed for native USB port only

}

Serial.println("Goodnight moon!");

// set the data rate for the SoftwareSerial port

mySerial.begin(115200);

mySerial.println("Hello, world?");

}

void loop() { // run over and over

if (mySerial.available()) {

//Serial.println("mySerial avairable");

Serial.write(mySerial.read());

}

if (Serial.available()) {

//Serial.println("Serial avairable");

mySerial.write(Serial.read());

}

実行

Arduino IDEからツール、シリアルモニタを開いて確認します

Goodnight moon!
Hello, world?

ERROR
HELO  # <- 入力

ERROR
AT     # <- 入力


OK
HELO   # <- 入力

ERROS

Goodnight moon!

Hello, world?

ERROR

HELO # <- 入力

ERROR

AT # <- 入力

HELO # <- 入力

ERROS

入力した文字がそのままエコーされています

まとめ

プログラムを実行しても　Hello, Worldが返ってこなかったりし、いろいろ回路をいじったりしましたが、なぜこれで動くのかちょっとわかりませんが動くのでこれでよしとします。次は実際にWifi通信してみることにします

pythonのmatplotlibでcandlestickチャート

投稿者: utsubo 投稿日: 2016-08-01 in python

MySQLからデータを取得し、Pythonのmatplotlibでローソク足を描画します。

環境

OS:MacOS10.11
python:2.7.12
MySQL 5.6

テーブル

MySQLのテーブル形式は以下のとおり。日足でも週足でもなんでも構いません。
データベース名はdbnameとしています

create table price_table(
  date datetime not null,
  code varchar(8) not null,
  open double precision null,
  high double precision null,
  low  double precision null,
  close double precision null,
  volume double precision null
)
;
create unique index idx_price on price(date,code)
;

create table price_table(

date datetime not null,

code varchar(8) not null,

open double precision null,

high double precision null,

low double precision null,

close double precision null,

volume double precision null

)

;

create unique index idx_price on price(date,code)

;

休日考慮

シンプルにローソク足だけ表示します
休日がある場合には間を空けます

#!/bin/env python
# coding:utf-8

import matplotlib.pyplot as plt
from matplotlib.finance import candlestick_ohlc
import time
import MySQLdb

connection = MySQLdb.connect(host="localhost",db="dbname",user="root",passwd="")
cursor = connection.cursor()

code = "6758"
date='2016-04-01'
cursor.execute("select date,open,high,low,close,volume from price_table where code=%s and date>=%s",[code,date])
result = cursor.fetchall()

ohlc=[]
fdate=[]  # float
ddate=[]  # datetime
for row in result:
  tmp=time.mktime(row[0].timetuple())
  ohlc.append((tmp,row[1],row[2],row[3],row[4],row[5]))  # unix time
  ddate.append(row[0])
  fdate.append(tmp)
cursor.close()
connection.close()


# graph上のfloat型の日付と、表示文字列を紐付けている
plt.xticks(
	fdate[::5],
	[x.strftime('%Y-%m-%d') for x in ddate][::5]
)


ax = plt.subplot()

candlestick_ohlc(ax,ohlc)

plt.xlabel('Date')
plt.ylabel('Price')
plt.title("title")
plt.legend()
plt.show()

#!/bin/env python

# coding:utf-8

import matplotlib.pyplot as plt

from matplotlib.finance import candlestick_ohlc

import time

import MySQLdb

connection = MySQLdb.connect(host="localhost",db="dbname",user="root",passwd="")

cursor = connection.cursor()

code = "6758"

date='2016-04-01'

cursor.execute("select date,open,high,low,close,volume from price_table where code=%s and date>=%s",[code,date])

result = cursor.fetchall()

ohlc=[]

fdate=[] # float

ddate=[] # datetime

for row in result:

tmp=time.mktime(row[0].timetuple())

ohlc.append((tmp,row[1],row[2],row[3],row[4],row[5])) # unix time

ddate.append(row[0])

fdate.append(tmp)

cursor.close()

connection.close()

# graph上のfloat型の日付と、表示文字列を紐付けている

plt.xticks(

fdate[::5],

[x.strftime('%Y-%m-%d') for x in ddate][::5]

)

ax = plt.subplot()

candlestick_ohlc(ax,ohlc)

plt.xlabel('Date')

plt.ylabel('Price')

plt.title("title")

plt.legend()

plt.show()

休日考慮しない

シンプルにローソク足だけ表示します
休日を無視して詰めて描画します。テクニカルを重ね合わせる場合はこちらのほうが都合がいいです

#!/bin/env python
# coding:utf-8

import matplotlib.pyplot as plt
from matplotlib.finance import candlestick_ohlc
import time
import MySQLdb

connection = MySQLdb.connect(host="localhost",db="dbname",user="root",passwd="")
cursor = connection.cursor()

code = "6758"
date='2016-04-01'
cursor.execute("select date,open,high,low,close,volume from price_table where code=%s and date>=%s",[code,date])
result = cursor.fetchall()

ohlc=[]
fdate=[]  # float
ddate=[]  # datetime
adr=1
for row in result:
  tmp=adr
  ohlc.append((adr,row[1],row[2],row[3],row[4],row[5]))
  ddate.append(row[0])
  fdate.append(adr)
  adr=adr+1
cursor.close()
connection.close()


# graph上のfloat型の日付と、表示文字列を紐付けている
plt.xticks(
	fdate[::5],
	[x.strftime('%Y-%m-%d') for x in ddate][::5]
)


ax = plt.subplot()

candlestick_ohlc(ax,ohlc)

plt.xlabel('Date')
plt.ylabel('Price')
plt.title("title")
plt.legend()
plt.show()

#!/bin/env python

# coding:utf-8

import matplotlib.pyplot as plt

from matplotlib.finance import candlestick_ohlc

import time

import MySQLdb

connection = MySQLdb.connect(host="localhost",db="dbname",user="root",passwd="")

cursor = connection.cursor()

code = "6758"

date='2016-04-01'

cursor.execute("select date,open,high,low,close,volume from price_table where code=%s and date>=%s",[code,date])

result = cursor.fetchall()

ohlc=[]

fdate=[] # float

ddate=[] # datetime

adr=1

for row in result:

tmp=adr

ohlc.append((adr,row[1],row[2],row[3],row[4],row[5]))

ddate.append(row[0])

fdate.append(adr)

adr=adr+1

cursor.close()

connection.close()

# graph上のfloat型の日付と、表示文字列を紐付けている

plt.xticks(

fdate[::5],

[x.strftime('%Y-%m-%d') for x in ddate][::5]

)

ax = plt.subplot()

candlestick_ohlc(ax,ohlc)

plt.xlabel('Date')

plt.ylabel('Price')

plt.title("title")

plt.legend()

plt.show()

これでとりあえずチャートが表示されます

Word Cloudを使って見る

投稿者: utsubo 投稿日: 2016-07-27 in python

WordCloudなるライブラリがあるので使ってみました。
自分の環境ではそのままではちょっとうまく動かなかったのでメモです。

こちらを参考にしました。

環境

– MacOS10.11
– python 2.7.12
– mecab 0.996

インストール

brew install python
brew install mecab
brew install mecab-ipadic


git clone https://github.com/amueller/word_cloud
cd word_cloud 
pip install -r requirements.txt
python setup.py install
pip install beautifulsoup4
pip install requests

brew install python

brew install mecab

brew install mecab-ipadic

git clone https://github.com/amueller/word_cloud

cd word_cloud

pip install -r requirements.txt

python setup.py install

pip install beautifulsoup4

pip install requests

エラー

こちらのサンプルをそのまま実行するとエラーが出ます

/usr/local/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 53 of the file word_cloud.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))
Traceback (most recent call last):
  File "word_cloud.py", line 53, in <module>
    wordlist = get_wordlist_from_QiitaURL(url)
  File "word_cloud.py", line 30, in get_wordlist_from_QiitaURL
    return mecab_analysis(text)
  File "word_cloud.py", line 10, in mecab_analysis
    t = mc.Tagger('-Ochasen -d /usr/local/Cellar/mecab/0.996/lib/mecab/dic/mecab-ipadic-neologd/')
  File "/usr/local/lib/python2.7/site-packages/MeCab.py", line 307, in __init__
    this = _MeCab.new_Tagger(*args)
RuntimeError

/usr/local/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 53 of the file word_cloud.py. To get rid of this warning, change code that looks like this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "html.parser")

markup_type=markup_type))

Traceback (most recent call last):

File "word_cloud.py", line 53, in <module>

wordlist = get_wordlist_from_QiitaURL(url)

File "word_cloud.py", line 30, in get_wordlist_from_QiitaURL

return mecab_analysis(text)

File "word_cloud.py", line 10, in mecab_analysis

t = mc.Tagger('-Ochasen -d /usr/local/Cellar/mecab/0.996/lib/mecab/dic/mecab-ipadic-neologd/')

File "/usr/local/lib/python2.7/site-packages/MeCab.py", line 307, in __init__

this = _MeCab.new_Tagger(*args)

RuntimeError

HTMLパーサーを明示的に入れます

    soup = BeautifulSoup(res.text,"html.parser")

1	soup = BeautifulSoup(res.text,"html.parser")

そのまま実行するとまたまたエラー

Traceback (most recent call last):
  File "word_cloud.py", line 59, in <module>
    create_wordcloud(" ".join(wordlist).decode('utf-8'))
  File "word_cloud.py", line 50, in create_wordcloud
    stopwords=set(stop_words)).generate(text)
  File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 463, in generate
    return self.generate_from_text(text)
  File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 448, in generate_from_text
    words = self.process_text(text)
  File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 391, in process_text
    self.stopwords_lower_ = set(map(str.lower, self.stopwords))
TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'

Traceback (most recent call last):

File "word_cloud.py", line 59, in <module>

create_wordcloud(" ".join(wordlist).decode('utf-8'))

File "word_cloud.py", line 50, in create_wordcloud

stopwords=set(stop_words)).generate(text)

File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 463, in generate

return self.generate_from_text(text)

File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 448, in generate_from_text

words = self.process_text(text)

File "/usr/local/lib/python2.7/site-packages/wordcloud-1.2.1-py2.7-macosx-10.11-x86_64.egg/wordcloud/wordcloud.py", line 391, in process_text

self.stopwords_lower_ = set(map(str.lower, self.stopwords))

TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'

どうもUnicodeがらみのエラーです。stop_wordsのUnicode変換がうまくいっていないようなので普通の文字列にします

コード修正

自分の環境に合わせていじります

#!/bin/env python
# coding:utf-8
#%matplotlib inline
import urllib2
from bs4 import BeautifulSoup

import matplotlib.pyplot as plt
from wordcloud import WordCloud
from bs4 import BeautifulSoup
import requests
import MeCab as mc



def mecab_analysis(text):
    t = mc.Tagger('-Ochasen -d /usr/local/Cellar/mecab/0.996/lib/mecab/dic/ipadic/')
    enc_text = text.encode('utf-8')
    node = t.parseToNode(enc_text)
    output = []
    while(node):
        if node.surface != "":  # ヘッダとフッタを除外
            word_type = node.feature.split(",")[0]
            if word_type in ["形容詞", "動詞","名詞", "副詞"]:
                output.append(node.surface)
        node = node.next
        if node is None:
            break
    return output


def get_wordlist_from_QiitaURL(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.text,"html.parser")

    text = soup.body.section.get_text().replace('\n','').replace('\t','')
    return mecab_analysis(text)

def create_wordcloud(text):

    # 環境に合わせてフォントのパスを指定する。
    #fpath = "/System/Library/Fonts/HelveticaNeue-UltraLight.otf"
    #fpath = "/Library/Fonts/ヒラギノ角ゴ Pro W3.otf"
    fpath = "/Library/Fonts/Osaka.ttf"

    # ストップワードの設定
    #stop_words = [ u'てる', u'いる', u'なる', u'れる', u'する', u'ある', u'こと
', u'これ', u'さん', u'して', u'くれる', u'やる', u'くださる', u'そう', u'せる', u'した',  u'思う',  u'それ', u'ここ', u'ちゃん', u'くん', u'', u'て',u'に',u'を
',u'は',u'の', u'が', u'と', u'た', u'し', u'で', u'ない', u'も', u'な', u'い', u'か', u'ので', u'よう', u'']
    stop_words = [ 'てる', 'いる', 'なる', 'れる', 'する', 'ある', 'こと', 'これ
', 'さん', 'して', 'くれる', 'やる', 'くださる', 'そう', 'せる', 'した',  '思う',  'それ', 'ここ', 'ちゃん', 'くん', '', 'て','に','を','は','の', 'が', 'と', 'た', 'し', 'で', 'ない', 'も', 'な', 'い', 'か', 'ので', 'よう', '']

    wordcloud = WordCloud(background_color="white",font_path=fpath, width=900, height=500, \
                          stopwords=set(stop_words)).generate(text)

    plt.figure(figsize=(15,12))
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

url = "http://qiita.com/t_saeko/items/2b475b8657c826abc114"
wordlist = get_wordlist_from_QiitaURL(url)
create_wordcloud(" ".join(wordlist).decode('utf-8'))

#!/bin/env python

# coding:utf-8

#%matplotlib inline

import urllib2

from bs4 import BeautifulSoup

import matplotlib.pyplot as plt

from wordcloud import WordCloud

from bs4 import BeautifulSoup

import requests

import MeCab as mc

def mecab_analysis(text):

t = mc.Tagger('-Ochasen -d /usr/local/Cellar/mecab/0.996/lib/mecab/dic/ipadic/')

enc_text = text.encode('utf-8')

node = t.parseToNode(enc_text)

output = []

while(node):

if node.surface != "": # ヘッダとフッタを除外

word_type = node.feature.split(",")[0]

if word_type in ["形容詞", "動詞","名詞", "副詞"]:

output.append(node.surface)

node = node.next

if node is None:

break

return output

def get_wordlist_from_QiitaURL(url):

res = requests.get(url)

soup = BeautifulSoup(res.text,"html.parser")

text = soup.body.section.get_text().replace('\n','').replace('\t','')

return mecab_analysis(text)

def create_wordcloud(text):

# 環境に合わせてフォントのパスを指定する。

#fpath = "/System/Library/Fonts/HelveticaNeue-UltraLight.otf"

#fpath = "/Library/Fonts/ヒラギノ角ゴ Pro W3.otf"

fpath = "/Library/Fonts/Osaka.ttf"

# ストップワードの設定

#stop_words = [ u'てる', u'いる', u'なる', u'れる', u'する', u'ある', u'こと

', u'これ', u'さん', u'して', u'くれる', u'やる', u'くださる', u'そう', u'せる', u'した', u'思う', u'それ', u'ここ', u'ちゃん', u'くん', u'', u'て',u'に',u'を

',u'は',u'の', u'が', u'と', u'た', u'し', u'で', u'ない', u'も', u'な', u'い', u'か', u'ので', u'よう', u'']

stop_words = [ 'てる', 'いる', 'なる', 'れる', 'する', 'ある', 'こと', 'これ

', 'さん', 'して', 'くれる', 'やる', 'くださる', 'そう', 'せる', 'した', '思う', 'それ', 'ここ', 'ちゃん', 'くん', '', 'て','に','を','は','の', 'が', 'と', 'た', 'し', 'で', 'ない', 'も', 'な', 'い', 'か', 'ので', 'よう', '']

wordcloud = WordCloud(background_color="white",font_path=fpath, width=900, height=500, \

stopwords=set(stop_words)).generate(text)

plt.figure(figsize=(15,12))

plt.imshow(wordcloud)

plt.axis("off")

plt.show()

url = "http://qiita.com/t_saeko/items/2b475b8657c826abc114"

wordlist = get_wordlist_from_QiitaURL(url)

create_wordcloud(" ".join(wordlist).decode('utf-8'))

実行

python word_cloud.py

1	python word_cloud.py

これで画像が表示されます

CentOS5にbzip2-1.0.6をインストール

投稿者: utsubo 投稿日: 2016-07-21 in linux

未だCentOS5を使っているといろいろと不都合が生じてきます。まず、最新のアプリケーションを使おうとすると、デフォルトで入っているライブラリ群が古く全くインストールできません。

今回はCentOS5.11にbzip2-1.0.6をインストールしてみます

ダウンロード

$ cd /usr/local/src
$ wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz
$ tar xzvfp  bzip2-1.0.6.tar.gz
$ cd bzip2-1.0.6

$ cd /usr/local/src

$ wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz

$ tar xzvfp bzip2-1.0.6.tar.gz

$ cd bzip2-1.0.6

コンパイル

シェアードライブラリも同時にコンパイルします
まずは本体

$ make
# make install

1 2	$ make # make install

次にシェアードライブラリ

$ make -f Makefile-libbz2_so 
$ make
gcc -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.6 blocksort.o huffman.o crctable.o randtable.o compress.o decompress.o bzlib.o
/usr/bin/ld: blocksort.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
blocksort.o: could not read symbols: Bad value
collect2: ld はステータス 1 で終了しました
make: *** [all] エラー 1

$ make -f Makefile-libbz2_so

$ make

gcc -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.6 blocksort.o huffman.o crctable.o randtable.o compress.o decompress.o bzlib.o

/usr/bin/ld: blocksort.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC

blocksort.o: could not read symbols: Bad value

collect2: ld はステータス 1 で終了しました

make: *** [all] エラー 1

ここでエラーが

解決方法

どうもｆPICをつけてコンパイルしないとダメなようです
本体側も修正、Makefileを修正します

CFLAGS=-Wall -Winline -O2 -g $(BIGFILES) -fPIC  # -fPICを追加
CXXFLAGS=-fPIC   # 行追加

1 2	CFLAGS=-Wall -Winline -O2 -g $(BIGFILES) -fPIC # -fPICを追加 CXXFLAGS=-fPIC # 行追加

コンパイル

$ make clean
$ make
# make install

$ make clean

$ make

# make install

次にシェアードライブラリ。Makefile-libbz2_soを修正

CXXFLAGS=-fPIC # 行追加

1	CXXFLAGS=-fPIC # 行追加

コンパイル

$ make -f Makefile-libbz2_so 
# mv libbz2.so.1.0* /usr/local/lib

1 2	$ make -f Makefile-libbz2_so # mv libbz2.so.1.0* /usr/local/lib

Rでword2vec

投稿者: utsubo 投稿日: 2016-07-20 in R

こちらを参考にRでWord2Vecを実行してみたが、ちょっとハマったのでメモ

パッケージインストール

必要なパッケージをインストールします

install.packages("devtools")
library(devtools)
install.packages("tsne")
install.packages("magrittr")
install.packages("stringi")
library(tsne)
library(magrittr)
library(stringi)
devtools::install_github("bmschmidt/wordVectors")

install.packages("devtools")

library(devtools)

install.packages("tsne")

install.packages("magrittr")

install.packages("stringi")

library(tsne)

library(magrittr)

library(stringi)

devtools::install_github("bmschmidt/wordVectors")

データ作成

青空文庫から夏目漱石の三四郎をテストデータとします

$ wget http://www.aozora.gr.jp/cards/000148/files/794_ruby_4237.zip
$ unzip 794_ruby_4237.zip
$ nkf -w  --overwrite sanshiro.txt
$ mecab -Owakati sanshiro.txt -o data.txt

$ wget http://www.aozora.gr.jp/cards/000148/files/794_ruby_4237.zip

$ unzip 794_ruby_4237.zip

$ nkf -w --overwrite sanshiro.txt

$ mecab -Owakati sanshiro.txt -o data.txt

UTF８へ文字コードを変換しておきます。ちなみにmecabはUTF8のものをインストールしておいてください

R実行

Rから実行します

library(devtools)
library(wordVectors)
library(magrittr)
library(tsne)
library(magrittr)
wordVectors::train_word2vec(
  train_file = "data.txt", output_file = "model.txt",
  vectors = 200, window = 10,
  threads = 3
)

library(devtools)

library(wordVectors)

library(magrittr)

library(tsne)

library(magrittr)

wordVectors::train_word2vec(

train_file = "data.txt", output_file = "model.txt",

vectors = 200, window = 10,

threads = 3

)

threadsはCPU数−１あたりで設定します
これを実行すると

 type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  でエラー: 
   '<f6>(;<a4><d0>;<fb><ba>{<d4>ֺ<b8>3:q<fd><c5>:ף<f6>9<9a><99><dd>8<f6>(<ec><ba><d7>c<91>:' に不正なマルチバイト文字があります 
 追加情報:  警告メッセージ: 
1:  utils::read.table(filename, header = F, skip = 1, colClasses = c("character",  で: 
  line 1 appears to contain embedded nulls
2:  utils::read.table(filename, header = F, skip = 1, colClasses = c("character",  で: 
  line 2 appears to contain embedded nulls
3:  utils::read.table(filename, header = F, skip = 1, colClasses = c("character",  で: 
  line 3 appears to contain embedded nulls
4:  utils::read.table(filename, header = F, skip = 1, colClasses = c("character",  で: 
  line 4 appears to contain embedded nulls
5:  utils::read.table(filename, header = F, skip = 1, colClasses = c("character",  で: 
  line 5 appears to contain embedded nulls
6:  utils::read.table(filename, header = F, skip = 1, nrows = 1,  で: 
  line 1 appears to contain embedded nulls

type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals, でエラー:

'<f6>(;<a4><d0>;<fb><ba>{<d4>ֺ<b8>3:q<fd><c5>:ף<f6>9<9a><99><dd>8<f6>(<ec><ba><d7>c<91>:' に不正なマルチバイト文字があります

追加情報: 警告メッセージ:

1: utils::read.table(filename, header = F, skip = 1, colClasses = c("character", で:

line 1 appears to contain embedded nulls

2: utils::read.table(filename, header = F, skip = 1, colClasses = c("character", で:

line 2 appears to contain embedded nulls

3: utils::read.table(filename, header = F, skip = 1, colClasses = c("character", で:

line 3 appears to contain embedded nulls

4: utils::read.table(filename, header = F, skip = 1, colClasses = c("character", で:

line 4 appears to contain embedded nulls

5: utils::read.table(filename, header = F, skip = 1, colClasses = c("character", で:

line 5 appears to contain embedded nulls

6: utils::read.table(filename, header = F, skip = 1, nrows = 1, で:

line 1 appears to contain embedded nulls

このようなエラーが出ます
この場合には気にせずRで

word2vec_model <- read.vectors("model.txt",binary=TRUE)

1	word2vec_model <- read.vectors("model.txt",binary=TRUE)

これで読み込み直してやります。

確認

> nearest_to(word2vec_model,word2vec_model[["三四郎"]])
      三四郎       与次郎       じっと           秋     ますます         勇気     見合わせ     おかしく           次 
3.330669e-16 4.131871e-01 4.873280e-01 4.886600e-01 4.895344e-01 4.948309e-01 5.033392e-01 5.220245e-01 5.278047e-01 
      腹の中 
5.329964e-01 
> nearest_to(word2vec_model,word2vec_model[["東京"]])
        東京       生まれ     これから       ずっと         変る         おれ       いなか         うえ       いっそ 
5.551115e-16 3.663971e-01 3.754231e-01 3.864140e-01 4.024397e-01 4.056571e-01 4.078920e-01 4.080678e-01 4.103215e-01 
    文芸時評 
4.340431e-01

> nearest_to(word2vec_model,word2vec_model[["三四郎"]])

三四郎与次郎じっと秋ますます勇気見合わせおかしく次

3.330669e-16 4.131871e-01 4.873280e-01 4.886600e-01 4.895344e-01 4.948309e-01 5.033392e-01 5.220245e-01 5.278047e-01

腹の中

5.329964e-01

> nearest_to(word2vec_model,word2vec_model[["東京"]])

東京生まれこれからずっと変るおれいなかうえいっそ

5.551115e-16 3.663971e-01 3.754231e-01 3.864140e-01 4.024397e-01 4.056571e-01 4.078920e-01 4.080678e-01 4.103215e-01

文芸時評

4.340431e-01

ま、こんなもんです

ubuntu14.04へRの最新版をapt-getでインストール

投稿者: utsubo 投稿日: 2016-07-12 in R

ubuntu14.04にａｐｔ−ｇｅｔでインストールできるRのバージョンは３．０．２とちょっと古いです。
最新版をインストールするには以下のようにします

$ sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
$ sudo add-apt-repository ppa:marutter/rdev
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install r-base

$ sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

$ sudo add-apt-repository ppa:marutter/rdev

$ sudo apt-get update

$ sudo apt-get upgrade

$ sudo apt-get install r-base

PROXY環境下の場合には

export http_proxy="http://server:port/"

1	export http_proxy="http://server:port/"

の環境変数を入れる必要があります

ここを参考にしました