ChangeLog 最新ページ

~~markup - ~matubara/ChangeLog~~ →移動しました

最終更新時間: 2009-02-01 00:57

2007-04-17 Tue

■ python + docutils で全角文字の幅計算が合わないらしい問題 [markup]

ReStructuredText ではテーブルをアスキーアート的に揃えてやらないと文法エラーになるわけですが、
日本語などの文字は monospace 時に2文字分の場所をとることになっているので、
重み付きで文字数を数えないといけない。
古い python + docutils はそれをやらないらしい。

python 2.4 以上、docutils 最新版なら大丈夫。

2007年4月17日時点で CentOS の stable では python が 2.3 なのに注意。

[ 固定リンク ]

2007-03-12 Mon

■ UIMA 101 -- 前編 - UIMAことはじめ [markup][java][net]

<http://www-06.ibm.com/jp/developerworks/ysl/library/y-uima101a/index.shtml>

[ 固定リンク ]

2007-01-29 Mon

■ XML and overlapping hierarchies [markup][net]

<http://nl.ijs.si/et/talks/tsujiilab-crossing/crossing.pdf>

<A><B> ... </A></B>

[ 固定リンク ]

2007-01-18 Thu

■ 2007 CSS Study Meeting [markup][net]

<http://ja.reddit.com/goto?rss=true&id=yxja>

[ 固定リンク ]

2007-01-05 Fri

■ Wiki to XML, through SGML [markup][net]

<http://www.xml.com/pub/a/2004/03/03/sgmlwiki.html>

[ 固定リンク ]

2006-12-22 Fri

■ XSLT [markup]

XSLT登竜門 : 目次
 リンク集 - XSL Transformations（XSLT） - - Personnel
最速インターフェース研究会 :: XSLエディタを作ってみた

[ 固定リンク ]

2006-12-21 Thu

■ PDFBox - Java PDF Library [markup][programming][net]

<http://www.pdfbox.org/>
PDF から中身抽出できるライブラリ。

java -cp lib//PDFBox-0.7.3.jar:../FontBox-0.1.0/lib/FontBox-0.1.0.jar:$CLASSPATH org.pdfbox.ExtractText xxx.pdf xxx.txt

ギリシャ文字もunicode文字として出してくれたりする。

[ 固定リンク ]

2006-12-19 Tue

■ Embedding metadata in XHTML and extracting them as RDF [markup][net]

<http://www.kanzaki.com/docs/sw/xh2rdf.html>
XHTML で見た目情報と混在している論理情報を、
XSLT で抽出する。

[ 固定リンク ]

2006-12-19 Tue

■ XHTML を XML に逆変換 [markup][perl]

#! /usr/bin/env perl
use warnings;
use strict;
use WWW::Mechanize;
use WWW::Mechanize::Link;
use URI::URL;
use URI::file;
use URI::Escape;
use Getopt::Long;
use Pod::Usage;
use XML::Simple;
use Data::Dumper;
use Encode;
use encoding qw/utf8/;
use open OUT => ':utf8';
use open qw/:std/;
$XML::Simple::PREFERRED_PARSER = 'XML::Parser';

sub xml_in_decoding($$) {
  my($xml_simple, $http_response) = @_;
  my $content = $http_response->content;
  
  if ( join(' ',$http_response->headers->content_type) =~ m/charset=(\w+)/ ) {
    $content = decode($1, $content);
  }
  $content =~ s{"Shift_JIS"}{"UTF-8"};

  #tidy
  {
    my $tmp = '/tmp/tidyresult';
    {
      open my $f, '>:encoding(utf-8)', $tmp or die "$!: $tmp";
      print $f $content;
      close $f;
    }
    system qq{tidy -modify -utf8 $tmp 2>/dev/null};
    {
      open my $f, '<:encoding(utf-8)', $tmp or die "$!: $tmp";
      $content = join '', <$f>;
      close $f;
    }
  }
  return $xml_simple->xml_in($content);
}

# get arguments
my $wait_seconds = 3;
my $verbose = 0;
my $input_encoding = '';
GetOptions(
           'wait=i'     => \$wait_seconds,
           'verbose'    => \$verbose,
           'encoding=s' => \$input_encoding,
           'help'       => sub{pod2usage(0)}
          );
my $url = shift @ARGV;

my $m = WWW::Mechanize->new();
my $xs = XML::Simple->new(ForceArray => 1,
                          KeyAttr => []);

# get a village
my %village;
$m->get($url); # assuming this as prologue
exit 1  unless ( $m->success );
{
  my @a;
  foreach my $link ( $m->find_all_links( url_regex => qr{_progress_\d+} ) ) {
    $link->url =~ m/_progress_(\d+)/;
    my $n = $1;
    next if defined $a[$n];
    print STDERR "waiting $wait_seconds seconds before retrieving ".$link->url."...\n" if $verbose;
    sleep $wait_seconds;
    $m->follow_link(url => $link->url);
    $a[$n] = xml_in_decoding($xs, $m->response);
  }
  $village{progresses} = \@a;
}

print STDERR "waiting $wait_seconds seconds before retrieving 'party'...\n" if $verbose;
if ( $m->follow_link( url => q{_party_} ) ) {
  sleep $wait_seconds;
  $village{epilogue} = $m->ct;
}
print $xs->xml_out(\%village);

#my $log = Parse::RecDescent->new($grammar)->parse($text);
#print Dumper($log);

__END__

=head1 NAME

  ninjin_crawl.pl - Jinro BBS crawler

=head1 SYNOPSIS

  crawler.pl [options] URL

=head1 OPTIONS

  --encoding      URL
                  auto-detect if not specified
  --help          shows this help
  --wait          waiting time between requests [3]

=head1 DESCRIPTION

  ninjin_crawl.pl retrieves and parse a whole log
  of the villlage specified in the argument.
  And put the result to the standard output as a XML.

=head1 SEE ALSO

L<WWW::Mechanize>

=cut

[ 固定リンク ]

2006-11-13 Mon

■ RecipeML - Examples [lx][markup][net]

<http://www.formatdata.com/recipeml/examples.html>
自然言語のレシピにアノテーションするアプローチ。

<recipeml version="0.5">
  <recipe>
    <head>
      <title>The Needless-Markman Hoax Chocolate-Chip Cookie</title>
    </head>
    <ingredients>
      <ing>
<amt><qty>2</qty><unit>cups</unit></amt>
<item>butter</item>
      </ing>
      <ing>
<amt><qty>4</qty><unit>cups</unit></amt>
<item>flour</item>
      </ing>
    </ingredients>
    <directions>
      <step>Measure oatmeal and blend in a blender to a fine
    powder</step>
      <step>Cream the butter and both sugars</step>
      <step>Add eggs and vanilla; mix together with
    flour, oatmeal, salt, baking powder, and soda</step>
      <step>Add chocolate chips, Hershey Bar and nuts</step>
      <step>Roll into balls and place two inches apart
    on a cookie sheet</step>
      <step>Bake for 10 minutes at 375 degrees</step>
    </directions>
  </recipe>
</recipeml>

directions の意味表現を記述するとかいう話が出たり。

[ 固定リンク ]

2006-08-20 Sun

■ Higher-Order Functional Programming with XSLT 2.0 and FXSL [markup][fp]

<http://www.idealliance.org/papers/extreme/proceedings/html/2006/Novatchev01/EML2006Novatchev01.html>
XSLT は関数型プログラミングをサポートしたらしい。

[ 固定リンク ]

2006-07-16 Sun

■ ブロック要素をセンタリング [markup][howto]

<table align=center>

のような効果が欲しいとき使えるCSS。

table タグは使いたくないけれど、table タグの効果が欲しい。
ということで、
水平方向のmargin:autoと、
display:table を組み合わせる。

<h1 style="border:1px black solid;   display:table;   margin: 0 auto;">test</h1>

水平方向の margin:auto は、LaTeX の\hfill と同様に解釈される。
display:table は、bounding box のサイズが「最小」になる。

[ 固定リンク ]

2006-05-23 Tue

■ reStructuredText -> LaTeX [markup][latex]

Wiki っぽい簡単なマークアップから、LaTeX ソース（など）への変換ができる。
この手のツールの中では、一番広く使われている感じ。

とりあえず、docutils と、
rst-mode.el を入れて、
Meadow/Emacs memo -- ロードパス
を参考にして、load-path が通ってるところ
(/usr/share/emacs/site-lisp で多分大丈夫) に .el を置いて、

(require 'rst-mode)

と書く。

rst2html.py xxx.txt > xxx.html

[ 固定リンク ]

2006-01-30 Mon

■ HTML Slidy [presentation][markup][net]

<http://www.w3.org/Talks/Tools/Slidy/>
W3C による、CSS + XHTML + JavaScript のプレゼンテーションツール。

ポジションとしては LaTeX + Beamer に近いが、
動的効果が得意で、数式が苦手。
フォントサイズを表示中に切替えられるのと、
エディタとブラウザだけで完結する、というのはありがたい。
LaTeX は処理系を用意しないといけないから……

(X)(HT)ML->PDF変換 Prince も。
Slidy はまだ印刷できないようだけど。

HTML 中で数式を直接書く（JavaScript でクライアントサイドで MathML に変換する）
ASCIIMathML

[ 固定リンク ]

2005-07-24 Sun

■ CSS Dencitie [markup][net]

<http://www6.plala.or.jp/go_west/nextcss/>
CSSの仕様、ブラウザの実装などを網羅的に解説。
中級者入門という、まとまった記事もある。

[ 固定リンク ]

Powered by chalow