clpeak - peak performance of your opencl device


As an opencl developer,  we all want to know peak capabilities of our device. I have been working on a small repo on github github.com/krrishnarraj/clpeak to measure peak performance of opencl device

So what does it do? Measure peak bandwidth for all vector-widths of float. Measure single & double precision compute capacity for all vector-widths. Measure transfer bandwidth from host to device and kernel launch latency. A preview of cayman

Platform: AMD Accelerated Parallel Processing
  Device: Cayman
    Driver version: 1348.4 (Linux x64)

    Global memory bandwidth (GBPS)
      float   : 130.97
      float2  : 131.36
      float4  : 90.50
      float8  : 69.91
      float16 : 35.27
    Single-precision compute (GFLOPS)
      float   : 674.44
      float2  : 1345.68
      float4  : 2601.47
      float8  : 2586.69
      float16 : 2573.38

    Double-precision compute (GFLOPS)
      double   : 671.24
      double2  : 671.59
      double4  : 670.93
      double8  : 669.51
      double16 : 666.50

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 3.53
      enqueueReadBuffer         : 4.43
      enqueueMapBuffer(for read) : 152.89
        memcpy from mapped ptr : 4.40
      enqueueUnmap(after write)   : 1781.26
        memcpy to mapped ptr    : 4.42

    Kernel launch latency : 44.22 us     

Highlights:
  •  Reached a max global bandwidth of 131 GBPS with float/float2
  •  2600 GFLOPS with float4, hardware peak being 2700. The VLIW4 architecture requires vector-code to utilize peak capacity. With scalar code, you only achieve 1/4th of capacity
  •  Double doesn’t require vector-code as 671 GFLOPS was achieved against hw peak of 675(1/4th of single fp) using scalar code
  •  Transfer bandwidth, it actually depends on the host m/c specs. Map/unmap is doing a zero-copy for a buffer created using CL_MEM_ALLOC_HOST_PTR flag (or it could a bug in code which makes it appear as zero-copy)

There is something interesting about intel cpus on intel platform

Platform: Intel(R) OpenCL                                 Platform: Intel(R) OpenCL
  Device: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz     Device: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
    Driver version: 1.2 (Win32)                               Driver version: 1.2 (Win64)

    Single-precision compute (GFLOPS)                         Single-precision compute (GFLOPS)
      float   : 25.44                                             float   : 25.41
      float2  : 50.81                                             float2  : 101.06
      float4  : 51.61                                             float4  : 171.80
      float8  : 52.55                                             float8  : 81.04
      float16 : 51.92                                             float16 : 95.44

    Double-precision compute (GFLOPS)                         Double-precision compute (GFLOPS)
      double   : 25.45                                            double   : 25.41
      double2  : 25.47                                            double2  : 87.65
      double4  : 25.71                                            double4  : 34.09
      double8  : 26.23                                            double8  : 30.03
      double16 : 27.16                                            double16 : 86.77

    Transfer bandwidth (GBPS)                                 Transfer bandwidth (GBPS)
      enqueueWriteBuffer          : 2.30                         enqueueWriteBuffer         : 2.26
      enqueueReadBuffer           : 6.47                        enqueueReadBuffer          : 7.56
      enqueueMapBuffer(for read)  : 1.#J                         enqueueMapBuffer(for read) : 1.#J
        memcpy from mapped ptr : 8.39                           memcpy from mapped ptr : 9.26
      enqueueUnmap(after write)   : 1.#J                         enqueueUnmap(after write)  : 1.#J
        memcpy to mapped ptr    : 8.13                           memcpy to mapped ptr   : 8.95

There is a huge compute performance difference in 32/64 modes. I got suspicious and went to intel forum software.intel.com/en-us/forums/topic/495379 . Compiler is exposed to more registers in x64. Running the same program in x64 is showing more than 3x performance!!! The hardware compute peak is 153.6 GFLOPS at 2.4GHz. Arik indicated that actual freq can go higher due to turbo mode. So 172 GFLOPS in x64 is possibly because of turbo mode

Coming to next device, Mali gpu in my laptop (yes yes its chromebook) exynos 5250 soc

Platform: ARM Platform
  Device: Mali-T604
    Driver version: 1.1 (Linux ARM)

    Global memory bandwidth (GBPS)
      float   : 1.56
      float2  : 4.41
      float4  : 5.75
      float8  : Out of resources! Skipped

    Single-precision compute (GFLOPS)
      float   : 2.38
      float2  : 16.40
      float4  : 8.07
      float8  : 21.84
      float16 : 16.37

    No double precision support! Skipped

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 5.84
      enqueueReadBuffer          : 2.59
      enqueueMapBuffer(for read) : 934.99
        memcpy from mapped ptr   : 2.83
      enqueueUnmap(after write)  : 1813.75
        memcpy to mapped ptr     : 2.85

    Kernel launch latency : 149.46 us

Highlights:

  • Max bandwidth of around 5.75 GBPS
  • Compute peak of around 22 GFLOPS. Not sure of what is the real hardware peak. Could be 32 something
  • Around 6 GBPS of host->device bandwidth. This is relatively higher than standard x86 based pcie systems. Map & unmap buffers are zero-copy
  • kernel latency measures time b/w when the kernel was queued at host and when it started executing on device. 150 us is acceptable for gpu

Lets take a device and profile 3 different opencl runtimes on it. Device was Intel i3-550 @ 3.2 GHz on linux x64. This is a nehalem based device having peak compute capacity of 51.2 GFLOPS
pocl is an opensource implementation of opencl spec portablecl.org . Currently aims cpu devices. For this setup, pocl was compiled against llvm 3.2

Platform                           :    AMD            Intel           POCL
Driver version                     :    1214.3          1.2.0.76921     0.9

Global memory bandwidth (GBPS)
  float                             :    7.58            8.28            7.73
  float2                            :    8.63            8.15            8.70
  float4                            :    7.67            7.34            7.67
  float8                            :    7.82            7.63            7.81
  float16                           :    7.51            7.48            7.64

Single-precision compute (GFLOPS)
  float                             :    3.64            14.56           3.64
  float2                            :    7.26            29.07           7.30
  float4                            :    14.45           46.65           14.58
  float8                            :    28.50           28.68           28.51
  float16                           :    18.31           45.49           49.67

Double-precision compute (GFLOPS)
  double                           :    3.19            12.56           3.19
  double2                          :    6.37            22.47           6.39
  double4                          :    12.65           8.55            12.70
  double8                          :    21.19           21.26           19.98
  double16                         :    4.88            10.25           6.73

Transfer bandwidth (GBPS)
  enqueueWriteBuffer               :    6.59            1.40            6.60
  enqueueReadBuffer               :    4.16            4.13            4.05
  enqueueMapBuffer(for read)      :    8159.13         1355.73         28256.36
   memcpy from mapped ptr       :    4.17            4.13            4.05
  enqueueUnmap(after write)        :   8801.16         1682.98         51622.20
   memcpy to mapped ptr          :    4.31            4.26            4.24

Kernel launch latency (us)           :    11.36           5.33            5.12

Highlights:

  • Bandwidth wise, all 3 platforms have performed almost equally. Although theoretical peak being 21 GBPS, only 1/2 was achieved because only 1 RAM slot was used. You need to place RAM in both slots to realize peak bandwidth
  • single fp compute, AMD reached only half of hardware peak. Intel decently peaked at float4, while pocl peaked at float16
 
Check github.com/krrishnarraj/clpeak/tree/master/results for results of some more devices. Send in results of your device to krrishnarraj@gmail.com or send a pull request

Mali drivers on chromebook


Now that arm has released linux drivers for graphics and opencl. Time for some experiments. Mali developer site has a dedicated page for samsung chromebook setup. Some steps can be ignored depending on your system setup. There are 2 versions of driver fbdev & x11. fbdev is a cpu based backend and x11 can use gpu based display backend. Userspace driver package contains drivers for egl, glesv2 and opencl

I used a basic test for display benchmark. The IE fishtank test
fbdev backend:            ~25 fps
armsoc x11 backend: ~47 fps
Looks like there is come gpu acceleration on graphics side

Coming to my favorite part. OpenCL. There are 2 versions of drivers, fbdev & armsoc. Not sure if that matters for opencl. libopencl.so has a linker dependency on libmali.so. Infact libmali.so contains all opencl symbolsI and libOpenCL.so is just a dummy stub library. I thought I can have pocl and mali drivers working together. Adding an icd entry for mali in /etc/OpenCL/vendors didn’t work. All the precompiled cl utilities like clinfo complained that some symbols are missing when loaded against mali opencl library, probably because it was compiled against 1.2 and mali provides opencl 1.1. Now that pocl and mali can’t co-exist, I deleted pocl

So to use mali driver, copy libmali.so & libmali_cinstr_plugin.so to /usr/lib. This itself is opencl library. Create shortlinks

ln -s /usr/lib/libmali.so /usr/lib/libOpenCL.so.1
ln -s /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so

Download mali opencl sdk. It contains opencl 1.1 headers. Copy CL/ from include to /usr/include. You are almost done. I used clinfo code from http://graphics.stanford.edu/~yoel/notes/clInfo.c

Found 1 platform(s).
platform[0x76e46de8]: profile: FULL_PROFILE
platform[0x76e46de8]: version: OpenCL 1.1 
platform[0x76e46de8]: name: ARM Platform
platform[0x76e46de8]: vendor: ARM
platform[0x76e46de8]: extensions: cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_arm_core_id cl_khr_egl_event cl_khr_egl_image
platform[0x76e46de8]: Found 1 device(s).

device[0x76e34818]: NAME: Mali-T604
device[0x76e34818]: VENDOR: ARM
device[0x76e34818]: PROFILE: FULL_PROFILE
device[0x76e34818]: VERSION: OpenCL 1.1 
device[0x76e34818]: EXTENSIONS: cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_arm_core_id cl_khr_egl_event cl_khr_egl_image
device[0x76e34818]: DRIVER_VERSION: 1.1

device[0x76e34818]: Type: GPU
device[0x76e34818]: EXECUTION_CAPABILITIES: Kernel
device[0x76e34818]: GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
device[0x76e34818]: CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
device[0x76e34818]: SINGLE_FP_CONFIG: 0x3f
device[0x76e34818]: QUEUE_PROPERTIES: 0x3

device[0x76e34818]: VENDOR_ID: 1767243777
device[0x76e34818]: MAX_COMPUTE_UNITS: 4
device[0x76e34818]: MAX_WORK_ITEM_DIMENSIONS: 3
device[0x76e34818]: MAX_WORK_GROUP_SIZE: 256
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_CHAR: 16
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_SHORT: 8
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_INT: 4
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_LONG: 2
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
device[0x76e34818]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0
device[0x76e34818]: MAX_CLOCK_FREQUENCY: 533
device[0x76e34818]: ADDRESS_BITS: 64
device[0x76e34818]: MAX_MEM_ALLOC_SIZE: 525550592
device[0x76e34818]: IMAGE_SUPPORT: 1
device[0x76e34818]: MAX_READ_IMAGE_ARGS: 128
device[0x76e34818]: MAX_WRITE_IMAGE_ARGS: 8
device[0x76e34818]: IMAGE2D_MAX_WIDTH: 65536
device[0x76e34818]: IMAGE2D_MAX_HEIGHT: 65536
device[0x76e34818]: IMAGE3D_MAX_WIDTH: 65536
device[0x76e34818]: IMAGE3D_MAX_HEIGHT: 65536
device[0x76e34818]: IMAGE3D_MAX_DEPTH: 65536
device[0x76e34818]: MAX_SAMPLERS: 16
device[0x76e34818]: MAX_PARAMETER_SIZE: 1024
device[0x76e34818]: MEM_BASE_ADDR_ALIGN: 1024
device[0x76e34818]: MIN_DATA_TYPE_ALIGN_SIZE: 128
device[0x76e34818]: GLOBAL_MEM_CACHELINE_SIZE: 64
device[0x76e34818]: GLOBAL_MEM_CACHE_SIZE: 131072
device[0x76e34818]: GLOBAL_MEM_SIZE: 2102202368
device[0x76e34818]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[0x76e34818]: MAX_CONSTANT_ARGS: 8
device[0x76e34818]: LOCAL_MEM_SIZE: 32768
device[0x76e34818]: ERROR_CORRECTION_SUPPORT: 0
device[0x76e34818]: PROFILING_TIMER_RESOLUTION: 1000
device[0x76e34818]: ENDIAN_LITTLE: 1
device[0x76e34818]: AVAILABLE: 1
device[0x76e34818]: COMPILER_AVAILABLE: 1

As you notice some device query calls have failed. Strange!! 

Update 1:
The mali device access problem has been resolved at http://community.arm.com/thread/4478 . Create a udev rule so that users in ‘video’ group can access mali device. Create a file say ‘10-mali.rules’ in /etc/udev/rules.d/ and add this

KERNEL==”mali[0-9]”, GROUP=”video” MODE=”0660”

and add yourself to video group, or change mode to 0666

Running strings on libmali.so reveals this:
Mali-T604
Mali-T658
Mali-T628
Mali-T622
Mali-T624
Mali-T678
Mali-T7516

So when is Mali T7516 coming?


OpenCL on arm linux


OpenCL on ARM looks very exciting. New architecture, new set of optimizations. I managed to get 2 opencl platforms working on arm chromebook

Number of platforms: 2
Platform Name: Portable Computing Language
Platform Vendor: The pocl project
Platform Version: OpenCL 1.2 pocl 0.9-pre
Platform Profile: FULL_PROFILE
Platform Extensions: cl_khr_icd

Platform Name: coprthr
Platform Vendor: Brown Deer Technology, LLC.
Platform Version: coprthr-1.6-CURRENT (Freewill)
Platform Profile: <profile>
Platform Extensions: cl_khr_icd

I purchased this samsung chromebook thinking that I can work on the first arm opencl capable Mali gpu. Much to my sorrow, there are no gpu drivers for linux yet. They provided only android opencl drivers for arndale board. So wont this android driver work on linux if android is nothing but linux?? NO. Android uses a different libc library called bionic. All these drivers are compiled for bionic and not libc!!

Anyways this chromebook has an ARM cortex A15 cpu which is more than enough to explore. pocl is the most popular opensource opencl implementation which works on all kinds of cpu x86, x64, arm and many more. It heavily uses clang and llvm.

1. pull pocl from github. It depends on libhwloc. Download or build it from source. Set these flags

export TARGET_CLANG_FLAGS=”-mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon”
export CLFLAGS=”-mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon”
export HOST_CLANG_FLAGS=”-mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon”

Currently cl.hpp.in requires a patch to work on arm

diff —git a/include/CL/cl.hpp.in b/include/CL/cl.hpp.in
index 06448e2..a9f194e 100644
—- a/include/CL/cl.hpp.in
+++ b/include/CL/cl.hpp.in
@@ -212,8 +212,11 @@
#if defined(linux) || defined(__APPLE__) || defined(__MACOSX)
#include <alloca.h>

+#ifdef __SSE2__
#include <emmintrin.h>
#include <xmmintrin.h>
+#endif // __SSE2__
+
#endif // linux

#include <cstring>
@@ -1035,7 +1038,12 @@ namespace detail {
#endif // !_WIN32
}

+#ifdef __SSE2__
inline void fence() { _mm_mfence(); }
+#else
+ inline void fence() {}
+#endif
+
}; // namespace detail

Copy cl.hpp.in to cl.hpp in include/CL. —enable-install-opencl-headers option to configure script will also install opencl headers. Rest all are straight forward.

Now you need an icd loader. A tgz source of icd loader is available at http://www.khronos.org/registry/cl/ Install cmake and copy CL folder from pocl include to icd/inc. A make will generate libOpenCL.so in bin/ . copy it to /usr/lib/ . You are ready with an opencl platform

clinfo is a very handy utility. A good one can be found at https://github.com/Oblomov/clinfo

2. Next platform that I tried was coprthr from browndeer technology. Quite unpopular though. Code can be found at https://github.com/browndeer/coprthr . It depends on libelf, libconfig and libevent. Download or build it from source. Rest should be fine

portland group HAD provided opencl compilers for arm and other systems some time back. I remember seeing the download  page. Now that the great nvidia has acquired portland group this product is missing from pgi website http://www.pgroup.com/products/pgcl/ . No wonder Linus Torvalds pointed that finger at nvidia :D

Now I need to benchmark these platforms for opencl


Bootloader unlock on samsung arm chromebook


 

Well one more post on chromebook. This chromebook is so secure that its really difficult to break its boot sequence and take control of the bootloader. http://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/custom-firmware explains the sequence. Some people might have been frustrated with that mandatory key press in developer mode. Archlinux guys have found a way to flash nv-uboot(non verified uboot) to spi firmware. This way you get a grub like interface to boot kernel of your wish.

Courtesy http://archlinuxarm.org/forum/viewtopic.php?f=27&t=4016&hilit=protect&start=80#p29341 post in arch forum.

SPI is read-only by default. You can open the chromebook and remove a sticker in the motherboard which makes SPI read-write

Steps:

0. Take a backup of all important data. Anything can go wrong. Have a bootable external mmc ready in case if anything goes wrong


1. Open backcover of chromebook. Follow instructions from ifixit http://www.ifixit.com/Teardown/Samsung+Chromebook+11.6+Teardown/12225/2#s45950

image


2. Next to usb-3 port lies a round ring shape circuit. Remove the stricker from it which will unshort the connection. This will make spi read-write. Make sure there are no traces of metal around it


3. Boot into chromeos. ctrl+alt+T and enter into sudo prompt
    $flashrom —wp-disable
    $flashrom —wp-status
   

check if write-protection is disabled. If not, go back and clean the circuit. I opened it twice. Download nv uboot image from https://www.dropbox.com/s/6pzvraf3ko14sz9/nv_image-snow.bin.gz (source: Strats’s post at Archlinux forum) gunzip it. You should be having a 4MB bin file. If downloaded bin is corrupt, you might endup with a bricked device. MD5 of extracted binary: CA50D23D315F1378B43E4552D8D441AD

    // Take backup and then flash
    $flashrom -p linux_spi:dev=/dev/spidev1.0 -r orig_image-snow.bin
    $flashrom -p linux_spi:dev=/dev/spidev1.0 -w nv_image-snow.bin
    $sync and reboot
    
4. Press space on reboot to get into uboot prompt. 
    $setenv bootdelay 1
    $saveenv
    
    $vboot_twostop          will boot into chromeos
    
5. Booting custom OS
    format mmc in gpt format. Use cgpt/parted
    create a fat partition for boot, say size 256 MB - mmcblk1p1. Copy vmlinux(or any other kernel) from /boot of chromeos into this partition. Create an ext4 partition for rootfs - mmcblk1p2. copy any linux rootfs. Arch, suse-jeos and fedora worked for me
    Reboot
    
    At uboot prompt

    // choose 2nd mmc device. Internal mmc is dev 0
    $mmc dev 1

    // ls contents in filesystem. I get lot of dcache warnings. Still works
    $fatls mmc 1:1 /

    // load kernel to memory
    $fatload mmc 1:1 ${loadaddr} /vmlinux

   // set kernel boot parameters
    $setenv bootargs console=tty1 root=/dev/mmcblk1p2 rootfstype=ext4 rootwait rw   

    $bootm ${loadaddr}
    
    Now you have the freedom to boot any kernel without signing it.
   
    Disclaimer: TRY AT YOUR OWN RISK


THE samsung arm chromebook


So what is cool about this laptop. Everything. Its the first commercial ARM laptop for just $250. It is the top selling laptop on amazon http://hothardware.com/News/Samsung-Chromebook-Now-Top-Selling-Laptop-on-Amazon/ and is still in on top as of now http://www.amazon.com/Best-Sellers-Electronics-Laptop-Computers/zgbs/electronics/565108/

image

Coming to specs, it sports exynos 5250 dual core ARM A15 cpus with mali T604 gpu. 11.6” screen with 16 GB SSD. There are no heat sink fans because the soc hardly drains 5W at full speed which makes it even lite

Sad part is that its not available here in India. Since I badly wanted to own this device I got via ebay. It comes with chromium os which is decent enough for browsing. Lot of enthusiasts out there are trying to get a stable linux distro for this device so that they can have a complete development setup. Here are the few projects

1. Crouton: Most popular for its ease of use https://github.com/dnschneid/crouton Its creates a chroot environment on top of chrome os. It doesn’t give you a complete linux experience

2. Marcin Juszkiewicz was working at Canonical. His most popular post http://marcin.juszkiewicz.com.pl/2013/02/14/how-to-install-ubuntu-13-04-on-chromebook/ explained how to get ubuntu on chromebook. Now that he has left Canonical, he doesn’t expect much support from ubuntu community for this device

3. ArchLinux has excellent support for arm chromebook http://archlinuxarm.org/platforms/armv7/samsung/samsung-chromebook Even the forums are very active

4. Fedora 19 has a remix version for samsung chromebook https://fedoraproject.org/wiki/Architectures/ARM/F19/Remixes#Samsung_Exynos_5_Dual_Core_Cortex_A15 After trying out all other distros I have finally settled with this distro for development

What is the challenge? Although you can boot mainline kernel, it requires lot if patches for all the component devices like wifi to work. Google has forked 3.4 & 3.8 and maintains kernel with all the required patches for chrome devices http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=summary

Although linux support for this device is far from complete, its exciting to work on gen-next ultra portable ARM device


Airtel 4G LTE on Linux


I recently purchased airtel 4G internet for its reasonable cost and high speed. You get a constant download speed of 10 Mbps here in b’lor. My problem with this usb device was I couldn’t get this working on linux! Yes, they did provide software setup for windows/linux, all those were x86 based. I have this ARM chromebook and who will provide that piece of software with connect and disconnect button!

The modem is basically from ZTE, ZTE MF825A. The documentation provided by zte is (in)dispensable for the fact that this product is not listed anywhere in their website. If you provide a good looking GUI with connect/disconnect button and not say how it works, how can it satisfy a linux guy

Ok. So this is what happens. When you connect the device it is detected as usb storage device. Then udev kicks in and mode switches it to 1408 from 1225. cdc_ether module picks up this device and creates an ethernet connection. Check the below dmesg log

[ 104.025283] usb 2-1: new high-speed USB device number 4 using s5p-ehci
[ 104.183421] usb 2-1: New USB device found, idVendor=19d2, idProduct=1225
[ 104.183498] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=4
[ 104.183562] usb 2-1: Product: ZTE WCDMA Technologies MSM
[ 104.183616] usb 2-1: Manufacturer: ZTE,Incorporated
[ 104.183667] usb 2-1: SerialNumber: MF8250ZTED000000
[ 104.327876] usbcore: registered new interface driver libusual
[ 104.380155] Initializing USB Mass Storage driver…
[ 104.383780] scsi0 : usb-storage 2-1:1.0
[ 104.388458] usbcore: registered new interface driver usb-storage
[ 104.388518] USB Mass Storage support registered.
[ 105.395342] scsi 0:0:0:0: CD-ROM CWID USB SCSI CD-ROM 2.31 PQ: 0 ANSI: 2
[ 105.407270] scsi 0:0:0:1: Direct-Access ZTE MMC Storage 2.31 PQ: 0 ANSI: 2
[ 105.463160] sd 0:0:0:1: [sda] Attached SCSI removable disk
[ 105.553641] sr0: scsi-1 drive
[ 105.553668] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 105.554806] sr 0:0:0:0: Attached scsi CD-ROM sr0
[ 110.415922] usb 2-1: USB disconnect, device number 4
[ 110.790178] usb 2-1: new high-speed USB device number 5 using s5p-ehci
[ 110.948539] usb 2-1: New USB device found, idVendor=19d2, idProduct=1408
[ 110.948614] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=4
[ 110.948677] usb 2-1: Product: ZTE WCDMA Technologies MSM
[ 110.948730] usb 2-1: Manufacturer: ZTE,Incorporated
[ 110.948780] usb 2-1: SerialNumber: MF8250ZTED000000
[ 110.976822] scsi1 : usb-storage 2-1:1.2
[ 111.139335] cdc_ether 2-1:1.0: eth0: register ‘cdc_ether’ at usb-s5p-ehci-1, CDC Ethernet Device, 34:4b:50:b6:ab:3c
[ 111.141926] usbcore: registered new interface driver cdc_ether

and ifconfig

eth0 Link encap:Ethernet HWaddr 34:4b:50:b6:ab:3c
inet addr:192.168.0.144 Bcast:192.168.0.255 Mask:255.255.255.0
………………………………………………………………..


Although an internal ip is assigned, there is no internet at this point. What surprised me was how can usb modem get an ethernet connection? how? I went to linux, ubuntu forums, no help! I assumed that kernel wrongly detected this device. I forced QMI module to load driver for this device by editing the kernel. Although it loaded the qmi driver, /dev/cdc-wdm0 was not usable. Finally I contacted Bjorn(author of  qmi-wwan.c in linux tree). After looking at the logs, he told that this device is probably is not a qmi device. He asked me check with that cdc_ether connection and open gateway ip(192.168.0.1) in the browser. “Page not found”!

My last resort was to go to a linux x86 m/c and trace what that connect/disconnect ‘GUI’ was doing using wireshark. Take it. So simple. Connect button called http://192.168.0.1/goform/goform_set_cmd_process using post method with goformId=”CONNECT_NETWORK” as parameter. Thats all and you are connected to internet. Here is a small html snippet that imitates the GUI

<form action = "http://192.168.0.1/goform/goform_set_cmd_process" method = "post">
<input type="checkbox" name="goformId" value="CONNECT_NETWORK" checked style="display:none;">
<BUTTON name="submit" type="submit"> Connect </BUTTON>
</form>
<form action = "http://192.168.0.1/goform/goform_set_cmd_process" method = "post">
<input type="checkbox" name="goformId" value="DISCONNECT_NETWORK" checked style="display:none;">
<BUTTON name="submit" type="submit"> Disconnect </BUTTON>
</form>

 The green led blinks after connecting to internet

Above html snippet can be downloaded from https://drive.google.com/file/d/0ByAtbw9O9wYyc2JCbzVwa2RZNTQ/edit?usp=sharing

Update 1:

Good news. This method works on android too. I just received my new Moto G and it works

First turn off edge/hspa/wifi, or in simple terms no ip is assigned to the system. Connect the dongle to phone using an OTG cable. Wait for sometime until red led turns green. This means kernel has detected the device. Goto settings->About->Status and check ip address. If 192.168.0.144/192.168.0.145 ip is assined you are good to go. Restart once if you don’t get the ip.

Download above html page and open it in chrome (default html viewer doesn’t work) file:///sdcard/path/to/html/file and connect

4G internet on a 3G phone :)



Hello World


And finally my long awaited blog is finally up. Ah what a relief. I had so many things running on my mind(ya, technical) which I wanted to share. Many experiments have gone undocumented

So what am I upto? Basically am a OpenCL guy, huge fan, and recently opencl and gpgpu stuffs on ARM mobiles. Most of my posts will be based on this. And ofcourse THE samsung chromebook - “The first ARM laptop in the world”. The very nature of it being an ARM laptop is exciting. Well posts on this will surely come. Apart from opencl, am a huge fan of linux too

So here starts my blog. Thank you for reading